7,306 Matching Annotations
  1. Oct 2024
    1. Author response:

      We thank both reviewers for their thorough and insightful feedback, which will contribute to improving our manuscript. In summary, the key concerns raised include the potential induction of GLV volatiles due to plant handling, limitations in the design of the "wind tunnel" bioassay, and the need for a deeper analysis of specific volatile compounds that contribute to the success of push-pull systems. We are happy to revise the entire manuscript according to all comments of the reviewers. This includes clarification of our methodology and providing a more reflective discussion on how physical stress might have influenced volatile emissions. Additionally, we will conduct new experiments with a modified bioassay setup to address concerns about directional cues and airflow control, minimizing cross-contamination. While the identification of individual compounds was beyond the scope of this study, we acknowledge its importance and propose it as a direction for future research.

      Reviewer #1 (Public review):

      Summary:

      The manuscript of Odermatt et al. investigates the volatiles released by two species of Desmodium plants and the response of herbivores to maize plants alone or in combination with these species. The results show that Desmodium releases volatiles in both the laboratory and the field. Maize grown in the laboratory also released volatiles, in a similar range. While female moths preferred to oviposit on maize, the authors found no evidence that Desmodium volatiles played a role in lowering attraction to or oviposition on maize.

      Strengths:

      The manuscript is a response to recently published papers that presented conflicting results with respect to whether Desmodium releases volatiles constitutively or in response to biotic stress, the level at which such volatiles are released, and the behavioral effect it has on the fall armyworm. These questions are relevant as Desmodium is used in a textbook example of pest-suppressive sustainable intercropping technology called push-pull, which has supported tens of thousands of smallholder farmers in suppressing moth pests in maize. A large number of research papers over more than two decades have implied that Desmodium suppresses herbivores in push-pull intercropping through the release of large amounts of volatiles that repel herbivores. This premise has been questioned in recent papers. Odermatt et al. thus contribute to this discussion by testing the role of odors in oviposition choice. The paper confirms that ovipositing FAW preferred maize, and also confirmed that odors released from Desmodium appeared not important in their bioassays.

      The paper is a welcome addition to the literature and adds quality headspace analyses of Desmodium from the laboratory and the field. Furthermore, the authors, some of whom have since long contributed to developing push-pull, also find that Desmodium odors are not significant in their choice between maize plants. This advances our knowledge of the mechanisms through which push-pull suppresses herbivores, which is critically important to evolving the technique to fit different farming systems and translating this mechanism to fit with other crops and in other geographical areas.

      Thank you for your careful assessment of our manuscript.

      Weaknesses:

      Below I outline the major concerns:

      (1) Clear induction of the experimental plants, and lack of reflective discussion around this: from literature data and previous studies of maize and Desmodium, it is clear that the plants used in this study, particularly the Desmodium, were induced. Maize appeared to be primarily manually damaged, possibly due to sampling (release of GLV, but little to no terpenoids, which is indicative of mostly physical stress and damage, for example, one of the coauthor's own paper Tamiru et al. 2011), whereas Desmodium releases a blend of many compounds (many terpenoids indicative of herbivore induction). Erdei et al. also clearly show that under controlled conditions maize, silver leaf and green leaf Desmodium release volatiles in very low amounts. While the condition of the plants in Odermatt et al. may be reflective of situations in push-pull fields, the authors should elaborate on the above in the discussion (see comments) such that the readers understand that the plant's condition during the experiments. This is particularly important because it has been assumed that Desmodium releases typical herbivore-induced volatiles constitutively, which is not the case (see Erdei et al. 2024). This reflection is currently lacking in the manuscript.

      We acknowledge the need for a more reflective discussion on the possible causes of GLV (green leaf volatiles) emission, particularly regarding physical damage. Although the field plants were carefully handled, it is possible that some physical stress may have contributed to the release of GLVs. We will ensure the revised manuscript reflects this nuanced interpretation. However, we will also explain more clearly that our aim was to capture the volatile emission of plants used by farmers under realistic conditions and moth responses to these plants, not to be able to attribute the volatile emission to a specific cause. We think that this is also clear in the manuscript. However, we plan to revise relevant passages throughout the manuscript to ensure that we do not make any claims about the reason for volatile emissions, and that our claims regarding these plants and their headspace being representative of the system as practiced by farmers are supported. In the revised manuscript we will explain better that the volatile profiles comprise a majority of non-GLV compounds. As shown in figure 1, the majority of the substances that were found in the headspace of the sampled plants of Desmodium intortum or Desmodium incanum are non-GLV monoterpenes, sesquiterpenes, or aromatic compounds. We will also note that the experimental plants used in the study were grown in insect proof screenhouses and were checked for any insect damage before volatile collection and bioassay.

      (2) Lack of controls that would have provided context to the data: The experiments lack important controls that would have helped in the interpretation:

      (2a) The authors did not control the conditions of the plants. To understand the release of volatiles and their importance in the field, the authors should have included controlled herbivory in both maize and Desmodium. This would have placed the current volatile profiles in a herbivory context. Now the volatile measurements hang in midair, leading to discussions that are not well anchored (and should be rephrased thoroughly, see eg lines 183-188). It is well known that maize releases only very low levels of volatiles without abiotic and biotic stressors. However, this changes upon stress (GLVs by direct, physical damage and eg terpenoids upon herbivory, see above). Erdei et al. confirm this pattern in Desmodium. Not having these controls, means that the authors need to put the data in the context of what has been published (see above).

      We appreciate this concern. Our study aimed to capture the real-world conditions of push-pull fields, where Desmodium and maize grow in natural environments without the direct induction of herbivory for experimental purposes. We will update the discussion to provide better context based on existing literature regarding the volatile release under stress conditions. We agree that in further studies it would be important to carry out experiments under different environmental conditions, including herbivore damage. However, this was not within the scope of the present study.

      (2b) It would also have been better if the authors had sampled maize from the field while sampling Desmodium. Together with the above point (inclusion of herbivore-induced maize and Desmodium), the levels of volatile release by Desmodium would have been placed into context.

      We acknowledge that sampling maize and other intercrop plants, such as edible legumes, alongside Desmodium in the push-pull field would have allowed us to make direct comparisons of the volatile profiles of different plants in the push-pull system under shared field conditions. Again, this should be done in future experiments but was beyond the scope of the present study. Due to the amount of samples, we could handle given cost and workload, we chose to focus on Desmodium because there is much less literature on the volatile profiles of field-grown Desmodium than maize plants in the field: we are aware of one study attempting to measure field volatile profiles from Desmodium intortum (Erdei et al. 2024) and no study attempting this for Desmodium incanum. We will point out this justification for our focus on Desmodium in the manuscript. Additionally, we will suggest in the discussion that future studies should measure volatile profiles from maize and intercrop legumes alongside Desmodium and border grass in push-pull fields.

      (2c) To put the volatiles release in the context of push-pull, it would have been important to sample other plants which are frequently used as intercrop by smallholder farmers, but which are not considered effective as push crops, particularly edible legumes. Sampling the headspace of these plants, both 'clean' and herbivore-induced, would have provided a context to the volatiles that Desmodium (induced) releases in the field - one would expect unsuccessful push crops to not release any of these 'bioactive' volatiles (although 'bioactive' should be avoided) if these odors are responsible for the pest suppressive effect of Desmodium. Many edible intercrops have been tested to increase the adoption of push-pull technology but with little success.

      Again, we very much agree that such measurements are important for the longer-term research program in this field. But again, for the current study this would have exploded the size of the required experiment. Regarding bioactivity, we have been careful to use the phrase "potentially bioactive", or to cite other studies showing bioactivity, where we have not demonstrated bioactivity ourselves.

      Because of the lack of the above, the conclusions the authors can draw from their data are weakened. The data are still valuable in the current discussion around push-pull, provided that a proper context is given in the discussion along the points above.

      We agree that our study is limited to its specific aims. Therefore, we think the revisions will make these more explicit and help to avoid misleading claims.

      (3) 'Tendency' of the authors to accept the odor hypothesis (i.e. that Desmodium odors are responsible for repelling FAW and thereby reduce infestation in maize under push-pull management) in spite of their own data: The authors tested the effects of odor in oviposition choice, both in a cage assay and in a 'wind tunnel'. From the cage experiments, it is clear that FAW preferred maize over Desmodium, confirming other reports (including Erdei et al. 2024). However, when choosing between two maize plants, one of which was placed next to Desmodium to which FAW has no tactile (taste, structure, etc), FAW chose equally. Similarly in their wind tunnel setup (this term should not be used to describe the assay, see below), no preference was found either between maize odor in the presence or absence of Desmodium. This too confirms results obtained by Erdei et al. (but add an important element to it by using Desmodium plants that had been induced and released volatiles, contrary to Erdei et al. 2024). Even though no support was found for repellency by Desmodium odors, the authors in many instances in the manuscript (lines 30-33, 164-169, 202, 279, 284, 304-307, 311-312, 320) appear to elevate non-significant tendencies as being important. This is misleading readers into thinking that these interactions were significant and in fact confirming this in the discussion. The authors should stay true to their own data obtained when testing the hypothesis of whether odors play a role in the pest-suppressive effect of push-pull.

      We appreciate this feedback and agree that we may have overstated claims that could not be supported by strict significance tests. However, we believe that non-significant tendencies can still provide valuable insights. In the revised version of the manuscript, we will ensure a clear distinction between statistically significant findings and non-significant trends and remove any language that may imply stronger support for the odor hypothesis that what the data show.

      (4) Oviposition bioassay: with so many assays in close proximity, it is hard to certify that the experiments are independent. Please discuss this in the appropriate place in the discussion.

      We have pointed this out in the submitted manuscript in the lines 275 – 279. Furthermore, we include detailed captions to figure 4 - supporting figure 3 & figure 4 - supporting figure 4. We are aware that in all such experiments there is a danger of between-treatment interference, which we will point out for our specific case. We will also mention that this common caveat does not invalidate experimental designs when practicing replication and randomization and assume insect’s ability to select suitable oviposition site in the background of such confounding factors under realistic conditions. We will also mention explicitly that with our experimental setup we tried to minimize interference between treatments by spacing and temporal staggering.

      (5) The wind tunnel has a number of issues (besides being poorly detailed):

      (5a) The setup which the authors refer to as a 'wind tunnel' does not qualify as a wind tunnel. First, there is no directional flow: there are two flows entering the setup at opposite sides. Second, the flow is way too low for moths to orient in (in a wind tunnel wind should be presented as a directional cue. Only around 1.5 l/min enters the wind tunnel in a volume of 90 l approximately, which does not create any directional flow. Solution: change 'wind tunnel' throughout the text to a dual choice setup /assay.)

      We agree with these criticisms and will change the terminology accordingly. We also plan to conduct an additional experiment with a no-choice arena that provides conditions closer to a true wind tunnel. The setup of the added experiment features an odor entry point at only one side of the chamber to create a more directional airflow. Each treatment (maize alone, maize + D. intortum, maize + D. incanum, and a control with no plants) will be tested separately, with only one treatment conducted per evening to avoid cross-contamination.

      (5b) There is no control over the flows in the flight section of the setup. It is very well possible that moths at the release point may only sense one of the 'options'. Please discuss this.

      We will add this to the discussion. The newly planned assays also address this concern by using a setup with laminar flow.

      (5c) Too low a flow (1,5 l per minute) implies a largely stagnant air, which means cross-contamination between experiments. An experiment takes 5 minutes, but it takes minimally 1.5 hours at these flows to replace the flight chamber air (but in reality much longer as the fresh air does not replace the old air, but mixes with it). The setup does not seem to be equipped with e.g. fans to quickly vent the air out of the setup. See comments in the text. Please discuss the limitations of the experimental setup at the appropriate place in the discussion.

      We will add these limitations to the discussion and will address these concerns with new experiments (see answer 5a).

      (5d) The stimulus air enters through a tube (what type of tube, diameter, length, etc) containing pressurized air (how was the air obtained into bags (type of bag, how is it sealed?), and the efflux directly into the flight chamber (how, nozzle?). However, it seems that there is no control of the efflux. How was leakage prevented, particularly how the bags were airtight sealed around the plants? 

      We will add the missing information to the methods and provide details about types of bags, manufacturers, and pre-treatments. In short, Teflon tubes connected bagged plants to the bioassay setup and air was pumped in at an overpressure, so leakage was not eliminated but contamination from ambient air was avoided.

      (5e) The plants were bagged in very narrowly fitting bags. The maize plants look bent and damaged, which probably explains the GLVs found in the samples. The Desmodium in the picture (Figure 5 supplement), which we should assume is at least a representative picture?) appears to be rather crammed into the bag with maize and looks in rather poor condition to start with (perhaps also indicating why they release these volatiles?). It would be good to describe the sampling of the plants in detail and explain that the way they were handled may have caused the release of GLVs.

      We will include a more detailed description of the plant handling and bagging processes to the methods to clarify how the plants were treated during all assays reported in the submitted manuscript and the newly planned assays. This will address concerns about the possible influence of plant stress, such as GLV emission due to bagging, on the results. We politely disagree that the maize plants were damaged and the Desmodium plants not representative of those encountered in the field. The Desmodium plant pictured was D. incanum, which has sparser foliage and smaller leaves than D. intortum.

      (6) Figure 1 seems redundant as a main figure in the text. Much of the information is not pertinent to the paper. It can be used in a review on the topic. Or perhaps if the authors strongly wish to keep it, it could be placed in the supplemental material.

      We think that Figure 1 provides essential information about the push-pull system and the FAW. To our knowledge, this partly contradictory evidence so far has not been synthesized in the literature. We realize that such a figure would more commonly be provided in a review article, but we do not think that the small number of studies on this topic so far justify a stand-alone review. Instead, the introduction to our manuscript includes a brief review of these few studies, complemented by the visual summary provided in Figure 1 and a detailed supplementary table. We will revise the figure and associated text in the introduction to highlight its relevance for the current study and to reduce redundant information.

      Reviewer #2 (Public review):

      Based on the controversy of whether the Desmodium intercrop emits bioactive volatiles that repel the fall armyworm, the authors conducted this study to assess the effects of the volatiles from Desmodium plants in the push-pull system on behavior of FAW oviposition. This topic is interesting and the results are valuable for understanding the push-pull system for the management of FAW, the serious pest. The methodology used in this study is valid, leading to reliable results and conclusions. I just have a few concerns and suggestions for improvement of this paper:

      (1) The volatiles emitted from D. incanum were analyzed and their effects on the oviposition behavior of FAW moth were confirmed. However, it would be better and useful to identify the specific compounds that are crucial for the success of the push-pull system.

      We fully agree that identifying specific volatile compounds responsible for the push-pull effect would provide valuable insights into the underlying mechanisms of the system. However, the primary focus of this study was to address the still unresolved question whether Desmodium emits volatiles at all under field conditions, and the secondary aim was to test whether we could demonstrate a behavioral effect of Desmodium headspace on FAW moths. Before conducting our experiments, we carefully considered the option of using single volatile compounds and synthetic blends in bioassays. We decided against this because we judged that the contradictory evidence in the literature was not a sufficient basis for composing representative blends. Furthermore, we think it is an important first step to test for behavioral responses to the headspaces of real plants. We consider bioassays with pure compounds to be important for confirmation and more detailed investigation in future studies. There was also contradictory evidence in the literature regarding moth responses to plants. We thus opted to focus on experiments with whole plants to maintain ecological relevance.

      (2) That would be good to add "symbols" of significance in Figure 4 (D).

      We report the statistical significance of the parameters in Figure 4 (D) in Table 3. While testing significance between groups is a standard approach, we used a more robust model-based analysis to assess the effects of multiple factors simultaneously. We will clarify this in the figure legend and provide a cross-reference to Table 3 for readers to easily find the statistical details.

      (3) Figure A is difficult for readers to understand.

      Unfortunately, it is not entirely clear which specific figure is being referred to as "Figure A" in this comment. We kindly request further clarification on which figure needs improvement, and we will make adjustments accordingly to ensure that all figures are easily comprehensible for readers.

      (4) It will be good to deeply discuss the functions of important volatile compounds identified here with comparison with results in previous studies in the discussion better.

      Our study does not provide strong evidence that specific volatiles from Desmodium plants are important determinants of FAW oviposition or choice in the push-pull system. Therefore, we prefer to refrain from detailed discussions of the potential importance of individual compounds. However, in the revised version, we will indicate specifically which of the volatiles we identified overlap with those previously reported from Desmodium, as only the total numbers are summarized in the discussion of the submitted paper.

    1. Reviewer #1 (Public review):

      Summary & Assessment:

      The catalytic core of the eukaryotic decapping complex consists of the decapping enzyme DCP2 and its key activator DCP1. In humans, there are two paralogs of DCP1, DCP1a and DCP1b, that are known to interact with DCP2 and recruit additional cofactors or coactivators to the decapping complex; however, the mechanisms by which DCP1 activates decapping and the specific roles of DCP1a versus DCP1b, remain poorly defined. In this manuscript, the authors used CRISPR/Cas9-generated DCP1a/b knockout cells to begin to unravel some of the differential roles for human DCP1a and DCP1b in mRNA decapping, gene regulation, and cellular metabolism. While this manuscript presents some new and interesting observations on human DCP1 (e.g. human DCP1a/b KO cells are viable and can be used to investigate DCP1 function; only the EVH1 domain, and not its disordered C-terminal region which recruits many decapping cofactors, is apparently required for efficient decapping in cells; DCP1a and b target different subsets of mRNAs for decay and may regulate different aspects of metabolism), there is one key claim about the role of DCP1 in regulating DCP2-mediated decapping that is still incompletely or inconsistently supported by the presented data in this revised version of the manuscript.

      Strengths & well-supported claims:

      • Through in vivo tethering assays in CRISPR/Cas9-generated DCP1a/b knockout cells, the authors show that DCP1 depletion leads to significant defects in decapping and the accumulation of capped, deadenylated mRNA decay intermediates.<br /> • DCP1 truncation experiments reveal that only the EVH1 domain of DCP1 is necessary to rescue decapping defects in DCP1a/b KO cells.<br /> • RNA and protein immunoprecipitation experiments suggest that DCP1 acts as a scaffold to help recruit multiple decapping cofactors to the decapping complex (e.g. EDC3, DDX6, PATL1 PNRC1, and PNRC2), but that none of these cofactors are essential for DCP2-mediated decapping in cells.<br /> • The authors investigated the differential roles of DCP1a and DCP1b in gene regulation through transcriptomic and metabolomic analysis and found that these DCP1 paralogs target different mRNA transcripts for decapping and have different roles in cellular metabolism and their apparent links to human cancers. (Although I will note that I can't comment on the experimental details and/or rigor of the transcriptomic and metabolomic analyses, as these are outside my expertise.)

      Weaknesses & incompletely supported claims:

      (1) One of the key mechanistic claims of the paper is that "DCP1a can regulate DCP2's cellular decapping activity by enhancing DCP2's affinity to RNA, in addition to bridging the interactions of DCP2 with other decapping factors. This represents a pivotal molecular mechanism by which DCP1a exerts its regulatory control over the mRNA decapping process." Similar versions of this claim are repeated in the abstract and discussion sections. However, this claim appears to be at odds with the observations that: (a) in vitro decapping assays with immunoprecipitated DCP2 show that DCP1 knockout does not significantly affect the enzymatic activity of DCP2 (Fig 2C&D; I note that there may be a very small change in DCP2 activity shown in panel D, but this may be due to slightly different amounts of immunoprecipitated DCP2 used in the assay); and (b) the authors show only weak changes in relative RNA levels immunoprecipitated by DCP2 with versus without DCP1 (~2-3 fold change in Fig 3H, where expression of the EVH1 domain, previously shown in this manuscript to fully rescue the DCP1 KO decapping defects in cells, looks to be almost within error of the control in terms of increasing RNA binding). If DCP1 pivotally regulates decapping activity by enhancing RNA binding to DCP2, why is no difference in in vitro decapping activity observed in the absence of DCP1, and very little change observed in the amounts of RNA immunoprecipitated by DCP2 with the addition of the DCP1 EVH1 domain?

      In the revised manuscript and in their response to initial reviews, the authors rightly point out that in vivo effects may not always be fully reflected by or recapitulated in in vitro experiments due to the lack of cellular cofactors and simpler environment for the in vitro experiment, as compared to the complex environment in the cell. I fully agree with this of course! And further completely agree with the authors that this highlights the critical importance of in cell experiments to investigate biological functions and mechanisms! However, because the in vitro kinetic and IP/binding data both suggest that the DCP1 EVH1 domain has minimal to no effects on RNA decapping or binding affinity, while the in cell data suggest the EVH1 domain alone is sufficient to rescue large decapping defects in DCP1a/b KO cells (and that all the decapping cofactors tested were dispensable for this), I would argue there is insufficient evidence here to make a claim that (maybe weakly) enhanced RNA binding induced by DCP1 is what is regulating the cellular decapping activity. Maybe there are as-yet-untested cellular cofactors that bind to the EVH1 domain of DCP1 that change either RNA recruitment or the kinetics of RNA decapping in cells; we can't really tell from the presented data so far. Furthermore, even if it is the case that the EVH1 domain modestly enhances RNA binding to DCP2, the authors haven't shown that this effect is what actually regulates the large change in DCP2 activity upon DCP1 KO observed in the cell.

      Overall, while I absolutely appreciate that there are many possible reasons for the differences observed in the in vitro versus in cell RNA decapping and binding assays, because this discrepancy between those data exists, it seems difficult to draw any clear conclusions about the actual mechanisms by which DCP1 helps regulate RNA decapping by DCP2. For example, in the cell it could be that DCP1 enhances RNA binding, or recruits unidentified cofactors that themselves enhance RNA binding, or that DCP1 allosterically enhances DCP2-mediated decapping kinetics, or a combination of these, etc; my point is that without in vitro data that clearly support one of those mechanisms and links this mechanism back to cellular DCP2 decapping activity (for example, in cell data that show EVH1 mutants that impair RNA binding fail to rescue DCP1 KO decapping defects), it's difficult to attribute the observed in cell effects of DCP1a/b KO and rescue by the EVH1 domain directly to enhancement of RNA binding (precisely because, as the authors describe, the decapping process and regulation may be very complex in the cell!).

      This contradiction between the in vitro and in-cell decapping data undercuts one of the main mechanistic takeaways from the first half of the paper; I still think this conclusion is overstated in the revised manuscript.

      Additional minor comment:

      • Related to point (1) above, the kinetic analysis presented in Fig 2C shows that the large majority of transcript is mostly decapped at the first 5 minute timepoint; it may be that DCP2-mediated decapping activity is actually different in vitro with or without DCP1, but that this is being missed because the reaction is basically done in less than 5 minutes under the conditions being assayed (i.e. these are basically endpoint assays under these conditions). It may be that if kinetics were done under conditions to slow down the reaction somewhat (e.g. lower Dcp2 concentration, lower temperatures), so that more of the kinetic behavior is captured, the apparent discrepancy between in vitro and in-cell data would be much less. Indeed, previous studies have shown that in yeast, Dcp1 strongly activates the catalytic step (kcat) of decapping by ~10-fold, and reduces the KM by only ~2 fold (Floor et al, NSMB 2010). It might be beneficial to use purified proteins here, if possible, to better control reaction conditions.

      In their response to initial reviews, the authors comment that they tried to purify human DCP2 from E coli, but were unable to obtain active enzyme in this way. Fair enough! I will only comment that just varying the relative concentration of immunoprecipitated DCP2 would likely be enough to slow down the reaction and see if activity differences are seen in different kinetic regimes, without the need to obtain fully purified / recombinant Dcp2.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Removing claims of causality: To avoid confusion, we have now removed claims of causality from our manuscript and also changed the title of the manuscript accordingly

      "Electrophysiological dynamics of salience, default mode, and frontoparietal networks during episodic memory formation and recall: A multi-experiment iEEG replication".

      Control analyses directly comparing AI and IFG: As per the reviewer’s suggestion, we have carried out additional control analyses by directly comparing the net inward/outward balance between the AI and the IFG. Our analysis revealed that the net outflow for the AI is significantly higher compared to the IFG during both encoding and recall phases, a pattern that was replicated across all four experiments. 

      These findings further highlight the unique role of the AI as a key hub in coordinating network interactions during episodic memory formation and retrieval, distinguishing it from a key anatomically adjacent prefrontal region implicated in cognitive control.

      We have incorporated these results into the manuscript (see new Figure S6 and updated Results section). 

      Control analyses directly comparing task with resting state: As per the reviewer’s suggestion, we compared the AI's net outflow during task periods to resting state, finding significantly higher outflow during both encoding and recall across all experiments (ps < 0.05). These results provide further evidence for enhanced role of AI net directed information flow to the DMN and FPN during memory processing compared to the resting state. 

      We have incorporated these results into the manuscript (see new Figure S9 and updated Results section). 

      Control analysis using every region of the brain outside the considered networks: We appreciate the reviewer's suggestion to conduct additional control analyses. However, we have concerns about implementing this approach for several reasons:

      (1) Hypothesis-driven research: Our study was designed based on a strong hypothesis derived from prior fMRI studies, which have consistently shown that the salience network (SN), anchored by the anterior insula (AI), plays a critical role in regulating the engagement and disengagement of the default mode network (DMN) and frontoparietal network (FPN) across diverse cognitive tasks.

      (2) Risk of p-hacking: Running analyses on a large number of brain regions outside our networks of interest without a priori hypotheses could lead to p-hacking, a practice strongly criticized in the scientific community, including by eLife editors (Makin & Orban de Xivry, 2019). Such an approach could potentially yield spurious results and undermine the validity of our findings.

      (3) Principled control region selection: Our choice of the inferior frontal gyrus (IFG) as a control region was hypothesis-driven, based on its: a) Anatomical adjacency to the AI b) Involvement in cognitive control functions, including response inhibition c) Frequent coactivation with the AI in fMRI studies. 

      (4) Robustness of current findings: Our PTE analysis involving the IFG, along with the additional control analyses requested by the reviewer (comparing the task-related net balance of the AI with the IFG and with resting state, see response to reviewer comment 2.1), strongly support a key role for the AI in orchestrating large-scale network dynamics during memory processes.

      (5) Specificity of findings: The contrast between AI and IFG results demonstrates that our observed patterns are not general to all task-active regions but are specific to the AI's role in network coordination. 

      We believe that our current analyses, including the additional controls, provide a comprehensive and rigorous examination of the AI's role in memory-related network dynamics. Adding analyses of numerous additional regions without clear hypotheses could potentially dilute the focus and interpretability of our results. 

      However, we acknowledge the importance of considering broader network interactions. In future studies, we could explore the role of other key regions in a hypothesis-driven manner, potentially expanding our understanding of the complex interactions between multiple brain networks during memory processes.

      These revisions, combined with our rigorous methodologies and comprehensive analyses, provide compelling support for the central claims of our manuscript. We believe these changes significantly enhance the scientific contribution of our work.

      Our point-by-point responses to the reviewers' comments are provided below.

      Reviewer 1:

      (1.1) Because phase-transfer entropy is referenced as a "causal" analysis in this investigation (PTE), I believe it is important to highlight for readers recent discussions surrounding the description of "causal mechanisms" in neuroscience (see "Confusion about causation" section from Ross and Bassett, 2024, Nature Neuroscience). A large proportion of neuroscientists (myself included) use "causal" only to refer to a mechanism whose modulation or removal (with direct manipulation, such as by lesion or stimulation) is known to change or control a given outcome (such as a successful behavior). As Ross and Bassett highlight, it is debatable whether such mechanistic causality is captured by Granger "causality" (a.k.a. Granger prediction) or the parametric PTE, and imprecise use of "causation" may be confusing. The authors have defined in the revised Introduction what their definition of "causality" is within the context of this investigation. 

      We appreciate the reviewer's feedback in terms of the terminology used in our manuscript. To avoid confusion, we have now removed claims of causality from our manuscript and also changed the title of the manuscript accordingly. 

      Reviewer 2:

      (2.1) Clarifying the new control analyses. The authors have been responsive to our feedback and implemented several new analyses. The use of a pre-task baseline period and a control brain region (IFG) definitively help to contextualize their results, and the findings shown in the revision do suggest that (1) relative to a pre-task baseline, directed interactions from the AI are stronger and (2) relative to a nearby region, the IFG, the AI exhibits greater outward-directed influence. 

      However, it is difficult to draw strong quantitative conclusions from the analyses as presented, because they do not directly statistically contrast the effect in question (directed interactions with the FPN and DMN) between two conditions (e.g. during baseline vs. during memory encoding/retrieval). As I understand it, in their main figures the authors ask, "Is there statistically greater influence from the AI to the DMN/FPN in one direction versus another?" And in the AI they show greater "outward" PTE than "inward" PTE from other networks during encoding/retrieval. The balance of directed information favors an outward influence from the AI to DMN/FPN. 

      But in their new analyses, they simply show that the degree of "outward" PTE is greater during task relative to baseline in (almost) all tasks. I believe a more appropriately matched analysis would be to quantify the inward/outward balance during task states, quantify the inward/outward balance during rest states, and then directly statistically compare the two. It could be that the relative balance of directed information flow is nonsignificantly changed between task and rest states, which would be important to know. 

      We thank the reviewer for this suggestion. We have now run additional analysis by directly comparing the inward/outward balance during the task versus the rest states. To calculate the net inward/outward balance, we calculated the net outflow as the difference between the total outgoing information and total incoming information (PTE(out)–PTE(in)). This analysis revealed that net outflow during task periods is significantly higher compared to rest, during both encoding and recall, and across the four experiments (ps < 0.05). These results provide further evidence for enhanced role of AI net directed information flow to the DMN and FPN during memory processing compared to the resting state. These new results have now been included in the revised manuscript (page 12). 

      Likewise, a similar principle applies to their IFG analysis. They show that the IFG tends to have an "inward" balance of influence from the DMN/FPN (the opposite of the AIs effect), but this does not directly answer whether the AI occupies a statistically unique position in terms of the magnitude of its influence on other regions. More appropriate, as I suggest above, would be to quantify the relative balance inward/outward influence, both for the IFG and the AI, and then directly compare those two quantities. (Given the inversion of the direction of effect, this is likely to be a significant result, but I think it deserves a careful approach regardless.) 

      We appreciate the reviewer's suggestion. As per the reviewer’s suggestion, we directly compared the net inward/outward balance between the AI and the IFG. Specifically, we compared the net outflow (PTE(out)–PTE(in)) for the AI with the IFG. This analysis revealed that the net outflow for the AI is significantly higher compared to the IFG during both encoding and recall, and across the four experiments. These findings further highlight a key role for the AI in orchestrating large-scale network dynamics during memory processes. The AI's pattern of directed information flow stands in contrast to that of the IFG, despite their anatomical proximity and shared involvement in cognitive control processes. This dissociation underscores the specificity of the AI's function in coordinating network interactions during memory formation and retrieval. These new results have now been included in our revised manuscript (page 11). 

      (2.2) Consider additional control regions. The authors justify their choice of IFG as a control region very well. In my original comments, I perhaps should have been more clear that the most compelling control analyses here would be to subject every region of the brain outside these networks (with good coverage) to the same analysis, quantify the degree of inward/outward balance, and then see how the magnitude of the AI effect stacks up against all possible other options. If the assertion is that the AI plays a uniquely important role in these memory processes, showing how its influence stacks up against all possible "competitors" would be a very compelling demonstration of their argument. 

      We thank the reviewer for this suggestion. However, please note that running a large number of random analysis by including a large number of brain regions (every region of the brain outside these networks) and comparing their dynamics to the AI without a hypothesis or solid principle amounts to p-hacking, which has been previously strongly criticized by the eLife editors (Makin & Orban de Xivry, 2019). Our study was strongly driven by a solid hypothesis based on prior fMRI studies that have shown that the SN, anchored by the anterior insula (AI), plays a critical role in regulating the engagement and disengagement of the DMN and FPN across diverse cognitive tasks (Bressler & Menon, 2010; Cai et al., 2016; Cai, Ryali, Pasumarthy, Talasila, & Menon, 2021; Chen, Cai, Ryali, Supekar, & Menon, 2016; Kronemer et al., 2022; Raichle et al., 2001; Seeley et al., 2007; Sridharan, Levitin, & Menon, 2008). Moreover, our selection of the IFG as a control region for comparison was also very strongly hypothesis driven, due to its anatomical adjacency to the AI, its involvement in a wide range of cognitive control functions including response inhibition (Cai, Ryali, Chen, Li, & Menon, 2014), and its frequent co-activation with the AI in fMRI studies. Furthermore, the IFG has been associated with controlled retrieval of memory (Badre, Poldrack, Paré-Blagoev, Insler, & Wagner, 2005; Badre & Wagner, 2007; Wagner, Paré-Blagoev, Clark, & Poldrack, 2001), making it a compelling region for comparison. Our findings related to the PTE analysis involving the IFG and also the additional control analyses requested by the reviewer (directly comparing the task-related net balance of the AI with the IFG and also to resting state, please see response to reviewer comment 2.1) strongly highlight a key role of the AI in orchestrating large-scale network dynamics during memory processes. 

      We believe that our current analyses, including the additional controls, provide a comprehensive and rigorous examination of the AI's role in memory-related network dynamics. Adding analyses of numerous additional regions without clear hypotheses could potentially dilute the focus and interpretability of our results.

      However, we acknowledge the importance of considering broader network interactions. In future studies, we could explore the role of other key regions in a hypothesis-driven manner, potentially expanding our understanding of the complex interactions between multiple brain networks during memory processes.

      (2.3) Reporting of successful vs. unsuccessful memory results. I apologize if I was not clear in my original comment (2.7, pg. 13 of the response document) regarding successful vs. unsuccessful memory. The fact that no significant difference was found in PTE between successful/unsuccessful memory is a very important finding that adds valuable context to the rest of the manuscript. I believe it deserves a figure, at least in the Supplement, so that readers can visualize the extent of the effect in successful/unsuccessful trials. This is especially important now that the manuscript has been reframed to focus more directly on claims regarding episodic memory processing; if that is indeed the focus, and their central analysis does not show a significant effect conditionalized on the success of memory encoding/retrieval, it is important that readers can see these data directly.

      As per the reviewer’s suggestion, we have now included a Figure related to the results for the successful versus unsuccessful comparison in the Supplementary materials of the revised manuscript (Figures S10, S11).   

      (2.4) Claims regarding causal relationships in the brain. I understand that the authors have defined "causal" in a specific way in the context of their manuscript; I do believe that as a matter of clear and transparent scientific communication, the authors nonetheless bear a responsibility to appreciate how this word may be erroneously interpreted/overinterpreted and I would urge further review of the manuscript to tone down claims of causality. Reflective of this, I was very surprised that even as both reviewers remarked on the need to use the word "causal" with extreme caution, the authors added it to the title in their revised manuscript.

      We thank the reviewer for this suggestion. To avoid confusion, we have now removed claims of causality from our manuscript and also changed the title of the manuscript accordingly. 

      References 

      Badre, D., Poldrack, R. A., Paré-Blagoev, E. J., Insler, R. Z., & Wagner, A. D. (2005). Dissociable controlled retrieval and generalized selection mechanisms in ventrolateral prefrontal cortex. Neuron, 47(6), 907-918. doi:10.1016/j.neuron.2005.07.023

      Badre, D., & Wagner, A. D. (2007). Left ventrolateral prefrontal cortex and the cognitive control of memory. Neuropsychologia, 45(13), 2883-2901. doi:10.1016/j.neuropsychologia.2007.06.015

      Bressler, S. L., & Menon, V. (2010). Large-scale brain networks in cognition: emerging methods and principles. Trends in Cognitive Sciences, 14(6), 277-290. doi:10.1016/j.tics.2010.04.004

      Cai, W., Chen, T., Ryali, S., Kochalka, J., Li, C. S., & Menon, V. (2016). Causal Interactions Within a Frontal-Cingulate-Parietal Network During Cognitive Control: Convergent Evidence from a Multisite-Multitask Investigation. Cereb Cortex, 26(5), 2140-2153. doi:10.1093/cercor/bhv046

      Cai, W., Ryali, S., Chen, T., Li, C. S., & Menon, V. (2014). Dissociable roles of right inferior frontal cortex and anterior insula in inhibitory control: evidence from intrinsic and taskrelated functional parcellation, connectivity, and response profile analyses across multiple datasets. J Neurosci, 34(44), 14652-14667. doi:10.1523/jneurosci.3048-14.2014

      Cai, W., Ryali, S., Pasumarthy, R., Talasila, V., & Menon, V. (2021). Dynamic causal brain circuits during working memory and their functional controllability. Nat Commun, 12(1), 3314. doi:10.1038/s41467-021-23509-x

      Chen, T., Cai, W., Ryali, S., Supekar, K., & Menon, V. (2016). Distinct Global Brain Dynamics and Spatiotemporal Organization of the Salience Network. PLOS Biology, 14(6), e1002469. doi:10.1371/journal.pbio.1002469

      Kronemer, S. I., Aksen, M., Ding, J. Z., Ryu, J. H., Xin, Q., Ding, Z., . . . Blumenfeld, H. (2022). Human visual consciousness involves large scale cortical and subcortical networks independent of task report and eye movement activity. Nat Commun, 13(1), 7342. doi:10.1038/s41467-022-35117-4

      Makin, T. R., & Orban de Xivry, J. J. (2019). Ten common statistical mistakes to watch out for when writing or reviewing a manuscript. Elife, 8. doi:10.7554/eLife.48175

      Raichle, M. E., MacLeod, A. M., Snyder, A. Z., Powers, W. J., Gusnard, D. A., & Shulman, G. L. (2001). A default mode of brain function. Proc Natl Acad Sci U S A, 98(2), 676-682. doi:10.1073/pnas.98.2.676

      Seeley, W. W., Menon, V., Schatzberg, A. F., Keller, J., Glover, G. H., Kenna, H., . . . Greicius, M. D. (2007). Dissociable Intrinsic Connectivity Networks for Salience Processing and Executive Control. Journal of Neuroscience, 27(9), 2349-2356. doi:10.1523/JNEUROSCI.5587-06.2007

      Sridharan, D., Levitin, D. J., & Menon, V. (2008). A critical role for the right fronto-insular cortex in switching between central-executive and default-mode networks. Proceedings of the National Academy of Sciences, 105(34), 12569-12574. doi:10.1073/pnas.0800005105

      Wagner, A. D., Paré-Blagoev, E. J., Clark, J., & Poldrack, R. A. (2001). Recovering meaning: left prefrontal cortex guides controlled semantic retrieval. Neuron, 31(2), 329-338. doi:10.1016/s0896-6273(01)00359-2

    1. Welcome to this video which will be a fairly high level introduction to YAML.

      Now YAML stands for YAML 8 Markup Language and for any key observers that's a recursive acronym.

      Now I want this video to be brief but I think it's important that you understand YAML's structure.

      So let's jump in and get started.

      YAML is a language which is human readable and designed for data serialization.

      Now that's a mouthful but put simply it's a language for defining data or configuration which is designed to be human readable.

      At a high level a YAML document is an unordered collection of key value pairs separated by a colon.

      It's important that you understand this lack of order.

      At this top level there is no requirement to order things in a certain way.

      Although there may be conventions and standards none of that is imposed by YAML.

      An example key value pair might be the key being cat1 and the value being raffle.

      One of my cats in this example both the key and the value are just normal strings.

      We could further populate our YAML file with a key of cat2 and a value of truffles and other cat of mine.

      Or a key of cat3 and a value of penny and a key of cat4 and a value of winkey.

      These are all strings.

      Now YAML supports other types numbers such as one and two, floating point values such as 1.337, boolean so true or false and even null which represents nothing.

      Now YAML also supports other types and one of those are lists known as arrays or other names depending on what if any programming languages that you're used to.

      A list is essentially an ordered set of values and in YAML we can represent a list by having a key let's say Adrian's cats.

      And then as a value we might have something that looks like this, a comma separated set of values inside swear brackets.

      Now this is known as inline format where the list is placed where you expect the value to be after the key and the colon.

      Now the same list can also be represented like this where you have the key and then a colon and then you go to a new line and each item in the list is represented by hyphen and then the value.

      Now notice how for some of the values are actually enclosed in speech marks or quotation marks and so on.

      This is optional.

      All of these are valid.

      Often though it's safe for you to enclose things as it allows you to be more precise and it avoids confusion.

      Now in YAML indentation really matters.

      Indentation is always done using spaces and the same level of indentation means that the things are within the same structure.

      So we know that because all of these list items are indented by the same amount they're all part of the same list.

      We know they're a list because of the hyphens.

      So same indent always using hyphens means that they're all part of the same list, same structure.

      Now these two styles are two methods for expressing the same thing.

      A key called Adrian's cats whose value is a list.

      This is the same structure.

      It represents the same data.

      Now there's one final thing which I want to cover with YAML and that's a dictionary.

      A dictionary is just a data structure.

      It's a collection of key value pairs which are unordered.

      A YAML template has a top level dictionary.

      It's a collection of key value pairs.

      So let's look at an example.

      Now this looks much more complicated but it's not if you just follow it through from the start.

      So we start with a key value pair.

      Adrian's cats at the top.

      So the key is Adrian's cats and the value is a list.

      And we can tell that it's a list because of the hyphens which are the same level of indentation.

      But, and this is important, notice how for each list item we don't just have the hyphen and a value.

      Instead we have the hyphen and for each one we have a collection of key value pairs.

      So for the final list item at the bottom we have a dictionary containing a number of key value pairs.

      The first has a key of name with a value of winky.

      The second a key color with a value of white.

      And then for this final list item a key, num of eyes and a value of one.

      And each item in this list, each value is a dictionary.

      A collection of one or more key value pairs.

      So values can be strings, numbers, floats, booleans, lists or dictionaries or a combination of any of them.

      Note how the color key value pair in the top list item, so the raffle dictionary at the top, its value is a list.

      So this structure that's on screen now, we have Adrian's cats which are a value, has a list.

      Each value in the list is a dictionary.

      Each dictionary contains a name, key, with a value, a color key, with a value.

      And then the third item in the list also has a num of eyes key and a value.

      Now using YAML key value pairs, lists and dictionaries allows you to build complex data structures in a way which once you have practice is very human readable.

      In this case, it's a database of somebody's cats.

      Now YAML can be read into an application or written out by an application.

      And YAML is commonly used for the storage and passing of configuration.

      For now thanks for watching, go ahead, complete the video and when you're ready I'll look forward to you joining me in the next.

    1. Welcome back.

      In this fundamentals video, I want to briefly talk about Kubernetes, which is an open source container orchestration system.

      You use it to automate the deployment, scaling and management of containerized applications.

      At a super high level, Kubernetes lets you run containers in a reliable and scalable way, making a vision fuse of resources, and lets you expose your containerized applications to the outside world or your business.

      It's like Docker, only with robots automated and super intelligence for all of the thinking.

      Now, Kubernetes is a cloud agnostic product, so you can use it on premises and within many public cloud platforms.

      Now, I want to keep this video to a super high level architectural overview, but that's still a lot to cover.

      So let's jump in and get started.

      Let's quickly step through the architecture of the Kubernetes cluster.

      A cluster in Kubernetes is a highly available cluster of compute resources, and these are organized to work as one unit.

      The cluster starts with a cluster control plane, which is the part which manages the cluster.

      It performs scheduling, application management, scaling and deployment, and much more.

      Compute within a Kubernetes cluster is provided via nodes, and these are virtual or physical servers, which function as a worker within the cluster.

      These are the things which actually run your containerized applications.

      Running on each of the nodes is software, and at minimum, this is container D or another container runtime, which is the software used to handle your container operations.

      And next, we have KubeLit, which is an agent to interact with the cluster control plane.

      And on each of the nodes communicates with the cluster control plane using Kubernetes API.

      Now, this is the top level functionality of the Kubernetes cluster.

      The control plane orchestrates containerized applications which run on nodes.

      But now let's explore the architecture of control planes and nodes in a little bit more detail.

      On this diagram, I've zoomed in a little.

      We have the control plane at the top and a single cluster node at the bottom, complete with the minimum Docker and KubeLit software running for control plane communications.

      Now, on to step through the main components which might run within the control plane and on the cluster nodes.

      Keep in mind, this is a fundamental level video.

      It's not meant to be exhaustive.

      Kubernetes is a complex topic, so I'm just covering the parts that you need to understand to get started.

      Now, the cluster will also likely have many more nodes.

      It's rare that you only have one node unless this is a testing environment.

      Now, first, I want to talk about pods and pods at the smallest unit of computing within Kubernetes.

      You can have pods which have multiple containers and provide shared storage and networking for those pods.

      But it's very common to see a one-container, one-pod architecture, which as the name suggests, means each pod contains only one container.

      Now, when you think about Kubernetes, don't think about containers.

      Think about pods.

      You're going to be working with pods and you're going to be managing pods.

      The pods handle the containers within them.

      Architecturally, you would generally only run multiple containers in a pod when those containers are tightly coupled and require close proximity and rely on each other in a very tightly coupled way.

      Additionally, although you'll be exposed to pods, you'll rarely manage them directly.

      Pods are non-permanent things.

      In order to get the maximum value from Kubernetes, you need to view pods as temporary things which are created, do a job, and are then disposed of.

      Pods can be deleted when finished, evicted for lack of resources, or the node itself fails.

      They aren't permanent and aren't designed to be viewed as highly available entities.

      There are other things linked to pods which provide more permanence, but more on that elsewhere.

      So now let's talk about what runs on the control plane.

      Firstly, I've already mentioned this one, the API, known formally as Q-API server.

      This is the front end for the control plane.

      It's what everything generally interacts with to communicate with the control plane, and it can be scaled horizontally for performance and to ensure high availability.

      Next, we have ETCD, and this provides a highly available key value store.

      So a simple database running within the cluster, which acts as the main backing store for data for the cluster.

      Another important control plane component is Q-scheduler, and this is responsible for constantly checking for any pods within the cluster which you don't have a node assigned.

      And then it assigns a node to that pod based on resource requirements, deadlines, affinity, or anti-affinity, data locality needs, and any other constraints.

      Remember, nodes are the things which provide the raw compute and other resources to the cluster, and it's this component which makes sure the nodes get utilized effectively.

      Next, we have an optional component, the Cloud Controller Manager, and this is what allows Kubernetes to integrate with any cloud providers.

      It's common that Kubernetes runs on top of other cloud platforms such as AWS, Azure, or GCP, and it's this component which allows the control plane to closely interact with those platforms.

      Now, it is entirely optional, and if you run a small Kubernetes deployment at home, you probably won't be using this component.

      Now, lastly, in the control plane is the Q-Controller Manager, and this is actually a collection of processors.

      We've got the node controller, which is responsible for monitoring and responding to any node outages, the job controller, which is responsible for running pods in order to execute jobs, the endpoint controller, which populates endpoints in the cluster, more on this in a second, but this is something that links services to pods.

      Again, I'll be covering this very shortly.

      And then the service account and token controller, which is responsible for account and API token creation.

      Now, again, I haven't spoken about services or endpoints yet, just stick with me.

      I will in a second.

      Now, lastly, on every node is something called K-Proxy, known as Cube Proxy, and this runs on every node and coordinates networking with the cluster control plane.

      It helps implement services and configs rules allowing communications with pods from inside or outside of the cluster.

      You might have a Kubernetes cluster, but you're going to want some level of communication with the outside world, and that's what Cube Proxy provides.

      Now, that's the architecture of the cluster and nodes in a little bit more detail, but I want to finish this introduction video with a few summary points of the terms that you're going to come across.

      So, let's talk about the key components.

      So, we start with the cluster, and conceptually, this is a deployment of Kubernetes.

      It provides management orchestration, healing, and service access.

      Within a cluster, we've got the nodes which provide the actual compute resources, and pods run on these nodes.

      A pod is one or more containers, and it's the smallest admin unit within Kubernetes, and often, as I mentioned previously, you're going to see the one container, one pod architecture.

      Simply put, it's cleaner.

      Now, a pod is not a permanent thing, it's not long-lived.

      The cluster can and does replace them as required.

      Services provide an abstraction from pods, so the service is typically what you will understand as an application.

      An application can be containerized across many pods, but the service is the consistent thing, the abstraction.

      Service is what you interact with if you access a containerized application.

      Now, we've also got a job, and a job is an ad hoc thing inside the cluster.

      Think of it as the name suggests, as a job.

      A job creates one or more pods, runs until it completes, retries if required, and then finishes.

      Now, jobs might be used as back-end isolated pieces of work within a cluster.

      Now, something new that I haven't covered yet, and that's Ingress.

      Ingress is how something external to the cluster can access a service.

      So, you have external users, they come into an Ingress, that's routed through the cluster to a service, the service points at one or more pods, which provides the actual application.

      So, Ingress is something that you will have exposure to when you start working with Kubernetes.

      And next is an Ingress controller, and that's a piece of software which actually arranges for the underlying hardware to allow Ingress.

      For example, there is an AWS load balancer, Ingress controller, which uses application and network load balancers to allow the Ingress.

      But there are also other controllers such as Nginx and others for various cloud platforms.

      Now, finally, and this one is really important, generally it's best to architect things within Kubernetes to be stateless from a pod perspective.

      Remember, pods are temporary.

      If your application has any form of long-running state, then you need a way to store that state somewhere.

      Now, state can be session data, but also data in the more traditional sense.

      Any storage in Kubernetes by default is ephemeral, provided locally by a node, and thus, if a pod moves between nodes, then that storage is lost.

      Conceptually, think of this like instant store volumes running on AWS EC2.

      Now, you can configure persistent storage known as persistent volumes or PVs, and these are volumes whose lifecycle lives beyond any one single pod, which is using them.

      And this is how you would provision normal long-running storage to your containerized applications.

      Now, the details of this are a little bit beyond this introduction level video, but I wanted you to be aware of this functionality.

      OK, so that's a high-level introduction to Kubernetes.

      It's a pretty broad and complex product, but it's super powerful when you know how to use it.

      This video only scratches the surface.

      If you're watching this as part of my AWS courses, then I'm going to have follow-up videos which step through how AWS implements Kubernetes with their EKS service.

      If you're taking any of the more technically deep AWS courses, then maybe other deep-dive videos into specific areas that you need to be aware of.

      So there may be additional videos covering individual topics at a much deeper level.

      If there are no additional videos, then don't worry, because that's everything that you need to be aware of.

      Thanks for watching this video.

      Go ahead and complete the video, and when you're ready, I look forward to you joining me in the next.

    1. And if you subscribe to the idea that language learning in general is difficult, you may not even start to learn at all.

      I think this is very true specifically regarding age. In class we talked about the common idea that you can't learn a language after the age of 13. This idea can make it very intimidating for people wanting to start learning a new language as an adult. In reality the truth behind the rumor is that you can't completely gain a new accent after 13 but even that is not always the case.

    1. Descartes presents one of the most well-discussed arguments for scepticism – the view that we cannot have knowledge – by asking the reader to consider the possibility that she is dreaming.

      Having scepticism to a certain extent can help the human mind come up with various questions, allowing them to deeply think about the topic and find value in them. However, while it may cause one to thoroughly think about what they do on the daily, it may pose a threat to them as a result of coming up with unnecssary and troublesome scenarios.

    2. One answer to this question is pragmatic – philosophy teaches you to think and write logically and clearly. This, we tell our students, will be of use to them no matter what path they pursue. We advertise philosophy, then, as a broadly useful means to a variety of ends.

      There are many different perspectives as to why one should study philosophy and while this perspective may seem simple, it is an extremely useful skill to be able to "think and write logically and clearly." Regardless of major or occupation, everyone has to use these skills every day as it allows us to communicate easier.

    3. The deep underlying idea is that if we have to choose a social and political arrangement without knowing the position that we may occupy in society, we will choose fair principles to govern our social and political institutions. My teacher had our class re-enact a scenario very much like this one in class. We discussed the principles that would govern our imagined society before we picked our fate out of a hat. Until that point in my young life, I had never thought about justice in that way. The power of this exercise contributed in no small way to my becoming a philosopher. I have recreated a similar activity in various classes I have taught. The discussion it generates among students is reliably superb, but the best moment is when students discover their fate – whether they end up being a doctor or a garbage truck driver or a poor young mother – and have to reckon (at least for that class period) with their principles. Many philosophers have persuasively criticized Rawls’ use of the original position as an argumentative tool. But we often forget, I think, how successfully it harnesses the power of the imagination to construct an alternative vision of what society could be like.

      This seems like a good way to recall people into seeing more just and humanely as they are not sure how their own policies will affect their unknown life.

    4. During the first round of this exercise, students inevitably take so many fish that there are none left in the lake. Students then discuss what has happened and what they ought to do differently in the next round. Some students have strong intuitions that everybody should take an equal amount, while others insist that all that matters is that in the end there are enough fish left to repopulate the lake. Not only is this exercise pedagogically engaging, but it leads students to develop proposals and to evaluate them critically.

      It is hard to think ahead when you have to self conserve and take care of those who you love. This is why we fail to make considerable change as a population regardless of the changes individuals may make.

    1. Author response:

      The following is the authors’ response to the current reviews.

      We thank the reviewers and editor for their positive assessment of our work. For the Version of Record, we have made small revisions addressing the remaining concerns of reviewer #3. We have also reformatted the supplementary material to conform to eLife’s style.

      While the manuscript was under review, we discussed our work with Bill Bialek, who suggested clarifying the effect of cell rearrangements on genetic patterns. Using the tracked cell trajectories we found that the highly coordinated intercalations in the germ band preserve the relative AP positions of cells. We have added an Appendix subsection (Appendix 1.5) explaining this finding and highlighting its relevance in a short paragraph added to the discussion.

      Reviewer #2

      Main comment from 1st review:

      Weaknesses:

      The modeling is interesting, with the integration of tension through tension triangulation around vertices and thus integrating force inference directly in the vertex model. However, the authors are not using it to test their hypothesis and support their analysis at the tissue level. Thus, although interesting, the analysis at the tissue level stays mainly descriptive.

      Comments on the revised version:

      My main concern was that the author did not use the analysis of mutant contexts such as Snail and Twist to confirm their predictions. They made a series of modifications, clarifying their conclusions. In particular, they now included an analysis of Snail mutant and show that isogonal deformations in the ventro-lateral regions are absent when the external pulling force of the VF is abolished, supporting the idea that isogonal strain could be used as an indicator of external forces (Fig7 and S6).

      They further discuss their results in the context of what was published regarding the mutant backgrounds (fog, torso-like, scab, corkscrew, ksr) where midgut invagination is disrupted, and where germ band buckles, and propose that this supports the importance of internal versus external forces driving GBE.

      Overall, these modifications, in addition to clarifications in the text, clearly strengthen the manuscript.

      We thank the reviewer for assessing our manuscript again and are happy to hear that they find the added data on the snail mutant convincing and that our revised manuscript is stronger.

      Reviewer #3

      In their article "The Geometric Basis of Epithelial Convergent Extension", Brauns and colleagues present a physical analysis of drosophila axis extension that couples in toto imaging of cell contours (previously published dataset), force inference, and theory. They seek to disentangle the respective contributions of active vs passive T1 transitions in the convergent extension of the lateral ectoderm (or germband) of the fly embryo.

      The revision made by the authors has greatly improved their work, which was already very interesting, in particular the use of force inference throughout intercalation events to identify geometric signatures of active vs passive T1s, and the tension/isogonal decomposition. The new analysis of the Snail mutant adds a lot to the paper and makes their findings on the criteria for T1s very convincing.

      About the tissue scale issues raised during the first round of review. Although I do not find the new arguments fully convincing (see below), the authors did put a lot of effort to discuss the role of the adjacent posterior midgut (PMG) on extension, which is already great. That will certainly provide the interested readers with enough material and references to dive into that question.

      We appreciate the referee’s positive assessment of our manuscript and their careful reading and constructive feedback. In particular, we are happy to hear that the referee finds our added data on the snail mutant very convincing and finds that the extended discussion on the role of the PMG is helpful. We address the remaining concerns in our detailed response below.

      I still have some issues with the authors' interpretation on the role of the PMG, and on what actually drives the extension. Although it is clear that T1 events in the germ band are driven by active local tension anisotropy (which the authors show but was already well-established), it does not show that the tissue extension itself is powered by these active T1s. Their analysis of "fence" movies from Collinet et al 2015 (Tor mutants and Eve RNAi) is not fully convincing. Indeed, as the authors point out themselves, there is no flow in Tor mutant embryos, even though tension anisotropy is preserved. They argue that in Tor embryos the absence of PMG movement leaves no room for the germband to extend properly, thus impeding the flow. That suggests that the PMG acts as a barrier in Tor mutants - What is it attached to, then?

      We thank the referee for pointing out this omission: The PMG is attached to the vitelline membrane in the scab domain (Munster et al. Nature 2019) and is also obstructed from moving by more anterior laying tissue (amnioserosa). It therefore acts as an obstacle for GBE extension if it fails to invaginate (e.g. in a Tor embryo). We have clarified this in the discussion of the Tor mutants.

      The authors also argue that the posterior flow is reduced in "fenced" Eve RNAi embryos (which have less/no tension anisotropy), to justify their claim that it is the anisotropy that drives extension. However, previous data, including some of the authors' (Irvine and Wieschaus, 1994 - Fig 8), show that the first, rapid phase of germband extension is left completely unaffected in Eve mutants (that lack active tension anisotropy). Although intercalation in Eve mutants is not quantified in that reference, this was later done by others, showing that it is strongly reduced.

      The quantification of GBE in Irvine and Wieschaus 1994 was based on the position of the PMG from bright field imaging, making it hard to distinguish the contributions of ventral furrow, PMG, and germ band, particularly during the early phase of GBE where all these processes happen simultaneously. More detailed quantifications based on PIV analysis of in toto light-sheet imaging show significantly reduced tissue flow in eve mutants after the completion of ventral furrow invagination (Lefebvre et al., eLife 2023). That the initial fast flow is driven by ventral furrow invagination, not by the PMG is apparent from twist/snail embryos where the initial phase is significantly slower (Lefebvre et al., eLife 2023, Gustafson et al., Nat Comms 2022). We have added these references to the re-analysis and discussion of the Collinet et al 2015 experiments.

      Similarly, the Cyto-D phenotype from Clement et al 2017, in which intercalation is also strongly reduced, also displays normal extension.

      We agree that a careful quantification of tissue flow in Cyto-D-treated embryos would be interesting. Whether they show normal extension is not clear from the Clement et al. 2017 paper, as no quantification of total tissue flow is performed and no statements regarding extension are made there.

      Reviewer #3 (Recommendations For The Authors):

      • A lot of typos / grammar mistakes / repetitions are still found here and there in the paper. Authors should plan a careful re-reading prior to final publication.

      We have carefully checked the manuscript and fixed the typos and grammar mistakes.

      • I failed to point to a very relevant reference in the previous round of review, which I think the authors should cite and comment: A review by Guirao & Bellaiche on the mechanics of intercalation in the fly germband, which notably discusses the passive/active and stress-relaxing/stress-generating nature of T1s. (Guirao and Bellaiche, Current opinions in cell biology 2017), in particular figures 1 and 2.

      We thank the referee for pointing us to this relevant reference which we now cite in the introduction.

      • Any new arguments/discussion the authors see fit to include in the paper to comment on the Eve/Tor phenotypes. As far as I am concerned, I am not fully convinced at the moment (see review), but I think the paper has other great qualities and findings, and now (since the first round of review) sufficiently discusses that particular matter. I leave it up to the authors how much (more) they want to delve into this in their final version!

      We have added clarifications and references to the discussion of the Eve/Tor phenotypes.


      The following is the authors’ response to the original reviews.

      Public Review:

      Joint Public Review:

      Summary:

      Brauns et al. work to decipher the respective contribution of active versus passive contributions to cell shape changes during germ band elongation. Using a novel quantification tool of local tension, their results suggest that epithelial convergent extension results from internal forces.

      Reading this summary, and the eLife assessment, we realized that we failed to clearly communicate important aspects of our findings in the first version of our manuscript. We therefore decided to largely restructure and rewrite the abstract and introduction to emphasize that:

      ● Our analysis method identifies active vs passive contributions to cell and tissue shape changes during epithelial convergent extension

      ● In the context of Drosophila germ band extension, this analysis provides evidence for a major role for internal driving forces rather than external pulling force from neighboring tissue regions (posterior midgut), thus settling a question that has been debated due to apparently conflicting evidence from different experiments.

      ● Our findings have important implications for local, bottom-up self-organization vs top-down genetic control of tissue behaviors during morphogenesis.

      Strengths:

      The approach developed here, tension isogonal decomposition, is original and the authors made the demonstration that we can extract comprehensive data on tissue mechanics from this type of analysis.

      They present an elegant diagram that quantifies how active and passive forces interact to drive cell intercalations.

      The model qualitatively recapitulates the features of passive and active intercalation for a T1 event.

      Regions of high isogonal strains are consistent with the proximity of known active regions.

      We think this statement is somewhat ambiguous and does not summarize our findings precisely. A more precise statement would be that high isogonal strain identifies regions of passive deformation, which is caused by adjacent active regions.

      They define a parameter (the LTC parameter) which encompasses the geometry of the tension triangles and allows the authors to define a criterium for T1s to occur.

      The data are clearly presented, going from cellular scale to tissue scale, and integrating modeling approaches to complement the thoughtful description of tension patterns.

      Weaknesses:

      The modeling is interesting, with the integration of tension through tension triangulation around vertices and thus integrating force inference directly in the vertex model. However, the authors are not using it to test their hypothesis and support their analysis at the tissue level. Thus, although interesting, the analysis at the tissue level stays mainly descriptive.

      We fully agree that a full tissue scale model is crucial to support the claims about tissue scale self-organization we make in the discussion. However, the full analysis of such a model is beyond the scope of the present manuscript. We have therefore split off that analysis into a companion manuscript (Claussen et al. 2023). In this paper, we show that the key results of the tissue-scale analysis of the Drosophila embryo, in particular the order-to-disorder transition associated with slowdown of tissue flow, are reproduced and rationalized by our model.

      We now refer more closely to this companion paper to point the reader to the results presented there.

      Major points:

      (1) The authors mention that from their analysis, they can predict what is the tension threshold required for intercalations in different conditions and predict that in Snail and Twist mutants the T1 tension threshold would be around √2. Since movies of these mutants are most probably available, it would be nice to confirm these predictions.

      This is an excellent suggestion. We have included an analysis of a recording of a Snail mutant, which is presented in the new Figures 4 and S6. As predicted, we find that isogonal deformations in the ventro-lateral regions are absent when the external pulling force of the VF is abolished. Further, in the absence of isogonal deformation, T1 transitions indeed occur at a critical tension of approx. √2, as predicted by our model. Both of these results provide important experimental evidence for our model and for isogonal strain as a reliable indicator of external forces.

      (2) While the formalism is very elegant and convincing, and also convincingly allows making sense of the data presented in the paper, it is not all that clear whether the claims are compatible with previous experimental observations. In particular, it has been reported in different papers (including Collinet et al NCB 2015, Clement et al Curr Biol 2017) that affecting the initial Myosin polarity or the rate of T1s does not affect tissue-scale convergent extension. Analysis/discussion of the Tor phenotype (no extension with myosin anisotropy) and the Eve/Runt phenotype (extension without Myosin anisotropy), which seem in contradiction with an extension mostly driven by myosin anisotropy.

      We are happy to read that the referees find our approach elegant and convincing. The referees correctly point out that we have failed to clearly communicate how our findings connect to the existing literature on Drosophila GBE. Indeed, the conflicting results reported in the literature on what drives GBE – internal forces (myosin anisotropy) or external forces (pulling by the posterior midgut) – were a motivation for our study. We have extensively rewritten the introduction, results section (“Isogonal strain identifies regions of passive tissue deformation”), and discussion (“Internal and external contributions to germ band extension”) in response to the referee’s request.

      In brief, distinguishing active internal vs passive external driving of tissue flow has been a fundamental open question in the literature on morphogenesis. Our tension-isogonal decomposition now provides a way to answer this question on the cell scale, by identifying regions of passive deformation due to external forces. As we now explain more clearly, our analysis shows that germ band extension is predominantly driven by internal tension dynamics, and not pulling forces from the posterior midgut.

      We put this cell-scale evidence into the context of previous experimental observations on the tissue scale: Genetic mutants (fog, torso-like, scab, corkscrew, ksr), where posterior midgut invagination is disrupted (Muenster et al. 2019, Smits et al. 2023). In these mutants, the germ band buckles forming ectopic folds or twists into a corkscrew shape as it extends, pointing towards a buckling instability characteristic of internally driven extensile flows.

      To address the apparently conflicting evidence from Collinet et al. 2015, we carried out a

      quantitative re-analysis of the data presented in that reference (see new SI section 3 and Fig.

      S11). The results support the conclusion that the majority of GBE flow is driven internally, thus resolving the apparent conflict.

      Lastly, as far as we understand, Clement et al. 2017 appears to be compatible with our picture of active T1 transitions. Clement et al. report that the actin cortex, when loaded by external forces, behaves visco-elastically with a relaxation time of the order of minutes, in line with our model for emerging interfaces post T1.

      We again thank the referees for prompting us to address these important issues and believe that including their discussion has significantly strengthened our manuscript.

      Recommendations for the authors:

      Minor points:

      - Fig 2 : authors should state in the main text at which scale the inverse problem is solved. (Intercalating quartet, if I understood correctly from the methods) ? and they should explain and justify their choice (why not computing the inverse at a larger scale).

      We have rephrased the first sentence of the section “Cell scale analysis” to clarify that we use local tension inference. This local inference is informative about the relative tension of one interface to its four neighbors. The focus on this local level is justified because we are interested in local cell behaviors, namely rearrangements. Tension inference is also most robust on the local level, since this is where force balance, the underlying physical determinant of the link between mechanics and geometry, resides. In global tension inference, spurious large scale gradients can appear when small deviations from local force balance accumulate over large distances. We have added a paragraph in SI Sec. 1.4 to explain these points.

      -Fig 2 : how should one interpret that tension after passive intercalation (amnioserosa) is higher than before. On fig 2E, tension has not converged yet on the plot, what happens after 20 minutes ?

      Recall that the inferred tension is the total tension on an interface. While on contracting interfaces, the majority of this tension will be actively generated by myosin motors, on extending interfaces there is also a contribution carried by passive crosslinkers. The passive tension can be effectively viewed as viscous dissipation on the elongating interface as crosslinkers turn over (Clement et al. 2017). Note that this passive tension is explicitly accounted for in the model presented in Fig. 5. Notably, it is crucial for the T1 process to resolve in a new extending junction. In the amnioserosa, the tension post T1 remains elevated because the amnioserosa is continually stretched by the convergence of the germ band. The tension hence does not necessarily converge back to 1. However, our estimates for the tension after 20 mins post T1 are very noisy because most of the T1s happen relatively late in the movie (past the 25 min mark) and therefore there are only a few T1s where we can track the post-T1 dynamics for more than 20 mins.

      We have added a brief explanation of the high post-T1 tension at the end of the section entitled “Relative tension dynamics distinguishes active and passive intercalations”. Further, we have moved up the section describing the minimal model right after the analysis of the relative tension during intercalations. We believe that this helps the reader better understand these findings before moving on to the tension-isogonal decomposition which generalizes them to the tissue scale.

      Page 7-8 / Figure 3: It is unclear how the decomposition into 1) physical shape 2) tension shape 2) isogonal shape works exactly. A more detailed explanation and more clear illustration of what a quartet is and its labels could help.

      We have added a more detailed explanation in the main text. See our response to the longer question regarding this point below.

      -What exactly defines the boundary curve in figure 3E? How is it computed?

      We have added a sentence in the caption for Fig. 3E explaining that the boundary curve is found by solving Eq. (1) with l set to zero for the case of a symmetric quartet. We have also added a brief explanation immediately below Eq. (1) pointing out that this equation defines the T1 threshold in the space of local tensions T_i in terms of the isogonal length l_iso.

      -The authors should consider incorporating some details described in the SI file to the main text to clarify some points, as long as the accessible style of the manuscript can be kept. The points mentioned below may also be clarified in the SI doc. The specific points that could be elaborated are: Page 7-8 / Figure 3: It is unclear how the decomposition into 1) physical shape 2) tension shape 2) isogonal shape works exactly. A more detailed explanation and more clear illustration of what a quartet is and its labels could help. The mapping to Maxwell-Cremona space is fine, but which subset is the quartet? For a set of 4 cells with two shared vertices and a junction, aren't there 5 different tension vectors? Are we talking two closed force triangles? Separately, how do you exactly decompose the deformation (of 4 full cell shapes or a subset?) into isogonal and non-isogonal parts? What is the least squares fit done over - is this system underdetermined? Is this statistically averaged or computed per quartet and then averaged?

      We thank the referees for pointing us to unclear passages in our presentation. We hope that our revisions have resolved the referee’s questions. As described above, we have clarified the tension-isogonal decomposition in the main text. We have also revised the corresponding SI section (1.5) to address the above questions. A sketch of the quartet with labels is found in SI Fig. S7A which we now refer to explicitly in the main text.

      We always consider force-balance configurations, i.e. closed force triangles. Therefore in the “kite” formed by two adjacent tension triangles, only three tension vectors are independent.

      The decomposition of deformation is performed as follows: For each of the four cells, the center of mass c_i is calculated. Next, tension inference is performed to find the two tension triangles with tension vectors T_ij. Now there are three independent centroidal vectors c_j - c_i and three corresponding independent tension vectors T_ij. We define the isogonal deformation tensor I_quratet as the tensor that maps the centroidal vectors to the tension vectors. In general this is not possible exactly, because I_quartet has only three independent components, but there are six equations.

      The plots in Fig. 3C, C’ are obtained by performing this decomposition for each intercalating quartet individually. The data is then aligned in time and ensemble averages are calculated for each timepoint.

      For tissue-scale analysis in Fig. 6, the decomposition is performed for individual vertices (i.e. the corresponding centroidal and tension triangles) and then averaged locally to find the isogonal strain fields shown in Fig. 6B, B’.

      - Line 468: "Therefore, tissue-scale anisotropy of active tension is central to drive and orient convergent-extension flow [10, 57, 59, 60]." Authors almost never mention the contribution of the PMG to tissue extension. Yet it is known to be crucial (convergent extension in Tor mutants is very much affected). Please discuss this point further.

      The referees raise an important point: as discussed in our response to major point (2), we now explicitly discuss the role of internal (active tension) and external (PMG pulling) forces during germ band extension. Please see our response to major point (2) for the changes we made to the manuscript to address this.

      In particular, we now explain that in mutants where PMG invagination is impaired (fog, torso-like, torso, scab, corkscrew), the germ band buckles out of plane or extends in a twisted, corkscrew fashion (Smits et al. 2023). This shows that the germ band generates extensile forces largely internally. In torso mutants, the now stationary PMG acts as a barrier which blocks GBE extension; the germ band buckles as a response.

      The role of PMG invagination hence lies not in creating pulling forces to extend the germ band, but rather in “making room” to allow for its orderly extension. As shown by the genetics mutants just discussed, the synchronization of PMG invagination and GBE is crucial for successful gastrulation.

      -Typos:

      Line 74: how are intercalations are

      Line 84: vertices vertices

      Line 233: very differently

      Line 236: are can

      Line 390: energy which is the isogonal mode must

      Line 1585: reveals show

      Line 603: area Line 618: in terms of on the

      We have fixed these typos.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This valuable study revisits the effects of substitution model selection on phylogenetics by comparing reversible and non-reversible DNA substitution models. The authors provide evidence that 1) non time-reversible models sometimes perform better than general time-reversible models when inferring phylogenetic trees out of simulated viral genome sequence data sets, and that 2) non time-reversible models can fit the real data better than the reversible substitution models commonly used in phylogenetics, a finding consistent with previous work. However, the methods are incomplete in supporting the main conclusion of the manuscript, that is that non time-reversible models should be incorporated in the model selection process for these data sets.

      The non-reversible models should be incorporated in the selection model process not because the significantly perform better but only because the do not perform worse than the reversible models and that true biochemical processes of nucleotide substitution does support the science of non-reversibility.

      Reviewer #1 (Public Review):

      The study by Sianga-Mete et al revisits the effects of substitution model selection on phylogenetics by comparing reversible and non-reversible DNA substitution models. This topic is not new, previous works already showed that non-reversible, and also covarion, substitution models can fit the real data better than the reversible substitution models commonly used in phylogenetics. In this regard, the results of the present study are not surprising. Specific comments are shown below.

      True.

      Major comments

      It is well known that non-reversible models can fit the real data better than the commonly used reversible substitution models, see for example,

      https://academic.oup.com/sysbio/article/71/5/1110/6525257

      https://onlinelibrary.wiley.com/doi/10.1111/jeb.14147?af=R

      The manuscript indicates that the results (better fitting of non-reversible models compared to reversible models) are surprising but I do not think so, I think the results would be surprising if the reversible models provide a better fitting.

      I think the introduction of the manuscript should be increased with more information about non-reversible models and the diverse previous studies that already evaluated them. Also I think the manuscript should indicate that the results are not surprising, or more clearly justify why they are surprising.

      The surprise in the findings is in NREV12 performing better than NREV6 for double stranded DNA viruses as it was expected that NREV6 would perform better given the biochemical processes discussed in the introduction.

      In the introduction and/or discussion I missed a discussion about the recent works on the influence of substitution model selection on phylogenetic tree reconstruction. Some works indicated that substitution model selection is not necessary for phylogenetic tree reconstruction, https://academic.oup.com/mbe/article/37/7/2110/5810088 https://www.nature.com/articles/s41467-019-08822-w https://academic.oup.com/mbe/article/35/9/2307/5040133

      While others indicated that substitution model selection is recommended for phylogenetic tree reconstruction, https://www.sciencedirect.com/science/article/pii/S0378111923001774 https://academic.oup.com/sysbio/article/53/2/278/1690801 https://academic.oup.com/mbe/article/33/1/255/2579471

      The results of the present study seem to support this second view. I think this study could be improved by providing a discussion about this aspect, including the specific contribution of this study to that.

      In our conclusion we have stated that: The lack of available data regarding the proportions of viral life cycles during which genomes exist in single and double stranded states makes it difficult to rationally predict the situations where the use of models such as GTR, NREV6 and NREV12 might be most justified: particularly in light of the poor over-all performance of NREV6 and GTR relative to NREV12 with respect to describing mutational processes in viral genome sequence datasets. We therefore recommend case-by-case assessments of NREV12 vs NREV6 vs GTR model fit when deciding whether it is appropriate to consider the application of non-reversible models for phylogenetic inference and/or phylogenetic model-based analyses such as those intended to test for evidence of natural section or the existence of molecular clocks.

      The real data was downloaded from Los Alamos HIV database. I am wondering if there were any criterion for selecting the sequences or if just all the sequences of the database for every studied virus category were analysed. Also, was any quality filter applied? How gaps and ambiguous nucleotides were considered? Notice that these aspects could affect the fitting of the models with the data.

      We selected varying number of sequences of the database for every studied virus type. Using the software aliview we did quality filter by re-aligning the sequences per virus type.

      How the non-reversible model and the data are compared considering the non-reversible substitution process? In particular, given an input MSA, how to know if the nucleotide substitution goes from state x to state y or from state y to state x in the real data if there is not a reference (i.e., wild type) sequence? All the sequences are mutants and one may not have a reference to identify the direction of the mutation, which is required for the non-reversible model. Maybe one could consider that the most abundant state is the wild type state but that may not be the case in reality. I think this is a main problem for the practical application of non-reversible substitution models in phylogenetics.

      True.

      Reviewer #2 (Public Review):

      The authors evaluate whether non time reversible models fit better data presenting strand-specific substitution biases than time reversible models. Specifically, the authors consider what they call NREV6 and NREV12 as candidate non time-reversible models. On the one hand, they show that AIC tends to select NREV12 more often than GTR on real virus data sets. On the other hand, they show using simulated data that NREV12 leads to inferred trees that are closer to the true generating tree when the data incorporates a certain degree of non time-reversibility. Based on these two experimental results, the authors conclude that "We show that non-reversible models such as NREV12 should be evaluated during the model selection phase of phylogenetic analyses involving viral genomic sequences". This is a valuable finding, and I agree that this is potentially good practice. However, I miss an experiment that links the two findings to support the conclusion: in particular, an experiment that solves the following question: does the best-fit model also lead to better tree topologies?

      By NREV12 leading to inferred trees that are closer to the true generating tree as compared to GTR, it then shows that the best-fit model in this case being NREV12 leads to better tree topologies.

      On simulated data, the significance of the difference between GTR and NREV12 inferences is evaluated using a paired t test. I miss a rationale or a reference to support that a paired t test is suitable to measure the significance of the differences of the wRF distance. Also, the results show that on average NREV12 performs better than GTR, but a pairwise comparison would be more informative: for how many sequence alignments does NREV12 perform better than GTR?

      We have used the popular paired t-test as it is the most widely used when comparing means values between two matched samples where the difference of each mean pair is normally distributed. And the wRF distances do match the guidelines above.

      The paired t-test contains the pairwise comparison and the boxplots side by side show the pairwise wRF comparisions..

      Reviewer #1 (Recommendations For The Authors):

      Minor comments

      The reversible and non-reversible models used in this study assume that all the sites evolve under the same substitution matrix, which can be unrealistic. This aspect could be mentioned.

      Done.

      The manuscript indicates that "a phylogenetic tree was inferred from an alignment of real sequences (Avian Leukosis virus) with an average sequence identity (API) of ~90%.". I was wondering under which substitution model that phylogenetic tree reconstruction was performed? could the use of that model bias posterior results in terms of favoring results based on such a model?

      We have stated on page ….. that the GTR+G model was used to reconstruct the tree. The use of the GTR+G model could yes bias the posterior results as we have stated on page ….

      I was wondering which specific R function was used to calculate the weighted Robinson-Foulds metric. I think this should be included in the manuscript.

      We stated that We used the weighted Robinson-Foulds metric (wRF; implemented in the R phangorn package (Schliep, 2011)⁠)

      Despite a minority, several datasets fitted better with a reversible model than with a non-reversible model. I think that should be clearly indicated.

      In addition, in my opinion the AIC does not enough penalizes the number of parameters of the models and favors the non-reversible models over the reversible models, but this is only my opinion based on the definition of AIC and it is not supported. Thus, I think the comparison between phylogenetic trees reconstructed under different substitution models was a good idea (but see also my second major comment).

      Noted.

      When comparing phylogenetic trees I was wondering if one should consider the effect of the estimation method and quality of the studied data? For example, should bootstrap values be estimated for all the ancestral nodes and only ancestral nodes with high support be evaluated in the comparison among trees?

      Yes the estimation method and quality of the studied data should be considered. When using RF unlike wRF this will not matter but for weighted RF it does. When building the trees, using RaxML only high support nodes are added to the tree.

      In Figure 3, I do not see (by eye) significant differences among the models. I see in the legend that the statistical evaluation was based on a t test but I am not much convinced. Maybe it is only my view. Exactly, which pairs of datasets are evaluated with the t test? Next, I would expect that the influence of the substitution model on the phylogenetic tree reconstruction is higher at large levels of nucleotide diversity because with more substitution events there is more information to see the effects of the model. However, the t test seems to show that differences are only at low levels of nucleotide diversity (and large DNR), what could be the cause of this?

      The paired T-tests compares the wRF distances of the inferred tree real tree and the trees simulated using the GTR model verses the wRF distances of the inferred true tree from the trees simulated using the NREV12 model.

      The reason why the influence of the NREV12 model on the tree reconstructed is not significantly higher at large levels of nucleotide diversity could be because at a certain level the DNR are simply unrealistic.

      Can the user perform substitution model selection (i.e., AIC) among reversible and non-reversible substitution models with IQTREE? If yes, then doing that should be the recommendation from this study, correct?

      But, can DNR be estimated from a real dataset? DNR seems to be the key factor (Figure 3) for the phylogenetic analysis under a proper model.

      Substitution model selection can be performed among reversible and non-reversible using both HyPhy and IQTREE. And we have recommended that model tests should be done as a first step before tree building. Estimating DNR from real datasets requires a substation rate matrix of a non-reversible.

      The manuscript has many text errors (including typos and incorrect citations). For example, many citations in page 20 show "Error! Reference source not found.". I think authors should double check the manuscript before submitting. Also, some text is not formally written. For example, "G represents gamma-distributed rates", rates of what? The text should be clear for readers that are not familiar with the topic (i.e., G represents gamma-distributed substitution rates among sites). In general, I recommend a detailed revision of the whole text of the manuscript.

      Done.

      Reviewer #2 (Recommendations For The Authors):

      The authors reference Baele et al., 2010 for describing NREV6 and NREV12. I suggest using the same name used in the referenced paper: GNR-SYM and GNR respectively. Although I do not think there is a standard name for these models, I would use a previously used one.

      We have built studies based on the names NREV6 and NREV12. We would like to keep the naming as standard for our studies.

      GTR and NREV12 models are already described in many other papers. I do not see the need to include such an extensive description. Also, a reference should be included to the discrete Gamma rate categories [1]

      We included the extensive description to enable other readers who are not super familiar with these models better understanding since we have given the models our own naming different from those used in other papers.

      We have added referencing for the discrete gamma rate as recommended. (Yang, 1994)

      To evaluate the exhaustiveness and correctness of the results, I would recommend publishing as supplementary material the simulated data sets or the scripts for generating the data set, the scripts or command lines for the analysis, and the versions of the software used (e.g., IQTREE). Also, to strongly support the main conclusion of the manuscript, I suggest adding to the simulations section results the RF-distances of the best-fit selected model under AIC, AICc, and BIC as well.

      We can go ahead and submit all the needed datasets. The simulated data RF-Distances results are available and will be submitted. We cannot however add them to the main document as this will create very long data tables.

      In some instances, it is mentioned that the selection criterion used is AIC, while in others, AIC-c is referenced. Even in the table captions, both terms are mixed. It should be made clearer which criterion is being employed, as AIC is not suitable for addressing the overparameterization of evolutionary models, given that it does not account for the sample size. A previous pre-print of this article [2] does not mention AIC-c, but also explicitly includes the formulas for AIC that do not take the sample size into account, and reports the same results as this manuscript, what indicates that AIC and not AIC-c was used here. This should be clarified. It is recommended to use AIC-c instead of AIC, especially if the sample size to model parameters ratio is low [3]. Two things may be appointed here: some authors consider tree branch lengths as model free parameters and others do not. In this paper it is not specified how the model parameters are counted. AIC tends to select more parameterized models than AIC-c, and overparameterization can lead to different tree inferences, as evidenced in Hoff et al., 2016. Therefore, it is expected that NREV12 is more frequently selected than NREV6 and GTR.

      In my opinion, a pairwise comparison between GTR and NREV12 performance is of great interest here, and the whiskers plots are not useful. Scatterplots would display the results better.

      Boxplots are meant to offer a simplified view of the results as the paired t-tests does all of the comparisons. We shall provide the scatter plots as supplementary information so that readers can get full detailed plots as recommended.

      Some references are missing

      Missing references added

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      This paper seeks to understand the upstream regulation and downstream effectors of glycolysis in retinal progenitor cells, using mouse retinal explants as the main model system. The paper presents evidence that high glycolysis in retinal progenitor cells is required for their proliferation and timely differentiation into photoreceptors. Retinal glycolysis increases after the deletion of Pten. The authors suggest that high glycolysis controls cell proliferation and differentiation by promoting intracellular alkalinization, beta-catenin acetylation and stabilization, and consequent activation of the canonical Wnt pathway.

      Strengths:

      (1) The experiments showing that PFKFB3 overexpression is sufficient to increase the proliferation of retinal progenitors (which are already highly dividing cells) and photoreceptor differentiation are striking and the result is unanticipated. It suggests that glycolytic flux is normally limiting for proliferation in embryos.

      In our BrdU birthdating experiment, we showed that PFKB3 expression drives the precocious differentiation of retinal progenitor cells (RPCs) into photoreceptors. However, we did not determine if there is an associated change in the number of dividing RPCs. To examine the proliferative status of PFKB3-overexpressing RPCs, we will perform short-term BrdU labeling to measure the number of RPCs in S-phase of the cell cycle. Additionally, we will count the number of RPCs expressing pHH3, a mitotic marker, and Ki67, a marker of cycling cells in all cell cycle phases.

      (2) Likewise the result that an increase in pH from 7.4 to 8.0 is sufficient to increase proliferation implies that pH regulation may have instructive roles in setting the tempo of retinal development and embryonic cell proliferation. Similarly, the results show that acetate supplementation increases proliferation (I think this result should be moved to the main figures).

      We thank the reviewer for these positive comments on our work. We will move the acetate data to the main figure as requested.

      Weaknesses:

      (1) Epistatic experiments to test if changes in pH mediate the effects of glycolysis on photoreceptor differentiation, or if Wnt activation is the main downstream effector of glycolysis in controlling differentiation are not presented.

      Traditionally, epistasis is tested using double knock-out (DKO) studies with null mutant alleles. If two genes operate in the same pathway, the downstream phenotype prevails, whereas phenotypic worsening is observed if two genes act in parallel pathways. Our data suggests the following order of events: Pten¯®glycolysis­®intracellular pH­®Wnt signaling­®photoreceptor differentiation. In this model, Wnt signaling is the downstream-most effector. To test our epistatic model, we will assess RPC proliferation and the differentiation of Crx+ photoreceptor precursors with the following assays:

      (1) To confirm that Wnt signaling acts downstream of Pten, we will generate DKOs of Pten and Ctnnb1, a downstream effector of Wnt signaling. We know that fewer photoreceptors are generated in single Pten-cKO and Ctnnb1-cKO retinas, with a disruption of the outer nuclear layer only in Ctnnb1-cKOs. If Pten and Wnt act in the same pathway, Pten;Ctnnb1 DKOs will resemble single Ctnnb1-cKOs.

      (2) While epistasis is traditionally examined using genetic mutants, we will perform proxy experiments using pharmacological agents. To test whether Wnt activation acts downstream of a pH increase, we will activate Wnt signaling with recombinant Wnt3a at high and low pH. While low pH inhibits photoreceptor differentiation, if Wnt signaling is downstream, it should promote differentiation even at low pH. Conversely, we will alter pH in the presence of a Wnt inhibitor, FH535, which should block the positive effects of high pH on photoreceptor differentiation.

      (3) To test whether Wnt activation acts downstream of glycolysis to increase photoreceptor differentiation, we will apply recombinant Wnt3a to retinal explants while simultaneously inhibiting glycolysis with 2DG.  While 2DG inhibits photoreceptor differentiation, if Wnt signaling is downstream, it should still be able to promote differentiation. 

      (4) To test whether pharmacological inhibition of Wnt signaling reverses the effects of high glycolytic activity in Pten cKO retinas, we will treat wild-type and Pten-cKO retinas with the Wnt inhibitor FH535 and/or the glycolytic inhibitor 2DG.

      (2) It is likely that metabolism changes ex vivo vs in vivo, and therefore stable isotope tracing experiments in the explants may not reflect in vivo metabolism.

      We agree with the reviewer that metabolism likely changes ex vivo compared to in vivo. However, we did not perform stable isotope tracing experiments to directly examine glycolytic flux in this study. While outside the scope of the current study, this type of analysis is an important future direction that we will bring up in the discussion.

      (3) The retina at P0 is composed of both progenitors and differentiated cells. It is not clear if the results of the RNA-seq and metabolic analysis reflect changes in the metabolism of progenitors, or of mature cells, or changes in cell type composition rather than direct metabolic changes in a specific cell type.

      We mined a scRNA-seq dataset to show that Pgk1, a rate-limiting enzyme for glycolysis, is specifically elevated in early-stage RPCs versus later stage. We have since analysed additional glycolytic pathway genes, and observed a similar enrichment of Pfkl, Eno1 and Slc16a3 transcripts in early RPCs, while other genes were equally expressed in both early and late RPCs.

      To functionally demonstrate that there are differences in glycolysis between early and late RPCs, we will use CD133 to sort RPCs at E15 (early) and P0 (late). We will perform qPCR on sorted cells to validate the transcriptional differences in glycolytic gene expression. Additionally, we will perform two proxy measures of glycolysis: 1) We will measure lactate levels in sorted RPCs at both stages, and 2) We will use a Seahorse assay and assess ECAR in sorted RPCs at both stages.

      (4) The biochemical links between elevated glycolysis and pH and beta-catenin stability are unclear. White et al found that higher pH decreased beta-catenin stability (JCB 217: 3965) in contrast to the results here. Oginuma et al found that inhibition of glycolysis or beta-catenin acetylation does not affect beta-catenin stability (Nature 584:98), again in contrast to these results. Another paper showed that acidification inhibits Wnt signaling by promoting the expression of a transcriptional repressor and not via beta-catenin stability (Cell Discovery 4:37). There are also additional papers showing increased pH can promote cell proliferation via other mechanisms (e.g. Nat Metab 2:1212). It is possible that there is organ-specificity in these signaling pathways however some clarification of these divergent results is warranted.

      The pleiotropic actions of Wnt signaling on cell proliferation and differentiation are well known, even shifting from pro-proliferative to anti-proliferative depending on tissue or cell type. It is thus not surprising that different studies found unique effects of pH and glycolysis on b-catenin modifications and the activation of downstream signaling. Thus, as suggested by the reviewer, the difference between our data and other studies could be attributed to tissue and organism. In our revision, we will more fully assess our findings in the context of published studies, as recommended by the reviewer.

      To summarize our data, in the developing retina, we found that non-phosphorylated b-catenin protein levels increase in Pten-cKO retinas in vivo, while conversely, non-phosphorylated b-catenin protein levels decrease upon 2DG treatment and at low pH 6.5 in vitro.

      The Oginuma et al. 2020 (Nature 584: 98-101) study was performed on the chick tailbud and investigated lineage decisions by neuromesodermal progenitors in the presomitic mesoderm. In this context, WNT activity, glycolysis and pHi all decline in tandem, complementary to our findings. However, Oginuma et al. found that while phosphorylated and non-phosphorylated b-catenin levels do not vary, K49 b -catenin acetylation is reduced at low pHi. In their system, K49 b -catenin acetylation is associated with a switch in cell fate choice from neural to mesodermal in the chick tailbud. We will now assess this modification.

      Hauck et al. 2021 (Cell Death & Differentiation 28:1398-1417) found that by mutating Pkm, a rate-limiting glycolytic enzyme, b-catenin can more efficiently shuttle to the nucleus to activate Wnt-signaling and promote cardiomyocyte proliferation. This study highlights the importance of examining b-catenin protein levels in both cytoplasmic and nuclear fractions. They also examined transcriptional targets of Wnt signaling, such as Axin2, Ccnd1, Myc, Sox2 and Tnnt3, which we will also now assess.

      In a separate study in cancer cells, high pH leads to increased expression of Ccnd1, a b-catenin target gene, and promotes proliferation (Koch et al. 2020. Nat Metab. 2:1212-1222). These findings are consistent with our demonstration that b-catenin levels are stabilized at pH 8, and RPC proliferation is enhanced. A separate study by Melnik et al 2018 (Cell Discovery 4:37) performed in cancer cells found that acidification induced by metformin indirectly suppresses Wnt signaling by activating the DDIT3 transcriptional repressor, consistent with our data showing low pH suppresses b-catenin stability. Melnik et al also used Mcl inhibitors, as we did in our study, and showed that this treatment blocked Wnt signaling. While we did not look at the impact of CNCn on Wnt signaling, we did see a decline in proliferation, as expected if Wnt levels are low. The relationship between CNCn and Wnt activity will now be assessed.

      The one study that fits less well is from Czowski and White (BioRxiv), where they found that higher pH levels decrease b-catenin levels in the cytoplasm, nucleus and junctional complexes in MDCK cells. In this study, the authors altered pH using inhibitors for a sodium-proton exchanger and a sodium bicarbonate transporter. The Oginuma paper instead used the ionophores nigericin and valinomycin to equilibrate intracellular pHi to media pH, which we will now incorporate into our study.

      In summary, to more comprehensively examine the link between Pten loss, glycolytic activity, pHi and Wnt signaling, we will examine levels of phosphorylated, non-phosphorylated and K49 acetylated b-catenin after each manipulation (i.e., Pten loss, pH manipulations, CNCn treatment, glycolysis inhibition, acetate treatments). For pH manipulations, we will use nigericin and valinomycin to equilibrate pH. These studies will be performed on cytoplasmic and nuclear fractions from CD133+ MACS-enriched RPCs, to add cell type and stage specificity to our study. We will also use qPCR to examine Wnt signaling genes, such as Axin2, Ccnd1, Myc, Sox2 and Tnnt3.

      (5) The gene expression analysis is not completely convincing. E.g. the expression of additional glycolytic genes should be shown in Figure 1. It is not clear why Hk1 and Pgk1 are specifically shown, and conclusions about changes in glycolysis are difficult to draw from the expression of these two genes. The increase in glycolytic gene expression in the Pten-deficient retina is generally small.

      See response to point 3.

      (6) Is it possible that glycolytic inhibition with 2DG slows down the development and production of most newly differentiated cells rather than specifically affecting photoreceptor differentiation?

      We thank the reviewer for this excellent suggestion. We will examine the impact of  2DG on the differentiation of other retinal cell types, including bipolar and amacrine cells and Muller glia. For technical reasons, we will exclude ganglion cells, which die in culture and are not possible to examine in explants, and horizontal cells, which are a rare cell type, and hence, difficult to accurately quantify.

      (7) Are the prematurely-born cells caused by PFKFB3 overexpression photoreceptors as assessed by morphology or markers (in addition to position)?

      We will immunostain treated retinas with additional cell-type specific markers to examine rod and cone photoreceptor numbers and morphologies.

      Reviewer #2 (Public review):

      Summary:

      The manuscript by Hanna et al., addresses the question of energy metabolism in the retina, a neuronal tissue with an inordinately high energy demand. Paradoxically, the retina appears to employ to a large extent glycolysis to satisfy its energetic needs, even though glycolysis is far less efficient than oxidative phosphorylation (OXPHOS). The focus of the present study is on the early development of the retina and the retinal progenitor cells (RPCs) that proliferate and differentiate to form the seven main classes of retinal neurons. The authors use different genetic and pharmacological manipulations to drive the metabolism of RPCs or the retina towards higher or lower glycolytic activity. The results obtained suggest that increased glycolytic activity in early retinal development produces a more rapid differentiation of RPCs, resulting in a more rapid maturation of photoreceptors and photoreceptor segment growth. The study is significant in that it shows how metabolic activity can determine cell fate decisions in retinal neurons.

      Strengths:

      This study provides important findings that are highly relevant to the understanding of how early metabolism governs the development of the retina. The outcomes of this study could be relevant also for human diseases that affect early retinal development, including retinopathy of maturity where an increased oxygenation likely causes a disturbance of energy metabolism.

      We thank the reviewer for these positive comments on our study.

      Weaknesses:

      The restriction to only relatively early developmental time points makes it difficult to assess the consequences of the different manipulations on the (more) mature retina. Notably, it is conceivable that early developmental manipulations, while producing relevant effects in the young post-natal retina, may "even out" and may no longer be visible in the mature, adult retina.

      While we agree that it would be interesting to observe the long-term consequences of our manipulations, we are limited by our retinal explant model, which can at best be cultured for 2 weeks in vitro. Additional limitations include the lack of photoreceptor outer segment development in our in vitro model. However, we can perform more extensive analyses of our genetic models in vivo (i.e., Pten-cKO, cyto-PFKB3-GOF, Ctnnb1-cKO). For these lines, we will focus on more in-depth analyses of photoreceptor differentiation and outer segment maturation using additional markers and one later stage of development.

      Reviewer #3 (Public review):

      Summary:

      This study examines the metabolic regulation of progenitor proliferation and differentiation in the developing retina. The authors observe dynamic changes in glycolytic gene expression in retinal progenitors and use various strategies to test the role of glycolysis. They find that elevated glycolysis in Pten-cKO retinas results in alteration of RPC fate, while inhibition of glycolysis has converse effects. They specifically test the role of elevated glycolysis using dominant active cytoPFKB3, which demonstrates the selective effects of elevated glycolysis on progenitor proliferation and rod differentiation. They then show that elevated glycolysis modulates both pHi and Wnt signaling, and provide evidence that these pathways impact proliferation and differentiation of progenitors, particularly affecting rod photoreceptor differentiation.

      Strengths:

      This is a compelling and rigorous study that provides an important advance in our understanding of metabolic regulation of retina development, addressing a major gap in knowledge. A key strength is that the study utilizes multiple genetic and pharmacological approaches to address how both increased or decreased glycolytic flux affect retinal progenitor proliferation and differentiation. They discover elevated Wnt signaling pathway genes in Pten cKO retina, revealing a potential link between glycolysis and Wnt pathway activation. Altogether the study is comprehensive and adds to the growing body of evidence that regulation of glycolysis plays a key role in tissue development.

      We thank the reviewer for these positive comments on our study.

      Weaknesses:

      (1) Following the expression of cytoPFKB3, which results in increased glycolytic flux, BrDU labeling was performed at e12.5 and increased labeled cells were detected in the outer nuclear layer. However whether these are cones or rods is not established. The rest of the analysis is focused on the precocious maturation of rhodopsin-labeled outer segments, and the major conclusions emphasize rod photoreceptor differentiation. Therefore, it is unclear whether there is an effect on cone differentiation for either Pten cKO or cytoPFKB3 transgenic retina. It is also not established whether rods are born precociously. Presumably, this would be best detected by BrDU labeling at later embryonic stages.

      We agree with the reviewer that we should expand our study to also examine cone differentiation and outer segment maturation, which we will now do by adding additional markers to our study.

      (2) The authors find that there is upregulation of multiple Wnt pathway components in Pten cKO retina. They further show that inhibiting Wnt signaling phenocopies the effects of reducing glycolysis. However, they do not test whether pharmacological inhibition of Wnt signaling reverses the effects of high glycolytic activity in Pten cKO retinas. Thus the argument that Wnt is a key downstream effector pathway regulating rod photoreceptor differentiation is weak.

      See Reviewer 1, point 1

      (3) The use of sodium acetate to force protein acetylation is quite non-specific and will have effects beyond beta-catenin acetylation (which the authors acknowledge). Thus it is a stretch to state that "forced activation of beta-catenin acetylation" mimics the impact of Pten loss/high glycolytic activity in RPCs since the effects could be due to acetylation of other proteins.

      As outlined in our response to Reviewer #1, point 4, we will now assess K49 b-catenin acetylation levels, as conducted by Oginuma et al. This analysis will allow us to determine whether b-catenin acetylation is altered with manipulations of Pten, glycolysis, pH or acetate treatments.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      One major issue arises in Figure 4, the recording of VLPO Ca2+ activity. In Lines 211-215, they stated that they injected AAV2/9-DBH-GCaMP6m into the VLPO, while activating LC NE neurons. As they claimed in line 157, DBH is a specific promoter for NE neurons. This implies an attempt to label NE neurons in the VLPO, which is problematic because NE neurons are not present in the VLPO. This raises concerns about their viral infection strategy since Ca activity was observed in their photometry recording. This means that DBH promoter could randomly label some non-NE neurons. Is DBH promoter widely used? The authors should list references. Additionally, they should quantify the labeling efficiency of both DBH and TH-cre throughout the paper.

      In Figure 5, we found that the VLPO received the noradrenergic projection from LC, indicating the recorded Ca2+ activity may come from the axon fibers corresponding to the projection. Similarly, Gunaydin et al. (2014) demonstrated that fiber photometry can be used to selectively record from neuronal projection.

      We appreciate the reviewer's insightful suggestion to elaborate on the DBH promoter, we have now expanded our discussion to address the DBH (pg. 18): “DBH (Dopamine-beta-hydroxylase), located in the inner membrane of noradrenergic and adrenergic neurons, is an enzyme that catalyzes the conversion of dopamine to norepinephrine, and therefore plays an important role in noradrenergic neurotransmission. DBH is a marker of noradrenergic neurons. Zhou et al. (2020) clarified the probe specifically labeled noradrenergic neurons by immunolabeling for DBH. Recently, DBH promoter have been used in several studies (e.g., Han et al., 2024; Lian et al., 2023). The DBH-Cre mice are widely used to specifically labeled noradrenergic neurons (e.g., Li et al., 2023; Breton-Provencher et al., 2022; Liu et al., 2024). It is difficult to distinguish the role of NE or DA neurons when using the TH promoter in VLPO. Therefore, we used DBH promoter with more specific labeling. LC is the main noradrenergic nucleus of the central nervous system. In our study, we injected rAAV-DBH-GCaMP6m-WPRE (Figure 2 and 8) and rAAV-DBH-EGFP-S'miR-30a-shRNA GABAA receptor)-3’-miR30a-WPRES (Figure 9) into the LC. The results showed that DBH promoter could specifically label noradrenergic neurons in the LC, while non-specific markers outside the LC were almost absent.”

      As suggested, we have quantified the labeling efficiency of both DBH and TH-cre throughout the revised manuscript (Fig.2D; Fig.3D, N-O; Fig.4E-F, J, L; Fig.5E, L; Fig.6L, S, X; Fig.7G).

      A similar issue arises with chemogenetic activation in Fig. 5 L-R, the authors used TH-cre and DIO-Gq virus to label VLPO neurons. Were they labelling VLPO NE or DA neurons for recording? The authors have to clarify this.

      As previously addressed in response to Comment #1, we agree that it is difficult to distinguish the role of NE or DA neurons when using the TH promoter in the VLPO. Therefore, we injected the mixture of DBH-Cre-AAV and AAV-EF1a-DIO-hChR2(H134R)-eYFP/AAV-Ef1a-DIO-hM3Dq-mCherry viruses into bilateral LC and AAV-EF1a-DIO-hChR2(H134R)-eYFP/AAV-Ef1a-DIO-hM3Dq-mCherry virus into bilateral VLPO. Moreover, we quantified the labeling efficiency of DBH in the LC to demonstrate that this promoter can specifically label NE neurons (Fig. 5). Importantly, these corrections did not alter the outcomes of our results. Both photogenetic and chemogenetic activation of LC-NE terminals in the VLPO can effectively promote midazolam recovery (Fig. 5G, N).

      Another related question pertains to the specificity of LC NE downstream neurons in the VLPO. For example, do they preferentially modulate GABAergic or glutamatergic neurons?

      Our study primarily aimed to explore the role of the LC-VLPO NEergic neural circuit in modulating midazolam recovery. We acknowledge that our evidence for the role of LC NE downstream neurons in the VLPO, derived from activation of LC-NE terminals and pharmacological intervention in the VLPO (Fig.5, Fig.6, Fig.8, Fig.9) is limited. Accordingly, we now present the VLPO’s role as a promising direction for future research in the limitation section of our revised manuscript: “This study shows that the LC-VLPO NEergic neural circuit plays an important role in modulating midazolam recovery. However, the specificity of LC NE downstream neurons in the VLPO is not explained in this paper, which is our next research direction, VLPO neurons and their downstream regulatory mechanisms may be involved in other nervous systems except the NE nervous system, and the deeper and more complex mechanisms need to be further investigated.”

      In Figure 1A-D, in the measurement of the dosage-dependent effect of Mida in LORR, were they only performed one batch of testing? If more than one batch of mice were used, error bar should be presented in 1B. Also, the rationale of testing TH expression levels after Mid is not clear. Is TH expression level change related to NE activation specifically? If so, they should cite references.

      As recommended, we have supplemented error bar and modified the graph of LORR’s rate in the revised manuscript. (Fig. 1A-B; Fig. 9G-H).

      We agree that the use of TH as a marker of NE activation is controversial, so in the revised manuscript, we directly determined central norepinephrine content to reflect the change of NE activity after midazolam administration (Fig. 1D).

      Regarding the photometry recording of LC NE neurons during the entire process of midazolam injection in Fig. 2 and Fig. 4, it is unclear what time=0 stands for. If I understand correctly, the authors were comparing spontaneous activity during the four phases. Additionally, they only show traces lasting for 20s in Fig. 2F and Fig. 4L. How did the authors select data for analysis, and what criteria were used? The authors should also quantify the average Ca2+ activity and Ca2+ transient frequency during each stage instead of only quantifying Ca2+ peaks. In line 919, the legend for Figure 2D, they stated that it is the signal at the BLA; were they also recorded from the BLA?

      In this study, we used optical fiber calcium signal recording, which is a fluorescence imaging based on changes in calcium. The fluorescence signal is usually divided into different segments according to the behavior, and the corresponding segments are orderly according to the specific behavior event as the time=0. The mean calcium fluorescence signal in the time window 1.5s or 1s before the event behavior is taken as the baseline fluorescence intensity (F0), and the difference between the fluorescence intensity of the occurrence of the behavior and the baseline fluorescence intensity is divided by the difference between the baseline fluorescence intensity and the offset value. That is, the value ΔF/F0 represents the change of calcium fluorescence intensity when the event occurs. The results of the analysis are commonly represented by two kinds of graphs, namely heat map and event-related peri-event plot (e.g., Cheng et al., 2022; Gan-Or et al., 2023; Wei et al., 2018). In Fig. 2, the time points for awake, midazolam injection, LORR and RORR in mice were respectively selected as time=0, while in Fig. 4, RORR in mice was selected as time=0. The selected traces lasting for 20s was based on the length of a complete Ca2+ signal. We have explained the Ca2+ recording experiment more specifically in the figure legends and methods sections of our revised manuscript.

      To the BLA, we sincerely apologize for our carelessness, the signal we recorded were from the LC rather than the BLA. We have carefully checked and corrected similar problems in the revised manuscript.

      Reviewer 2:

      In figure legends, abbreviations in figure should be supplemented as much as possible. For example, "LORR" in Figure 1.

      As suggested, we have supplemented abbreviations in figure as much as possible in the revised manuscript.

      Additional recommendations:

      The main conceptual issue in the paper is the inflation of the conclusion regarding the mechanism of sedation induced by midazolam. The authors did not reveal the full mechanism of this but rather the relative contribution of NE system. Several conclusions in the text should be edited to take into account this starting from the title. I think the following examples are more appropriate: "NE contribution to rebooting unconsciousness caused by midazolam' or 'NE contribution to reverse the sedation induced by midazolam'.

      As suggested, we have moderated the assertions about the mechanism of sedation induced by midazolam in several conclusions starting from the title (Line 1,125,150,169,202,237,482), to present a more measured interpretation in the manuscript.

      Line 178-179, the authors state 'these suggest that intranuclear ... suppresses recovery from midazolam administration'. In fact, this intervention prolonged or postponed recovery from midazolam.

      In our revised manuscript, we have corrected this inappropriate term (Line 178).

      Pharmacology part (page 12) that aimed to pinpoint which NE receptor is implicated would suffer from specificity issues.

      In relation to the specificity issue, the focus on VLPO might be rational but again other areas are most likely involved given the pharmacological actions of midazolam.

      In the revised manuscript, we have discussed those specificity issues of NE receptor and areas involved throughout the midazolam-induced altered consciousness: “In addition, given the pharmacological actions of midazolam, other areas may also be involved. Current studies suggest that the neural network involved in the recovery of consciousness consists of the prefrontal cortex, basal forebrain, brain stem, hypothalamus and thalamus. The role of these regions in midazolam recovery remains to be further investigated. Therefore, we will apply more specific experimental methods to determine the importance of LC-VLPO NEergic neural circuit and related NE receptors in the midazolam recovery, and conduct further studies on other relevant brain neural regions, hoping to more fully elucidate the mechanism of midazolam recovery in the future”.

      Line 274, the authors used 'inhibitory EEG activity'. what does it mean? a description of which rhythm-related power density is affected would be more objective.

      Example of conclusion inflation: in line 477, the word 'contributes' is better than 'mediates' if the specificity issue is taken into account.

      As suggested, we have improved our expression of words in our revised manuscript (pg. 13-14).

      References

      Gunaydin LA, Grosenick L, Finkelstein JC, et al. Natural neural projection dynamics underlying social behavior. Cell. 2014;157(7):1535-1551. doi:10.1016/j.cell.2014.05.017

      Zhou N, Huo F, Yue Y, Yin C. Specific Fluorescent Probe Based on "Protect-Deprotect" To Visualize the Norepinephrine Signaling Pathway and Drug Intervention Tracers. J Am Chem Soc. 2020;142(41):17751-17755. doi:10.1021/jacs.0c08956

      Han S, Jiang B, Ren J, et al. Impaired Lactate Release in Dorsal CA1 Astrocytes Contributed to Nociceptive Sensitization and Comorbid Memory Deficits in Rodents. Anesthesiology. 2024;140(3):538-557. doi:10.1097/ALN.0000000000004756

      Lian X, Xu Q, Wang Y, et al. Noradrenergic pathway from the locus coeruleus to heart is implicated in modulating SUDEP. iScience. 2023;26(4):106284. Published 2023 Feb 27. doi:10.1016/j.isci.2023.106284

      Li C, Sun T, Zhang Y, et al. A neural circuit for regulating a behavioral switch in response to prolonged uncontrollability in mice. Neuron. 2023;111(17):2727-2741.e7. doi:10.1016/j.neuron.2023.05.023

      Breton-Provencher V, Drummond GT, Feng J, Li Y, Sur M. Spatiotemporal dynamics of noradrenaline during learned behaviour. Nature. 2022;606(7915):732-738. doi:10.1038/s41586-022-04782-2

      Liu Q, Luo X, Liang Z, et al. Coordination between circadian neural circuit and intracellular molecular clock ensures rhythmic activation of adult neural stem cells. Proc Natl Acad Sci U S A. 2024;121(8):e2318030121. doi:10.1073/pnas.2318030121

      Cheng J, Ma X, Li C, et al. Diet-induced inflammation in the anterior paraventricular thalamus induces compulsive sucrose-seeking. Nat Neurosci. 2022;25(8):1009-1013. doi:10.1038/s41593-022-01129-y

      Gan-Or B, London M. Cortical circuits modulate mouse social vocalizations. Sci Adv. 2023;9(39):eade6992. doi:10.1126/sciadv.ade6992

      Wei YC, Wang SR, Jiao ZL, et al. Medial preoptic area in mice is capable of mediating sexually dimorphic behaviors regardless of gender. Nat Commun. 2018;9(1):279. Published 2018 Jan 18. doi:10.1038/s41467-017-02648-0

    1. Reviewer #3 (Public review):

      Summary:

      This study used transcranial direct current stimulation administered using small 'high-definition' electrodes to modulate neural activity within the non-human primate prefrontal cortex during both wakefulness and anaesthesia. Functional magnetic resonance imaging (fMRI) was used to assess the neuromodulatory effects of stimulation. The authors report on the modification of brain dynamics during and following anodal and cathodal stimulation during wakefulness and following anodal stimulation at two intensities (1 mA, 2 mA) during anaesthesia. This study provides some possible support that prefrontal direct current stimulation can alter neural activity patterns across wakefulness and sedation in monkeys. However, the reported findings need to be considered carefully against several important methodological limitations.

      Strengths:

      A key strength of this work is the use of fMRI-based methods to track changes in brain activity with good spatial precision. Another strength is the exploration of stimulation effects across wakefulness and sedation, which has the potential to provide novel information on the impact of electrical stimulation across states of consciousness.

      Weaknesses:

      The lack of a sham stimulation condition is a significant limitation, for instance, how can the authors be sure that results were not affected by drowsiness or fatigue as a result of the experimental procedure?

      In the anaesthesia condition, the authors investigated the effects of two intensities of stimulation (1 mA and 2 mA). However, a potential confound here relates to the possibility that the initial 1 mA stimulation block might have caused plasticity-related changes in neural activity that could have interfered with the following 2 mA block due to the lack of a sufficient wash-out period. Hence, I am not sure any findings from the 2 mA block can really be interpreted as completely separate from the initial 1 mA stimulation period, given that they were administered consecutively. Several previous studies have shown that same-day repeated tDCS stimulation blocks can influence the effects of neuromodulation (e.g., Bastani and Jaberzadeh, 2014, Clin Neurophysiol; Monte-Silva et al., J. Neurophysiology).

      The different electrode placement for the two anaesthetised monkeys (i.e., Monkey R: F3/O2 montage, Monkey N: F4/O1 montage) is problematic, as it is likely to have resulted in stimulation over different brain regions. The authors state that "Because of the small size of the monkey's head, we expected that tDCS stimulation with these two symmetrical montages would result in nearly equivalent electric fields across the monkey's head and produce roughly similar effects on brain activity"; however, I am not totally convinced of this, and it really would need E-field models to confirm. It is also more likely that there would in fact be notable differences in the brain regions stimulated as the authors used HD-tDCS electrodes, which are generally more focal.

      Given the very small sample size, I think it is also important to consider the possibility that some results might also be impacted by individual differences in response to stimulation. For instance, in the discussion (page 9, paragraph 2) the authors contrast findings observed in awake animals versus anaesthetised animals. However, different monkeys were examined for these two conditions, and there were only two monkeys in each group (monkeys J and Y for awake experiments [both male], and monkeys R and N [male and female] for the anaesthesia condition). From the human literature, it is well known that there is a considerable amount of inter-individual variability in response to stimulation (e.g., Lopez-Alonso et al., 2014, Brain Stimulation; Chew et al., 2015, Brain Stimulation), therefore I wonder if some of these differences could also possibly result from differences in responsiveness to stimulation between the different monkeys? At the end of the paragraph, the authors also state "Our findings also support the use of tDCS to promote rapid recovery from general anesthesia in humans...and suggest that a single anodal prefrontal stimulation at the end of the anesthesia protocol may be effective." However, I'm not sure if this statement is really backed-up by the results, which failed to report "any behavioural signs of awakening in the animals" (page 7)?

    1. Author response:

      We are pleased that the reviewers found our study thought-provoking and appreciate the care they have taken in providing constructive feedback. Focusing on the main issues raised by the reviewers, we provide here a provisional response to the Public Comments and outline our revision plan.

      A) Reviewers 1 and 2 were concerned that our task and analyses were limited by the fact that we only tested the model based on biases in movement direction (angular biases) and did not examine biases in movement extent (radial biases).

      While we think the angular biases provide a sufficient test to compare the set of models presented in the paper, we appreciate that there was a missed opportunity to also look at movement extent.  Looking at predictions concerning both movement direction and extent would provide a stronger basis for model comparison. To this end, we will take a two-step approach:

      (1) Re-analysis of existing datasets from experiments that involve a pointing task (movements terminate at the target position) rather than a shooting task (movements terminate further than the target distance).  We will conduct a model comparison using these data. 

      (2) If we are unable to obtain a suitable dataset or datasets because we cannot access individual data or there are too few participants, we will conduct a new experiment using a pointing task.  We will use these new data to evaluate whether the transformation model can accurately predict biases in both movement direction and extent.

      We will incorporate those new results in our revision.

      B) Reviewer 3 noted that model fitting was based on group average data. They questioned if this was representative across individuals and how well the model would account for individual patterns of reach biases.

      To address this issue, we propose to do the following:

      (1) We will first fit the model to individual data in Exp 1 and assess whether a two-peak function, the signature of the transformation model, is characteristic of most the fits. We recognize that the results at the individual level may not support the model.  This could occur because the model is not correct.  Alternatively, the model could be correct but difficult to evaluate at the individual level for several reasons. First, the data set may be underpowered at the individual level. Second, motor biases can be idiosyncratic (e.g., within subject correlation is greater than between subject correlation), a point we noted in the original submission. Third, as observed in previous studies, transformation biases also show considerable individual variability (Wang et al, 2020); as such, even if the model is correct, a two-peaked function may not hold for all individuals.

      (2) If the individual variability is too large to draw meaningful conclusions, we will conduct a new experiment in which we measure motor and proprioceptive biases. Our plan would be to collect a large data set from a limited number of participants.  These data should allow us to evaluate the models on an individual basis, including using each participant’s own transformation/proprioceptive bias function to predict their motor biases.

      C) The reviewers have comments regarding the assumptions and form of the different models. Reviewer 3 questioned the visual bias model presented in the paper, and Reviewers 2 and 3 suggested additional visual bias/ biomechanical models to consider.

      We agree that what we call a visual bias effect is not confined to the visual modality: It is observed when the target is presented visually or proprioceptively, and in manifest in both reaching movements, saccades, and pressing keys to adjust a dot to match with the remembered target (Kosovicheva & Whitney, 2017; Yousif et al. 2023). As such, the bias may reflect a domain-general distortion in the representation of goals within polar space. We refer to this component as a "visual bias" because it is associated with the representation of the visual target in the reaching task.

      We do think the version of the visual bias model in the original submission is reasonable given that the bias pattern has been observed in perceptual tasks with stimuli that were very similar to ours (e.g., Kosovicheva & Whitney, 2017). We have explored other perceptual models in evaluating the motor biases observed in Experiment 1. For example, several models discuss how visual biases may depend on the direction of a moving object or the orientation of an object (Wei & Stocker, 2015; Patten, Mannion & Clifford, 2017). However, these models failed to account for the motor biases observed in our experiments, a not surprising outcome since the models were not designed to capture biases in perceived location.  There are also models of visual basis associated with viewing angle (e.g., based on retina/head position).  Since we allow free viewing, these biases are unlikely to make substantive contributions to the biases observed in our reaching tasks.

      Given that some readers are likely to share the reviewers’ concerns on this issue, we will extend our discussion to describe alternative visual models and provide our arguments about why these do not seem relevant/appropriate for our study.

      In terms of biomechanical models, we plan to explore at least one alternative model, the MotorNet Model (https://elifesciences.org/articles/88591). This recently published model combines a six-muscle planar arm model with artificial neural networks (ANNs) to generate a control policy. The model has been used to predict movement curvature in various contexts.  We will focus on its utility to predict biases in reaching to visual targets.

      D) Reviewer 1 had concerns with how we measured the transformation bias. In particular, they asked why the data from Wang et al (2020) are used as an estimate of transformation biases, and not as the joint effects of visual and proprioceptive biases in the sensed target and hand location, respectively.

      We define transformation error as the misalignment between the visual target and the hand position. We quantify this transformation bias by referencing studies that used a matching task in which participants match their unseen hand to a visual target, or vice versa. Errors observed in these tasks are commonly attributed to proprioceptive bias, although they could also reflect a contribution from visual bias. We utilized the same data set to simulate both the transformation bias model and the proprioceptive bias model.

      Although it may seem that we are simply renaming concepts, the concept of transformation error addresses biases that arise during motor planning. For the proprioceptive bias model, the bias only influences the perceived start position but not the goal since proprioception will influence the perceived position of the target before the movement begins. In contrast, the transformation bias model proposes that movements are planned toward a target whose location is biased due to discrepancies between visual and proprioceptive representations.

      The question then arises whether measurements of proprioceptive bias also reflect a transformation bias. We believe that the transformation bias is influenced by proprioceptive feedback, or at the very least, proprioceptive and transformation bias share a common source of error and thus, are highly correlated. We will revise the Introduction and Results sections to more clearly articulate these relationships and assumptions.

      E) Reviewer 3 asked whether the oblique effect in visual perception could account for our motor bias.

      The potential link between the oblique effect and the observed motor bias is an intriguing idea, one that we had not considered. However, after giving this some thought, we see several arguments against the idea that the oblique effect accounts for the pattern of motor biases.

      First, by the oblique effect, variance is greater for diagonal orientations compared to Cartesian orientations. These differences in perceptual variability can explain the bias pattern in visual perception through a Bayesian efficient coding model (Wei & Stocker, 2015). We note that even though participants showed large variability for stimuli at diagonal orientations, the bias for these stimuli was close to zero. As such, we do not think it can explain the motor bias function given the large bias for targets at locations along the diagonal axes.

      Second, the reviewer suggested an "oblique effect" within the motor system, proposing that motor variability is greater for diagonal directions due to increased visual bias. If this hypothesis is correct, a visual bias model should account for the motor bias observed, particularly for diagonal targets. In other words, when estimating the visual bias from a reaching task, a similar bias pattern should emerge in tasks that do not involve movement. However, this prediction is not supported in previous studies. For example, in a position judgment task that is similar to our task but without the reaching response, participants exhibited minimal bias along the diagonals (Kosovicheva & Whitney, 2017).

      Despite our skepticism, we will keep this idea in mind during the revision, investigating variability in movement across the workspace.

    1. Reviewer #1 (Public review):

      Summary:

      In the abstract and throughout the paper, the authors boldly claim that their evidence, from the largest set of data ever collected on inattentional blindness, supports the views that "inattentionally blind participants can successfully report the location, color, and shape of stimuli they deny noticing", "subjects retain awareness of stimuli they fail to report", and "these data...cast doubt on claims that awareness requires attention." If their results were to support these claims, this study would overturn 25+ years of research on inattentional blindness, resolve the rich vs. sparse debate in consciousness research, and critically challenge the current majority view in cognitive science that attention is necessary for awareness.

      Unfortunately, these extraordinary claims are not supported by extraordinary (or even moderately convincing) evidence. At best, the results support the more modest conclusion: If sub-optimal methods are used to collect retrospective reports, inattentional blindness rates will be overestimated by up to ~8% (details provided below in comment #1). This evidence-based conclusion means that the phenomenon of inattentional blindness is alive and well as it is even robust to experiments that were specifically aimed at falsifying it. Thankfully, improved methods already exist for correcting the ~8% overestimation of IB rates that this study successfully identified.

      Comments:

      (1) In experiment 1, data from 374 subjects were included in the analysis. As shown in figure 2b, 267 subjects reported noticing the critical stimulus and 107 subjects reported not noticing it. This translates to a 29% IB rate, if we were to only consider the "did you notice anything unusual Y/N" question. As reported in the results text (and figure 2c), when asked to report the location of the critical stimulus (left/right), 63.6% of the "non-noticer" group answered correctly. In other words, 68 subjects were correct about the location while 39 subjects were incorrect. Importantly, because the location judgment was a 2-alternative-forced-choice, the assumption was that if 50% (or at least not statistically different than 50%) of the subjects answered the location question correctly, everyone was purely guessing. Therefore, we can estimate that ~39 of the subjects who answered correctly were simply guessing (because 39 guessed incorrectly), leaving 29 subjects from the non-noticer group who may have indeed actually seen the location of the stimulus. If these 29 subjects are moved to the noticer group, the corrected rate of IB for experiment 1 is 21% instead of 29%. In other words, relying only on the "Y/N did you notice anything" question leads to an overestimate of IB rates by 8%. This modest level of inaccuracy in estimating IB rates is insufficient for concluding that "subjects retain awareness of stimuli they fail to report", i.e. that inattentional blindness does not exist.

      In addition, this 8% inaccuracy in IB rates only considers one side of the story. Given the data reported for experiment 1, one can also calculate the number of subjects who answered "yes, I did notice something unusual" but then reported the incorrect location of the critical stimulus. This turned out to be 8 subjects (or 3% of the "noticer" group). Some would argue that it's reasonable to consider these subjects as inattentionally blind, since they couldn't even report where the critical stimulus they apparently noticed was located. If we move these 8 subjects to the non-noticer group, the 8% overestimation of IB rates is reduced to 6%.

      The same exercise can and should be carried out on the other 4 experiments, however, the authors do not report the subject numbers for any of the other experiments, i.e., how many subjects answered Y/N to the noticing question and how many in each group correctly answered the stimulus feature question. From the limited data reported (only total subject numbers and d' values), the effect sizes in experiments 2-5 were all smaller than in experiment 1 (d' for the non-noticer group was lower in all of these follow-up experiments), so it can be safely assumed that the ~6-8% overestimation of IB rates was smaller in these other four experiments. In a revision, the authors should consider reporting these subject numbers for all 5 experiments.

      (2) Because classic IB paradigms involve only one critical trial per subject, the authors used a "super subject" approach to estimate sensitivity (d') and response criterion (c) according to signal detection theory (SDT). Some readers may have issues with this super subject approach, but my main concern is with the lack of precision used by the authors when interpreting the results from this super subject analysis.

      Only the super subject had above-chance sensitivity (and it was quite modest, with d' values between 0.07 and 0.51), but the authors over-interpret these results as applying to every subject. The methods and analyses cannot determine if any individual subject could report the features above-chance. Therefore, the following list of quotes should be revised for accuracy or removed from the paper as they are misleading and are not supported by the super subject analysis:

      "Altogether this approach reveals that subjects can report above-chance the features of stimuli (color, shape, and location) that they had claimed not to notice under traditional yes/no questioning" (p.6)

      "In other words, nearly two-thirds of subjects who had just claimed not to have noticed any additional stimulus were then able to correctly report its location." (p.6)

      "Even subjects who answer "no" under traditional questioning can still correctly report various features of the stimulus they just reported not having noticed, suggesting that they were at least partially aware of it after all." (p.8)

      "Why, if subjects could succeed at our forced-response questions, did they claim not to have noticed anything?" (p.8)

      "we found that observers could successfully report a variety of features of unattended stimuli, even when they claimed not to have noticed these stimuli." (p.14)

      "our results point to an alternative (and perhaps more straightforward) explanation: that inattentionally blind subjects consciously perceive these stimuli after all... they show sensitivity to IB stimuli because they can see them." (p.16)

      "In other words, the inattentionally blind can see after all." (p.17)

      (3) In addition to the d' values for the super subject being slightly above zero, the authors attempted an analysis of response bias to further question the existence of IB. By including in some of their experiments critical trials in which no critical stimulus was presented, but asking subjects the standard Y/N IB question anyway, the authors obtained false alarm and correct rejection rates. When these FA/CR rates are taken into account along with hit/miss rates when critical stimuli were presented, the authors could calculate c (response criterion) for the super subject. Here, the authors report that response criteria are biased towards saying "no, I didn't notice anything". However, the validity of applying SDT to classic Y/N IB questioning is questionable.

      For example, with the subject numbers provided in Box 1 (the 2x2 table of hits/misses/FA/CR), one can ask, 'how many subjects would have needed to answer "yes, I noticed something unusual" when nothing was presented on the screen in order to obtain a non-biased criterion estimate, i.e., c = 0?' The answer turns out to be 800 subjects (out of the 2761 total subjects in the stimulus-absent condition), or 29% of subjects in this condition.

      In the context of these IB paradigms, it is difficult to imagine 29% of subjects claiming to have seen something unusual when nothing was presented. Here, it seems that we may have reached the limits of extending SDT to IB paradigms, which are very different than what SDT was designed for. For example, in classic psychophysical paradigms, the subject is asked to report Y/N as to whether they think a threshold-level stimulus was presented on the screen, i.e., to detect a faint signal in the noise. Subjects complete many trials and know in advance that there will often be stimuli presented and the stimuli will be very difficult to see. In those cases, it seems more reasonable to incorrectly answer "yes" 29% of the time, as you are trying to detect something very subtle that is out there in the world of noise. In IB paradigms, the stimuli are intentionally designed to be highly salient (and unusual), such that with a tiny bit of attention they can be easily seen. When no stimulus is presented and subjects are asked about their own noticing (especially of something unusual), it seems highly unlikely that 29% of them would answer "yes", which is the rate of FAs that would be needed to support the null hypothesis here, i.e., of a non-biased criterion. For these reasons, the analysis of response bias in the current context is questionable and the results claiming to demonstrate a biased criterion do not provide convincing evidence against IB.

      (4) One of the strongest pieces of evidence presented in the entire paper is the single data point in Figure 3e showing that in Experiment 3, even the super subject group that rated their non-noticing as "highly confident" had a d' score significantly above zero. Asking for confidence ratings is certainly an improvement over simple Y/N questions about noticing, and if this result were to hold, it could provide a key challenge to IB. However, this result hinges on a single data point, it was not replicated in any of the other 4 experiments, and it can be explained by methodological limitations. I strongly encourage the authors (and other readers) to follow up on this result, in an in-person experiment, with improved questioning procedures.

      In the current Experiment 3, the authors asked the standard Y/N IB question, and then asked how confident subjects were in their answer. Asking back-to-back questions, the second one with a scale that pertains to the first one (including a tricky inversion, e.g., "yes, I am confident in my answer of no"), may be asking too much of some subjects, especially subjects paying half-attention in online experiments. This procedure is likely to introduce a sizeable degree of measurement error.

      An easy fix in a follow-up study would be to ask subjects to rate their confidence in having noticed something with a single question using an unambiguous scale:

      On the last trial, did you notice anything besides the cross?

      (1) I am highly confident I didn't notice anything else<br /> (2) I am confident I didn't notice anything else<br /> (3) I am somewhat confident I didn't notice anything else<br /> (4) I am unsure whether I noticed anything else<br /> (5) I am somewhat confident I noticed something else<br /> (6) I am confident I noticed something else<br /> (7) I am highly confident I noticed something else

      If we were to re-run this same experiment, in the lab where we can better control the stimuli and the questioning procedure, we would most likely find a d' of zero for subjects who were confident or highly confident (1-2 on the improved scale above) that they didn't notice anything. From there on, the d' values would gradually increase, tracking along with the confidence scale (from 3-7 on the scale). In other words, we would likely find a data pattern similar to that plotted in Figure 3e, but with the first data point on the left moving down to zero d'. In the current online study with the successive (and potentially confusing) retrospective questioning, a handful of subjects could have easily misinterpreted the confidence scale (e.g., inverting the scale) which would lead to a mixture of genuine high-confidence ratings and mistaken ratings, which would result in a super subject d' that falls between zero and the other extreme of the scale (which is exactly what the data in Fig 3e shows).

      One way to check on this potential measurement error using the existing dataset would be to conduct additional analyses that incorporate the confidence ratings from the 2AFC location judgment task. For example, were there any subjects who reported being confident or highly confident that they didn't see anything, but then reported being confident or highly confident in judging the location of the thing they didn't see? If so, how many? In other words, how internally (in)consistent were subjects' confidence ratings across the IB and location questions? Such an analysis could help screen-out subjects who made a mistake on the first question and corrected themselves on the second, as well as subjects who weren't reading the questions carefully enough. As far as I could tell, the confidence rating data from the 2AFC location task were not reported anywhere in the main paper or supplement.

      (5) In most (if not all) IB experiments in the literature, a partial attention and/or full attention trial (or set of trials) is administered after the critical trial. These control trials are very important for validating IB on the critical trial, as they must show that, when attended, the critical stimuli are very easy to see. If a subject cannot detect the critical stimulus on the control trial, one cannot conclude that they were inattentionally blind on the critical trial, e.g., perhaps the stimulus was just too difficult to see (e.g., too weak, too brief, too far in the periphery, too crowded by distractor stimuli, etc.), or perhaps they weren't paying enough attention overall or failed to follow instructions. In the aggregate data, rates of noticing the stimuli should increase substantially from the critical trial to the control trials. If noticing rates are equivalent on the critical and control trials one cannot conclude that attention was manipulated.

      It is puzzling why the authors decided not to include any control trials with partial or full attention in their five experiments, especially given their online data collection procedures where stimulus size, intensity, eccentricity, etc. were uncontrolled and variable across subjects. Including such trials could have actually helped them achieve their goal of challenging the IB hypothesis, e.g., excluding subjects who failed to see the stimulus on the control trials might have reduced the inattentional blindness rates further. This design decision should at least be acknowledged and justified (or noted as a limitation) in a revision of this paper.

      (6) In the discussion section, the authors devote a short paragraph to considering an alternative explanation of their non-zero d' results in their super subject analyses: perhaps the critical stimuli were processed unconsciously and left a trace such that when later forced to guess a feature of the stimuli, subjects were able to draw upon this unconscious trace to guide their 2AFC decision. In the subsequent paragraph, the authors relate these results to above-chance forced-choice guessing in blindsight subjects, but reject the analogy based on claims of parsimony.

      First, the authors dismiss the comparison of IB and blindsight too quickly. In particular, the results from experiment 3, in which some subjects adamantly (confidently) deny seeing the critical stimulus but guess a feature at above-chance levels (at least at the super subject level and assuming the online subjects interpreted and used the confidence scale correctly), seem highly analogous to blindsight. Importantly, the analogy is strengthened if the subjects who were confident in not seeing anything also reported not being confident in their forced-choice judgments, but as mentioned above this data was not reported.

      Second, the authors fail to mention an even more straightforward explanation of these results, which is that ~8% of subjects misinterpreted the "unusual" part of the standard IB question used in experiments 1-3. After all, colored lines and shapes are pretty "usual" for psychology experiments and were present in the distractor stimuli everyone attended to. It seems quite reasonable that some subjects answered this first question, "no, I didn't see anything unusual", but then when told that there was a critical stimulus and asked to judge one of its features, adjusted their response by reconsidering, "oh, ok, if that's the unusual thing you were asking about, of course I saw that extra line flash on the left of the screen". This seems like a more parsimonious alternative compared to either of the two interpretations considered by the authors: (1) IB does not exist, (2) super-subject d' is driven by unconscious processing. Why not also consider: (3) a small percentage of subjects misinterpreted the Y/N question about noticing something unusual. In experiments 4-5, they dropped the term "unusual" but do not analyze whether this made a difference nor do they report enough of the data (subject numbers for the Y/N question and 2AFC) for readers to determine if this helped reduce the ~8% overestimate of IB rates.

      (7) The authors use sub-optimal questioning procedures to challenge the existence of the phenomenon this questioning is intended to demonstrate. A more neutral interpretation of this study is that it is a critique on methods in IB research, not a critique on IB as a manipulation or phenomenon. The authors neglect to mention the dozens of modern IB experiments that have improved upon the simple Y/N IB questioning methods. For example, in Michael Cohen's IB experiments (e.g., Cohen et al., 2011; Cohen et al., 2020; Cohen et al., 2021), he uses a carefully crafted set of probing questions to conservatively ensure that subjects who happened to notice the critical stimuli have every possible opportunity to report seeing them. In other experiments (e.g., Hirschhorn et al., 2024; Pitts et al., 2012), researchers not only ask the Y/N question but then follow this up by presenting examples of the critical stimuli so subjects can see exactly what they are being asked about (recognition-style instead of free recall, which is more sensitive). These follow-up questions include foil stimuli that were never presented (similar to the stimulus-absent trials here), and ask for confidence ratings of all stimuli. Conservative, pre-defined exclusion criteria are employed to improve the accuracy of their IB-rate estimates. In these and other studies, researchers are very cautious about trusting what subjects report seeing, and in all cases, still find substantial IB rates, even to highly salient stimuli. The authors should consider at least mentioning these improved methods, and perhaps consider using some of them in their future experiments.

    1. batch = min(zone_managed_pages(zone) >> 10, SZ_1M / PAGE_SIZE); batch /= 4; /* We effectively *= 4 below */ if (batch < 1) batch = 1; /* * Clamp the batch to a 2^n - 1 value. Having a power * of 2 value was found to be more likely to have * suboptimal cache aliasing properties in some cases. * * For example if 2 tasks are alternately allocating * batches of pages, one task can end up with a lot * of pages of one half of the possible page colors * and the other with pages of the other colors. */ batch = rounddown_pow_of_two(batch + batch/2) - 1;

      Determine the number of pages for batch allocating based on a heuristic. Using a (2^n - 1) to minimize cache aliasing issues.

      but I think it may also be categorized as a configuration policy because the code execution depends on CONFIG_MMU.

    1. -carson s This is annoying, I had written out a page note before but it deleted when I came back to this page. As such, this note will be shorter. Anyway, I liked this podcast overall, especially the discussion on group dynamics towards the end. that was really fascinating. I think that I agree with the idea that people should learn to disagree with each other and that the way to do that is by practicing. I mean, we learn everything else by practice, so why not this too? I also appreciated the point that reasonable people can disagree on things and that we need to be open tp the possibility that we may be wrong. that is something that I try to keep in mind day to day.

    1. I teach about shifting paradigms and talk about the discomfort it can cause. White students learning to think more critically about ques-tions o f race and racism may go home for the holidays and sud-denly see their parents in a different light.

      I really connect with this sentence because, as a teacher, I’ve noticed that challenging students' established ways of thinking can be uncomfortable for them, and I always try to acknowledge that discomfort. When we ask students to rethink long-held beliefs, especially around sensitive topics like race and racism, it can be unsettling. I feel it's my responsibility to guide them through this discomfort, helping them see that growth often comes from questioning old ideas and embracing new perspectives, even when it’s difficult.

    2. The unwillingness to approach teaching from a standpoint that includes awareness o f race, sex, and class is often rooted in the fear that classrooms will be uncontrollable, that emotions and passions will not be contained. To some extent, we all know that whenever we address in the classroom subjects that stu-dents are passionate about there is always a possibility of con-frontation, forceful expression of ideas, or even conflict. In much of my writing about p

      Educational inequality has a ripple effect that goes far beyond the classroom, shaping the entire course of a student's life. The fact that success in today’s world is so closely tied to a college degree highlights just how deep this problem runs. It’s frustrating to think that a student's potential is often dictated by the resources their family can provide, rather than their talents or drive. Wealthier students have the advantage of tutors, better schools, extracurricular activities, and financial stability, while students from lower-income families may be just as capable but are held back by factors beyond their control.

  2. inst-fs-iad-prod.inscloudgate.net inst-fs-iad-prod.inscloudgate.net
    1. ican dream and its practice has demographic and historical as well as in-dividual and structural causes. In the United States, class is connected with race and immigration; the poor are disproportionately African Americans or recent immigrants, especially from Latin America. Legal racial discrimination was abolished in American schooling during the last half century (an amazing ac-complishment in itself), but prejudice and racial hierarchy remain, and racial or ethnic inequities reinforce class disparities. This overlap adds more diffi-culties to the already difficult relationship between individual and collective goals of the American dream, in large part because it adds anxieties about di-versity and citizenship to concerns about opportunity and competition. The fact that class and race or ethnicity are so intertwined and so embedded in the structure of schooling may provide the greatest barrier of all to the achieve-ment of the dream for all Americans, and helps explain much of the contention, confusion, and irrationality in public education.

      It’s frustrating to see how intertwined class and race continue to be in the U.S. education system. While we’ve made strides in abolishing legal discrimination, the lingering effects of historical inequities still impact students today. It’s like we’ve removed some of the barriers, but others remain firmly in place, making it really tough for many kids to achieve the American dream. The point about anxiety around diversity and citizenship is particularly interesting. It seems like there’s this constant struggle between wanting to uphold the values of opportunity and competition while also addressing the realities of inequality. This creates a complicated dynamic where policies and practices often reflect more about societal fears than about truly supporting all students. It’s also alarming to think that these intertwined issues can create such a significant barrier to educational achievement. It raises questions about how we can create a more equitable system that not only acknowledges these disparities but actively works to dismantle them. We need to focus on comprehensive solutions that tackle both class and racial inequities, rather than treating them as separate issues.

    1. Unlike the situation in the rest of the welfare state, educational benefits cannot be tied to employment.

      This is interesting as you may want to think that the more we put into education the more 'employment' we get. However, this is not the case.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1 (Public Review):

      Comment 1. Clinical Data on Patient Brain Samples: The inclusion of specific details such as postmortem intervals and the age at disease onset for patient brain samples would be valuable. These factors could significantly affect the quality of the tissues and their relevance to the study. Moreover, given the large variation in disease duration between PD and PDD, it’s important to consider disease duration as a potential confounding factor, especially when concluding that PDD patients have a more severe form of synucleinopathy compared to PD.

      We thank the reviewer for this valuable comment. We have included the post-mortem interval (PMI) and age of death in Table S1, showing the clinicopathological information. Changes on page 16. As suggested by the reviewer, we included the discussion on the large variation in disease duration between PD and PDD cases. We noted that DLB cases also have shorter disease durations but still demonstrate seeding kinetics similar to PDD. Therefore, we hypothesise that the molecular differences we observed between different diseases were due to the strain properties or higher pathological load (seen in both PDD and DLB) and are unlikely due to the disease duration. Changes on pages 9-11, lines 204-212.

      Comment 2. Inclusion of Healthy Controls in Multiple Tests: Given the importance of healthy controls in scientific studies, especially those involving human brain samples, the authors could consider using healthy controls in more tests to strengthen the robustness of the findings. Expanding the use of healthy controls in biochemical profiling and phosphorylation profiles would provide a better basis for comparison and clarify the significance of results in a disease context. This will help the authors to elaborate on the interpretation of results, for example, in Figure 3, where the authors claim that PD brains show mostly monomeric _α_Syn forms (line 119 and 120, and also in 222 and 223). Whether it implies the absence of alpha-syn pathology in PD brains? If there are differences from healthy controls? What are these low molecular weight bands (¡15kD) (line 125-126) and whether they are also present in healthy controls? Also, we do not have a perfect pS129-specific (anti-p_α_Syn) antibody. They are known for non-specific labeling. Investigating the phosphorylation levels in healthy controls and comparing them to PD brains, especially considering the predominance of monomeric (healthy _α_Syn?) in PD brains, would help clarify the observed changes.

      We agree with the reviewer’s assessment and consider this an important suggestion. We performed biochemical profiling and immunogold imaging with the three HC cases and presented the results in Figure 4. aSyn in healthy controls was completely digested by PK. The low MW bands were absent in PD and HC, and there was no difference in the PK profiles. However, this may be due to the low pathology load and amount of pathological aSyn in the selected PD brains. Additional comments were added to the results. Changes are on pages 4 (lines 136-137) and page 7 (Figure 4).

      Comment 3. Age of Healthy Controls: Providing information about the age at death for healthy controls is crucial, as age can impact the accumulation of aSyn. Also include if the brain samples were age-matched, or analyses were age-adjusted.

      We have described the age of each patient, and the analyses were age-adjusted. Changes on page 16 (Table S1).

      Comment 4. Braak Staging Discrepancy: The study reports the same Braak staging for both PD and PDD, despite the significant difference in disease duration. Maybe other reviewers with clinical experience might have a better take on this. This observation merits discussion in the paper, allowing readers to better understand the implications of this finding.

      ddressed: Our PD and PDD cases are Braak stage 6, indicating that the LB pathology had progressed to the neocortex. It‘s important to note that Braak stage represents only where the LB pathogy has spread and does not indicate anything about the load of LBs. However, our immunohistochemistry results (page 20) show that PDD demonstrates a higher LB load than PD cases in the entorhinal cortex. As the reviewer has suggested, this comment has been amended in the manuscript. Changes on pages 9-11, lines 204-212.

      Comment 5. Citation of Relevant Studies: The paper should consider citing and discussing a recent celebrated study on PD biomarkers that used thousands of cerebrospinal fluid (CSF) samples from different PD patient cohorts to demonstrate the effectiveness of SAA as a biochemical assay for diagnosing PD and its subtypes.

      As suggested by the reviewer, we included this study in the discussion. Changes on page 12, lines 275-278.

      Reviewer 3 (Public Review):

      The experiments are missing two important controls. 1) what to fibrils generated by different in vitro fibril preparations made from recombinant synclein protein look like; and 2) the use of CSF from the same patients whose brain tissue was used to assess whether CSF and brain seeds look and behave identically. The latter is perhaps the most important question of all - namely how representative are CSF seeds of what is going on in patients’ brains?

      We thank the reviewers for this valuable comment. Although in vitro preformed fibrils (PFFs) made out of recombinant aSyn are still important sources for cellular and animal studies to generate disease models and investigate mechanisms, many studies have now turned to use human brain amplified fibrils considering them to more closely present the human structure. Therefore, our study was designed to specifically address this hypothesis by comparing e human derived and SAA-amplified fibrils. It would be interesting to compare these structures also to PFFs but this was beyond the scope of our study. Comparing the CSF and brain seed from the same patients would be very interesting indeed but also difficult as this would require biosample collection during life followed by brain donation. The SAA cannot be done from the PM CSF due to contamination with blood. However, we are in a privileged position to examine such a comparison soon with our longitudinal Discovery cohort, where some participants have donated their brains. These future studies will address the critical question of whether the CSF seeds reflect those in the brain.

      In their discussion the authors do not comment on the obvious differences in the conditions leading to the formation of seeds in the brain and in the artificial conditions of the seeding assay. Why should the two sets of conditions be expected to yield similar morphologies, especially since the extracted fibrils are subjected to harsh conditions for solubilization and re-suspension.

      We agree with the reviewer that the formation of seeds in the brain and the SAA reaction conditions are very different, and one would not expect similar fibrillar morphologies. However, the theory is that pathological seeds are known to amplify through templated seeding, where seeds copy their intrinsic properties to the growing SAA fibrils. Thus, numerous studies use the SAA fibrils as model fibrils to investigate the different aSyn strains. Our study aimed to test whether the SAA fibrils are representative models of the brain fibrils. We included a more explicit comment on this discussion. Changes on page 3, lines 78-83.

      Finally, the key experiment was not performed - would the resultant seeds from SAA preparations from the different nosological entities produce different pathologies when injected into animal brains? But perhaps this is the subject of a future manuscript.

      We agree this is an essential experiment to build on our conclusion. Animal studies would be imperative to assess whether the SAA fibrils reflect the brain fibrils’ toxicity. However, these were beyond the scope of the present study but are being performed in collaboration with some expert groups.

      Furthermore, the authors comment on phosphorylation patterns, stating that the resultant seeds are less heavy phosphorylated than the original material. Again, this should not be surprising, since the SAA assay conditions are not known to contain the enzymes necessary to phosphorylate synuclein. The discussion of PTMs is limited to pS-129 phosphorylation. What about other PTMs? How does the pattern of PTMs affect the seeding pattern.

      We agree with the reviewer that other PTMs should be explored, but this was beyond the scope of this study. Here, we could focus on pS129, which has multiple reliable antibodies that also work with immunogold-TEM.

      Lastly, the manuscript contains no data on how the diagnostic categories were assigned at autopsy. This information should be included in the supplementary material.

      Clinical and neuropathological diagnostic criteria are now included in Table S1. Changes on page 16, lines 448-461.

      Reviewer 1 (Recommendations for the authors):

      (1) Remove a duplicate sentence in line 94-96.

      Addressed: Thank you for pointing this out. The duplicated sentence has been corrected. Changes are on page 4, lines 105-106.

      (2) Figure 1 Placement of Healthy Controls: Moving the graph representing healthy controls from the supplementary materials to the main figures could help readers better appreciate the results of diseased states.

      The healthy control SAA curves were moved to the main figure. Changes are on page 5, Figure 2.

      (3) Commenting on Case 2 Healthy Control: In the discussion section, you may comment on the case of the healthy control that showed amplification towards the end. While definitive conclusions may be challenging, acknowledging the possibility of incidental Lewy bodies or the prodromal phase of the disease would add depth to the analysis? But make sure to include the age information for healthy controls.

      We believe this is an important point to discuss in the manuscript. We have referenced other studies with similar observations and stated that it is currently unknown what this phenomenon reflects (page 11, lines 221-226). The age information of the healthy control subjects was added to Table S1.

      (4) Figure S3 Clarity: To enhance the clarity of Figure S3, consider adding a reference marker or arrow in the low-magnification image that points to the region being magnified in the insets. This visual cue will make it easier for readers to connect the detailed insets with the corresponding area in the broader image.

      In Figure S3, we included a reference arrow in the low-magnification images to clarify where the higher-magnification images are taken. Changes are on page 19, Figure S3.

      Reviewer 2 (Recommendations for the authors):

      (1) A major issue confronting the field is the conflation of the PMCA and RT-QuIC assays (the latter of which was used here). The decision to rename and combine the two under the umbrella of SAAs does a major disservice to the field for many reasons. Recognizing that the push for this did not come from the authors, clarifying the differences in their Introduction would be very useful. I suggest this, in large part, because in the prion field, PMCA is known to amplify prion strains with high fidelity whereas the product from RT-QuIC does not. In fact, the RT-QuIC product for PrP is not even infectious, while the synuclein field uses it as a means to generate material for subsequent studies. Highlighting these differences would certainly strengthen the arguments the authors are making about the inadequacy of the synuclein RT-QuIC approach in research.

      We thank the reviewers for these very valuable comments. We have included a further introduction on PMCA and RT-QuIC, explaining the differences and clearly stating our selection of the RT-QuIC method in this paper (page 3, lines 55-68). In addition, we have highlighted that, unlike PMCA, the RT-QuIC end-products are non-infectious and biologically dissimilar to the seed protein. Combined with our results, the findings demonstrate the methodological limitation of RT-QuIC in reproducing the seed fibrils and replicating their intrinsic biophysical information.

      (2) On page 4, sentences starting on lines 94 and 95 are a duplication.

      The duplicated sentence has been corrected. Changes are on page 4, lines 105-106.

      (3) In the Results, noting that the pSyn staining on the RT-QuIC fibrils is coming from the human patient sample used to seed the reaction would be useful. This is mentioned in the Discussion, but the lack of mention in the Results made me pause reading to double check the methods. I think this could also be addressed a bit more clearly in the Abstract.

      We have clarified this in the Results and Abstract. Changes on page 1 (lines 21-22) and page 9 (lines 192-194)

      (4) On page 8 line 188, change was to were in the sentence, ”First, faster seeding kinetics was...”

      This grammar error has been corrected. Changes are on page 9, line 200.

      (5) The authors may want to comment on the unexpected finding that despite the RT-QuIC fibrils having a difference in twisted vs straight filaments, all 4 seeded reactions gave identical results in the conformational stability assay.

      Addressed: We want to thank the reviewer for this comment and have highlighted the unexpected finding with a comment on what could be causing the identical results in the conformational stability assay. Changes are on page 12, lines 297-303.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The study "Endogenous oligomer formation underlies DVL2 condensates and promotes Wnt/βcatenin signaling" by Senem Ntourmas et al. contributes to the understanding of phase separation in Dishevelled (DVL) proteins, specifically focusing on DVL2. It builds upon existing research by investigating the endogenous complexes of DVL2 using ultracentrifugation and contrasting them with DVL1 and DVL3 behavior. The study identifies a DVL2-specific region involved in condensate formation and introduces the "two-step" concept of DVL2 condensate formation, enriching the field's knowledge. 

      Strengths: 

      A notable strength of this study is the validation of endogenous DVL2 complexes, providing insights into its behavior compared to DVL1 and DVL3. The functional validation of the DVL C-terminus (here termed conserved domain 2 (CD2) and the identification of DVL2-specific regions (here termed LCR4) involved in condensate formation are significant contributions that complement the current knowledge on the importance of DVL DIX domain, DEP domain and intrinsically disordered regions between DIX and PDZ domains. Additionally, the introduction of the concept where oligomerization (step 1) precedes condensate formation (step 2) is an interesting hypothesis, which can be further experimentally challenged in the future.

      We thank the reviewer for her/his interest in our work and for acknowledging our significant contributions to the understanding of DVL2 phase separation.   

      Weaknesses: 

      However, the applicability of the findings to full-length DVL2 protein, hence the physiological relevance, is limited. This is mostly due to the fact that the authors almost completely depend on the set of DVL2 mutants, which lack the (i) DEP domain and (ii) nuclear export signal (NES). These variants fail to establish DEP domain-mediated interactions, including those with FZD receptors. Of note, the DEP domain itself represents a dimerization/tetramerization interface, which could affect the protein condensate formation of these mutants. Possibly even more importantly, the used mutants localize into the nucleus, which has different biochemical & biophysical properties than a cytoplasm, where DVL typically reside, which in turn affects the condensate formation. On top, in the nucleus, most of the DVL binding partners, including relevant kinases, which were reported to affect protein condensate formation, are missing.

      The most convincing way to address this valid concern and to support a physiological relevant role of our findings is to extend our experiments with full-length DVL2, which we did alongside the suggestion in point two (please see below). In addition, we address the specific issues as follows:

      We completely agree that interaction through the DEP domain contributes to condensate formation, which was thoroughly demonstrated in great studies by Melissa Gammons and Mariann Bienz, and complex formation (Fig. 2B, C). We deleted this domain on purpose for our mapping experiments, since we obtained more consistent results without any additional contribution of the DEP domain. Once we mapped CFR and identified crucial amino acids within CFR (VV, FF), we demonstrated that CFR-mediated interaction contributes to complex formation, condensate formation and pathway activation in the context of full-length DVL2 (Fig. 7A-G). 

      We also agree that the nuclear localization may affect condensate formation because of the reasons mentioned by the reviewer or others, such as differences in DVL2 protein concentration. However, later proof-of-concept experiments in full-length DVL2 confirmed that CFR and its identified crucial amino acids (VV, FF), which were mapped in this rather artificial nuclear context, contribute to the typical cytosolic condensate formation of DVL2 (Fig. 7C, D). Moreover, we also observed cells with cytosolic condensates for the NES-lacking DVL2 constructs, although to a lower extent as compared to cells with nuclear condensates. A new analysis of NES-lacking key constructs focusing exclusively on cells with cytosolic condensates revealed similar differences between the DVL2 mutants as were observed before when investigating cells with nuclear (and cytosolic) condensates (new Fig. S3E, F), suggesting that the detected differences are not due to nuclear localization but reflect the overall condensation capacity. 

      In addition, our condensate-challenging experiments (osmotic shock, 1,6-hexandiol) suggested that cytosolic condensates of full-length DVL2 and nuclear CFR-mediated condensates of deletion proteins lacking the DEP domain behave quite similar (Fig. 6A-C).

      Second, the use of an overexpression system, while suitable for comparing DVL2 protein condensate features, falls short in functional assays. The study could benefit from employing established "rescue systems" using DVL1/2/3 knockout cells and re-expression of DVL variants for more robust functional assessments. 

      We used the suggested established rescue system of DVL1/2/3 knockout cells (T-REx DVL1/2/3 triple knockout cells and T-REx DVL1/2/3 RNF43 ZNRF3 penta knockout cells, which are even more sensitive towards DVL re-expression as they lack RNF43/ZNRF3-mediated degradation of DVL activating receptors; both cell lines from the Bryja lab). Upon overexpression, our key mutants DVL2 VV-AA FF-AA and ∆CFR showed markedly reduced pathway activation compared to WT DVL2 (new Figs. 7F and S5J), as we observed before. Especially in the DVL1/2/3 triple knockout cells, DVL2 VV-AA FF-AA hardly activated the pathway and was as inactive as the established M2 mutant (new Fig. 7F). Most importantly, while re-expression of WT DVL2 at close to endogenous expression levels fully rescued Wnt3a-induced pathway activation in DVL1/2/3 knockout cells, DVL2 VV-AA FF-AA revealed significantly reduced rescue capacity and was almost as inactive as DVL2 M2 (new Figs. 7G and S5K). 

      Furthermore, the discussion and introduction overlook some essential aspects of DVL biology. One such example is the importance of the open/close conformation of DVL and its effects on DVL phase separation and activity. In the context of this study, it is important to say that this conformational plasticity is mediated by DVL C-terminus (CD2 in this study). The second example is the reported roles of DVL1 and DVL3, which can both mediate the Wnt3a signal. How this can be interpreted when DVL1 and DVL3 lack LCR4 and still form condensates? 

      We included the open/close conformation of DVL in our manuscript (introduction p. 3 and new discussion paragraph p. 10) and discussed it in the context of our findings. It is intriguing to speculate that Wnt-induced opening of DVL2 increases the accessibility of LCR4 and CD2, thereby triggering pre-oligomerization and subsequent phase separation of DVL2 (see discussion).

      We extended the last paragraph of the discussion to interpret the roles of DVL1 and DVL3 lacking LCR4 (see p. 10). In short, the general ability of DVL1 and DVL3 to form condensates and to activate the Wnt pathway can be potentially explained through the other interaction sites (DIX, DEP, intrinsically disordered region). However, previous studies suggest that the DVL paralogs exhibit (quantitative) differences in Wnt pathway activation and that all three paralogs have to interact at a certain ratio for optimal pathway activation. In this context, a physiologic role for DVL2 LCR4 may be to promote the formation of these DVL1/2/3 assemblies and/or to enhance the stability of these assemblies.

      In order to increase the physiological relevance of the study, I would recommend analyzing several key mutants in the context of the full-length DVL2 protein using the rescue/complementation system. Further, a more thorough discussion and connections with the existing literature on DVL protein condensates/puncta/LLPS can improve the impact of the study. 

      We thank the reviewer for her/his suggestions to improve our study, which we addressed as detailed above.

      Reviewer #2 (Public Review): 

      Summary: 

      The authors aimed to identify which regions of DVL2 contribute to its endogenous/basal clustering, as well as the relevance of such domains to condensate/phase separation and WNT activation. 

      Strengths: 

      A strength of the study is the focus on endogenous DVL2 to set up the research questions, as well as the incorporation of various techniques to tackle it. I found also quite interesting that DVL2-CFR addition to DVL1 increased its MW in density gradients. 

      We thank the reviewer for her/his interest in our work and the constructive suggestions to improve our study.

      Weaknesses: 

      I think that several of the approaches of the manuscript are subpar to achieve the goals and/or support several of the conclusions. For example: 

      (1) Although endogenous DVL2 indeed seems to form complexes (Figure 1A), neither the number of proteins involved nor whether those are homo-complexes can be determined with a density gradient. Super-resolution imaging or structural analyses are needed to support these claims. 

      We agree that it will be very interesting to study the nature of the detected endogenous complexes in detail and we will consider this for any follow-up study, as structural analyses were out of scope for the revision of the presented manuscript. To address the issue, we mentioned that the calculation of about eight DVL2 molecules per complex is based on the assumption of homotypic complexes (results p. 4) and we discussed, why we think that homotypic complexes are the most likely assumption based on the currently available (limited) data (discussion p. 8).

      (2) Follow-up analyses of the relevance of the DVL2 domains solely rely on overexpressed proteins. However, there were previous questions arising from o/e studies that prompted the focus on endogenous, physiologically relevant DVL interactions, clustering, and condensate formation.

      Although the title, conclusions, and relevance all point to the importance of this study for understanding endogenous complexes, only Figures 1A and B deal with endogenous DVL2. 

      We think that the biochemical detection of endogenous DVL2 complexes itself represents a valuable contribution to the understanding of endogenous DVL clustering, especially (i) since it is still lively discussed in the field whether and to which extent endogenous DVL assemblies exist (see introduction) and (ii) since recent studies addressing this issue rely on fluorescent tagging of the endogenous protein, which, among all benefits, harbors the risk to artificially affect DVL assembly. The follow-up analysis predominantly strengthens this key finding through (i) associating the detected complexes with established (DEP domain) and newly mapped (LCR4) DVL2 interaction sites, which we think is crucial to validate our biochemical approach, and (ii) linking the complexes with condensate formation and pathway activation for functional insights.

      In addition, we performed new experiments with re-expression of DVL2 and our key mutants at close to endogenous expression levels in DVL1/2/3 knockout cells, supporting a physiological relevant role of our findings (new Figs. 7G and S5K, please also see point (5) below).

      (3) Mutants lacking activity/complex formation, e.g. DVL2_1-418, may need further validation. For instance, DVL2_1-506 (same mutant but with DEP) seems to form condensates and it is functional in WNT signalling (King et al., 20223). These differences could be caused by the lack of DEP domain in this particular construct and/or folding differences. 

      We would definitely expect that DVL2 1-506 exhibits increased condensate formation and pathway activation as compared to DVL2 1-418, since the DEP domain was thoroughly characterized as interaction domain in the Bienz lab and the Gammons lab (see references), which we confirmed in our assays (Fig. 2B-D). However, as the DEP domain is an established DVL2 interaction site, we were not interested to further characterize the DEP domain but to explain the marked difference in complex formation between DVL2 ∆DEP and 1-418 (Fig. 2A-C), which could not be associated with any known DVL2 interaction site and which we finally mapped to CFR (Fig. 4A-D). 

      Since fusion of the newly-characterized interaction site CFR to DVL2 1-418 (1-418+CFR) rescued complex formation, condensate formation and signaling activity (Fig. 3B-E and Fig. 4C, D), we think that the lacking activity/complex formation of DVL2 1-418 is more likely due to missing interaction sites than due to folding problems. However, as it is hard to exclude folding differences of deletion mutants, we confirmed the CFR activity through loss-of-function experiments in the context of fulllength DVL2 with minimal point mutations (Fig. 7A-G, VV,FF). 

      (4) The key mutants, DeltaCFR and VV/FF only show mild phenotypes. The authors' results suggest that these regions contribute but are not necessary for 1) complex formation (Density gradient Figures 7A and B), condensate formation (Figures 7C and D), and WNT activity (Figure 7E). Of note Figure 7C shows examples for the mutants with no condensates while the qualification indicates that 50% of the cells do have condensates. 

      Condensate formation and Wnt pathway activation by DVL VV-AA FF-AA were reduced by more than 50% as compared to WT (Fig. 7D, E). We consider these marked differences, since loss of function always ranges between 0% and 100%. In newly performed experiments in DVL1/2/3 knockout cells, the differences were even more pronounced, see point (5) below.

      Yes, Fig. 7C shows an example to qualitatively visualize the change in condensate formation, while Fig. 7D provides the corresponding quantification allowing quantitative assessment of the differences.

      (5) Most of the o/e analyses (including all reporter assays) should be performed in DVL1-3 KO cells in order to explore specifically the behaviour of the investigated mutants. 

      As suggested, we employed DVL1/2/3 knockout cells for performing reporter assays (T-REx DVL1/2/3 triple knockout cells and T-REx DVL1/2/3 RNF43 ZNRF3 penta knockout cells, which are even more sensitive towards DVL re-expression as they lack RNF43/ZNRF3-mediated degradation of DVL activating receptors; both cell lines from the Bryja lab). Here, we focused on key mutants in the context of full-length DVL2, as they are closest to the physiologic situation. Upon overexpression, DVL2 VV-AA FF-AA and DVL2 ∆CFR showed markedly reduced pathway activation as compared to WT DVL2 (new Figs. 7F and S5J). Especially in the DVL1/2/3 triple knockout cells, DVL2 VV-AA FF-AA hardly activated the pathway and was as inactive as the established M2 mutant (new Fig. 7F). Moreover, re-expression at close to endogenous expression levels revealed that DVL2 VV-AA FF-AA less efficiently rescued Wnt3a-induced pathway activation as compared to WT (Figs. 7G and S5K).

      (6) How comparable are condensates found in the cytoplasm (usually for wt DVL) with those located in the nucleus (DEP mutants)? 

      In principal, cytosolic condensates could differ from nuclear condensates due to various reasons, such as e.g. different protein concentration, different availability of interaction partners or different biochemical/biophysical properties (please also see point 1 of reviewer 1). In our condensatechallenging experiments (osmotic shock, 1,6-hexandiol), cytosolic condensates of full-length DVL2 and nuclear condensates of DVL2 mutants behaved quite similar (Fig. 6A-C).

      We are confident that the differences between different DEP mutants in our mapping experiments are not due to nuclear localization but reflect the overall condensation capacity because later proofof-concept experiments demonstrated that CFR, which was identified in these mapping experiments, contributes to cytosolic condensate formation in the context of full-length DVL2 (Fig. 7C, D). Moreover, a new analysis focusing only on cells with cytosolic condensates, which can also be observed for DEP mutants to a low extent, revealed similar differences between key DEP mutants as observed before (Fig. S3E, F; for details please also see point 1 of reviewer 1).

      Several studies in the last two decades have analysed the relevance of DVL homo - and heteroclustering by relying on overexpressed proteins. Recent studies also explored the possibility of DVL undergoing liquid-liquid phase separation following similar principles. As highlighted by the authors in the introduction, there is a need to understand DVL dynamics under endogenous/physiological conditions. Recent super-resolution studies aimed at that question by characterising endogenously edited DVL2. The authors seemed to aim in the same direction with their initial findings (Figure 1A) but quickly moved to o/e proteins without going back to the initial question. This reviewer thinks that to support their conclusions and advance in this important question, the authors should introduce the relevant mutations in the endogenous locus (e.g. by Cas9+ donor template encoding the required 3' exons, as done by others before for WNT components, including DVL2) and determine their impact in the above-indicated processes.

      We agree that genomic editing of the DVL2 locus would be the cleanest system to study the relevance of CFR at endogenous expression levels. As we did not have the resources to generate the suggested cells, we, as an alternative, transiently re-expressed DVL2 and the respective mutants at low levels that were really close to the endogenous expression levels in DVL1/2/3 triple knockout cells (Fig. S5K). These experiments revealed that DVL2 VV-AA FF-AA less efficiently rescued Wnt3ainduced pathway activation as compared to DVL2 WT (Fig. 7G).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: 

      This is a detailed description of the role of PKCδ in Drosophila learning and memory. The work is based on a previous study (Placais et al. 2017) that has already shown that for the establishment of long-term memory, the repetitive activity of MP1 dopaminergic neurons via the dopamine receptor DAMB is essential to increase mitochondrial energy flux in the mushroom body. 

      In this paper, the role of PKCδ is now introduced. PKCδ is a molecular link between the dopaminergic system and the mitochondrial pyruvate metabolism of mushroom body Kenyon cells. For this purpose, the authors establish a genetically encoded FRET-based fluorescent reporter of PKCδspecific activity, δCKAR. 

      Strengths: 

      This is a thorough study of the long-term memory of Drosophila. The work is based on the extensive, high-quality experience of the senior authors. This is particularly evident in the convincing use of behavioral assays and imaging techniques to differentiate and explore various memory phases in Drosophila. The study also establishes a new reporter to measure the activity of PKCδ - the focus of this study - in behaving animals. The authors also elucidate how recurrent spaced training sessions initiate a molecular gating mechanism, linking a dopaminergic punishment signal with the regulation of mitochondrial pyruvate metabolism. This advancement will enable a more precise molecular distinction of various memory phases and a deeper comprehension of their formation in the future. 

      Weaknesses: 

      Apart from a few minor technical issues, such as the not entirely convincing visualisation of the localisation of a PKCδ reporter in the mitochondria, there are no major weaknesses. Likewise, the scientific classification of the results seems appropriate, although a somewhat more extensive discussion in relation to Drosophila would have been desirable.

      We are very grateful for this very positive appreciation of our work. Following this comment, we have revised our manuscript to bring more compelling evidence of the mitochondrial localization of the PKCδ reporter. We also developed the discussion of our results with respect to the Drosophila learning and memory literature.

      Reviewer #2 (Public Review):

      Summary 

      This study deepens the former authors' investigations of the mechanisms involved in gating the longterm consolidation of an associative memory (LTM) in Drosophila melanogaster. After having previously found that LTM consolidation 1. costs energy (Plaçais and Préat, Science 2013) provided through pyruvate metabolism (Plaçais et al., Nature Comm 2017) and 2. is gated by the increased tonic activity in a type of dopaminergic neurons ('MP1 neurons') following only training protocol relevant for LTM, i.e. interspaced in time (Plaçais et al., Nature Neuro 2012), they here dig into the intra-cell signalling triggered by dopamine input and eventually responsible for the increased mitochondria activity in Kenyon Cells. They identify a particular PKC, PKCδ, as a major molecular interface in this process and describe its translocation to mitochondria to promote pyruvate metabolism, specifically after spaced training. 

      Methodological approach 

      To that end, they use RNA interference against the isozyme PKCδ, in a time-controlled way and in the whole Kenyon cell populations or in the subpopulation forming the α/β lobe. This knock-down decreased the total PKCδ mRNA level in the brain by ca. 30%, and is enough to observe decreased in flies performances for LTM consolidation. Using Pyronic, a sensor for pyruvate for in vivo imaging, and pharmacological disruption of mitochondrial function, the authors then show that PKCδ knockdown prevents a high level of pyruvate from accumulating in the Kenyon cells at the time of LTM consolidation, pointing towards a role of PKCδ in promoting pyruvate metabolism. They further identify the PDH kinase PDK as a likely target for PKCδ since knocking down both PKCδ and PDK led to normal LTM performances, likely counterbalancing PKCδ knock-down alone. 

      To understand the timeline of PKCδ activation and to visualise its mitochondrial translocation in a subpart of Mushroom body lobes they imported in fruitfly the genetically-encoded FRET reporters of PKCδ, δCKAR, and mitochondria-δCKAR (Kajimoto et al 2010). They show that PKCδ is activated to the sensor's saturation only after spaced training, and not other types of training that are 'irrelevant' for LTM. Further, adding thermogenetic activation of dopaminergic neurons and RNA interference against Gq-coupled dopamine receptor to FRET imaging, they identify that a dopamine-triggered cascade is sufficient for the elevated PKCδ-activation. 

      Strengths and weaknesses 

      The authors use a combination of new fluorescent sensors and behavioral, imaging, and pharmacological protocols they already established to successfully identify the molecular players that bridge the requirement for spaced training/dopaminergic neurons MP1 oscillatory activity and the increased metabolic activity observed during long-term memory consolidation. 

      The study is dense in new exciting findings and each methodological step is carefully designed. Almost all possible experiments one could think of to make this link have been done in this study, with a few exceptions that do not prevent the essential conclusions from being drawn. 

      The discussion is well conducted, with interesting parallels with mammals, where the possibility that this process takes place as well is yet unknown. 

      Impact 

      Their findings should interest a large audience: 

      They discover and investigate a new function for PKCδ in regulating memory processes in neurons in conjunction with other physiological functions, making this molecule a potentially valid target for neuropathological conditions. They also provide new tools in drosophila to measure PKCδ activation in cells. They identify the major players for lifting the energetic limitations preventing the formation of a long-term memory. 

      We warmly thank Reviewer #2 for the enthusiastic assessment of our work. There were no specific point to address in the Public Review.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I have a few comments that could help improve the paper and help the reader navigate the detailed analysis.

      (1) Perhaps the authors could add a sentence or two in the intro about the different PKC genes in Drosophila and whether they are expressed in the MB.

      We thank Reviewer #1 for this suggestion. We now describe in the introduction the various subfamilies of PKCs downstream of Gq signaling , the Drosophila members of those different PKC subfamilies, and their expression in the brain. 

      (2) Italicise Drosophila throughout the text.

      We have done this correction.

      (3) In Figure 1, you could change the scheme in Figure F-H and have the timeline always start after training. Then you could see that the training varies in time (perhaps provide the exact duration for each training protocol) and the test interval is constant. Why is it actually measured in a time window and not at an exact time?

      This is indeed a good suggestion to clarify the presentation of our results. We changed the timelines schemes in all the figures with the t=0 starting at the end of the conditioning. Indeed, each conditioning protocol has a different duration as represented on these timelines: as one-cycle training lasts 5 min, 5x massed training has a duration of 20 min, and 5x spaced training takes 1 hours and 30 min to be completed, with its 15 min intertrial intervals. In vivo imaging experiments are performed during a certain time window after conditioning during which, according to our previous experience, the activity of MP1 dopamine neurons after spaced training remains constant (Plaçais et al., 2012). This offers the practical advantage that we can image several flies after a given training session, instead of having to perform many consecutive conditioning protocols.  

      (4) In Figure 2 you could show the massed training data from the supplement. This is very similar to what is shown in Figure 1. Are there also imaging experiments on massed training?

      The reason why massed training data was initially displayed in the supplementary data is that α/β neurons are known to be crucial for LTM formation but are not required for memory formed after massed training, so that the absence of effect was somehow expected. Nonetheless, we performed δCKAR imaging in α/β neurons after 5x massed training and found that PKCδ activity was not increased post-conditioning as expected (Figure 2C). This experiment was performed in parallel of additional data after 5x spaced conditioning δCKAR imaging in α/β neurons as a positive control (these new data were added to the Figure 2B). Following Reviewer #1’s suggestion, all data investigating the effect of PKCδ in α/β neurons are now displayed on Figure 2.

      (5) Figure 3: I am not sure if the blue curve in Figure A really represents an upregulated pyruvate flux compared to the control (mentioned in line 210). It may be the case initially, but it is clearly below the control after 40s. Why is that?

      This visual effect is due to the fact that PDBu injection in itself increases the pyruvate level in MB neurons (independently of its effect on PKCδ), before sodium azide injection. As a result, the baseline of the PDBu treated flies is above the DMSO control flies when sodium azide is injected, which results in the fact that the pyronic sensor saturates quicker and therefore reaches its plateau before the control when traces where normalized right before sodium azide injection. 

      That being said, the measure of the slope in itself following sodium azide injection is not affected by these differences, and is always measured between 10 and 70% of the plateau. 

      Given this remark, and another comment from Reviewer#2 about this experiment, we removed the panel 3A and present only the complete recording of this experiment, that is now displayed on Figure 3 – figure supplement 1C.

      (6) For me, the localisation of the mitochondrial reporter in the mitochondria is not clear. The image in the supplement is not sufficient to show this clearly. What is missing here is a co-staining in the same brain of UAS-mito-δCKAR and a mitochondrial marker to label the mitochondria and the reporter at the same time in the same animal.

      We agree with Reviewer #1’s remark and added new data to make this point more convincing. As suggested, we co-expressed mito-δCKAR with the mitochondrial reporter mito-DsRed in MB neurons (Lutas et al., 2012). We observed a clear colocalization of both signals by performing confocal imaging in the MB neurons somas, indicating that mito-δCKAR is indeed addressed to mitochondria (Figure 4 – figure supplement 1B and 2). 

      (7) Are there controls that the MB expression of the reporters in the flies does not influence the learning ability? In order to make statements about the physiology of the cells, it must also be shown that the cells still have normal activity and allow learning behaviour comparable to wild-type flies.

      This is indeed an important control that we added in the revised version. We tested the memory after 5x spaced, 5x massed and 1x training of flies expressing in the MB the various imaging probes used in our study (cyto-δCKAR, mito-δCKAR and Pyronic). Memory performance was similar to controls in all cases (Figure 1 – figure supplement 1E).  

      (8) Perhaps the authors could go into more detail on two points in the discussion and shorten the comprehensive comparison to the vertebrate system somewhat. It would be nice to know how the local transfer from the peduncle to the vertical lobus is supposed to take place. What is the mechanism here? Any suggestions from the literature? It would also be useful to mention the compartmentalisation of the MB and how the information can overcome these boundaries from the peduncle to the vertical lobe.

      We now elaborate on this question in the discussion (lines 368-386). To sum up, given that the compartmentalization of the MBs is anatomically defined by the presence of specific subset of MBON and DAN cell types (forming different information-processing units), rather than by physical boundaries per se, we can consider two main hypotheses to explain PKCδ activation transfer from the peduncle to the lobes: passive diffusion of activated PKCδ, or mitochondrial motility that would displace PKCδ from its place of first activation. We indeed found that mitochondrial motility was occurring upon 5x spaced conditioning for LTM formation (Pavlowsky et al. 2024).

      In principle, one could also consider that PKCδ could be activated in the lobes by a relaying neuron. The MVP2 neuron (aka MBON-γ1>pedc) presents dendrites facing MP1 and makes synapses with the α/β neurons at the level of the α and β lobes, which makes it a good candidate. Furthermore, as we show that PKCδ activation in the lobes requires DAMB (Figure 4C, Figure 5A-B, Figure 5 – figure supplement 1), one could imagine the following activation loop: MP1 activates the MB neurons via DAMB, that activate MVP2 at the level of the peduncle, which activates in turn the MB neurons at the level of the lobes. However, we did not retain this hypothesis, because MVP2 is GABAergic, which makes it highly unlikely to be able to activate a kinase like PKCδ.

      Regarding the comparative discussion with mammalian systems, we appreciate Reviewer #1’s remark that it may appear too detailed, but given that Reviewer #2 (public comment) highlighted the ‘interesting parallel with mammals’ in our discussion, we finally chose not to reduce this part in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Fig 1G: is there a decrease in PKCδ activation after mass training as compared to the control, indicating an inhibitory mechanism onto PKCδ following mass training? Or is this an artifact of the PDBu application procedure in the control group? 

      We thank Reviewer #2 for this careful comment. The dent in the timetrace following PDBu application after massed training (Figure 1G) is indeed an artifact due to the manual injection of the drug. But we would like to emphasize that what matters in the determination of PKCδ activity is the level of the baseline before PDBu application after normalization to the final plateau, so that variation around the injection time do not impact the result of the analysis. Moreover, in the revised version, we performed a similar series of experiments, using an α/β neuron-specific driver (Figure 2C). In this series of experiments, there were limited injection artefacts, and we obtained the same conclusion as Figure 1G that PKCδ activity is left unchanged by 5x massed conditioning. 

      Fig 3A: I suggest moving this panel in the supplement: I found it difficult to process the effect of PDBu that is unspecific to PKCδ and that leads to a different plateau because of a different baseline. It would be better explained in more detail in the supplement, especially given that the 3B panel can lead to a similar conclusion and does not have this specificity problem. Up to the authors.

      We thank Reviewer #2 for this feedback. We followed the suggestion and now only display the full recording of this experiment on Figure 3 – figure supplement 1C.

      Fig 3C: To go further, one wonders if knocking-down PDK would act as a switch for gating LTM formation, i.e. if done during a 1x training or a 5x massed training would it gate long-term consolidation?

      This is indeed an excellent suggestion. We performed this experiment and showed that in flies expressing the PDK RNAi in adult MB neurons, only one cycle of training was sufficient to induce longterm memory formation (Figure 3A), instead of the 5 spaced cycles normally required. This confirms the model we previously established in Plaçais et al. 2017, where long-term memory formation was observed upon PDK MB knock-down after 2 cycles of spaced training. This new result goes further in characterizing this facilitation effect, now showing that even a single cycle is sufficient. Altogether these data show that mitochondrial metabolic activation is the critical gating step in long-term memory formation. Spaced training achieves this activation through PDK inhibition, mediated by PKCδ.

      What is the level of mRNA in this construct? I don't see a quantification, can you justify it?

      We thank Reviewer #2 for this remark. This PDK RNAi had been used in a previous work in pyruvate imaging experiment, where it successfully boosted mitochondrial pyruvate uptake. But indeed we had not validated it at the mRNA level. In the revised version of the present manuscript, we now confirm by RT-qPCR that the PDK RNAi efficiently downregulates PDK expression in neurons (Figure 3 – figure supplement 1A).

      Fig. 4C: Is PKCδ activation increase in Vertical lobe DAMB-dependent? One wonders, because MP1 may somehow activate other neurons that could reach this part of the Kenyon Cells. I do not see in the results what could disprove this possibility. The mechanism linking DAMB activation in the peduncle and PKCδ activation in the VL is mysterious, see also Fig. 5.

      This is a very sound remark. In the revised version we have checked whether PKCδ activation in the vertical lobes is also dependent on DAMB.  We performed thermogenetic activation of MP1 neurons and imaged mito-δCKAR signal in the vertical lobes upon DAMB MB knock-down. We found that as for the peduncle, DAMB was required for PKCδ mitochondrial activation (Figure 4C, right panel). This experiment was performed in parallel with similar measurements in flies that did not express DAMB RNAi, as a positive control (these new control data were added to the Figure 4C, left panel).

      This result supports a model where dopamine from MP1 neurons directly acts on Kenyon cells, even for PKCδ activation in the vertical lobes. Thus, this advocates for a diffusion of DAMB-activated PKCδ from the peduncle to the vertical lobes, either by passive diffusion or by mitochondrial motility - two hypotheses that we added in the discussion. 

      Fig. 5: If MP1 neurons release dopamine only to the peduncle, how do you expect PKCδ to be translocated to mitochondria all the way to the vertical lobe? Also is it specific to the vertical lobe and not found in the medial lobe?

      Investigating the spatial distribution of PKCδ is, once again, a very sound suggestion. We re-analyzed our dataset of the mito-δCKAR signal after spaced training for peduncle measurement, as the imaging plane also included the β lobe. We found that PKCδ is also activated at that level, and that its activation also depends on DAMB (Figure 5 – figure supplement 1). We also performed additional pyruvate measurements in the medial lobes, and observed that mitochondria pyruvate uptake presents the same extension in time in the medial lobes as in the vertical lobes when comparing spaced training (Figure 6 E-F and Figure 6 – figure supplement 1E-F) to 1x training (Figure 6A-B and Figure 6 – figure supplement 1C-D). Therefore, the metabolic action of PKCδ seems not to be restricted to the vertical lobes, but spreads across the whole axonal compartment.

      Altogether, these data point toward the fact that activated PKCδ diffuse from its point of activation, the peduncle, where dopamine is released by MP1 and DAMB is activated, to both the vertical and medial lobes, either by passive diffusion, or taking advantage of mitochondrial movement that was shown to be triggered by spaced training (Pavlowsky et al. 2024), from the MB neurons somas to the axons. To further characterize the kinetics of PKCδ activation, we measured its activity using the mitoδCKAR sensor at 3 and 8 hours following spaced training. We found that while PKCδ was still active at 3 hours, it was back to its baseline activity level at 8 hours, both at the level of the peduncle and the vertical lobes (Figure 5 C-F). However, at 8 hours, pyruvate metabolism is still upregulated in the lobes, which indicates that an additional mechanism is relaying PKCδ action to maintain the high energy state of the MBs at later time points. As we propose in the revised discussion, the mitochondrial motility hypothesis makes sense here (Pavlowsky et al. 2024), as the progressive increase in the number of mitochondria in the lobes would be able to sustain high mitochondrial metabolism beyond PKCδ activation at 8 hours post-conditioning. This new result and its implications open exciting perspectives for future research about the different mitochondrial regulations occurring after spaced training, their organization over time and their interactions.

      Fig.7:  PDK written in yellow is almost invisible

      This has been changed.

    1. As is wont to happen in culture, while we’re appropriately punishing the Cosby Show patriarch for his horrific misdeeds, the women around him are also being made to pay, this time literally.

      It is unfortunate that the act of an individual person can permanently taint the work of hundreds, directly affecting those around him who weren't even involved in the perpetrated crime. It is also unfortunate that in the process of trying to protect women, or any victims for a matter of fact, we unintentionally are harming them as well. In this case, it seems that the choice to pull the reruns of The Cosby Show was more of a publicity stunt instead of a legitimate attempt to protect the demographic most harmed by Bill Cosby's actions. They could have easily simply done something to his residual payments to prevent him from profiting off the work he worked in—not his work, he was simply just one of the many people that helped The Cosby Show become reality. This is why it's important to think thoroughly of the consequences an action may have on not just the perpetuator, but also the victims and other parties, directly or indirectly, involved.

    1. sexual violence “sex”, or by blaming the victim for the violence they experienced

      The impact of language, especially in news articles, is blatantly clear. We get a lot of our information of the outside world from the news, and the first source of information we see and consume from these news articles are the headlines. These titles often introduce bias, either exaggerating or downplaying the content to attract readers and generate revenue, which is why it's important to also read the content of the article, and other articles, to fully grasp the situation the article is reporting on.

      This is particularly problematic in cases of sexual violence, where articles frequently minimize the severity of the crime and may even favor the abuser. The distinction between "sex" and "sexual violence" hinges on one important element: consent. This difference is significant. It's not uncommon for me to see articles that cover rape, not label what is rape as rape (a 'recent' case I could think of is the mass rape trial in France). The connotations of the words used shape our perceptions (e.g. words like dislike, hate, detest, loathe---we feel different things regarding each of these words even though they are often referred to as synonym of each other), influencing how we judge the seriousness of these incidents.

    1. The player’s initial fear that they might need to act quickly to defend themselves from some lurking supernatural horror becomes transmuted, by the end of the story, into the inevitable realization that their character has already lost her chance to act,(p.131)has arrived too late to intervene in her sister’s story. All she can do now is understand it.

      Very symbolic of life. There is a famous quote "life is ten percent of what happens to you and ninety percent of how you react to it." We may think in Gone Home that the ten percent is happening. That something is happening to us. But, in reality we as the character are only reacting to what is already happening.

    2. Walter Benjamin’s portrait of the flâneur, the urban wanderer who walks without purpose other than keen observation through the city streets, and in whom “the joy of watching is triumphant” (1973): the connection between flâneurs and explorers of games has been noted by many games scholars (Kagen 2015; Carbo-Mascarell 2016). In games, walking connects to the adventure pillar of exploration, as well as the sense of immersive transportation and a focus on environmental storytelling: in adventure games specifically, it provides a space for thinking and reflecting, a necessary precursor to successfully overcoming obstacles.

      I find this first section introducing walking’s purpose in games and specifically as the base of “walking simulators” interesting because I had always viewed walking as a waste of time. I think it was an important thing to note that some people do feel this way, which has caused many games to include a “fast pass” that can be purchased or is a complete replacement for any walking. It’s especially interesting to look at how walking or the lack thereof can affect our “fun”, agency, and a sort of challenge. If we don’t have this break time to think and reflect, then it feels like it would be a lot harder to be able to overcome any obstacles we may face. I never understood the immersive power of walking through an environment for the player, but now that I think about it, having a “fast pass” model for the game feels like it would disconnect the player from the character they’re playing as. If we don’t get to experience the character’s entire journey, are we really in full control of the character? If we aren’t, how are we going to feel like we are the character themselves?

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The overall analysis and discovery of the common motif are important and exciting. Very few human/primate ribozymes have been published and this manuscript presents a relatively detailed analysis of two of them. The minimized domains appear to be some of the smallest known self-cleaving ribozymes.

      Strengths:

      The manuscript is rooted in deep mutational analysis of the OR4K15 and LINE1 and subsequently in modeling of a huge active site based on the closely-related core of the TS ribozyme. The experiments support the HTS findings and provide convincing evidence that the ribozymes are structurally related to the core of the TS ribozyme, which has not been found in primates prior to this work.

      Weaknesses:

      (1) Given that these two ribozymes have not been described outside of a single figure in a Science Supplement, it is important to show their locations in the human genome, present their sequence and structure conservation among various species, particularly primates, and test and discuss the activity of variants found in non-human organisms. Furthermore, OR4K15 exists in three copies on three separate chromosomes in the human genome, with slight variations in the ribozyme sequence. All three of these variants should be tested experimentally and their activity should be presented. A similar analysis should be presented for the naturally-occurring variants of the LINE1 ribozyme. These data are a rich source for comparison with the deep mutagenesis presented here. Inserting a figure (1) that would show the genomic locations, directions, and conservation of these ribozymes and discussing them in light of this new presentation would greatly improve the manuscript. As for the biological roles of known self-cleaving ribozymes in humans, there is a bioRxiv manuscript on the role of the CPEB3 ribozyme in mammalian memory formation (doi.org/10.1101/2023.06.07.543953), and an analysis of the CPEB3 functional conservation throughout mammals (Bendixsen et al. MBE 2021). Furthermore, the authors missed two papers that presented the discovery of human hammerhead ribozymes that reside in introns (by de la PeÃ{plus minus}a and Breaker), which should also be cited. On the other hand, the Clec ribozyme was only found in rodents and not primates and is thus not a human ribozyme and should be noted as such.

      We thank this Reviewer for his/her input and acknowledgment of this work. To improve the manuscript, we have included the genomic locations in Figure 1A, Figure 6A and Figure 6C. And we have tested the activity of representative variants found in the human genome and discussed the activity of the variants in other primates. All suggested publications are now properly cited.

      Line 62-66: It has been shown that single nucleotide polymorphism (SNP) in CPEB3 ribozyme was associated with an enhanced self-cleavage activity along with a poorer episodic memory (14). Inhibition of the highly conserved CPEB3 ribozyme could strengthen hippocampal-dependent long-term memory (15, 16). However, little is known about the other human self-cleaving ribozymes.

      Line 474-501: Homology search of two TS-like ribozymes. To locate close homologs of the two TS-like ribozymes, we performed cmsearch based on a covariance model (38) built on the sequence and secondary structural profiles. In the human genome, we got 1154 and 4 homolog sequences for LINE-1-rbz and OR4K15-rbz, respectively. For OR4K15-rbz, there was an exact match located at the reverse strand of the exon of OR4K15 gene (Figure 6A). The other 3 homologs of OR4K15-rbz belongs to the same olfactory receptor family 4 subfamily K (Figure 6C). However, there was no exact match for LINE-1-rbz (Figure 6A). Interestingly, a total of 1154 LINE-1-rbz homologs were mapped to the LINE-1 retrotransposon according to the RepeatMasker (http://www.repeatmasker.org) annotation. Figure 6B showed the distribution of LINE-1-rbz homologs in different LINE-1 subfamilies in the human genome. Only three subfamilies L1PA7, L1PA8 and L1P3 (L1PA7-9) can be considered as abundant with LINE-1-rbz homologs (>100 homologs per family). The consensus sequences of all homologs obtained are shown in Figure 6D. In order to investigate the self-cleavage activity of these homologs, we mainly focused on the mismatches in the more conserved internal loops. The major differences between the 5 consensus sequences are the mismatches in the first internal loop. The widespread A12C substitution can be found in majority of LINE-1-rbz homologs, this substitution leads to a one-base pair extension of the second stem (P2) but almost no activity (RA’: 0.03) based on our deep mutational scanning result. Then we selected 3 homologs without A12C substitution for LINE-1-rbz for in vitro cleavage assay (Figure 6E). But we didn’t observe significant cleavage activity, this might be caused by GU substitutions in the stem region. For 3 homologs of OR4K15-rbz, we only found one homolog of OR4K15 with pronounced self-cleavage activity (Figure 6F). In addition, we performed similar bioinformatic search of the TS-like ribozymes in other primate genomes. Similarly, the majority (15 out of 18) of primate genomes have a large number of LINE-1 homologs (>500) and the remaining three have essentially none. However, there was no exact match. Only one homolog has a single mutation (U38C) in the genome assembly of Gibbon (Figure S15). The majority of these homologs have 3 or more mismatches (Figure S15). For OR4K15-rbz, all representative primate genomes contain at least one exact match of the OR4K15-rbz sequence.

      Line 598-602: According to the bioinformatic analysis result, there are some TS-like ribozymes (one LINE-1-rbz homolog in the Gibbon genome, and some OR4K15-rbz homologs) with in vitro cleavage activity in primate genomes. Unlike the more conserved CPEB3 ribozyme which has a clear function, the function of the TS-like ribozymes is not clear, as they are not conserved, belong to the pseudogene or located at the reverse strand.

      (2) The authors present the story as a discovery of a new RNA catalytic motif. This is unfounded. As the authors point out, the catalytic domain is very similar to the Twister Sister (or "TS") ribozyme. In fact, there is no appreciable difference between these and TS ribozymes, except for the missing peripheral domains. For example, the env33 sequence in the Weinberg et al. 2015 NCB paper shows the same sequences in the catalytic core as the LINE1 ribozyme, making the LINE1 ribozyme a TS-like ribozyme in every way, except for the missing peripheral domains. Thus these are not new ribozymes and should not have a new name. A more appropriate name should be TS-like or TS-min ribozymes. Renaming the ribozymes to lanterns is misleading.

      Although we observed some differences in mutational effects, we agree with the reviewer that it is more appropriate to call them TS-like ribozymes. We have replaced all “lantern ribozyme” with “TS-like ribozyme” as suggested.

      (3) In light of 2) the story should be refocused on the fact the authors discovered that the OR4K15 and LINE1 are both TS-like ribozymes. That is very exciting and is the real contribution of this work to the field.

      We thank this Reviewer for their acknowledgement of this work. To improve the manuscript, we have re-named the ribozymes as suggested.

      (4) Given the slow self-scission of the OR4K15 and LINE1 ribozymes, the discussion of the minimal domains should be focused on the role of peripheral domains in full-length TS ribozymes. Peripheral domains have been shown to greatly speed up hammerhead, HDV, and hairpin ribozymes. This is an opportunity to show that the TS ribozymes can do the same and the authors should discuss the contribution of peripheral domains to the ribozyme structure and activity. There is extensive literature on the contribution of a tertiary contact on the speed of self-scission in hammerhead ribozymes, in hairpin ribozyme it's centered on the 4-way junction vs 2-way junction structure, and in HDVs the contribution is through the stability of the J1/2 region, where the stability of the peripheral domain can be directly translated to the catalytic enhancement of the ribozymes.

      We appreciate your question and the valuable suggestions provided. We have included the citations and discussion about the peripheral domains in other ribozymes.

      Line 570-576: Thus, a more sophisticated structure along with long-range interactions involving the SL4 region in the twister sister ribozyme must have helped to stabilize the catalytic region for the improved catalytic activity. Similarly, previous studies have demonstrated that peripheral regions of hammerhead (49), hairpin (50) and HDV (51, 52) ribozymes could greatly increase their self-cleavage activity. Given the importance of the peripheral regions, absence of this tertiary interaction in the TS-like ribozyme may not be able to fully stabilize the structural form generated from homology modelling.

      (5) The argument that these are the smallest self-cleaving ribozymes is debatable. LÃ1/4nse et al (NAR 2017) found some very small hammerhead ribozymes that are smaller than those presented here, but the authors suggest only working as dimers. The human ribozymes described here should be analyzed for dimerization as well (e.g., by native gel analysis) particularly because the authors suggest that there are no peripheral domains that stabilize the fold. Furthermore, Riccitelli et al. (Biochemistry) minimized the HDV-like ribozymes and found some in metagenomic sequences that are about the same size as the ones presented here. Both of these papers should be cited and discussed.

      We apologize for any confusion caused by our previous statement. To clarify, we highlighted “35 and 31 nucleotides only” because 46 and 47 nt contain the variable hairpin loops which are not important for the catalytic activity. By comparing the conserved segments, the TS-like ribozyme discussed in this paper is the shortest with the simplest secondary structure. And we have replaced the terms “smallest” and “shortest” with “simplest” in our manuscript. The title has been changed to “Minimal twister sister (TS)-like self-cleaving ribozymes in the human genome revealed by deep mutational scanning”. All the publications mentioned have been cited and discussed. Regarding possible dimerization, we did not find any evidence but would defer it to future detailed structural analysis to be sure.  

      Line 605-608: Previous studies also have revealed some minimized forms of self-cleaving ribozymes, including hammerhead (19, 53) and HDV-like (54) ribozymes. However, when comparing the conserved segments, they (>= 36 nt) are not as short as the TS-like ribozymes (31 nt) found here.

      (6) The authors present homology modeling of the OR4K15 and LINE1 ribozymes based on the crystal structures of the TS ribozymes. This is another point that supports the fact that these are not new ribozyme motifs. Furthermore, the homology model should be carefully discussed as a model and not a structure. In many places in the text and the supplement, the models are presented as real structures. The wording should be changed to carefully state that these are models based on sequence similarity to TS ribozymes. Fig 3 would benefit from showing the corresponding structures of the TS ribozymes.

      We thank the reviewer for pointing these out and we have already fixed them. We have replaced all “lantern ribozyme” with “TS-like ribozyme” as suggested. The term “Modelled structures” were used for representing the homology model. And we have included the TS ribozyme structure in Fig 3.

      Reviewer #2 (Public Review):

      Summary:

      This manuscript applies a mutational scanning analysis to identify the secondary structure of two previously suggested self-cleaving ribozyme candidates in the human genome. Through this analysis, minimal structured and conserved regions with imminent importance for the ribozyme's activity are suggested and further biochemical evidence for cleavage activity are presented. Additionally, the study reveals a close resemblance of these human ribozyme candidates to the known self-cleaving ribozyme class of twister sister RNAs. Despite the high conservation of the catalytic core between these RNAs, it is suggested that the human ribozyme examples constitute a new ribozyme class. Evidence for this however is not conclusive.

      Strengths:

      The deep mutational scanning performed in this study allowed the elucidation of important regions within the proposed LINE-1 and OR4K15 ribozyme sequences. Part of the ribozyme sequences could be assigned a secondary structure supported by covariation and highly conserved nucleotides were uncovered. This enabled the identification of LINE-1 and OR4K15 core regions that are in essence identical to previously described twister sister self-cleaving RNAs.

      Weaknesses:

      I am skeptical of the claim that the described catalytic RNAs are indeed a new ribozyme class. The studied LINE-1 and OR4K15 ribozymes share striking features with the known twister sister ribozyme class (e.g. Figure 3A) and where there are differences they could be explained by having tested only a partial sequence of the full RNA motif. It appears plausible, that not the entire "functional region" was captured and experimentally assessed by the authors.

      We thank this Reviewer for his/her input and acknowledgment of this work. Because a similar question was raised by reviewer 1, we decided to name the ribozymes as TS-like ribozymes. Regarding the entire regions, we conducted mutational scanning experiments at the beginning of this study. The relative activity distributions (Figure 1B, 1C) have shown that only parts of the sequence contributes to the self-cleavage activity. That is the reason why we decided to focus on the parts of the sequence afterwards.

      They identify three twister sister ribozymes by pattern-based similarity searches using RNA-Bob. Also comparing the consensus sequence of the relevant region in twister sister and the two ribozymes in this paper underlines the striking similarity between these RNAs. Given that the authors only assessed partial sequences of LINE-1 and OR4K15, I find it highly plausible that further accessory sequences have been missed that would clearly reveal that "lantern ribozymes" actually belong to the twister sister ribozyme class. This is also the reason I do not find the modeled structural data and biochemical data results convincing, as the differences observed could always be due to some accessory sequences and parts of the ribozyme structure that are missing.

      We appreciate the reviewer for raising this question. As we explained in the last question, we now called the ribozymes as TS-like ribozymes. We also emphasize that the relative activity data of the original sequences have indicated that the other part did not make any contribution to the activity of the ribozyme. The original sequences provided in the Science paper (Salehi-Ashtiani et al. Science 2006) were generated from biochemical selection of the genomic library. It did not investigate the contribution of each position to the self-cleavage activity.

      Highly conserved nucleotides in the catalytic core, the need for direct contacts to divalent metal ions for catalysis, the preference of Mn2+ oder Mg2+ for cleavage, the plateau in observed rate constants at ~100mM Mg2+, are all characteristics that are identical between the proposed lantern ribozymes and the known twister sister class.

      The difference in cleavage speed between twister sister (~5 min-1) and proposed lantern ribozymes could be due to experimental set-up (true single-turnover kinetics?) or could be explained by testing LINE-1 or OR4K15 ribozymes without needed accessory sequences. In the case of the minimal hammerhead ribozyme, it has been previously observed that missing important tertiary contacts can lead to drastically reduced cleavage speeds.

      We thank the reviewer for this question. We now called the ribozymes as TS-like ribozymes. As we explained in the last question, the relative activity data of the original sequences have proven that the other part did not make any contribution to the activity of the ribozyme. Moreover, we have tested different enzyme to substrate ratios to achieve single turn-over kinetics (Figure S13). The difference in cleavage speed should be related to the absence of peripheral regions which do not exist in the original sequences of the LINE-1 and OR4K15 ribozyme. We have included the publications and discussion about the peripheral domains in other ribozymes.

      Line 458-463: The kobs of LINE-1-core was ~0.05 min-1 when measured in 10mM MgCl2 and 100mM KCl at pH 7.5 (Figure S13). Furthermore, the single-stranded ribozymes exhibited lower kobs (~0.03 min-1 for LINE-1-rbz) (Figure S14) when comparing with the bimolecular constructs. This confirms that the stem loop region SL2 does not contribute much to the cleavage activity of the TS-like ribozymes.

      Line 570-576: Thus, a more sophisticated structure along with long-range interactions involving the SL4 region in the twister sister ribozyme must have helped to stabilize the catalytic region for the improved catalytic activity. Similarly, previous studies have demonstrated that peripheral regions of hammerhead (49), hairpin (50) and HDV (51, 52) ribozymes could greatly increase their self-cleavage activity. Given the importance of the peripheral regions, absence of this tertiary interaction in the TS-like ribozyme may not be able to fully stabilize the structural form generated from homology modelling.

      Reviewer 2: ( Recommendations For The Authors):

      Major points

      It would have made it easier to connect the comments to text passages if the submitted manuscript had page numbers or even line numbers.

      We thank the reviewer for pointing this out and we have already fixed it.

      In the introduction: "...using the same technique, we located the functional and base-pairing regions of..." The use of the adjective functional is imprecise. Base-paired regions are also important for the function, so what type of region is meant here? Conserved nucleotides?

      We thank the reviewer for pointing this out. We were describing the regions which were essential for the ribozyme activity. And we have defined the use of “functional region” in introduction.

      Line 95: we located the regions essential for the catalytic activities (the functional regions) of LINE-1 and OR4K15 ribozymes in their original sequences.

      In their discussion, the authors mention the possible flaws in their 3D-modelling in the absence of Mg2+. Is it possible to include this divalent metal ion in the calculations?

      We thank the reviewer for this question. Currently, BriQ (Xiong et al. Nature Communications 2021) we used for modeling doesn’t include divalent metal ion in modeling.

      Xiong, Peng, Ruibo Wu, Jian Zhan, and Yaoqi Zhou. 2021. “Pairing a High-Resolution Statistical Potential with a Nucleobase-Centric Sampling Algorithm for Improving RNA Model Refinement.” Nature Communications 12: 2777. doi:10.1038/s41467-021-23100-4.

      Abstract:

      It is claimed that ribozyme regions of 46 and 47 nt described in the manuscript resemble the shortest known self-cleaving ribozymes. This is not correct. In 1988, hammerhead ribozymes in newts were first discovered that are only 40 nt long.

      We apologize for any confusion caused by our previous statement. To clarify, we highlighted “35 and 31 nucleotides only” as 46 and 47 nt contain the variable hairpin loops which are not important for the catalytic activity. By comparing the conserved segments, the TS-like ribozyme discussed in this paper is the shortest with the simplest secondary structure. And we have replaced the terms “smallest” and “shortest” with “simplest” in our manuscript. The title has been changed to “Minimal TS-like self-cleaving ribozyme revealed by deep mutational scanning”.

      The term "functional region" is, to my knowledge, not a set term when discussing ribozymes. Does it refer to the catalytic core, the cleavage site, the acid and base involved in cleavage, or all, or something else? Therefore, the term should be 1) defined upon its first use in the manuscript and 2) probably not be used in the abstract to avoid confusion to the reader.

      We apologize for any confusion caused by our previous statement. To clarify, we have changed the term “functional region” in abstract. And we have defined the use of “functional region” in introduction.

      Line 34-37: We found that the regions essential for ribozyme activities are made of two short segments, with a total of 35 and 31 nucleotides only. The discovery makes them the simplest known self-cleaving ribozymes. Moreover, the essential regions are circular permutated with two nearly identical catalytic internal loops, supported by two stems of different lengths.

      Line 95: we located the regions essential for the catalytic activities (the functional regions) of LINE-1 and OR4K15 ribozymes in their original sequences.

      The choice of the term "non-functional loop" in the abstract is a bit unfortunate. The loop might not be important for promoting ribozyme catalysis by directly providing, e.g. the acid or base, but it has important structural functions in the natural RNA as part of a hairpin structure.

      We thank the reviewer for pointing this out and we have re-phrased the sentences.

      Line 33-34: We found that the regions essential for ribozyme activities are made of two short segments, with a total of 35 and 31 nucleotides only.

      Line 283: Removing the peripheral loop regions (Figures 1B and 1C) allows us to recognize that the secondary structure of OR4K15-rbz is a circular permutated version of LINE-1-rbz.

      Results:

      Please briefly explain CODA and MC analysis when first mentioned in the results (Figure (1) The more detailed explanation of these terms for Figure 2 could be moved to this part of the results section (including explanations in the figure legend).

      We thank the reviewer for pointing this out and we included a brief explanation.

      Line 150-154: CODA employed Support Vector Regression (SVR) to establish an independent-mutation model and a naive Bayes classifier to separate bases paired from unpaired (26). Moreover, incorporating Monte-Carlo simulated annealing with an energy model and a CODA scoring term (CODA+MC) could further improve the coverage of the regions under-sampled by deep mutations.

      Please indicate the source of the human genomic DNA. Is it a patient sample, what type of tissue, or is it an immortalized cell line? It is not stated in the methods I believe.

      We thank the reviewer for pointing this out. According to the original Science paper (Salehi-Ashtiani et al. Science 2006), the human genomic DNA (isolated from whole blood) was purchased from Clontech (Cat. 6550-1). In our study, we directly employed the sequences provided in Figure S2 of the Science paper for gene synthesis. Thus, we think it is unnecessary to mention the source of genomic DNA in the methods section of our paper.  

      Please also refer to the methods section when the calculation of RA and RA' values is explained in the main text to avoid confusion.

      We thank the reviewer for pointing this out and we have fixed it.

      Line 207-208: Figure 2A shows the distribution of relative activity (RA’, measured in the second round of mutational scanning) (See Methods) of all single mutations

      For OR4K15 it is stated that the deep mutational scanning only revealed two short regions as important. However, there is another region between approx. 124-131 nt and possibly even at positions 47 and 52 (to ~55), that could contribute to effective RNA cleavage, especially given the library design flaws (see below) and the lower mutational coverage for OR4K15. A possible correlation of the mutations in these regions is even visible in the CODA+MC analysis shown in Figure 1D on the left. Why are these regions ignored in ongoing experiments?

      We thank the reviewer for this question. As shown in Table S1, although the double mutation coverage of OR4K15-ori was low (16.2 %), we got 97.6 % coverage of single mutations. The relative activity of these single mutations was enough to identify the conserved regions in this ribozyme. Mutations at the positions mentioned by the reviewer did not lead to large reductions in relative activity. Since the relative activity of the original sequence is 1, we presumed that only positions with average relative activity much lower than 1 might contribute to effective cleavage.

      Regarding the corresponding correlation of mutations in CODA+MC, they are considered as false positives generated from Monte Carlo simulated annealing (MC), because lack of support from the relative activity results.

      Have the authors performed experiments with their "functional regions" in comparison to the full-length RNA or partial truncations of the full-length RNA that included, in the case of OR4K15, nt 47-131? Also for LINE-1 another stem region was mentioned (positions 14-18 with 30-34) and two additional base pairs. Were they included in experiments not shown as part of this manuscript?

      We appreciate the reviewer for raising this question. We only compared the full-length or partial truncations of the LINE-1 ribozyme. Since the secondary structure predicted from OR4K15-ori data was almost the same as LINE-1, we didn’t perform deep mutagenesis on the partial truncation of the OR4K15. However, the secondary structure of OR4K15 was confirmed by further biochemical experiments.   

      Regarding the second question, the additional base pairs were generated by Monte Carlo simulated annealing (MC). They are considered as false positives because of low probabilities and lack of support from the deep mutational scanning results. The appearance of false positives is likely due to the imperfection of the experiment-based energy function employed in current MC simulated annealing. 

      Are there other examples in the literature, where error-prone PCR generates biases towards A/T nucleotides as observed here? Please cite!

      We thank the reviewer for pointing this out and we have included the corresponding citation.

      Line 161-162: The low mutation coverage for OR4K15-ori was due to the mutational bias (27, 28) of error-prone PCR (Supplementary Figures S1, S2, S3 and S4).

      Line 170-171: whose covariations are difficult to capture by error-prone PCR because of mutational biases (27, 28).

      The authors mention that their CODA analysis was based on the relative activities of 45,925 and 72,875 mutation variants. I cannot find these numbers in the supplementary tables. They are far fewer than the read numbers mentioned in Supplementary Table 2. How do these numbers (45,925 and 72,875) arise? Could the authors please briefly explain their selection process?

      We apologize for any confusion caused by our previous statement. Our CODA analysis only utilized variants with no more than 3 mutations. The number listed in the supplementary tables is the total number of the variants. To clarify, we have included a brief explanation for these numbers.

      Line 203-204: We performed the CODA analysis (26) based on the relative activities of 45,925 and 72,875 mutation variants (no more than 3 mutations) obtained for the original sequence and functional region of the LINE-1 ribozyme, respectively.

      What are the reasons the authors assume their findings from LINE-1 can be used to directly infer the structure for OR4K15? (Third section in results, last paragraph)

      We apologize for any confusion caused by our previous statement. We meant to say that the consistency between LINE-1-rbz and LINE-1-ori results suggested that our method for inferring ribozyme structure was reliable. Thus, we employed the same method to infer the structure of the functional region of OR4K15. To clarify, we have re-phrased the sentence.   

      Line 259-261: The consistent result between LINE-1-rbz and LINE-1-ori suggested that reliable ribozyme structures could be inferred by deep mutational scanning. This allowed us to use OR4K15-ori to directly infer the final inferred secondary structure for the functional region of OR4K15.

      There are several occasions where the authors use the differences between the proposed lantern ribozymes and twister sister data as reasons to declare LINE-1 and OR4K15 a new ribozyme class. As mentioned previously, I am not convinced these differences in structure and biochemical results could not simply result from testing incomplete LINE-1 and OR4K15 sequences.

      We apologize for any confusion caused by our previous statement. Despite we observed some differences in mutational effects, we agree with the reviewer that it is not convincing to claim them as a new ribozyme class. We have replaced all “lantern ribozyme” with “TS-like ribozyme” as the reviewer 1 suggested.

      The authors state, that "the result confirmed that the stem loop SL2 region in LINE-1 and OR4K15 did not participate in the catalytic activity". To draw such a conclusion a kinetic comparison between a construct that contains SL2 and does not contain SL2 would be necessary. The given data does not suffice to come to this conclusion.

      We appreciate the reviewer for raising this question. To address this, we performed gel-based kinetic analysis of these two ribozymes (Figure S14).

      Line 458-462: The kobs of LINE-1-core under single-turnover condition was ~0.05 min-1 when measured in 10mM MgCl2 and 100mM KCl at pH 7.5 (Figure S13). Only a slightly lower value of  kobs (~0.03 min-1) was observed for LINE-1-rbz (Figure S14). This confirms that the stem loop region SL2 does not contribute to the cleavage activity of the TS-like ribozymes.

      Construct/Library design:

      The last 31 bp in the OR4K15 ribozyme template sequence are duplicated (Supplementary Table 4). Therefore, there are 2 M13 fwd binding sites and several possible primer annealing sites present in this template. This could explain the lower yield for the mutational analysis experiments. Did the authors observe double bands in their PCR and subsequent analysis? The experiments should probably be repeated with a template that does not contain this duplication. Alternatively, the authors should explain, why this template design was chosen for OR4K15.

      We apologize for this mistake during writing. Our construct design for OR4K15 contains only one M13F binding site. We thank the reviewer for pointing this out and we have fixed the error.

      Figure 5B: Where are the bands for the OR4K15 dC-substrate? They are not visible on the gel, so one has to assume there was no substrate added, although the legend indicates otherwise.

      Also this figure, please indicate here or in the methods section what kind of marker was used. In panels A and B, please label the marker lanes.

      We apologize for this mistake and we have repeated the experiment. The marker lane was removed to avoid confusion caused by the inappropriate DNA marker. 

      The authors investigated ribozyme cleavage speeds by measuring the observed rate constants under single-turnover conditions. To achieve single-turnover conditions enzyme has to be used in excess over substrate. Usually, the ratios reported in the literature range between 20:1 (from the authors citation list e.g.: for twister sister (Roth et al 2014) and hatchet (Li et al. 2015)) or even ~100:1 (for pistol: Harris et al 2015, or others https://www.sciencedirect.com/science/article/pii/S0014579305002061). Can the authors please share their experimental evidence that only 5:1 excess of enzyme over the substrate as used in their experiments truly creates single-turnover conditions?

      We greatly appreciate the Reviewer for raising this question. To address this, we performed kinetic analysis using different enzyme to substrate ratios (Figure S13). There is not too much difference in kobs, except that kobs reach the highest value of 0.048 min-1 when using 100:1 excess of enzyme over the substrate. 

      Line 458-460: The kobs of LINE-1-core under single-turnover condition was ~0.05 min-1 when measured in 10mM MgCl2 and 100mM KCl at pH 7.5 (Figure S13).

      Citations:

      In the introduction citation number 12 (Roth et al 2014) is mentioned with the CPEB3 ribozyme introduction. This is the wrong citation. Please also insert citations for OR4K15 and IGF1R and LINE-1 ribozyme in this sentence.

      We thank the reviewer for pointing this out and we now have fixed it.

      Also in the introduction, a hammerhead ribozyme in the 3' UTR of Clec2 genes is mentioned and reference 16 (Cervera et al 2014) is given, I think it should be reference 9 (Martick et al 2008)

      We thank the reviewer for pointing this out and we now have fixed it.

      In the results section it is stated that, "original sequences were generated from a randomly fragmented human genomic DNA selection based biochemical experiment" citing reference 12. This is the wrong reference, as I could not find that Roth et al 2014 describe the use of such a technique. The same sentence occurs in the introduction almost verbatim (see also minor points).

      We thank the reviewer for pointing this out and we now have fixed it.

      Minor points

      Headline:

      Either use caps for all nouns in the headline or write "self-cleaving ribozyme" uncapitalized

      We thank the reviewer for pointing this out and we now have fixed it.

      Abstract:

      1st sentence: in "the" human genome

      "Moreover, the above functional regions are..." - the word "above" could be deleted here

      "named as lantern for their shape"- it should be "its shape"

      "in term of sequence and secondary structure"- "in terms"

      "the nucleotides at the cleavage sites" - use singular, each ribozyme of this class has only one cleavage site

      We thank the reviewer for pointing these out and we now have fixed them.

      Introduction:

      Change to "to have dominated early life forms"

      Change to "found in the human genome"

      Please write species names in italics (D. melanogaster, B. mori)

      Please delete "hosting" from "...are in noncoding regions of the hosting genome"

      Please delete the sentence fragment/or turn it into a meaningful sentence: "Selection-based biochemical experiments (12).

      Change to "in terms of sequence and secondary structure, suggesting a more"

      Please reword the last sentence in the introduction to make clear what is referred to by "its", e.g. probably the homology model of lantern ribozyme generated from twister sister ribozymes?

      Please refer to the appropriate methods section when explaining the calculation of RA and RA'.

      We thank the reviewer for pointing these out and we now have fixed them.

      The last sentence of the second paragraph in the second section of the results states that the authors confirmed functional regions for LINE-1 and OR4K15, however, until that point the section only presents data on LINE-1. Therefore, OR4K15 should not be mentioned at the end of this paragraph.

      In response to the reviewer's suggestions, we have removed OR4K15 from this paragraph.

      Line 225-228: The consistency between base pairs inferred from deep mutational scanning of the original sequences and that of the identified functional regions confirmed the correct identification of functional regions for LINE-1 ribozyme.

      Change to "Both ribozymes have two stems (P1, P2), to internal loops ..."

      We thank the reviewer for pointing this out and we now have fixed it.

      The section naming the "functional regions" of LINE-1 and OR4K15 lantern ribozymes should be moved after the section in which the circular permutation is shown and explained. Therefore, the headline of section three should read "Consensus sequence of LINE-1 and OR4K15 ribozymes" or something along these lines.

      We thank the reviewer for pointing this out and we now have fixed it.

      Line 308-309: Given the identical lantern-shaped regions of the LINE-1-rbz and OR4K15-rbz ribozyme, we named them twister sister-like (TS-like) ribozymes.

      The statement on the difference between C8 in OR4K15 and U38 in LINE-1 should be further classified. As U38 is only 95% conserved. Is it a C in those other instances or do all other nucleotide possibilities occur? Is the high conservation in OR4K15 an "artifact" of the low mutation rate for this RNA in the deep mutational scanning?

      We thank the reviewer for this question. Yes, the high conservation in OR4K15 an "artifact" of the low mutation rate for this RNA in the deep mutational scanning. That is why RA’ value is more appropriate to describe the conservation level of each position. We also mentioned this in the manuscript:

      Line 287-288: The only mismatch U38C in L1 has the RA’ of 0.6, suggesting that the mismatch is not disruptive to the functional structure of the ribozyme.

      Section five, first paragraph: instead of "two-stranded LINE-1 core" use the term "bimolecular", as it is more commonly used.

      We thank the reviewer for pointing this out and we now have changed it.

      Figure caption 3 headline states "Homology modelled 3D structure..."but it also shows the secondary structures of LINE1, OR4K15 and twister sister examples.

      We thank the reviewer for pointing this out and we now have removed “3D”.

      In Figure 3C, we see a nucleobase labeled G37, however in the secondary structure and sequence and 3D structural model there is a C37 at this position. Please correct the labeling.

      We thank the reviewer for pointing this out and we now have fixed it.

      Section 7 "To address the above question..." please just repeat the question you want to address to avoid any confusion to the reader.

      We thank the reviewer for pointing these out and we have re-phrased this sentence.

      Line 364: Considering the high similarity of the internal loops, we further investigated the mutational effects on the internal loop L1s.

      Please rephrase the sentence "By comparison, mutations of C62 (...) at the cleavage site did not make a major change on the cleavage activity...", e.g. "did not lead to a major change" etc.

      Section 8, first paragraph: This result further confirms that the RNA cleavage in lantern...", please delete "further"

      Change to "analogous RNAs that lacked the 2' oxygen atom in the -1 nucleotide"

      Methods

      Change to "We counted the number of reads of the cleaved and uncleaved..."

      Change to "...to produce enough DNA template for in vitro transcription."

      Change to "The DNA template used for transcription was used..." (delete while)

      We thank the reviewer for pointing these out and we now have fixed them.

      Supplement

      All supplementary figures could use more detailed Figure legends. They should be self-explanatory.

      Fig S1/S2: how is "mutation rate" defined/calculated?

      We thank the reviewer for pointing this out and we now have added a short explanation. The mutation rate was calculated as the proportion of mutations observed at each position for the DNA-seq library.

      Fig S3/S4: axis label "fraction", fraction of what? How calculated?

      We thank the reviewer for pointing this out and we now have added a short explanation. The Y axis “fraction” represents the ratio of each mutation type observed in all variants.

      Fig S5: RA and RA' are mentioned in the main text and methods, but should be briefly explained again here, or it should be clearly referred to the methods. Also, the axis label could be read as average RA' divided by average RA. I assume that is not the case. I assume I am looking at RA' values for LINE-1 rbz and RA values for LINE-1-ori? Also, mention that only part of the full LINE-1-ori sequence is shown...

      We thank the reviewer for pointing this out and we have now added a short explanation. The Y axis represents RA’ for LINE-1-rbz, or RA for LINE-1-ori. The part shown is the overlap region between LINE-1-rbz and LINE-1-ori. We apologize for any confusion caused by our previous statement.

      Fig S9 the magenta for coloring of the scissile phosphate is hard to see and immediately make out.

      We thank the reviewer for pointing this out and we now have added a label to the scissile phosphate.

      Fig S10: Why do the authors only show one product band here? Instead of both cleavage fragments as in Figure 5?

      We thank the reviewer for this question. We purposely used two fluorophores (5’ 6-FAM, 3’ TAMRA) to show the two product bands in Figure 5. In Fig S10, long-time incubation was used to distinguish catalysis based self-cleavage from RNA degradation. This figure was prepared before the purchasing of the substrate used in Figure 5. The substrate strand used in Fig S10 only have one fluorophore (5’ 6-FAM) modification. And the other product was too short to be visualized by SYBR Gold staining.

      Fig S13: please indicate meaning of colors in the legend (what is pink, blue, grey etc.)

      Please change to "RtcB ligase was used to capture the 3' fragment after cleavage...."

      We thank the reviewer for pointing this out and we now have fixed it.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Materials and Methods section:

      Cell gating and FACS sorting strategies need to be explained. There is no figure legend of supplementary figure 4 which is supposed to explain the gating strategy. Please detail the strategy for each cell types.

      Thank you for your suggestion. We have given a detailed description about the gating and FACS sorting strategies for different liver cell types in supplementary figure 1. In addition, flow cytometry plots of CD45+Ly6C-CD64+F4/80+ KCs from Bmp9fl/flBmp10fl/flLrat Cre mouse were also presented in supplementary figure 1.

      The genetic background of the different mouse strains and the age of the mice should be noted on each figure.

      All the mice used in our study are C57BL/6 background (method section). The age of the mice has been described on each figure.

      The Mann Whitney test instead of the two-tailed student's t-test should be used for the different statistical analyses. Why are the expression counts statically analyzed by 2-tailed Student's t test as they were already identified as DE in RNAseq statistical analysis?

      Thank you for your suggestion. Statical methods have been corrected in the revised manuscript.

      What is the age of the mice and how many are used for each bulk RNAseq?

      This information has been added on the corresponding figure legends.

      Figure 1:

      Figure 1a and c: The qPCR data would be much more interesting if presented as DDct and not as relative value as we do not see the mRNA levels of BMP9 and BMP10 in each Bmp9fl/flBmp10fl/flCre mouse. This would allow to compare the mRNA level of BMP9 versus BMP10. This should be changed in all figures.

      The presentation of qPCR data in Figure 1a have been changed, which is allowed to compare the abundance of BMP9 versus BMP10 mRNA. Figure 1c only shows the expression of BMP10, so it is unnecessary to present qPCR data as DDct. In our bulk RNA sequencing data of liver tissues, we found that BMP9 expression counts is higher than that of BMP10, in line with the data from BioGPS.

      Figure 1e (IF) and f (FACS), the quantification of these data should be added as shown in Fig2d. What is the difference between Fig1e and Fig2d as they both seem to show the quantification of F4/80 in CTL versus Bmp9fl/flBmp10fl/flLratCre mice. Are the cells sorted in Fig1f and 1e and suppl Fig1b? if yes please precise the strategy. If they are not gated how can the authors obtain 93% of KC? The reference Tillet et al., JBC 2018 should be added in the discussion of figure 1 as it is the first description of BMP10 in HSC.

      The quantitative data of Figure 1e and 1f have been added in our revised manuscript. Compared with other tissue-resident macrophages, CLEC4F as a KC-specific marker exclusively expressed on KCs. In our previous report (PMID: 34874921), we demonstrated that BMP9/10-ALK1 signal induced the expression of CLEC4F. The data shown in Figure 1e repeated this phenotype that upon loss of BMP9/10-ALK1 signal, liver macrophages did not express CLEC4F. F4/80 in Figure 1e was used as an internal positive control. Fig2d showed the quantification of F4/80 and CD64, two pan-macrophage markers, which was more accurate to measure the number of liver macrophages, especially given that F4/80 mean fluorescence intensity was reduced in liver macrophages of Bmp9fl/flBmp10fl/flLrat Cre mice. Cells in Fig1f, 1e and suppl Fig1b were not sorted and the flow cytometry plots of these cells were pre-gated on live CD45+Ly6C-CD64+F4/80+ liver macrophages. The reference Tillet et al., JBC 2018 has been added in our revised manuscript.

      Supplementary 4 should have a detailed figure legend and should appear before gating experiments. What cell subtype is used for each cell type gating. Please add the exact references of all the antibodies used and if they are fluorescently labeled antibodies. Why is the number of lymphocytes noted and how is it calculated? The gating strategy for the Bmp9fl/flBmp10fl/flLratCre mice should also be showed as the number of FA4/80+ and Tim4+ cells are decreased.

      A detailed figure legend has been added in original supplementary figure 4 that has been moved to supplementary figure 1 in our revised manuscript. The antibodies used in our study were also used in our previous report (PMID: 34874921) and others (PMID: 31561945; PMID: 26813785). Lymphocytes number on flow cytometry plots will automatically appear when we analyze flow cytometry data, so it does not mean that these selected cells are lymphocytes. To avoid the misunderstanding, these words have been deleted. The gating strategy of CD45+Ly6C-CD64+F4/80+ liver macrophages for the Bmp9fl/flBmp10fl/flLrat Cre mice was showed in our revised manuscript (Supplementary Figure 1).

      Figure 2:

      Figure 2a: How many mice were used for bulk RNAseq at what age? Please describe the gating strategy for sorting liver macrophages. The PCA should be shown. The genes represented in Fig2c and cited in the text should be shown on the volcano plot and the heatmap (Timd4, Cdh5, Cd5l). A reference for these KC and monocytic markers should be added in the text.

      Control and Bmp9fl/flBmp10fl/flLrat Cre mice at the age of 8-10 weeks (n=3/group) were used for bulk RNAseq. This information has been added in Figure 2a legend. The PCA, Timd4 gene and references for these KC and monocytic markers have been shown in our revised manuscript according to your suggestion.

      Figure 2b: How are selected the genes represented in the heatmap? The top ones? If it is a KC signature the authors should give a reference for this signature.

      These genes were KC signature genes. The reference (PMID: 30076102) has been given in our revised manuscript.

      Fig2e: Please explain what is the Vav1 promoter and in which cells it will delete Alk1and Smad4? The authors also need to show that Alk1 and Smad4 are indeed deleted in these mice and in which cell subtype (EC and KC?). This is an important point as the authors conclude that other molecular mechanisms than Smad4 signaling may affect the phenotypes of liver macrophages in Bmp9fl/flBmp10fl/flLratCre.

      Cre recombinase of Vav1Cre mice is expressed at high levels in hematopoietic stem cells (PMID: 27185381). This strain is widely used to target all hematopoietic cells with a high efficiency (PMID: 24857755). In our previous report (PMID: 34874921), we demonstrated that Alk1 (Supplemental Figure 6A) and Smad4 (Supplemental Figure 6G) were efficiently deleted in KCs from Alk1fl/flVav1Cre and Smad4fl/flVav1Cre mice, respectively. This sentence and reference have been added in our revised manuscript. Homozygous loss of ALK-1 causes embryonically lethality due to aberrant angiogenesis (PMID: 28213819). EC-specific ALK1 knockout in the mouse through deletion of the ALK1 gene from an Acvrl12loxP allele with the EC-specific L1-Cre line results in postnatal lethality at P5, and mice exhibiting hemorrhaging in the brain, lung, and gastrointestinal tract (PMID: 19805914). In contrast, Alk1fl/flVav1Cre mice generated in our lab did not observe this phenomenon or body weight loss, and still survived at the age of 16 weeks. Thus, we don’t think that ECs can be targeted by Vav1Cre strain, at least in our experimental system.

      Supl Figure 3 (revised Supl Figure 4): The authors need to explain what cell types are affected by Csf1r-Cre and Clec4fDTR. Have the authors tried to perform a similar experiment in Bmp9fl/flBmp10fl/flLratCre? The legend of the Y axis is not clear, why is CD45+ used in the first bar graph while the other two graphs use F4/80+?

      We (PMID: 34874921) and others (PMID: 31587991; PMID: 31561945; PMID: 26813785) have demonstrated that Clec4f specifically expressed on KCs and thus only KCs can be deleted in Clec4fDTR mice after DT injection. CSF1R, also known as macrophage colony-stimulating factor receptor (M-CSFR), is the receptor for the major monocyte/macrophage lineage differentiation factor CSF1. Thus, Csf1r-Cre strain can target monocyte, monocyte-derived macrophage and tissue-resident macrophage including liver, spleen, intestine, heart, kidney, and muscle with a high efficiency (PMID: 29761406). We did not perform a similar experiment in Bmp9fl/flBmp10fl/flLrat Cre mice as we have demonstrated that the differentiation of liver macrophages from Bmp9fl/flBmp10fl/flLrat Cre mice is inhibited. The other two graphs in Supl Figure 4C were obtained from Supl Figure 4B. Flow cytometry plots in Supl Figure 4B are pre-gated on CD45+Ly6C-CD64+F4/80+ liver macrophages, so it is appropriate to use F4/80+ as an internal control.

      Figure 3: Same remarks as in Figure 2. How many mice were used for bulk RNAseq, at what age? The PCA should be shown. How were selected the genes represented in the heatmap? The top ones? A reference should be given for the sinusoidal EC and the continuous EC signatures and large artery signature. Maf and Gata4 should be shown on the volcano plot. A quantification for CD34 IF (Fig3e) as well as for the quantification of the FACS data (Fig 3f) should be added.

      Control and Bmp9fl/flBmp10fl/flLrat Cre mice at the age of 8-10 weeks (n=3/group) were used for bulk RNAseq. According to your suggestion, other revisions have been made.

      Figure 4: A quantification and statistical analysis of Prussian staining area and GS IF should be added not just number of mice which were affected.

      A quantification and statistical analysis of Prussian staining area and GS IF has been added.

      Minor points:

      Few spelling mistakes that should be checked.

      Figure 5a, some bar graphs are missing.

      Spelling mistakes and missing bar graphs in Figure 5a have been corrected.

      Reviewer #2 (Recommendations For The Authors):

      The authors should provide some additional information:

      - Did the single HSC-KO mice for either BMP9 or BMP10 already show partial phenotypes?

      We think that under steady state, the phenotype of KCs and ECs, described in our manuscript, in the livers of single HSC-KO mice for either BMP9 or BMP10 was not altered. However, we don’t know whether the role of BMP9 and BMP10 is still redundant in liver diseases or inflammation, which is worth further studying.

      - The authors should also stain Endomucin, Lyve1, CD32b on liver tissue to assess endothelial zonation/differentiation in addition to FACS analysis.

      In our revised manuscript, we performed immunostaining for Endomucin and Lyve1 and found increased expression of Endomucin and decreased expression of Lyve1 (Figure 3g), suggesting that endothelial zonation/differentiation was disrupt in the liver of Bmp9fl/flBmp10fl/flLrat Cre mice compared to their littermates. We did not stain CD32b expression in the liver section as there is no good antibody against mouse CD32b for frozen sections.

      - Did the authors assess BMP9/BMP10 effects individually and combined in vitro on KC and EC? Are these likely only direct effects or may they also involve each other (i.e. also cross talk between KC and EC in response to BMP9/10?). This could be assessed in co-culture models.

      Using ALK1 reporter mice, we demonstrated that KCs and liver ECs express ALK1.We and others have shown that in vitro stimulation with BMP9/BMP10 can induce the expression of ID1/ID3 and GATA4/Maf in KCs and ECs (PMID: 34874921; PMID: 35364013; PMID: 30964206), respectively. These results suggested that BMP9/BMP10 can directly function on KCs and ECs. Indeed, we are also interested in the crosstalk between KCs and ECs. However, in vitro coculture system can not mimic the interaction between KCs and ECs in the liver as these cells will lose their identity upon their isolation from liver environment. Nevertheless, Bonnardel et al. applied Nichenet bioinformatic analysis to predict that liver ECs provide anchoring site, Notch and CSF1 signal for KCs (PMID: 31561945). Of course, this prediction still needs experimental validation.

      - The abstract should be rephrased and more specific focus on BMP related intercellular crosstalk in the liver and its implications for liver health and disease. At the end of the abstract they should also emphasize for which specific fields/topics/diseases these findings are important.

      Thank you for your suggestion. The abstract has been rephrased and we hope this abstract could satisfy you.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this important study, Huffer et al posit that non-cold sensing members of the TRPM subfamily of ion channels (e.g., TRPM2, TRPM4, TRPM5) contain a binding pocket for icilin which overlaps with the one found in the cold-activated TRPM8 channel.

      The authors identify the residues involved in icilin binding by analyzing the existing TRPM8-icilin complex structures and then use their previously published approach of structure-based sequence comparison to compare the icilin binding residues in TRPM8 to other TRPM channels. This approach uncovered that the residues are conserved in a number of TRPM members: TRPM2, TRPM4, and TRPM5. The authors focus on TRPM4, with the rationale that it has the simplest activation properties (a single Ca2+-binding site). Electrophysiological studies show that icilin by itself does not activate TRPM4, but it strongly potentiates the Ca2+ activation of TRPM4, and introducing the A867G mutation (the mutation that renders avian TRPM8 sensitive to icilin) further increases the potentiating effects of the compound. Conversely, the mutation of a residue that likely directly interacts with icilin in the binding pocket, R901H, results in channels whose Ca2+ sensitivity is not potentiated by icilin.

      The data indicate that, just like in TRPV channels, the binding pockets and allosteric networks might be conserved in the TRPM subfamily.

      The data are convincing, and the authors employ good experimental controls.

      We appreciate the supportive feedback of this reviewer.

      Reviewer #2 (Public Review):

      Summary:

      The authors set out to study whether the cooling agent binding site in TRPM8, which is located between the S1-S4 and the TRP domain, is conserved within the TRPM family of ion channels. They specifically chose the TRPM4 channel as the model system, which is directly activated by intracellular Ca2+. Using electrophysiology, the authors characterized and compared the Ca2+ sensitivity and the voltage dependence of TRPM4 channels in the absence and presence of synthetic cooling agonist icilin. They also analyzed the mutational effects of residues (A867G and R901H; equivalent mutations in TRPM8 were shown involved in icilin sensitivity) on Ca2+ sensitivity and voltage-dependence of TRPM4 in the absence and presence of Ca2+. Based on the results as well as structure/sequence alignment, the authors concluded that icilin likely binds to the same pocket in TRPM4 and suggested that this cooling agonist binding pocket is conserved in TRPM channels.

      Strengths:

      The authors gave a very thorough introduction to the TRPM channels. They have nicely characterized the Ca2+ sensitivity and the voltage-dependence of TRPM4 channels and demonstrated icilin potentiates the Ca2+ sensitivity and diminishes the outward rectification of TRPM4. These results indicate icilin modulates TRPM4 activation by Ca2+.

      We appreciate the supportive feedback of this reviewer.

      Weaknesses:

      The reviewer has a few concerns. First, icilin alone (at 25µM) and in the absence of Ca2+ does not activate the TRPM4 channel. Have the authors titrated a wide range of icilin concentrations (without Ca2+ present) for TRPM4 activation? It raises the question that whether icilin is indeed an agonist for TRPM4 channel. This has not been tested so it is unclear. One may argue that icilin needs Ca2+ as a co-factor for channel activation just like in TRPM8 channel. This leads to the second concern, which is a complication in the experimental design and data interpretation. TRPM4 itself requires Ca2+ for activation to begin with, thus it is hard to dissect whether the current observed here for TRPM4 is activated by Ca2+ or by icilin plus its cofactor Ca2+. This is the difference between TRPM8 and TRPM4, as TRPM8 itself is not activated by Ca2+, thus TRPM8 activation is through icilin and Ca2+ acts as a prerequisite for icilin activation.

      We agree that the comparison between TRPM8 and TRPM4 is not perfect because TRPM4 requires Ca2+ for activation, but it is clear that the current activated by Ca2+ in the presence of icilin also involves icilin because it activates at lower Ca2+ concentrations and lower voltages. We have tested icilin at concentrations between 12.5 and 25 µM and at these concentrations icilin does not activate TRPM4 when applied alone, so we have no evidence that it is an agonist. Both of these concentrations are higher than those reported by Chuang et al. to be saturating for TRPM8 in the presence of Ca2+. We haven’t tested icilin at higher concentrations because we wanted to keep the final concentration of DMSO low enough to avoid any effects of the vehicle. We now emphasize this even more clearly in the revised manuscript.

      The results presented in this study are only sufficient to show that icilin modulates the Ca2+-dependent activation of TRPM4 and icilin at best may act as an allosteric modulator for TRPM4 function. One cannot conclude from the current work that icilin is an agonist or even specifically a cooling agonist for TRPM4. Icilin is a cooling agonist for TRPM8, but it does not mean that if icilin modulates TRPM4 activity then it serves as a cooling agonist for TRPM4.

      We agree with these comments, and we believe that the intent of our statements in the manuscript are completely in line with this perspective. We never refer to icilin as a cooling agent for TRPM4 but rather refer to the cooling agent binding pocket in TRPM8 and how that appears to be conserved and functions in TRPM4 to modulate opening of the channel. We have carefully gone through the manuscript to refer directly to icilin by name (rather than as a cooling agent) when referring to its actions on TRPM4 to make sure there is no confusion.

      For the mutation data on A867G, Figure 4A-B, left panels, it looks like A867G has stronger Ca2+ sensitivity compared to the WT in the absence of icilin and the onset of current activation is faster than the WT, or this is simply due to the scale of the data figure are different between A867G and the WT. Overall the mutagenesis data are weak to support the conclusion that icilin binds to the S1-S4 pocket. The authors need to mutate more residues that are involved in direct interaction with icilin based on the available structural information, including but limited to residues equivalent to Y745 and H845 in human TRPM8.

      The A867G mutant does seem to promote opening by Ca2+ in the absence of icilin, and we now comment on this in the manuscript. Having said that, we have not carefully studied the concentration-dependence for activation by Ca2+ because at higher concentrations we see evidence of desensitization. We think Ca2+, icilin and depolarized voltages promote an open state of TRPM4 and the A867G does so as well.

      We respectfully disagree about the strength of mutagenesis results present in our manuscript. We present clear gain and loss of function for two mutants corresponding to influential residues within the cooling agent binding pocket of TRPM8. We agree that Y786 mutations would have been a valuable addition, and our plan was to include mutations of this residue. Unfortunately, both the Y786A and Y786H mutants exhibited rundown to repeated stimulation by Ca2+, making them challenging to obtain reliable results on their effects on modulation by icilin.

      The authors set out to study the conservation of the cooling agonist binding site in TRPM family, but only tested a synthetic cooling agonist icilin on TRPM4. In order to draw a broad conclusion as the title and the discussion have claimed, the authors need to more cooling compounds, including the most well-known natural cooling agonist menthol, and other cooling agonists such as WS-12 and/or C3, and test their effects on several TRPM channels, not just TRPM4. With the current data, the authors need to significantly tone down the claim of a conserved cooling agonist binding pocket in the TRPM family.

      We would have liked to broaden the scope to other ligands that modulate TRPM8 and we agree that including those data would certainly reinforce our conclusions. However, the first author recently moved on to a new faculty position and extending our findings would require enlisting another member of the lab and take away from their independent projects. We also do not agree that this is essential to support any of our conclusions. It is also important to keep in mind that icilin is a high-affinity ligand for TRPM8, such that weaker interactions with TRPM4 can still be readily observed. We think it is likely that lower affinity agonists like menthol might not have sufficient affinity to see activity in TRPM4. This scenario is not unlike our earlier experience with TRPV channels where we succeeded in engineering vanilloid sensitivity into TRPV2 and TRPV3 using the high affinity agonist resiniferatoxin (Zhang et al., 2016, eLife). In the case of TRPV2, another group had made the same quadruple mutant and failed to see activation by capsaicin even though resiniferatoxin also worked in their hands (see Fig. 2 in Yang et al., 2016, PNAS).

      On page 11, the authors suggest based on the current data, that TRPM2 and TRPM5 may also be sensitive to cooling agonists because the key residues are conserved. TRPM2 is the closest homolog to TRPM8 but is menthol-insensitive. There are studies that attempted to convert menthol sensitivity to TRPM2, for example, Bandell 2006 attempted to introduce S2 and TRP domains from TRPM8 into TRPM2 but failed to make TRPM2 a menthol-sensitive channel. The sequence conservation or structural similarity is not sufficient for the authors to suggest a shared cooling agonist sensitivity or even a common binding site in the TRPM2 and TRPM5 channels. Again, as pointed out above, the authors need to establish the actual activation of other TRPM channels by these agonists first, before proceeding to functionally probe whether other TRPM channels adopt a conserved agonist binding site.

      We are somewhat confused by these comments because we do not comment about whether cooling agents can activate TRPM2 or TRPM5. We simply analyzed the structures to make the point that the key residues in the cooling agent binding pocket of TRPM8 are conserved in these other TRPM channels. The Bandell paper is relevant, but it is also possible that they failed to uncover a relationship because they only used an agonist that has relatively low affinity for TRPM8. It would have been interesting to see what they might have found if they had used a high-affinity ligand like icilin instead of a low affinity ligand like menthol.

      Taken together, this current work presents data to show the modulatory effects of icilin on the Ca2+ dependent activation and voltage dependence of the TRPM4 channel.

      We agree.

      Reviewer #3 (Public Review):

      Summary:

      The family of transient receptor potential (TRP) channels are tetrameric cation selective channels that are modulated by a variety of stimuli, most notably temperature. In particular, the Transient receptor potential Melastatin subfamily member 8 (TRPM8) is activated by noxious cold and other cooling agents such as menthol and icilin and participates in cold somatosensation in humans. The abundance of TRP channel structural data that has been published in the past decade demonstrates clear architectural conservation within the ion channel family. This suggests the potential for unifying mechanisms of gating despite their varied modes of regulation, which are not yet understood. To address this question, the authors examine the 264 structures of TRP channels determined to date and observe a potential binding pocket for icilin in multiple members of the Melastatin subfamily, TRPM2, TRPM4, and TRPM5. Interestingly, none of the other Melastatin subfamily members had been shown to be sensitive to icilin apart from TRPM8. Each of these channels is activated by intracellular calcium (Ca2+) and a Ca2+ binding site neighbors the predicted pocket for icilin binding in all cryo-EM structures. The authors examined whether icilin could modulate the activation of TRPM4 in the presence of intracellular Ca2+. The addition of icilin enhances Ca2+-dependent activation of TRPM4, promotes channel opening at negative membrane potentials, and improves the kinetics of opening. Furthermore, mutagenesis of TRPM4 residues within the putative icilin binding pocket predicted to enhance or diminish TRPM4 activity elicit these behaviors. Overall, this study furthers our understanding of the Melastatin subfamily of TRP channel gating and demonstrates that a conserved binding pocket observed between TRPM4 and TRPM8 channel structures can function similarly to regulate channel gating.

      Strengths:

      This is a simple and elegant study capitalizing on a vast amount of high-resolution structural information from the TRP channel of ion channels to identify a conserved binding pocket that was previously unknown in the Melastatin subfamily, which is interrogated by the authors through careful electrophysiology and mutagenesis studies.

      Weaknesses:

      No weaknesses were identified by this reviewer.

      We appreciate the supportive comments of the review.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I don't have any major asks, but a few questions did arise while reading your work.

      (1) You refer multiple times to the VSLD pocket as being "open to the cytoplasm". It is not clear if you are implying that compounds such as icilin access the pocket via the cytoplasm (e.g., permeate the membrane to the cytosol, and then enter the binding site?) Is there data to support this? Some clarification here would be helpful, and perhaps explain if there is any distinction between how calcium might enter the VSLD binding site vs hydrophobic compounds like icilin.

      This is an excellent point. Our reference to “open to the cytoplasm” was for Ca2+ ions and we have no evidence for how icilin enters the cooling agent binding pocket. We had tried to look for evidence that Ca2+ might trap icilin in TRPM4 but at the end of the day the results were not convincing enough to include in the manuscript. We have added data showing that icilin slows deactivation of TRPM4 after removing Ca2+, which is particularly evident in the A867G mutant, but this doesn’t inform on whether Ca2+ can trap icilin. We have added a statement about not knowing how icilin enters or leaves the cooling agent binding pocket in TRPM channels.

      (2) Icilin is referred to as a "cooling compound", but its cooling effects are dependent on its interactions with TRPM8. This might be something to clarify, as it might otherwise be understood that other TRPM channels that interact with icilin also mediate the sensing of cool temperatures.

      This is another excellent point and we have no reason to believe that icilin interacting with any TRPM channel other than TRPM8 mediates cooling sensations. We have added a statement to this effect in the discussion when considering actions of icilin that might be mediated by TRPM4 channels.

      Reviewer #2 (Recommendations For The Authors):

      (1) The title and statements in the results/discussion refer to icilin as a cooling agonist of TRPM4 and binds to a conserved "cooling agonist binding pocket", and the authors suggested a similar role and binding site for icilin in TRPM2 and TRPM5 channel. It is a too broad conclusion that is not fully supported by the current experimental data, which only shows icilin works as a modulator, not an agonist for TRPM4 channel. The authors should change the usage of cooling agonist or conserved cooling agonist binding pocket plus significantly tone down the conclusion of a conserved cooling agonist binding pocket, which is potentially misleading. Alternatively, if the authors insist on using cooling agonist in this context, they should establish the activation of TRPM4, TRPM2, and TRPM5 by icilin as the first step, because the current data only support icilin as a TRPM4 modulator but not an agonist.

      We respectfully don’t agree with this opinion. We show broad conservation of the cooling agent binding pocket in structures of many TRPM channels, and we chose one of them to test for a functional relationship. We think that the title accurately reflects the topic of the paper and does not specify the extent to which functional conservation has been demonstrated and we would like to keep it. The distinction between agonist and modulator is not even germane because icilin is not an agonist of TRPM8 either.

      (2) The manuscript will be strengthened if the authors test additional cooling compounds of TRPM8, including menthol, the menthol analog WS-12, and C3. More importantly, distinct from icilin, these three compounds do not depend on Ca2+ to activate the TRPM8 channel. Thus when testing these compounds on TRPM4, it may reduce the complication of the role of Ca2+, as TRPM4 channel itself requires Ca2+ for activation.

      We restate our response to this point in the public review…

      We would have liked to broaden the scope to other ligands that modulate TRPM8 and we agree that including those data would certainly reinforce our conclusions. However, the first author recently moved on to a new faculty position and extending our findings would require enlisting another member of the lab and taking away from their independent projects. We also do not agree that this is essential to support any of our conclusions. It is also important to keep in mind that icilin is a high-affinity ligand for TRPM8, such that weaker interactions with TRPM4 can still be readily observed. We think it is likely that lower affinity agonists like menthol might not have sufficient affinity to see activity in TRPM4 This scenario is not unlike our earlier experience with TRPV channels where we succeeded in engineering vanilloid sensitivity into TRPV2 and TRPV3 using the high affinity agonist resiniferatoxin (Zhang et al., 2016, eLife). In the case of TRPV2, another group had made the same quadruple mutant and failed to see activation by capsaicin even though resiniferatoxin also worked in their hands (see Fig. 2 in Yang et al., 2016, PNAS).

      (3) The manuscript will be strengthened if the authors test additional residues in the S1-S4 pocket that form direct interactions or are within interacting distances with icilin based on the cryo-EM structures.

      We restate our response to this point in the public review…

      We present clear gain and loss of function for two mutants corresponding to influential residues within the cooling agent binding pocket of TRPM8. We agree that Y786 mutations would have been a valuable addition and our plan was to include mutations of this residue. Unfortunately, both the Y786A and Y786H mutants exhibited rundown, making them challenging to obtain reliable results on their effects on modulation by icilin.

      Furthermore, the ambiguity in the icilin binding pose based on available TRPM8 structures complicates structure-based identification of the most important interacting residues in TRPM8, and we would have needed to functionally validate the effects of any novel mutations we identified in TRPM8 prior to testing them in TRPM4. Instead, we have based our mutagenesis on constructs that have been previously characterized to affect the sensitivity of TRPM8 to cooling agents. A systematic mutagenesis scan of TRPM8 residues predicted to interact differentially with icilin in the two different available binding poses would likely help clarify the true binding pose of icilin and would be an interesting future study.

      Reviewer #3 (Recommendations For The Authors):

      I enjoyed reading this manuscript. It was well-executed and written. It will be interesting to corroborate these findings with a cryo-EM structure of TRPM2, TRPM4, or TRPM5 in the presence of icilin.

      We agree and may pursue these in future studies. This would be particularly interesting given ambiguities in how icilin docks into TRPM8 in previously published structures.

      Minor comments/questions:

      Have the authors considered icilin accessibility to its binding pocket? In other words, could the presence of intracellular Ca2+ inhibit the accessibility of icilin to its binding pocket in TRPM4? It should be a straightforward experiment, I think it would be informative, and could further support the authors' conclusion of the location of the TRPM4 icilin binding pocket.

      We completely agree and we had tried to look for evidence that Ca2+ might trap icilin in TRPM4 but at the end of the day the results were not convincing enough to include in the manuscript. We have added data showing that icilin slows deactivation of TRPM4 after removing Ca2+, which is particularly evident in the A867G mutant, but this doesn’t inform on whether Ca2+ can trap icilin. We have added a statement about not knowing how icilin enters or leaves the cooling agent binding pocket in TRPM channels.

      Figures 7 and 8 are missing the 0 µM Ca2+ control trace in the presence of 25 µM icilin.

      All sample traces from Figures 7 and 8 are shown from a single cell for the sake of comparison (Likewise, the sample traces from Figures 3 and 4 come from a single cell, and the sample traces from Figures 5 and 6 come from a single cell). Unfortunately, we were unable to obtain data from an R901H mutant cell that contained all six conditions we wished to show, and there is no representative trace for 0 µM Ca2+ in the presence of 25 µM icilin for that cell.

      This is up to the discretion of the authors, but perhaps a better way to arrange the paper Figures would be to combine Figures 5-6 and Figures 7-8 and rearrange the data to place some in a supplementary figure (e.g. Figure 5-6 = Figure 5 and Figure 5 - Figure Supplement 1, Figure 7-8 = Figure 6 and Figure 6 - Figure Supplement 1).

      We carefully considered these suggestions and we appreciate the reviewers’ flexibility but would prefer to retain the original arrangement of data in the figures.

      Are there any mutations in the icilin binding pocket in TRPM4, and presumably TRPM2 and TRPM5, that are associated with human disease? This is a question that came to my mind and not one that needs to be addressed in the manuscript.

      This is an interesting point. There are quite a few disease-associated mutants within TRPM4 at positions corresponding to the cooling agent binding pocket in TRPM8. We could not see an appropriate place in the discussion where we could concisely bring this information in so we decided against commenting.

    1. I would and could neverfully understand the specificity of pain caused by residentialschools and the damage done to those who were taken andthose who were left behind.

      I think this is something worth repeating. Any of us who have not been to residential schools may try to understand as best we can, educate ourselves, and read survivor's stories, but we will never truly relate to or completely understand the trauma that came with being there.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This study provides an incremental advance to the scavenger receptor field by reporting the crystal structures of the domains of SCARF1 that bind modified LDL such as oxidized LDL and acylated LDL. The crystal packing reveals a new interface for the homodimerization of SCARF1. The authors characterize SCARF1 binding to modified LDL using flow cytometry, ELISA, and fluorescent microscopy. They identify a positively charged surface on the structure that they predict will bind the LDLs, and they support this hypothesis with a number of mutant constructs in binding experiments.

      Strengths:

      The authors have crystallized domains of an understudied scavenger receptor and used the structure to identify a putative binding site for modified LDL particles. An especially interesting set of experiments is the SCARF1 and SCARF2 chimeras, where they confer binding of modified LDLs to SCARF2, a related protein that does not bind modified LDLs, and use show that the key residues in SCARF1 are not conserved in SCARF2.

      Weaknesses:

      While the data largely support the conclusions, the figures describing the structure are cursory and do not provide enough detail to interpret the model or quality of the experimental X-ray structure data. Additionally, many of the flow cytometry experiments lack negative controls for non-specific LDL staining and controls for cell surface expression of the SCARF constructs. In several cases, the authors interpret single data points as increased or decreased affinity, but these statements need dose-response analysis to support them. These deficiencies should be readily addressable by the authors in the revision.

      The paper is a straightforward set of experiments that identify the likely binding site of modified LDL on SCARF1 but adds little in the way of explaining or predicting other binding interactions. That a positively charged surface on the protein could mediate binding to LDL particles is not particularly surprising. This paper would be of greater importance if the authors could explain the specificity of the binding of SCARF1 to the various lipoparticles that it does or does not bind. Incorporating these mutants into an assay for the biological role of SCARF1 would be powerful.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Wang and colleagues provided mechanistic insights into SCARF1 and its interactions with the lipoprotein ligands. The authors reported two crystal structures of the N-terminal fragments of SCARF1 ectodomain (ECD). On the basis of the structural analysis, the authors further investigated the interactions between SCARF1 and modified LDLs using cell-based assays and biochemical experiments. Together with the two structures and supporting data, this work provided new insights into the diverse mechanisms of scavenger receptors and especially the crucial role of SCARF1 in lipid metabolism.

      Strengths:

      The authors started by determining the crystal structures of two fragments of SCARF1 ECD. The superposition of the two high-resolution structures, together with the predicted model by AlphaFold, revealed that the ECD of SCARF1 adopts a long-curved conformation with multiple EGF-like domains arranged in tandem. Non-crystallographic and crystallographic two-fold symmetries were observed in crystals of f1 and f2 respectively, indicating the formation of SCARF1 homodimers. Structural analysis identified critical residues involved in dimerization, which were validated through mutational experiments. In addition, the authors conducted flow cytometry and confocal experiments to characterize cellular interactions of SCARF1 with lipoproteins. The results revealed the vital role of the 133-221aa region in the binding between SCARF1 and modified LDLs. Moreover, four arginine residues were identified as crucial for modified LDL recognition, highlighting the contribution of charge interactions in SCARF1-lipoprotein binding. The lipoprotein binding region is further validated by designing SCARF1/SCARF2 chimeric molecules. Interestingly, the interaction between SCARF1 and modified LDLs could be inhibited by teichoic acid, indicating potential overlap in or sharing of binding sites on SCARF1 ECD.

      The author employed a nice collection of techniques, namely crystallographic, SEC, DLS, flow cytometry, ELISA, and confocal imaging. The experiments are technically sound and the results are clearly written, with a few concerns as outlined below. Overall, this research represents an advancement in the mechanistic investigation of SCARF1 and its interaction with ligands. The role of scavenger receptors is critical in lipid homeostasis, making this work of interest to the eLife readership.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Wang et. al. described the crystal structures of the N-terminal fragments of Scavenger receptor class F member 1 (SCARF1) ectodomains. SCARF1 recognizes modified LDLs, including acetylated LDL and oxidized LDL, and it plays an important role in both innate and adaptive immune responses. They characterized the dimerization of SCARF1 and the interaction of SCARF1 with modified lipoproteins by mutational and biochemical studies. The authors identified the critical residues for dimerization and demonstrated that SCARF1 may function as homodimers. They further characterized the interaction between SCARF1 and LDLs and identified the lipoprotein ligand recognition sites, the highly positively charged areas. Their data suggested that the teichoic acid inhibitors may interact with SCARF1 in the same areas as LDLs.

      Strengths:

      The crystal structures of SCARF1 were high quality. The authors performed extensive site-specific mutagenesis studies using soluble proteins for ELISA assays and surface-expressed proteins for flow cytometry.

      Weaknesses:

      (1) The schematic drawing of human SCARF1 and SCARF2 in Fig 1A did not show the differences between them. It would be useful to have a sequence alignment showing the polymorphic regions.

      The schematic drawing in Fig.1A is to give a brief idea about the two molecules, the sequence alignment may take too much space in the figure. A careful alignment between SCARF1 and SCARF2 can be found in Ref. 24 (Ishii, et al., J Biol Chem, 2002. 277, 39696-702) an also mentioned in p.4.

      (2) The description of structure determination was confusing. The f1 crystal structure was determined by SAD with Pt derivatives. Why did they need molecular replacement with a native data set? The f2 crystal structure was solved by molecular replacement using the structure of the f1 fragment. Why did they need to use EGF-like fragments predicted by AlphaFold as search models?

      The crystal structure of f1 was first determined by SAD using Pt derivatives, but soaking of Pt reduced the resolution of the crystals, therefore we use this structure as a search model for a native data set that had higher resolution for further refinement. For the structural determination of f2, the molecular replacement using f1 structure was not able to show the initial density of the extra region in f2 (residues 133-209), which was missing in f1. Therefore, the EGF-like domains of SCARF1 modeled by AlphaFold were applied as search models for this region (p.18).

      (3) It's interesting to observe that SCARA1 binds modified LDLs in a Ca2+-independent manner. The authors performed the binding assays between SCARF1 and modified LDLs in the presence of Ca2+ or EDTA on Page 9. However, EDTA is not an efficient Ca2+ chelator. The authors should have performed the binding assays in the presence of EGTA instead.

      The binding assays in the presence of EGTA are included in the revised manuscript (Fig. S7) (p.9), which also suggest that SCARA1 binds OxLDL in a Ca2+-independent manner.

      (4) The authors claimed that SCARF1Δ353-415, the deletion of a C-terminal region of the ectodomain, might change the conformation of the molecule and generate hinderance for the C-terminal regions. Why didn't SCARF1Δ222-353 have a similar effect? Could the deletion change the interaction between SCARF1 and the membrane? Is SCARF1Δ353-415 region hydrophobic?

      The truncation mutants were constructed to roughly locate the binding region of lipoproteins on SCARF1, and the overall results showed that the sites might locate at the region of 133-221. Mutant Δ222-353 may also affect the conformation, but it still had binding with OxLDL like wild type, suggesting the binding sites were retained in this mutant. Mutant Δ353-415 showed a reduction of binding, implying that the binding sites might be retained but binding was affected, we think it might be due to the conformational change that could reduce the binding or accessibility of lipoproteins. Since this region locates closer to the membrane, it’s possible that it may change the interaction with the membrane. In the AF model, Δ353-415 region does not seem to be more hydrophobic than other regions (Fig. S2C).

      (5) What was the point of having Figure 8? Showing the SCARF1 homodimers could form two types of dimers on the membrane surface proposed? The authors didn't have any data to support that.

      Fig. 8 shows a potential model of the SCARF1 dimers on the cell surface by combining the structural information from crystals and AF predictions. The two dimers in the figure are identical but with different viewing angles. The lipoprotein binding sites are also indicated (Fig. 8).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      The authors need to show examples of the electron density for both structures.

      Electron density examples of the two structures are shown in Fig. S2A.

      Figure 1)

      The figure does not show enough details of the structure. The text mentions hydrogen-bond and disulfide bonds that stabilize the loops, these should be shown.

      Disulfide bonds of the two structures are shown in Fig. 1.

      Figure 2)

      D) The full gel should be shown.

      E) Rather than just relying on changes in gel filtration elution volumes, the authors do the appropriate experiment and measure the hydrodynamic radius of the WT and mutant ectodomains by DLS. However, they need to show plots of the size distribution, not just mean radial values, in order to show if the sample is monodisperse.

      The full gel and plots of DLS are shown in Fig. S3A-B.

      Figure 3)

      I have concerns about the rigor of the experiments in panels A-D. The authors include a non-transfected control but do not appear to have treated non-transfected cells with the lipoproteins to evaluate the specificity of binding. Every cell binding assay (flow  or confocal) must show the data from non-transfected cells treated with each lipoprotein, as each lipoprotein species could have a unique non-specific binding pattern. The authors show these controls in Figure 6, but these controls are necessary in every experiment.

      In Fig. 3A, since several lipoproteins were included in the figure, we use non-transfected cells without lipoprotein treatment as a negative control. The OxLDL or AcLDL treated non-transfected cells were also used as negative controls and shown in Fig. 3B-C. LDL, HDL or OxHDL may have their own non-specific binding patterns, the treatment of LDL, HDL or OxHDL with the transfected cells all gave negative results (Fig. 3A and D).

      Cell-surface of the SCARF1 variants is a major concern. The constructs the authors use are tagged with a GFP on the cytosolic side. However, the Methods to do indicate if they gate on GFP+ transfected cells for analytical flow. Such gating may have been used because the staining experiments in Figures 3 and 4 show uniform cell populations, whereas the staining done with an anti-SCARF1 Ab in S4 shows most of the cells not expressing the protein on the surface. Please clarify.

      Data for the anti-SCARF1 Ab assay is gated for GFP in the revised Fig. S4, and  the non-transfected cells are included as a control.

      The authors must demonstrate cell-surface staining with an epitope tag on the extracellular side and clarify if the analyzed cells are gated for surface expression. The anti-SCARF antibody used in S4 may not recognize the truncated or mutant SCARFs equally. Cell-surface expression in the flow experiments cannot be inferred from confocal experiments because the flow experiments have a larger quantitative range.

      Anti-SCARF1 antibody assay provides an estimation of the surface expression of the proteins. If the epitope of the antibody was mutated or removed in the mutants, most likely it would lose binding activity. Including an epitope tag on the ectodomain could be an option, but if truncation or mutation changes the conformation of the ectodomain, the accessibility of the epitope may also be affected, and addition of an extra sequence or domain, such as an epitope tag, may affect the surface expression of proteins sometimes.

      In several places, the authors infer increased or decreased affinity from mean fluorescent intensity values of a single concentration point without doing appropriate dose-curves. These experiments need to be done or else the mentions of changes in apparent affinities should be removed.

      We add a concentration for the WT interaction with OxLDL (Fig. S6, p.9) and the manuscript is also modified accordingly.

      Figure 7

      The concentration of teichoic acid used to inhibit modified LDL binding should be indicated and a dose-curve analysis should be done comparing teichoic acid to some non-inhibitory bacterial polymer.

      The concentration of teichoic acids used in the inhibition assays is 100 mg/ml (p.21). Unfortunately, we don’t have other bacterial polymers in the lab and not sure about the potential inhibitory effects.

      Reviewer #2 (Recommendations For The Authors):

      Major points:

      (1) The SCARF1 ECD contains three N-linked glycosylation sites (N289, N382, N393). It remains unclear whether these modifications are involved in SCARF1 binding to modified LDLs. Is it possible to design some experiments to investigate the effect of N-glycans on the recognition of modified LDLs? In particular, N382 and N393 are included in 353-415aa and the truncation mutant of SCARF1Δ353-415aa resulted in reduced binding with OxLDL in Fig.3G. Or whether the reduced binding is only due to the potential conformational changes caused by the deletion of the C-terminal region of the ECD?

      A previous study regarding the N-glycans (N289, N382, N393) of SCARF1 (ref.17) has shown that they may affect the proteolytic resistance, ligand-binding affinity and subcellular localization of SCARF1, which is not quite surprising as lipoproteins are large particles, the N-glycans on the surface of SCARF1 could affect accessibility or affinity for lipoproteins. But the exact roles of each glycan could be difficult to clarify as they might also be involved in protein folding and trafficking.

      The reduction of the binding of OxLDL for the mutant SCARF1 Δ353-415aa may be due to the conformational change or the loss of the glycans or both.

      (2) The authors speculated that the dimeric form of SCARF1 may be more efficient in recognizing lipoproteins on the cell surface. Please highlight the critical region/sites for ligand binding in Figure 8 and discuss the structural basis of dimerization improving the binding.

      The binding sites for lipoproteins on SCARF1 are indicated in Fig. 8. According to our data, it might be possible the conformation of the dimeric form of SCARF1 makes it more accessible to the ligands on the cell surface as implied by flow cytometry (p.14-15), but still needs further evidence on this.

      (3) Could the two salt bridges (D61-K71, R76-D98) observed in f1 crystals be found in f2 crystals? They seemed to be a little far from the defined dimeric interface (F82, S88, Y94) and how important are these to SCARF1 dimerization?

      The two salt bridges observed in f1 crystal are not found in f2 crystal (distances are larger than 5.0 Å), suggesting they are not required for dimerization (p. 7-8), but may be helpful in some cases.

      (4) The monomeric mutants (S88A/Y94A, F82A/S88A/Y94A) exhibited opposite affinity trends to OxLDL in ELISA and flow cytometry. The authors proposed steric hinderance of the dimers coated onto the plates as the potential explanation for this observation. However, the method of ELISA stated that OxLDLs, instead of SCARF1 ECD, were coated onto the plates. So what's the underlying reason for the inconsistency in different assays?

      Thanks. ELISA was done by coating OxLDLs on the plates as described in the Methods. But still, a dimeric form of SCARF1 may only bind one OxLDL coated on the plates due to steric hinderance. We correct this on p.12.

      Minor points:

      (1) Figure 2D and Figure S3 - please label the molecular weight marker on the SEC traces to indicate the native size of various purified proteins.

      The elution volume of SEC not only reflects the molecular weight, but it’s also affected by the conformation or shape of protein. The ectodomain of SCARF1 has a long curved conformation, the elution volumes of the monomeric or dimeric forms of SCARF1 do not align well with the standard molecular weight marker and elute much earlier in SEC. We include the standard molecular weight marker in Fig. S3C-D.

      (2) Could the authors provide SEC profiles of f1 and f2 that were used in crystallographic study?

      The SEC profiles of f1 and f2 for crystallization are shown in Fig. S5 (p.6).

      (3) The legend of Figure 3A states that the NC in flow cytometry assay represents the non-transfected cells, but please confirm whether the NC in Fig. 3A-C corresponds to non-transfected cells or no lipoprotein.

      NC in Fig. 3A represents the non-transfected cells, and no lipoproteins were added in this case as several lipoproteins are included in Fig. 3A. The lipoprotein (OxLDL or AcLDL) treated non-transfected cells (NC) were shown in Fig. 3B-C as negative controls.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript authored by Stockner and colleagues delves into the molecular simulations of Na+ binding pathway and the ionic interactions at the two known sodium binding sites site 1 and site 2. They further identify a patch of two acidic residues in TM6 that seemingly populate the Na+ ions prior to entry into the vestibule. These results highlight the importance of studying the ion-entry pathways through computational approaches and the authors also validate some of their findings through experimental work. They observe that sodium site 1 binding is stabilized by the presence of the substrate in the s1 site and this is particularly vital as the GABA carboxylate is involved in coordinating the Na+ ion unlike other monoamine transporters and binding of sodium to the Na2 site stabilizes the conformation of the GAT1 by reducing flexibility among the helical bundles involved in alternating access.

      Strengths:

      The study displays results that are generally consistent with available information from experiments on SLC6 transporters particularly GAT1 and puts forth the importance of this added patch of residues in the extracellular vestibule that could be of importance to the ion permeation in SLC6 transporters. This is a nicely performed study and could be improved if the authors could comment on and fix the following queries.

      We thank the reviewer for the overall positive assessment of our work.

      Comments on revised version:

      The authors have satisfactorily addressed my comments and this has significantly improved the clarity of the manuscript.

      The only point that I would like to inquire about is the role of EL4 in modulating Na+ entry.

      In the simulations do the authors see no role of EL4 in controlling Na+ entry. It is particularly intriguing as some studies in the recent past displayed charged mutations in EL4 of dDAT, SERT and GAT1 as being detrimental for substrate entry/uptake. It would therefore be nice to add a small discussion if there is any role for EL4 in Na+ entry.

      In this study we focused on sodium binding to the sodium binding site NA1 and NA2 and discovered the role of negatively charged residues at the beginning of TM6 contribution to sodium binding. Our data shows less than average interactions of sodium ions with EL4. In particular, we do also not observe any prominent role for D355, which is the only negatively charged residues in EL4a. We associate this effect to the presence of four positively charged residues (R69,Y76, K350, R351) surrounded D355 and an electrostatic repulsion by a local positive field, which is also visible in Figure 1k. Following the suggestion of the reviewer, we added a short statement to the last paragraph of the discussion.

      Reviewer #2 (Public Review):

      Summary

      Starting from an AlphaFold2 model of the outward-facing conformation of the GAT1 transporter, the authors primarily use state-of-the-art MD simulations to dissect the role of the two Na+ ions that are known to be co-transported with the substrate, GABA (and a cotransported Cl- ion). The simulations indicated that Na+ binding to OF GAT depends on the electrostatic environment. The authors identify an extracellular recruiting site including residues D281 and E283 which they hypothesized to increase transport by locally increasing the available Na+ concentration and thus increasing binding of Na+ to the canonical binding sites NA1 and NA2. The charge-neutralizing double mutant D281AE283A showed decreased binding in simulations. The authors performed GABA uptake experiments and whole-cell patch clamp experiments that taken together validated the hypothesis that the Na+ staging site is important for transport due to its role in pulling in Na+.

      Detailed analysis of the MD simulations indicated that Na+ binding to NA2 has multiple structural effects: The binding site becomes more compact (reminiscent of induced fit binding) and there is some evidence that it stabilizes the outward-facing conformation.

      Binding to NA1 appears to require the presence of the substrate, GABA, whose carboxylate moiety participates in Na+ binding; thus the simulations predict cooperativity between binding of GABA and Na+ binding to NA1.

      Strengths

      - MD simulations were used to propose a hypothesis (the existence of the staging Na+ site) and then tested with a mutant in simulations AND in experiments. This is an excellent use of simulations in combination with experiments.

      - A large number of repeat MD simulations are generally able to provide a consistent picture of Na+ binding. Simulations are performed according to current best practices and different analyses illuminate the details of the molecular process from different angles.

      - The role of GABA in cooperatively stabilizing Na+ binding to the NA1 site looks convincing and intriguing.

      We thank the reviewer for the overall positive assessment of our work.

      Weaknesses

      - Assessing the effects of Na+ binding on the large scale motions of the transporter is more speculative because the PCA does not clearly cover all of the conformational space and the use of an AlphaFold2 model may have introduced structural inconsistencies. For example, it is not clear if movements of the inner gate are due to a AF2 model that's not well packed or really a feature of the open outward conformation.

      We do not think that the results of the manuscript and in particular the large scale motions are speculative or dependent too much on the limitations of PCA. We only use PCA for Figure 6a-d,6g,h. Motions of SLC6 transporters (and of any other transporter) are much more complex than a single 2D PCA plot could every capture. We therefore used PCA here only to identify the two motions with the largest amplitude, show in Figure 6a-d, 6g,h.

      Given that all the ~13000 degrees of freedom of GAT1 contribute to conformational differences, a dimensionally reduction method like PCA can be very helpful for extracting dominant motions. Structure comparison showed that motions observed in PC1 captured a large portion of the motions of occlusion (Figure 6c,d) when compared to the full transition observed in the unfiltered trajectories (See Figure 6e,f). PCA therefore helps to extract this main motions.

      For completeness, we show a series of structures from the unfiltered trajectories in figure 6e,f. In the overlay, the motion of occlusion is more difficult to observe, because convoluted with all other degrees of freedom. In figure 6e,f, the structures are aligned with the maximum likelihood method theseus, while the coloring is based on the amplitudes measured by PCA to visualize the regions moving relative to each other with largest amplitude. All other structural measures, including the opening of the inner gate (Figure 6i-k), are direct measures of the raw trajectories.

      With respect to the question of the instability of the inner gate, we made similar observations for hSERT (please see DOI: 10.1038/s41467-023-44637-6) using the experimentally determined structure as starting point. We find a weakening of the inner gate for sodium free SERT and at intermediate or full occlusion of sodium- and serotonin-bound SERT. These previous data on SERT corroborate our finding and indicates that the effect could be a general feature of the SLC6 transporter family.

      Unfortunately no outward-open structure of GAT1 was available for this study. AlphaFold2 models have limitations and we are well aware of these limitations, but AlphaFold2 can also make high quality models including small adjustment of backbone positions, if the sequence identity is high, as in the current project (43% sequence identity for the transmembrane region). For GAT1 (as described in the manuscript) we initially tested hSERT based model created with MODELLER. MODELLER uses as premises the assumption that the protein backbone does not change or only very little between the template protein and the target protein. These MODELLER created models did not perform well, because of a slight shift in the position of the backbone, which is a consequence of consistently smaller side chains in the bundle domain-scaffold domain interface of GAT1 as compared to SERT.

      In the simulations described in the manuscript (using the AlphaFold created model) we observed that the overall structural and dynamic parameters and in particular also observation at the inner gate are very similar to the results described in our papers on sodium binding to SERT using experimental SERT structures. The differences of Na1 binding are explained in the manuscript and are contingent to the residue difference of D98 in SERT and the corresponding residue G65 in GAT1. This makes us confident about the quality of the obtained data. Please see DOI: 10.3390/cells11020255; DOI: 10.3389/fncel.2021.673782.

      - Quantitative analyses are difficult with the existing data; for example, the tICA "free energy" landscape is probably not converged because unbinding events haven't been observed.

      The tICA analysis is a Marco State Model approach, which relies on the convergence of transitions between a large number of microstates. A limited number of trajectories showing full sodium unbinding are not obligatory for converged dataset, but the transitions between the microstates must to be converged. For the transitions within the S1 we have many transitions and very good convergence for transition probabilities within the S1. We limit interpretation of free energy data and discussion on this part of the free energy surface. The supporting information (Figure S5) reports on the quality of the tICA analysis. Flat lines with a time lag larger than 40 ns is consistent with a converged model based on the data of the trajectories used for the analysis, and consistently, also the Chapman-Kolmogorov tests show minimal difference between estimates and predictions.

      We see about 40 binding event from the extracellular side to the S1, which seems insufficient for a converged quantification for sodium transiting from the extracellular side to the S1. We state this limitation of the dataset in the results section of the manuscript.

    1. What mother has not put a fussy, "hyperactive" child down to nap once too often in the day or in winter sent older children out to play in the "fresh air," even as their red fingers were near frozen, so as to enjoy uninterruptedly a cup of coffee and a long and much anticipated visit with a dear friend? Yes, we have all done these or similar things with (at least we like to think) little harm done.

      This refers to mothers needing personal time for themselves and needing to do things that may not benefit the child but allow the mother some time of peace and cause no harm to the child. This quote shows the complexity of maternal expectations and that mothers need to be able to have their own time when possible. An example of this is an overactive child when mothers put them to sleep to allow them a brief time to themselves. That being said little harm done.

  3. Sep 2024
    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      In this manuscript, Day et al. present a high-throughput version of expansion microscopy to increase the throughput of this well-established super-resolution imaging technique. Through technical innovations in liquid handling with custom-fabricated tools and modifications to how the expandable hydrogels are polymerized, the authors show robust ~4-fold expansion of cultured cells in 96-well plates. They go on to show that HiExM can be used for applications such as drug screens by testing the effect of doxorubicin on human cardiomyocytes. Interestingly, the effects of this drug on changing DNA organization were only detectable by ExM, demonstrating the utility of HiExM for such studies. 

      Overall, this is a very well-written manuscript presenting an important technical advance that overcomes a major limitation of ExM - throughput. As a method, HiExM appears extremely useful, and the data generally support the conclusions. 

      Strengths: 

      Hi-ExM overcomes a major limitation of ExM by increasing the throughput and reducing the need for manual handling of gels. The authors do an excellent job of explaining each variation introduced to HiExM to make this work and thoroughly characterize the impressive expansion isotropy. The dox experiments are generally well-controlled and the comparison to an alternative stressor (H2O2) significantly strengthens the conclusions. 

      Weaknesses: 

      (1) Based on the exceedingly small volume of solution used to form the hydrogel in the well, there may be many unexpanded cells in the well and possibly underneath the expanded hydrogel at the end of this. How would this affect the image acquisition, analysis, and interpretation of HiExM data? 

      The hydrogel footprint covers approximately 5% of the surface within an individual well and only cells within this area are embedded in the polymerized hydrogel for subsequent processing steps. Cells that are outside of this footprint are not incorporated into the gel because these cells are digested by Proteinase K and washed away by the excess water exchange in the gel swelling step. Note that different cell types may require higher or lower concentrations of Proteinase K to adequately digest cells for expansion while maintaining fluorescence signal. Given the compatibility of HiExM with 96-well plates, this titration can be performed rapidly in a single experiment. Although cells outside of the hydrogel footprint are removed prior to imaging, we do occasionally observe Hoechst signal that appears to be underneath the gels. We believe this signal is likely from excess DNA from digested cells that was not fully washed out in the gel swelling step. This signal is both spatially and morphologically distinct from the nuclear signal of intact cells and it does not affect image acquisition, analysis, or data interpretation. 

      (2) It is unclear why the expansion factor is so variable between plates (e.g., Figure 2H). This should be discussed in more detail. 

      The variability in expansion factor across plates can likely be attributed to the small volume of gel solution (~250 nL) required for expansion within 96 well plates. Small variations in gel volume could impact gel polymerization compared to standard ExM gels. For example, gels in HiExM are more sensitive to evaporation because of the ~1000x reduced volume compared to standard expansion gel preparations, resulting in an increased air-liquid-interface. Evaporation in HiExM gels would increase monomer and cross linker concentrations, leading to variation in expansion factor across plates. We note that expansion factor is robust within well plates and that variance is slightly increased between plates. These considerations are discussed in the revised manuscript.

      (3) The authors claim that CF dyes are more resistant to bleaching than other dyes. However, in Figure. S3, it appears that half of the CF dyes tested still show bleaching, and no data is shown supporting the claim that Alexa dyes bleach. It would be helpful to include data supporting the claim that Alexa dyes bleach more than CF dyes and the claim that CF dyes in general are resistant to bleaching should be modified to more accurately reflect the data shown. 

      We did not show data using Alexa dyes because these fluorophores are highly sensitive to photobleaching using Irgacure and thus we could not obtain images. In contrast, some CF dyes are more robust to bleaching in HiExM including CF488A, CF568, and CF633 dyes.  We have recently adapted our protocol to PhotoExM chemistry which is compatible with a wider range of fluorophores as described by Günay et al. (2023) and as shown in Fig. S16.

      (4) Related to the above point, it appears that Figure S11 may be missing the figure legend. This makes it hard to understand how HiExM can use other photo-inducible polymerization methods and dyes other than CF dyes.

      We revised the legend for revised Fig. S11 (now Fig. S16) as follows: Example of a cell expanded in HiExM using Photo-ExM gel chemistry. Photo-ExM does not require an anoxic environment for gel deposition and polymerization, improving ease of use of HiExM. Mitochondria were stained with an Alexa 647 conjugated secondary antibody, demonstrating that HiExM is compatible with additional fluorophores when combined with Photo-ExM.

      (5) The use of automated high-content imaging is impressive. However, it is unclear to me how the increased search space across the extended planar area and focal depths in expanded samples is overcome. It would be helpful to explain this automated imaging strategy in more detail. 

      We imaged plates on the Opera Phenix using the PreciScan Acquisition Software in Harmony. In brief, each well is imaged at 5x magnification in the Hoechst channel to capture the full well at low resolution. Hoechst is used for this step given its signal brightness, ubiquity across established staining protocols, and spectral independence from most fluorophores commonly conjugated to secondary antibodies. Using this information, the microscope detects regions of interest (nuclei) based on criteria including size, brightness, circularity, etc. Finally, the positional information for each region is stored, and the microscope automatically images those regions at 63x magnification. The working distance for the objective used in this study is 600 µm which is sufficient to capture the entirety of expanded cells in the Z direction. This strategy minimizes offtarget imaging and allows robust image acquisition even in cultures with lower seeding density. A detailed description of the automated imaging strategy is included in the methods section of the revised manuscript.

      (6) The general method of imaging pre- and post-expansion is not entirely clear to me. For example, on page 5 the authors state that pre-expansion imaging was done at the center of each gel. Is pre-expansion imaging done after the initial gel polymerization? If so, this would assume that the gelation itself has no effect on cell size and shape if these gelled but not yet expanded cells are used as the reference for calculating expansion factor and isotropy. 

      Pre-expansion imaging is performed after staining is complete, but prior to the application of AcX, which is the first step of the HiExM protocol. Following staining and imaging, plates can be sealed with parafilm and stored at 4˚C for up to a week prior to starting the expansion protocol. We typically image 61 fields of view at the center of the well plate (where the gel will be deposited) to obtain sufficient pre-expansion images as shown in Figure 2b (left). After preexpansion imaging, we perform the HiExM protocol followed by image acquisition. We then tile all the images, as shown in Figure 2b, and compare tiled images from the same well pre- and post-expansion to manually identify the same cells. Comparisons of the pre- and postexpansion images of the same cell are used to calculate expansion factor and isotropy measurements as described. A detailed description of this process is included in the revised manuscript.

      (7) In the dox experiments, are only 4 expanded nuclei analyzed? It is unclear in the Figure 3 legend what the replicates are because for the unexpanded cells, it says the number of nuclei but for expanded it only says n=4. If only 4 nuclei are analyzed, this does not play to the strengths of HiExM by having high throughput.

      We performed the doxorubicin titration assay across four different well plates (n=4). For each condition, the total number of expanded nuclei measured was 118, 111, 110, 113, and 77 for DMSO, 1nM, 10nM, 100nM, and 1µM, respectively. For SEM calculations, we included the number of independent experiments to avoid underestimating error. We revised the Fig. 3 legend to include these experimental details.

      (8) I am not sure if the analysis of dox-treated cells is accurate for the overall phenotype because only a single slice at the midplane is analyzed. It would be helpful to show, at least in one or two example cases, that this trend of changing edge intensity occurs across the whole 3D nucleus.  

      For this analysis, the result is heavily dependent on the angle at which the edge of the nucleus intersects the image plane in the orthogonal view. For this reason, we opted to only use the optimal image plane for each nucleus. We repeated our analysis on an image using multiple optical sections to demonstrate this point. These new data are included as Fig. S11 of the revised manuscript.

      (9) It would be helpful to provide an actual benchmark of imaging speed or throughput to support the claims on page 8 that HiExM can be combined with autonomous imaging to capture thousands of cells a day. What is the highest throughput you have achieved so far?  

      The parameters that dictate imaging speed in HiExM include exposure time, z-stack height, and number of fluorophore channels. Depending on the signal intensity for a given channel, exposure times vary from 200ms to 1000ms. For z-stack height, we found that imaging 65 sections with 1µm spacing allowed for robust identification of each region of interest in the 5x pre-scan. As an example, collecting images for a full well plate (e.g., 20 images per well with 4 channels) requires approximately 24 hours of autonomous image acquisition using the Opera Phenix. Depending on cell size, this process yields imaging data for 1200 cells (1 cell per field of view) to 6000 cells (5 cells per field of view). Different autonomous imagers as well as improving staining techniques that increase signal:noise can be expected to significantly decrease the exposure time as it will reduce the number of z-stacks needed for each region.

      Reviewer #2 (Public Review): 

      Summary: 

      In the present work, the authors present an engineering solution to sample preparation in 96well plates for high-throughput super-resolution microscopy via Expansion Microscopy. This is not a trivial problem, as the well cannot be filled with the gel, which would prohibit the expansion of the gel. A device was engineered that can spot a small droplet of hydrogel solution and keep it in place as it polymerizes. It occupies only a small portion of space at the center of each well, the gel can expand into all directions, and imaging and staining can proceed by liquid handling robots and an automated microscope. 

      Strengths: 

      In contrast to Reference 8, the authors' system is compatible with standard 96 well imaging plates for high-throughput automated microscopy and automated liquid handling for most parts of the protocol. They thus provide a clear path towards high-throughput ExM and highthroughput super-resolution microscopy, which is a timely and important goal. 

      Weaknesses: 

      The assay they chose to demonstrate what high-throughput ExM could be useful for, is not very convincing. But for this reviewer that is not important. 

      We believe the data provide an example of the utility of HiExM that would benefit experiments that require many samples (e.g., conditions, replicates, timepoints, etc.) by enabling easier sample processing and autonomous acquisition of thousands of nanoscale images in parallel. The ability to generate large data sets also enables quantitative analysis of images with appropriate statistical power. The intention of this work is to provide a proof-of-concept example of the robustness, accessibility, and experimental design flexibility of HiExM.

      Reviewer #3 (Public Review):

      Summary: 

      Day et al. introduced high-throughput expansion microscopy (HiExM), a method facilitating the simultaneous adaptation of expansion microscopy for cells cultured in a 96-well plate format. The distinctive features of this method include 1) the use of a specialized device for delivering a minimal amount (~230 nL) of gel solution to each well of a conventional 96-well plate, and 2) the application of the photochemical initiator, Irgacure 2959, to successfully form and expand the toroidal gel within each well.  

      Strengths: 

      This configuration eliminates the need for transferring gels to other dishes or wells, thereby enhancing the throughput and reproducibility of parallel expansion microscopy. This methodological uniqueness indicates the applicability of HiExM in detecting subtle cellular changes on a large scale. 

      Weaknesses: 

      To demonstrate the potential utility of HiExM in cell phenotyping, drug studies, and toxicology investigations, the authors treated hiPS-derived cardiomyocytes with a low dose of doxycycline (dox) and quantitatively assessed changes in nuclear morphology. However, this reviewer is not fully convinced of the validity of this specific application. Furthermore, some data about the effect of expansion require reconsideration. 

      The application we chose was intended as a methods proof-of-concept that could enable future deep biological investigations using HiExM. We believe the data provide an example of the utility of HiExM for collecting thousands of nanoscale images that would benefit experiments that require many samples (e.g., conditions, replicates, timepoints, etc.). The ability to generate large data sets also enables quantitative analysis of images with appropriate statistical power. The intention of this experiment was to provide a proof-of-concept example of the robustness, accessibility, and experimental design flexibility of HiExM. 

      The variability in expansion factor across plates can likely be attributed to the small volume (~250 nL) deposited by the device posts. Small variations in gel volume could impact gel polymerization compared to standard ExM gels. For example, HiExM gels are more sensitive to evaporation due to an increased air-liquid-interface because they are ~1000x smaller than standard expansion gel preparations. Evaporation in HiExM gels likely increases monomer and cross linker concentrations, leading to variation in expansion factor across plates. We note that expansion factor is robust within well plates and that the expansion factor can be more variable between plates, likely due to differences in gel volumes and evaporation. Future iterations of the platform are expected to control for these environmental conditions. These differences are discussed in the revised manuscript.

      Recommendations for the authors:.

      Reviewer #1 (Recommendations For The Authors):

      (1) Please include a scale bar in Figure 3a.

      A scale bar has been added to Figure 3a.

      (2) Please show the data related to nuclear volume after dox treatment.

      We have added a supplementary figure (Fig. S10) showing nuclear volume and sphericity for post-expansion nuclei as well as nuclear area and circularity for pre-expansion nuclei.

      (3) I think it would be extremely helpful for the method as a whole if analysis code and files for device fabrication were made publicly available rather than upon request.

      The analysis code has been included in the supplementary files as CM_Hoechst_Analysis_for publication.ipynb. Device design files are also available at the supplementary files link as hiExM_device.SLDPRT (96-well plate device) and MultiExM_24_July28_2022.SLDPRT (24-well plate device).

      (4) Some details are missing from the methods, such as the concentration of AcX used for HiExM, the concentration of antibodies, etc. Related, how long does the photopolymerization take? Just the 60 seconds that the UVA light is on?

      Additional protocol details are included in the methods section of the revised manuscript. The photopolymerization does only take 60 seconds.

      Reviewer #2 (Recommendations For The Authors):

      (1) The first three references are chosen a little strangely here. I suggest citing STED, SIM, and PALM/STORM from the original manuscripts here. Also, EM is technically not a super-resolution technique as it is within the resolution of electron beams. This reviewer would stay with light microscopy methods when discussing "super-resolution".

      We removed the reference to EM and added citations to the original publications for SIM, STED, and STORM.

      (2) The sentence after citation 4 is a little off in its meaning.

      We have edited the sentence to improve clarity.

      (3) It is highly useful and great that the authors include the observations on the effect of photopolymerization with Irgacure 2959 on dyes.

      (4) In the discussion, the authors could mention new high NA silicone oil objectives that may further optimise the resolution in their scheme.

      We added a sentence in the discussion to reflect this important point.

      (5) The files for the manufacture of the HiExM devices must be in the supplementary data rather than available on request.

      The Solidworks designs for the 96 and 24 well plate devices are included in the supplementary files as hiExM_device.SLDPRT and MultiExM_24_July28_2022.SLDPRT, respectively.

      (6) It would be useful if the authors could discuss their thoughts on the high throughput processing of expansion factors in the data analysis routine.

      We added details to the methods section describing how images are processed and analyzed.

      Reviewer #3 (Recommendations For The Authors):

      Major:

      (1) In the experiments depicted in Figure 3, the authors attempted cellular phenotyping using hiPCS-derived cardiomyocytes treated with doxorubicin (dox). They addressed that the relative intensity of Hoechst at the nuclear periphery increased solely in post-expansion images, although this trend is not clearly evidenced in the provided data (e.g., DMSO control vs. 1 nM dox, Figure 3b). Moreover, this observed phenomenon lacks clear biological significance and may not be suitable as a demonstration for proof-of-concept (POC) acquisition. It is crucial to delineate the biological processes linked with the specific enhancement of DNA binding dye signals in the nuclear periphery and how to rule out the possibility of heterogeneous redistribution of nuclear components rather than enhancing resolution. For instance, if this change can be associated with a biological process such as DNA damage, quantitative detection of the accumulated proteins related to DNA repair, or the specific histone marks, may be more suitable and less susceptible to heterogeneous expansion factors. Additionally, the authors noted the absence of significant changes in nuclear volume, yet the corresponding data was not presented. Moreover, the application insufficiently demonstrated the HiExM's scalable feature employing various well plates. If only acquiring images of dozens of nuclei (Figure 3 legend, p15), a single well per condition would suffice. Therefore, it is necessary to elucidate why this application necessitates a 96-well format for demonstration purposes. The potential experimental design should also incorporate the requirement for well-to-well replication and the acquisition of features at the individual well level, rather than at the single-cell level. Also, related to Figure S10, whether outer gradient slope, but not inner gradient slope, is linked to apoptosis (Page 8, Line 2-4) remains unclear in the H2O2-treated cells.

      We believe the data provide an example of the utility of HiExM that would benefit experiments that require many samples (e.g., conditions, replicates, timepoints, etc.) by enabling easier sample processing and autonomous acquisition of thousands of nanoscale images in parallel. The ability to generate large data sets also enables quantitative analysis of images with appropriate statistical power. The intention of this work is to provide a proof-of-concept example of the robustness, accessibility, and experimental design flexibility of the HiExM method. As discussed in the manuscript, dox treatment is associated with DNA damage, cellular stress, and apoptosis, and commonly observed at high dox concentrations (>200 nM) in in vitro studies using conventional microscopy. Our data suggest that cardiomyocytes exhibit sensitivity to lower concentrations of dox than previously anticipated. Although direct evidence specifically linking dox to increased DNA condensation at the nuclear periphery is limited, the known proapoptotic effects of dox strongly suggest that our observations correlate with these changes. We have now included the data analysis on nuclear morphology in revised Fig. S10. We agree that deeper biological interpretation of the observed changes in Hoechst signal upon dox treatment (or other cellular stressors such as H2O2) using HiExM and whether these changes are correlated with DNA damage or other cellular alterations remains an exciting future direction to develop a more sensitive platform for assessing drug responses.

      For expanded samples, we performed the doxorubicin titration assay across four different well plates (n=4). For each condition, the total number of nuclei measured was 118, 111, 110, 113, and 77 for DMSO, 1nM, 10nM, 100nM, and 1µM, respectively. We apologize for the confusion with respect to the number of replicates and cells analyzed. For SEM calculations, we used the number of independent experiments to avoid underestimating error. 

      (2) In Figure 2b, do the orange arrows indicate the same cell with a unique shape in both the pre- and post-expansion images? Additionally, in Figure 3b, why do the pre- and post-expansion nuclei exhibit such different global shapes? Considering that the gel may freely rotate within the well during expansion, it raises doubts about whether one can identify cells with consistent shapes in both the pre- and post-expansion images. Furthermore, this reviewer observed a similar issue regarding reproducibility among different well plates, as shown in Figure 2h. The panel illustrates that different plates yielded distinct populations of gel sizes. The expansion factors provided in the figure legend (page 13) ranged from 3.5x to 5.1x across gels, indicating a relatively large variation in expansion size. What is the reason behind these variations, and how can they be minimized? These variations could become critical when considering large-scale screening across multiple plates.

      The orange arrow is intended to indicate the same cell with a unique shape in both the pre- and post-expansion images, albeit at a different orientation given that the gel is not fixed within the well. We agree that improved methods to identify the same cells pre- and post-expansion could facilitate error measurements. We have referenced recent methods that could be combined with HiExM to automate and improve error and distortion detection to the discussion of the revised manuscript. 

      Fig. 2 illustrates the ability of HiExM to achieve reproducible gel formation with minimal error within gels, wells, and across plates, measurements consistent with proExM. While uniform within gels, the expansion factor is somewhat variable between gels and plates. We attribute these differences primarily to the small size of the gels, making them vulnerable to the effects of evaporation between experiments. We note this variability should be taken into consideration for studies where absolute length measurements between plates are important for biological interpretation. Future iterations of the platform that allow precise delivery of gel volumes and that minimizes environmental exposure are expected to improve the expansion factor reproducibility across plates to further enable the use of HiExM as a tool for high-throughput nanoscale imaging.

      Minor:

      (1) Considering the signal loss due to photobleaching and fluorophore dilution during expansion, protein imaging may occasionally lack the sensitivity required to detect subtle morphological changes in cellular machinery. This potential limitation should be addressed or discussed in the text.

      A sentence reflecting this point has been added to the manuscript.

      (2) On page 15, the figure legend for panel d states, "Heatmaps of nuclei in b showing..." However, it appears that the panel referred to in this sentence corresponds to panel c.

      The typo has been fixed.

      (3) The type of glass 96-well plate utilized in this study should be specified, as the quality of the product could impact the expansion results.

      The supplier and product number of the well plate used in our study has been added to the methods section.

      (4) In Figure S3, the raw pixel values of CF305 dye are exceptionally low. Is there a specific reason for the very low signals observed when using this dye?

      CF® 350 (305 was a typo) does not excite well at 405 nm, which is the excitation wavelength for the channel we used.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      Understanding large-scale neural activity remains a formidable challenge in neuroscience. While several methods have been proposed to discover the assemblies from such large-scale recordings, most previous studies do not explicitly model the temporal dynamics. This study is an attempt to uncover the temporal dynamics of assemblies using a tool that has been established in other domains.

      The authors previously introduced the compositional Restricted Boltzmann Machine (cRBM) to identify neuron assemblies in zebrafish brain activity. Building upon this, they now employ the Recurrent Temporal Restricted Boltzmann Machine (RTRBM) to elucidate the temporal dynamics within these assemblies. By introducing recurrent connections between hidden units, RTRBM could retrieve neural assemblies and their temporal dynamics from simulated and zebrafish brain data.

      Strengths:

      The RTRBM has been previously used in other domains. Training in the model has been already established. This study is an application of such a model to neuroscience. Overall, the paper is well-structured and the methodology is robust, the analysis is solid to support the authors' claim.

      Weaknesses:

      The overall degree of advance is very limited. The performance improvement by RTRBM compared to their cRBM is marginal, and insights into assembly dynamics are limited.

      (1) The biological insights from this method are constrained. Though the aim is to unravel neural ensemble dynamics, the paper lacks in-depth discussion on how this method enhances our understanding of zebrafish neural dynamics. For example, the dynamics of assemblies can be analyzed using various tools such as dimensionality reduction methods once we have identified them using cRBM. What information can we gain by knowing the effective recurrent connection between them? It would be more convincing to show this in real data.

      See below in the recommendations section.

      (2) Despite the increased complexity of RTRBM over cRBM, performance improvement is minimal. Accuracy enhancements, less than 1% in synthetic and zebrafish data, are underwhelming (Figure 2G and Figure 4B). Predictive performance evaluation on real neural activity would enhance model assessment. Including predicted and measured neural activity traces could aid readers in evaluating model efficacy.

      See below in the recommendations section.

      Recommendations:

      (1) The biological insights from this method are constrained. Though the aim is to unravel neural ensemble dynamics, the paper lacks in-depth discussion on how this method enhances our understanding of zebrafish neural dynamics. For example, the dynamics of assemblies can be analyzed using various tools such as dimensionality reduction methods once we have identified them using cRBM. What information can we gain by knowing the effective recurrent connection between them? It would be more convincing to show this in real data.

      We agree with the reviewer that our analysis does not explore the data far enough to reach the level of new biological insights. For practical reasons unrelated to the science, we cannot further explore the data in this direction at this point, however, funding permitting, we will pick up this question at a later stage. The only change we have made to the corresponding figure at the current stage was to adapt the thresholds, which better emphasizes the locality of the resulting clusters.

      (2) Despite the increased complexity of RTRBM over cRBM, performance improvement is minimal. Accuracy enhancements, less than 1% in synthetic and zebrafish data, are underwhelming (Figure 2G and Figure 4B). Predictive performance evaluation on real neural activity would enhance model assessment. Including predicted and measured neural activity traces could aid readers in evaluating model efficacy.

      We thank the reviewer kindly for the comments on the performance comparison between the two models. We would like to highlight that the small range of accuracy values for the predictive performance is due to both the sparsity and stochasticity of the simulated data, and is not reflective of the actual percentage in performance improvement. To this end, we have opted to use a rescaled metric that we call the normalised Mean Squared Error (nMSE), where the MSE is equal to 1 minus the accuracy, as the visible units take on binary values. This metric is also more in line with the normalised Log-Likelihood (nLLH) metric used in the cRBM paper in terms of interpretability. The figure shows that the RTRBM can significantly predict the state of the visible units in subsequent time-steps, whereas the cRBM captures the correct time-independent statistics but has no predictive power over time.

      We also thank the reviewer for pointing out that there is no predictive performance evaluation on the neural data. This has been chosen to be omitted for two reasons. First, it is clear from Fig. 2 that the (c)RBM has no temporal dependencies, meaning that the predictive performance is determined mostly by the average activity of the visible units. If this corresponds well with the actual mean activity per neuron, the nMSE will be around 0. This correspondence is already evaluated in the first panel of 3F. Second, as this is real data, we can not make an estimate of a lower bound on the MSE that is due to neural noise. Because of this, the scale of the predictive performance score will be arbitrary, making it difficult to quantitatively assess the difference in performance between both models.

      (3) The interpretation of the hidden real variable $r_t$ lacks clarity. Initially interpreted as the expectation of $\mathbf{h}_t$, its interpretation in Eq (8) appears different. Clarification on this link is warranted.

      We thank the reviewer kindly for the suggested clarification. However, we think the link between both values should already be sufficiently clear from the text in lines 469-470:

      “Importantly, instead of using binary hidden unit states 𝐡[𝑡−1], sampled from the expected real valued hidden states 𝐫[𝑡−1], the RTRBM propagates these real-valued hidden unit states directly.”

      In other words, both indeed are the same, one could sample a binary-valued 𝐡[𝑡-1] from the real-valued 𝐫[𝑡-1] through e.g. a Bernoulli distribution, where 𝐫[𝑡-1] would thus indeed act as an expectation over 𝐡[𝑡−1]. However, the RTRBM formulation keeps the real-valued 𝐫[𝑡-1] to propagate the hidden-unit states to the next time-step. The motivation for this choice is further discussed in the original RTRBM paper (Sutskever et al. 2008).

      (4) In Figure 3 panel F, the discrepancy in x-axis scales between upper and lower panels requires clarification. Explanation regarding the difference and interpretation guidelines would enhance understanding.

      Thank you for pointing out the discrepancy in x-axis scales between the upper and lower panels of Figure 3F. The reason why these scales are different is that the activation functions in the two models differ in their range, and showing them on the same scale would not do justice to this difference. But we agree that this could be unclear for readers. Therefore we added an additional clarification for this discrepancy in line 215:

      “While a direct comparison of the hidden unit activations between the cRBM and the RTRBM is hindered by the inherent discrepancy in their activation functions (unbounded and bounded, respectively), the analysis of time-shifted moments reveals a stronger correlation for the RTRBM hidden units ($r_s = 0.92$, $p<\epsilon$) compared to the cRBM ($r_s = 0.88$, $p<\epsilon$)”

      (5) Assessing model performance at various down-sampling rates in zebrafish data analysis would provide insights into model robustness.

      We agree that we would have liked to assess this point in real data, to verify that this holds as well in the case of the zebrafish whole-brain data. The main reason why we did not choose to do this in this case is that we would only be able to further downsample the data. Current whole brain data sets are collected at a few Hz (here 4 Hz, only 2 Hz in other datasets), which we consider to be likely slower than the actual interaction speed in neural systems, which is on the order of milliseconds between neurons, and on the order of ~100 ms (~10 Hz) between assemblies. Therefore reducing the rate further, we expect to only see a reduction in quality, which we considered less interesting than finding an optimum. Higher rates of imaging in light-sheet imaging are only achievable currently by imaging only single planes (which defies the goal of whole brain recordings), but may be possible in the future when the limiting factors (focal plane stepping and imaging) are addressed. For completeness, we have now performed the downstepping for the experimental data, which showed the expected decrease in performance. The results have been integrated into Figure 4.

      Reviewer #2 (Public Review):

      Summary:

      In this work, the authors propose an extension to some of the last author's previous work, where a compositional restricted Boltzmann machine was considered as a generative model of neuron-assembly interaction. They augment this model by recurrent connections between the Boltzmann machine's hidden units, which allow them to explicitly account for temporal dynamics of the assembly activity. Since their model formulation does not allow the training towards a compositional phase (as in the previous model), they employ a transfer learning approach according to which they initialise their model with a weight matrix that was pre-trained using the earlier model so as to essentially start the actually training in a compositional phase. Finally, they test this model on synthetic and actual data of whole-brain light-sheet-microscopy recordings of spontaneous activity from the brain of larval zebrafish.

      Strengths:

      This work introduces a new model for neural assembly activity. Importantly, being able to capture temporal assembly dynamics is an interesting feature that goes beyond many existing models. While this work clearly focuses on the method (or the model) itself, it opens up an avenue for experimental research where it will be interesting to see if one can obtain any biologically meaningful insights considering these temporal dynamics when one is able to, for instance, relate them to development or behaviour.

      Weaknesses:

      For most of the work, the authors present their RTRBM model as an improvement over the earlier cRBM model. Yet, when considering synthetic data, they actually seem to compare with a "standard" RBM model. This seems odd considering the overall narrative, and it is not clear why they chose to do that. Also, in that case, was the RTRBM model initialised with the cRBM weight matrix?

      Thank you for raising the important point regarding the RTRBM comparison in the synthetic data section. Initially, we aimed to compare the performance of the cRBM with the cRTRBM. However, we encountered significant challenges in getting the RTRBM to reach the compositional phase. To ensure a fair and robust comparison, we opted to compare the RBM with the RTRBM.

      A few claims made throughout the work are slightly too enthusiastic and not really supported by the data shown. For instance, when the authors refer to the clusters shown in Figure 3D as "spatially localized", this seems like a stretch, specifically in view of clusters 1, 3, and 4.

      Thanks for pointing out this inaccuracy. When going back to the data/analyses to address the question about locality, we stumbled upon a minor bug in the implementation of the proportional thresholding, causing the threshold to be too low and therefore too many neurons to be considered.

      Fixing this bug reduces the number of neurons, thereby better showing the local structure of the clusters. Furthermore, if one would lower the threshold within the hierarchical clustering, smaller, and more localized, clusters would appear. We deliberately chose to keep this threshold high to not overwhelm the reader with the number of identified clusters. We hope the reviewer agrees with these changes and that the spatial structure in the clusters presented are indeed rather localized.

      Moreover, when they describe the predictive performance of their model as "close to optimal" when the down-sampling factor coincided with the interaction time scale, it seems a bit exaggerated given that it was more or less as close to the upper bound as it was to the lower bound.

      We thank the reviewer for catching this error. Indeed, the best performing model does not lay very close to the estimated performance of an optimal model. The text has been updated to reflect this.

      When discussing the data statistics, the authors quote correlation values in the main text. However, these do not match the correlation values in the figure to which they seem to belong. Now, it seems that in the main text, they consider the Pearson correlation, whereas in the corresponding figure, it is the Spearman correlation. This is very confusing, and it is not really clear as to why the authors chose to do so.

      Thank you for identifying the discrepancy between the correlation values mentioned in the text and those presented in the figure. We updated the manuscript to match the correlation coefficient values in the figure with the correct values denoted in the text.

      Finally, when discussing the fact that the RTRBM model outperforms the cRBM model, the authors state it does so for different moments and in different numbers of cases (fish). It would be very interesting to know whether these are the same fish or always different fish.

      Thank you for pointing this out. Keeping track of the same fish across the different metrics makes sense. We updated the figure to include a color code for each individual fish. As it turns out each time the same fish are significantly better performing.

      Recommendations:

      Figure 1: While the schematic in A and D only shows 11 visible units ("neurons"), the weight matrices and the activity rasters in B and C and E and F suggest that there should be, in fact, 12 visible units. While not essential, I think it would be nice if these numbers would match up.

      Thank you for pointing out the inconsistency in the number of visible units depicted in Figure 1. We agree that this could have been confusing for readers. The figure has been updated accordingly. As you suggested, the schematic representation now accurately reflects the presence of 12 visible units in both the RBM and RTRBM models.

      Figure 3: Panel G is not referenced in the main text. Yet, I believe it should be somewhere in lines 225ff.

      Thank you for mentioning this. We added in line 233 a reference to figure 3 panel G to refer to the performance of the cRBM and RTRBM on the different fish.

      Line 637ff: The authors consider moments <v\_i h\_μ> and <v\_i h\_j>, and from the context, it seems they are not the same. However, it is not clear as to why because, judging from the notation, they should be the same.

      The second-order statistic <v\_i h\_j> on line 639 was indeed already mentioned and denoted as <v\_i h\_μ> on line 638. It has now been removed accordingly in the updated manuscript.

      I found the usage of U^ and U throughout the manuscript a bit confusing. As far as I understand, U^ is a learned representation of U. However, maybe the authors could make the distinction clearer.

      We understand the usage of Û and U throughout the text may be confusing for the reader. However, we would like to notify the reviewer that the distinction between these two variables is explained in line 142: “in addition to providing a close estimate (̂Û) to the true assembly connectivity matrix U”. However, for added clarification to the reader, we added additional mentions of the estimated nature of Û throughout the text in the updated manuscript.

      Equation 3: It would be great if the authors could provide some more explanation of how they arrived at the identities.

      These identities have previously been widely described in literature. For this reason, we decided not to include their derivation in our manuscript. However, for completeness, we kindly refer to:

      Goodfellow, I., Bengio, Y., & Courville, A. (2016). Chapter 20: Deep generative models [In Deep Learning]. MIT Press. https://www.deeplearningbook.org/contents/generative_models.html

      Typos:

      -  L. 196: "connectiivty" -> "connectivity"

      -  L. 197: Does it mean to say "very strong stronger"?

      -  L. 339: The reference to Dunn et al. (2016) should appear in parentheses.

      -  L. 504f: The colon should probably be followed by a full sentence.

      -  Eq. 2: In the first line, the potential V still appears, which should probably be changed to show the concrete form (-b * h) as in the second line.

      -  L. 351: Is there maybe a comma missing after "cRBM"?

      -  L. 271: Instead of "correlation", shouldn't it rather be "similarity"? - L. 218: "Figure 3D" -> "Figure 3F"

      We thank the reviewer for pointing out these typos, which have all (except one) been fixed in the text. We do emphasize the potential V to show that there are alternative hidden unit potentials that can be chosen. For instance, the cRBM utilizes dReLu hidden unit potentials.

      Reviewer #3 (Public Review):

      With ever-growing datasets, it becomes more challenging to extract useful information from such a large amount of data. For that, developing better dimensionality reduction/clustering methods can be very important to make sense of analyzed data. This is especially true for neuroscience where new experimental advances allow the recording of an unprecedented number of neurons. Here the authors make a step to help with neuronal analyses by proposing a new method to identify groups of neurons with similar activity dynamics. I did not notice any obvious problems with data analyses here, however, the presented manuscript has a few weaknesses:

      (1) Because this manuscript is written as an extension of previous work by the same authors (van der Plas et al., eLife, 2023), thus to fully understand this paper it is required to read first the previous paper, as authors often refer to their previous work for details. Similarly, to understand the functional significance of identified here neuronal assemblies, it is needed to go to look at the previous paper.

      We agree that the present Research Advance has been written in a way that builds on our previous publication. It was our impression that this was the intention of the Research Advance format, as spelled out in its announcement "eLife has introduced an innovative new type of article – the Research Advance – that invites the authors of any eLife paper to present significant additions to their original research". In the previous formatting guidelines from eLife this was more evident with a strong limitation on the number of figures and words, however, also for the present, more liberal guidelines, place an emphasis on the relation to the previous article. We have nonetheless tried in several places to fill in details that might simplify the reading experience.

      (2) The problem of discovering clusters in data with temporal dynamics is not unique to neuroscience. Therefore, the authors should also discuss other previously proposed methods and how they compare to the presented here RTRBM method. Similarly, there are other methods using neural networks for discovering clusters (assemblies) (e.g. t-SNE: van der Maaten & Hinton 2008, Hippocluster: Chalmers et al. 2023, etc), which should be discussed to give better background information for the readers.

      The clustering methods suggested by the reviewer do not include modeling any time dependence, which is the crucial advance presented here by the introduction of the RTRBM, in extending the (c)RBM. In our previous publication on the cRBM (an der Plas et al., eLife, 2023), this comparison was part of the discussion, although it focussed on a different set of methods. While clustering methods like t-SNE, UMAP and others certainly have their value in scientific analysis, we think it might be misleading the reader to think that they achieve the same task as an RTRBM, which adds the crucial dimension of temporal dependence.

      (3) The above point to better describe other methods is especially important because the performance of the presented here method is not that much better than previous work. For example, RTRBM outperforms the cRBM only on ~4 out of 8 fish datasets. Moreover, as the authors nicely described in the Limitations section this method currently can only work on a single time scale and clusters have to be estimated first with the previous cRBM method. Thus, having an overview of other methods which could be used for similar analyses would be helpful.

      We think that the perception that the RTRBM performs only slightly better is based on a misinterpretation of the performance measure, which we have tried to address (see comments above) in this rebuttal and the manuscript. In addition we would like to emphasize that the structural estimation (which is still modified by the RTRBM, only seeded by the cRBMs output), as shown in the simulated data, makes improved structural estimates, which is important, even in cases where the performance is comparable (which can be the case if the RBM absorbs temporal dependencies of assemblies into modified structure of assemblies). We have clarified this now in the discussion.

      Recommendations:

      (1) Line 181: it is not explained how a reconstruction error is defined.

      Dear reviewer, thanks for pointing this out. A definition of the (mean square) reconstruction error is added in this line.

      (2) How was the number of hidden neurons chosen and how does it affect performance?

      Thank you for pointing this out. Due to the fact that we use transfer learning, the number of hidden units used for the RTRBM is given by the number of hidden units used for training the cRBM. In further research, when the RTRBM operates in the compositional phase, we can exploit a grid search over a set of hyper parameters to determine the optimal set of hidden units and other parameters.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife Assessment 

      This study is a detailed investigation of how chromatin structure influences replication origin function in yeast ribosomal DNA, with focus on the role of the histone deacetylase Sir2 and the chromatin remodeler Fun30. Convincing evidence shows that Sir2 does not affect origin licensing but rather affects local transcription and nucleosome positioning which correlates with increased origin firing. However, the evidence remains incomplete as the methods employed do not rigorously establish a key aspect of the mechanism, fully address some alternative models, or sufficiently relate to prior results. Overall, this is a valuable advance for the field that could be improved to establish a more robust paradigm. 

      We have added extensive new results to the manuscript that, we believe, address all three criticisms above, namely that the methods employed do not (1) rigorously establish a key aspect of the mechanism; (2) fully address some alternative models; or (3) sufficiently relate to prior results.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      This paper presents a mechanistic study of rDNA origin regulation in yeast by SIR2. Each of the ~180 tandemly repeated rDNA gene copies contains a potential replication origin. Earlyefficient initiation of these origins is suppressed by Sir2, reducing competition with origins distributed throughout the genome for rate-limiting initiation factors. Previous studies by these authors showed that SIR2 deletion advances replication timing of rDNA origins by a complex mechanism of transcriptional de-repression of a local PolII promoter causing licensed origin proteins (MCMcomplexes) to re-localize (slide along the DNA) to a different (and altered) chromatin environment. In this study, they identify a chromatin remodeler, FUN30, that suppresses the sir2∆ effect, and remarkably, results in a contraction of the rDNA to about onequarter it's normal length/number of repeats, implicating replication defects of the rDNA. Through examination of replication timing, MCM occupancy and nucleosome occupancy on the chromatin in sir2, fun30, and double mutants, they propose a model where nucleosome position relative to the licensed origin (MCM complexes) intrinsically determines origin timing/efficiency. While their interpretations of the data are largely reasonable and can be interpreted to support their model, a key weakness is the connection between Mcm ChEC signal disappearance and origin firing.  

      Criticism: The reviewer expressed concern about the connection between Mcm ChEC signal disappearance and origin firing.

      To further support our claim that the disappearance of the MCM signal in our ChEC datasets reflects origin firing, we now present additional data using the well-established method of MCM Chromatin IP (ChIP).

      (1) New Supporting Evidence:  ChIP at genome-wide origins. In Figure 5 figure supplement 2, we demonstrate that the Mcm2 ChIP signal in cells released into hydroxyurea (HU) is significantly reduced at early origins compared to late origins, which mirrors the pattern observed with the MCM2 ChEC signal. This reduction in the ChIP signal at early origins supports the interpretation that the MCM signal disappearance is associated with origin firing.

      (2) New supporting based evidence:  ChIP at rDNA Origins. Our ChIP analysis also shows that the disappearance of the MCM signal at rDNA origins in sir2Δ cells released into HU is accompanied by signal accumulation at the replication fork barrier (RFB), indicative of stalled replication forks at this location (Figure 5 figure supplement 3). This pattern is consistent with the initiation of replication at these origins and fork stalling at the RFB.

      (3) New supporting evidence:  2D gels with quantification. Furthermore, additional 2D gel electrophoresis results provide ample independent evidence of rDNA origin firing in HU in sir2Δ mutants and suppression of origin firing in sir2 fun30 cells. These new data include 1) quantification of 2D gels in Figure 4D and 2) new 2D gels presented in Figure 4C as described below in greater detail. Collectively, these results demonstrate that rDNA origins fire prematurely in HU in sir2 cells and that firing is suppressed by FUN30 deletion. These additional data reinforce our model and support the association between MCM signal disappearance and replication initiation.

      While the cyclical chromatin association-dissociation of MCM proteins with potential origin sequences may be generally interpreted as licensing followed by firing, dissociation may also result from passive replication and as shown here, displacement by transcription and/or chromatin remodeling.

      The reviewer raised a concern that the cyclical chromatin association-dissociation of MCM proteins could be interpreted as licensing followed by firing, but might also result from passive replication or displacement by transcription and chromatin remodeling.

      Addressing Alternative Explanations:

      (1) Selective Disappearance of MCM Complexes: While transcription and passive replication can indeed cause the MCM-ChEC signal to disappear, these processes cannot selectively cause the disappearance of the displaced MCM complex without also affecting the non-displaced MCM complex. Specifically, RNA polymerase transcribing C-pro would first need to dislodge the normally positioned MCM complex before reaching the displaced complex, which is not observed in our data.

      (2) Role of FUN30 Deletion:  FUN30 deletion results in increased C-pro transcription and reduced disappearance of the displaced MCM complex. This observation supports our model, as transcription alone would not selectively affect the displaced MCM complex while leaving the normally positioned MCM complex unaffected.

      (3) Licensing Restrictions: It is crucial to note that continuous replenishment of displaced MCMs with newly loaded MCMs is not possible in our experimental conditions, as the cells are in S phase and licensing is restricted to G1. This temporal restriction further supports our interpretation that the disappearance of the MCM signal reflects origin firing rather than alternative processes.

      In summary, while alternative explanations such as transcription and passive replication could potentially account for MCM signal disappearance, our data indicate that these processes cannot selectively affect the displaced MCM complex without impacting the non-displaced complex. The selective disappearance observed in our experiments, along with the effects of FUN30 deletion and the temporal constraints on MCM loading, strongly support our interpretation that the disappearance of the MCM signal reflects origin firing.

      Moreover, linking its disappearance from chromatin in the ChEC method with such precise resolution needs to be validated against an independent method to determine the initiation site(s). Differences in rDNA copy number and relative transcription levels also are not directly accounted for, obscuring a clearer interpretation of the results. 

      The reviewer raised concerns about the need to validate the disappearance of MCM from chromatin observed using the ChEC method against an independent method to determine initiation sites. Additionally, they pointed out that differences in rDNA copy number and relative transcription levels are not directly accounted for, which may obscure the interpretation of the results.

      (1) Reduced rDNA Copy Number promotes Early Replication: Copy number reduction of the magnitude caused by deletion of both SIR2 and FUN30 is not expected to suppress early rDNA replication in sir2, but rather to exacerbate it. Specifically, deletion of SIR2 and FUN30 causes the rDNA to shrink to approximately 35 copies. Kwan et al., 2023 (PMID: 36842087) have shown that a reduction in rDNA copy number to 35 copies results in a dramatic acceleration of rDNA replication in a SIR2+ strain. Therefore, the effect of rDNA size on replication timing reinforces our conclusion that deletion of FUN30 suppresses rDNA replication.

      (2) New 2D Gels in sir2 and sir2 fun30 strains with equal number of rDNA repeats: To directly address the concern regarding differences in the number of rDNA repeats, we have included new 2D gel analyses in the revised manuscript. By using a fob1

      background, we were able to equalize the repeat number between the sir2 and sir2 fun30 strains (Figure 4E). The 2D gels conclusively show that the suppression of rDNA origin firing upon FUN30 deletion is independent of both rDNA size and FOB1.

      Nevertheless, this paper makes a valuable advance with the finding of Fun30 involvement, which substantially reduces rDNA repeat number in sir2∆ background. The model they develop is compelling and I am inclined to agree, but I think the evidence on this specific point is purely correlative and a better method is needed to address the initiation site question. The authors deserve credit for their efforts to elucidate our obscure understanding of the intricacies of chromatin regulation. At a minimum, I suggest their conclusions on these points of concern should be softened and caveats discussed. Statistical analysis is lacking for some claims. 

      Strengths are the identification of FUN30 as suppressor, examination of specific mutants of FUN30 to distinguish likely functional involvement. Use of multiple methods to analyze replication and protein occupancies on chromatin. Development of a coherent model. 

      Weaknesses are failure to address copy number as a variable; insufficient validation of ChEC method relationship to exact initiation locus; lack of statistical analysis in some cases. 

      With regard to "insufficient validation of ChEC method relationship to exact initiation locus":  The two potential initiation sites that one would monitor (non-displaced and displaced) are separated by less than 150 base pairs, and other techniques simply do not have the resolution necessary to distinguish such differences. Indeed, our new ChIP results presented in Figure 5 figure supplement 3 clearly demonstrate that while the resolution of ChIP is adequate to detect the reduction of MCM signal at the replication initiation site and its relocation to the RFB ( ~2 kb away), it lacks the resolution required to differentiate closely spaced MCM complexes.

      Furthermore, as we suggest in the manuscript, our results are consistent with a model in which it is only the displaced MCM complex that is activated, whether in sir2 or WT.  If no genotypedependent difference in initiation sites is even expected, it would be hard to interpret even the most precise replication-based assays.  

      We appreciate the reviewer pointing out that some statistical analyses were lacking: we have added statistical analysis for 2D gels (Figures 4D and 4E),  EdU incorporation experiments in Figure 4F and disappearance of MCM ChEC and ChIP signal upon release of cells into HU (Figure 5 supplement 1 and Supplement 2).  

      Additional background and discussion for public review: 

      This paper broadly addresses the mechanism(s) that regulate replication origin firing in different chromatin contexts. The rDNA origin is present in each of ~180 tandem repeats of the rDNA sequence, representing a high potential origin density per length of DNA (9.1kb repeat unit). However, the average origin efficiency of rDNA origins is relatively low (~20% in wild-type cells), which reduces the replication load on the overall genome by reducing competition with origins throughout the genome for limiting replication initiation factors. Deletion of histone deacetylase SIR2, which silences PolII transcription within the rDNA, results in increased early activation or the rDNA origins (and reduced rate of overall genome replication). Previous work by the authors showed that MCM complexes loaded onto the rDNA origins (origin licensing) were laterally displaced (sliding) along the rDNA, away from a well-positioned nucleosome on one side. The authors' major hypothesis throughout this work is that the new MCM location(s) are intrinsically more efficient configurations for origin firing. The authors identify a chromatin remodeling enzyme, FUN30, whose deletion appears to suppress the earlier activation of rDNA origins in sir2∆ cells. Indeed, it appears that the reduction of rDNA origin activity in sir2∆ fun30∆ cells is severe enough to results in a substantial reduction in the rDNA array repeat length (number of repeats); the reduced rDNA length presumably facilitates it's more stable replication and maintenance. 

      Analysis of replication by 2D gels is marginally convincing, using 2D gels for this purpose is very challenging and tricky to quantify. 

      We address this criticism by carefuly quantifying 2 D gel results using single rARS signal for normalizing bubble arc as discussed below.

      The more quantitative analysis by EdU incorporation is more convincing of the suppression of the earlier replication caused by SIR2 deletion. 

      We have also added quantification of EdU results to strengthen our arguments.  

      To address the mechanism of suppression, they analyze MCM positioning using ChEC, which in G1 cells shows partial displacement of MCM from normal position A to positions B and C in sir2∆ cells and similar but more complete displacement away from A to positions B and C in sir2fun30 cells. During S-phase in the presence of hydroxyurea, which slows replication progression considerably (and blocks later origin firing) MCM signals redistribute, which is interpreted to represent origin firing and bidirectional movement of MCMs (only one direction is shown), some of which accumulate near the replication fork barrier, consistent with their interpretation. They observe that MCMs displaced (in G1) to sites B or C in sir2∆ cells, disappear more rapidly during S-phase, whereas the similar dynamic is not observed in sir2∆fun30∆. This is the main basis for their conclusion that the B and C sites are more permissive than A. While this may be the simplest interpretation, there are limitations with this assay that undermine a rigorous conclusion (additional points below). The main problem is that we know the MCM complexes are mobile so disappearance may reflect displacement by other means including transcription which is high is the sir2∆ background. Indeed, the double mutant has greater level of transcription per repeat unit which might explain more displaced from A in G1. Thus, displacement might not always represent origin firing. Because the sir2 background profoundly changes transcription, and the double mutant has a much smaller array length associated with higher transcription, how can we rule out greater accessibility at site A, for example in sir2∆, leading to more firing, which is suppressed in sir2 fun30 due to greater MCM displacement away from A? 

      I think the critical missing data to solidly support their conclusions is a definitive determination of the site(s) of initiation using a more direct method, such as strand specific sequencing of EdU or nascent strand analysis. More direct comparisons of the strains with lower copy number to rule out this facet. As discussed in detail below, copy number reduction is known to suppress at least part of the sir2∆ effect so this looms over the interpretations. I think they are probably correct in their overall model based on the simplest interpretation of the data but I think it remains to be rigorously established. I think they should soften their conclusions in this respect. 

      Please see discussion below about these issues.

      Reviewer #2 (Public Review): 

      Summary: 

      In this manuscript, the authors follow up on their previous work showing that in the absence of the Sir2 deacetylase the MCM replicative helicase at the rDNA spacer region is repositioned to a region of low nucleosome occupancy. Here they show that the repositioned displaced MCMs have increased firing propensity relative to non-displaced MCMs. In addition, they show that activation of the repositioned MCMs and low nucleosome occupancy in the adjacent region depend on the chromatin remodeling activity of Fun30. 

      Strengths: 

      The paper provides new information on the role of a conserved chromatin remodeling protein in the regulation of origin firing and in addition provides evidence that not all loaded MCMs fire and that origin firing is regulated at a step downstream of MCM loading. 

      Weaknesses: 

      The relationship between the author's results and prior work on the role of Sir2 (and Fob1) in regulation of rDNA recombination and copy number maintenance is not explored, making it difficult to place the results in a broader context. Sir2 has previously been shown to be recruited by Fob1, which is also required for DSB formation and recombination-mediated changes in rDNA copy number. Are the changes that the authors observe specifically in fun30 sir2 cells related to this pathway? Is Fob1 required for the reduced rDNA copy number in fun30 sir2 double mutant cells? 

      We have conducted additional studies in the fob1 background to address how FOB1 and the replication fork barrier (RFB) influence the kinetics of rDNA size reduction upon FUN30 deletion (Figure 2 - figure supplement 2), rDNA replication timing (Figure 2 - figure supplement 3), and rDNA origin firing using 2D gels (Figure 4C).

      Strains lacking SIR2 exhibit unstable rDNA size, and FOB1 deletion stabilizes rDNA size in a sir2 background (and otherwise). Similarly, we found that FOB1 deletion influences the kinetics of rDNA size reduction in sir2 fun30 cells. Specifically, we were able to generate a fob1 sir2 fun30 strain with more than 150 copies. Nonetheless, and consistent with our model, this strain still exhibited delayed rDNA replication timing (Figure 2 - figure supplement 3), and its rDNA still shrank upon continuous culture (Figure 2 figure supplement 2). These results demonstrate that, although FOB1 affects the kinetics of rDNA size reduction in sir2 fun30 strains, the reduced rDNA array size or delayed replication timing upon FUN30 deletion size does not depend on FOB1.

      The use of the fob1 background allowed us to compare the activation of rDNA origins in sir2 and sir2 fun30 strains with equally short rDNA sizes. 2D gels demonstrate robust and reproducible suppression of rDNA origin activity upon deletion of FUN30 in sir2 fob1 strains with 35 rDNA copies (Figure 4C). These results indicate that the main effect we are interested in—FUN30-induced reduction in origin firing—is independent of both FOB1 and rDNA size.

      Our additional studies conclusively show that the FUN30-induced reduction in rDNA origin firing is independent of both FOB1 and rDNA size. These findings provide important insights into the mechanisms regulating rDNA copy number maintenance, placing our results within the broader context of existing knowledge on Sir2 and Fob1 functions.

      Reviewer #3 (Public Review): 

      Summary: 

      Heterochromatin is characterized by low transcription activity and late replication timing, both dependent on the NAD-dependent protein deacetylase Sir2, the founding member of the sirtuins. This manuscript addresses the mechanism by which Sir2 delays replication timing at the rDNA in budding yeast. Previous work from the same laboratory (Foss et al. PLoS Genetics 15, e1008138) showed that Sir2 represses transcription-dependent displacement of the Mcm helicase in the rDNA. In this manuscript, the authors show convincingly that the repositioned Mcms fire earlier and that this early firing partly depends on the ATPase activity of the nucleosome remodeler Fun30. Using read-depth analysis of sorted G1/S cells, fun30 was the only chromatin remodeler mutant that somewhat delayed replication timing in sir2 mutants, while nhp10, chd1, isw1, htl1, swr1, isw2, and irc3 had not effect. The conclusion was corroborated with orthogonal assays including two-dimensional gel electrophoresis and analysis of EdU incorporation at early origins. Using an insightful analysis with an Mcm-MNase fusion (Mcm-ChEC), the authors show that the repositioned Mcms in sir2 mutants fire earlier than the Mcm at the normal position in wild type. This early firing at the repositioned Mcms is partially suppressed by Fun30. In addition, the authors show Fun30 affects nucleosome occupancy at the sites of the repositioned Mcm, providing a plausible mechanism for the effect of Fun30 on Mcm firing at that position. However, the results from the MNAse-seq and ChEC-seq assays are not fully congruent for the fun30 single mutant. Overall, the results support the conclusions providing a much better mechanistic understanding how Sir2 affects replication timing at rDNA, 

      The observation that the MNase-seq plot in fun30 mutant shows a large signal at the +3 nucleosome and somewhat smaller at position +2, while the ChEC-seq plot exhibits negligible signals, is indeed an important point of consideration. This discrepancy arises because most of the MCM in fun30 mutant remains at its original site where it abuts +1 nucleosome. As a result, the MCM-MNase fusion protein fails to reach and “light up” the +3 nucleosome, which is, nonetheless, well-visualized with exogenous MNase.  The paucity of displaced MCMs, which is responsible for cutting +2 nucleosome, explains the discrepancy in the +2 nucleosome signal between exogenous MNase and CheC datasets in the fun30 mutant.  

      Despite this apparent discrepancy, the overall results support our conclusions and provide a much better mechanistic understanding of how Sir2 affects replication timing at rDNA. The MNaseseq data reflect nucleosome positioning and chromatin structure, while the ChEC-seq data specifically highlights the locations where MCM is bound and active.  

      Strengths 

      (1) The data clearly show that the repositioned Mcm helicase fires earlier than the Mcm in the wild type position. 

      (2) The study identifies a specific role for Fun30 in replication timing and an effect on nucleosome occupancy around the newly positioned Mcm helicase in sir2 cells. 

      Weaknesses 

      (1) It is unclear which strains were used in each experiment. 

      (2) The relevance of the fun30 phospho-site mutant (S20AS28A) is unclear. 

      We appreciate the reviewer pointing out places in which our manuscript omitted key pieces of information (items 1 and 3), we have included the strain numbers in our revision.  With regard to point 2, we had written:  

      Fun30 is also known to play a role in the DNA damage response; specifically, phosphorylation of Fun30 on S20 and S28 by CDK1 targets Fun30 to sites of DNA damage, where it promotes DNA resection (Chen et al. 2016; Bantele et al. 2017). To determine whether the replication phenotype that we observed might be a consequence of Fun30's role in the DNA damage response, we tested non-phosphorylatable mutants for the ability to suppress early replication of the rDNA in sir2; these mutations had no effect on the replication phenotype (Figure 2B), arguing against a primary role for Fun30 in DNA damage repair that somehow manifests itself in replication. 

      (3) For some experiments (Figs. 3, 4, 6) it is unclear whether the data are reproducible and the differences significant. Information about the number of independent experiments and quantitation is lacking. This affects the interpretation, as fun30 seems to affect the +3 nucleosome much more than let on in the description. 

      We have provided replicas and quantitation for the results in these figures.

      (Replica ChEC Southern blot with quantification (Figure 3 figure supplement 1), quantification and replicas for 2D gels in Figure 4 and replicas for nucleosome occupancy (Figure 6 supplement 1).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Fig. 3-Examination of MCM occupancy at the rDNA ARS region using a variation of ChEC.

      Presumably these are these G1-arrested cells but does not seem to be stated. Please confirm. 

      The 2D gels results are not very convincing of their conclusions. We are asked to compare bubble to fork arcs at 30 minutes, but this is not feasible. It is the author's job to quantify the data from multiple replicates, but none is given. After much careful examination, comparing the relative intensities of ascending bubble and Y-arcs, I think I can accept that 4A shows highest early efficiency for sir2 over WT and fun30, which are similar to each other, and lowest for sir2 fun30, at 60 and 90 min. 

      In the revision we provide a careful quantification of the 2D gels in Figure 4. For assessing rDNA origin activity, we normalized the bubble arc during the HU time course to a single rARS signal, that appears as large 24.4kb Nhe1I fragment originating from the  rightmost rDNA repeat (see Figures 4A and 4B). The description of the quantification in the text is provided below. 

      “Prior to separation on 2D gels, DNA was digested with NheI, which releases a 4.7 kb rARScontaining linear DNA fragment at the internal rDNA repeats (1N) and a much larger, 24.5 kb single-rARS-containing fragment originating from the rightmost repeat. In 2D gels, active origins generate replication bubble arc signals, whereas passive replication of an origin appears as a y-arc. Having a signal emanating from a single ARS-containing fragment simplifies the comparison of rDNA origin activity in strains with different numbers of rDNA repeats, such as in sir2 vs sir2 fun30 mutants. Origin activity is expressed as a ratio of the bubble to the single-ARS signal, effectively measuring the number of active rDNA origins per cell at a given time point. 

      As seen previously (Foss et al. 2019), deletion of SIR2 increased the number of activated rDNA origins, while deletion of FUN30 suppressed this effect. When analyzed in aggregate at 20, 30, 60 and 90 minutes following release into HU, the average number of activated rDNA origin activity in sir2 mutant was increased 6.3-fold compared to those in WT (5.0±2.3 in sir2 vs 0.8±0.4 in wt, p<0.05 by 2 tailed t-test), and the increased number was reduced upon FUN30 deletion (1.3±0.7 in sir2 fun30, p<0.05 by 2 tailed t-test vs sir2, NS for comparison to WT).”

      However, for part 4B, they state (p. 11) that deletion of FUN30 in a SIR2 background had no perceptible effect (on ARS305) but I think the data appear otherwise: the FUN30 cells show more Y-arc than WT.

      We now provide the assessment of ARS305 activity in HU cells as a ratio of bubble-arc to 1N signal. The reviewer is right that FUN30 has a more robust bubble arc signal compared to WT.

      However, after normalization to 1N this difference did not appear significant (3.7 vs 5.1). Overall the analysis of activity or ARS305 origins demonstrates a reciprocity with the activity of rDNA origins in each of the four genotypes.  Furthermore, this observation is confirmed in our EdU-based analysis of 111 genomic origins, with statistical analysis showing a very high level of significance (see below).  

      Ultimately, analysis of unsynchronized cells would give unambiguous results about origin efficiency. In this regard I note that analysis of rDNA origin firing by 2D gels with HU versus asynchronous gives different results in WT versus sir2∆, with no difference in unsynchronized cells (He et al. 2022). It would be interesting to test the strains here unsynchronized, though copy number size would still be a variable to address.

      Origin activity in log cultures is typically assessed by comparing replication initiation within an origin, presenting as a bubble arc, to passively replicated DNA (Y-arc). However, such an analysis at tandemly arrayed origins, such as rDNA, is not feasible, as both active and passive replication are the result of activation of the same origins. This explains the lack of difference between WT and sir2 cells previously reported (He et al. 2022), which we have also observed. Differences in activation of rDNA origins in WT vs sir2 cells is clearly reflected in HU experiments, as was the case in the earlier report (He et al. 2022). 

      To address the issue of differences in copy number between sir2 and sir2 fun30 cells we have now done experiments in a fob1 background where we can equalize the copy number among the two genotypes. These 2D gels are presented in Figure 4C. We address this issue in the revised manuscript as follows:

      “The overall impact of FUN30 deletion on rDNA origin activity in a sir2 background is expected to be a composite of two opposing effects: a suppression of rDNA origin activation and increased rDNA origin activation due to reduced rDNA size (Kwan et al. 2023). To evaluate the effect FUN30 on rDNA origin activation independently of rDNA size, we generated an isogenic set of strains in a fob1 background, all of which contain 35 copies of the rDNA repeat.  (Deletion of FOB1 is necessary to stabilize rDNA copy number.)  Comparing rDNA origin activity in sir2 versus sir2 fun30 genotypes, we observed a robust and reproducible reduction in rDNA origin activity upon FUN30 deletion. This finding confirms that the FUN30 suppresses rDNA origin firing in sir2 background independently of both rDNA size and FOB1 status.”

      -EdU analysis is more convincing regarding relative effects on genome versus rDNA, however, again, the effect of reduced rDNA array size in the sir2 fun30 cells may also be the proximal cause of the reduced effect on genome (early origins) replication rather than a direct effect on origin efficiency. No statistic provided to support that fun30 suppresses sir2 for rDNA activity. 

      This comment raises three distinct, but related, issues: 

      First, the reviewer is asking whether the reduced rDNA size, of the magnitude we observed in sir2 fun30 cells, could by itself be responsible for increased origin activity elsewhere in the genome, just because there is less rDNA that needs to be replicated. As noted earlier (Kwan et al. 2023), Kwan et al. examined the effect of rDNA size reduction and observed: 1) marked increased in rDNA origin activity and 2) reciprocal reduction in origin activity elsewhere in the genome. This counterintuitive finding suggests that a smaller rDNA size exerts more competition for limited replication resources compared to a larger rDNA size. In light of this, our findings with FUN30 deletion become even more compelling. The suppression of rDNA firing upon FUN30 deletion is so significant that it overrides the expected effects of rDNA size reduction.

      Second, the reviewer points out our lack of statistical analysis to support our contention that fun30 suppresses sir2 with regard to rDNA origin activity. We have now addressed this issue as well, by quantifying 2D gel signals, as described above in the text that begins with "Prior to separation on 2D gels, DNA was digested with NheI ...". 

      Third, we have now provided a statistical analysis to support our conclusion that EdU-based analysis of activity of 111 early origins shows suppression upon deletion of SIR2 that is largely reversed by additional deletion of FUN30. 

      "Deletion of FUN30 in a sir2 background partially restored EdU incorporation at early origins, concomitant with reduced EdU incorporation at rDNA origins. In particular, the median value of log10 of read depths at 111 early origins, as the data are shown in Figure 4F, dropped from 6.5 for wild type to 6.2 for sir2 but then returned almost to wild type levels (6.4) in sir2 fun30.  The p value obtained by Student's t test, comparing the drop in 111 origins from wild type to sir2 with that from wild type to sir2 fun30 was highly significant (<< 10-16)  In contrast, FUN30 deletion in the WT background did not reduce EdU incorporation at genomic origins (median 6.6). These findings highlight that FUN30 deletion-induced suppression of rDNA origins in sir2 is accompanied by the activation of genomic origins."

      Use loss of Mcm-ChEC signal as proxy for origin firing. Reasonably convincing that decrease correlates with origin firing on a one-to-one basis (Fig. 5B), though no statistic given. 

      We provide the statistical analysis in Figure 5-figure supplement 1.

      However, there is no demonstration of ability to observe this correlation with fine resolution as needed for the claims here. It seems equally possible that sir2 deletion causes more firing by repositioning MCMs to a better location or that the prior location, which still contains substantial MCM, becomes more permissive. The MCM signal appears to be mobile, so perhaps the role of FUN30 is to prevent to mobility of MCM away from the original site in WT cells; note that significantly less Mcm signal is at the original position in sir2 fun30. No accumulation of MCM occurs near the RFB in WT (and fun30) cells. I understand that origin firing is lower in WT but raises concerns about sensitivity and dynamic range of this assay and that MCM positions may reflect transcription versus replication. 

      Please see the section above labeled "Addressing Alternative Explanations".  

      Is Fig 6A Y-axis correctly labeled? I understand this figure to represent MNase-seq reads; is there any Mcm2-ChEC-seq in part A? 

      We have corrected the labeling. 6A represent MNase-seq reads. Thank you for pointing this out.

      I understand part B to represent nucleosome-sized fragments released by Mcm2-ChEC interpreted to be nucleosomes. But could they be large fragments potentially containing adjacent MCM-double hexamers?  

      Our representation of ChEC-seq data in Figure 1 supplement 1, where we can see the entire spectrum of fragment sizes, demonstrates two distinct populations of fragments: nucleosome size and MCM-size fragments.

      Reviewer #2 (Recommendations For The Authors): 

      Suggestions for the authors to consider: 

      (1) The authors make a good case for the importance of replication balance between rDNA and euchromatin in ensuring that the genome is replicated in a timely fashion. This seems to be clearly regulated by Sir2. However, Sir2 also affects rDNA copy number and suppresses unequal cross over events, which are stimulates by Fob1. Does Fun30 suppress Fob1-dependent recombination events in sir2D cells? 

      It is unclear why FUN30 only affects rDNA repeat copy number in sir2 cells. Why doesn't Fun30 reduce copy number in wild-type cells? 

      Deletion of SIR2 causes rightward repositioning of MCMs to a position where they are more prone to fire, as shown by our HU ChEC datasets in which we show that the repositioned MCMs are more prone to activation than the non-repositioned ones. FUN30 deletion suppresses activation of these, activation-prone repositioned MCMs, as shown by HU ChEC. This suppression of rDNA origin activation in sir2 cells causes rDNA to shrink. In fun30 single mutants, due to the paucity of non-repositioned MCMs, we do not observe significant suppression of rDNA origin firing, and consequently, there is no reduction in rDNA size in fun30 cells.

      (2) The authors use Mcm-MNase to map the location of the MCM helicase. Can these results be confirmed using the more standard and direct ChIP assay to examine changes in MCM localization

      We carried out suggested MCM ChIP experiments and present these results in Figure 5 supplement 2 and supplement 3. These ChIP data demonstrate that: 

      (1) MCM signal disappears preferentially at early origins compared to late origins, as seen in our ChEC results.

      (2) The disappearance of ChEC signal at rDNA origins in sir2 mutant is accompanied by the signal accumulation at the RFB, consistent with fork stalling at the RFB mirroring the results we obtained by ChEC. While these results indicate that that ChIP has adequate resolution to detect MCM repositioning at 2 kb, scale, its resolution was insufficient for fine scale discrimination of repositioned and non-repositioned MCMs.

      In this regard, the specific role of Fun30 in regulation of MCM firing at rDNA is interesting. 

      Does Fun30 localize to the ARS region of rDNA? How is Fun30 specifically recruited to rDNA?  

      We carried out ChIP for Fun30 and observed, similarly to previous reports (Durand-Dubief et al. 2012), a wide distribution of Fun30 throughout the genome and at rDNA. We have elected not to include these results in the current manuscript.

      (3) The 2D gels in Figure 4 are difficult to interpret. The bubble to arc ratios in fun30D seem different from both wild-type and sir2D. It may be helpful to the reader to quantify the bubble to arc ratios. fun30D also seems to be affecting ARS305 by itself.

      We provide quantification of 2 D gels in Figure 4.

      (4) Figure 5. 

      (4.1) For examining origin firing based on the disappearance of the Mcm-MNase reads, is HU arrest necessary? HU may be causing indirect effects due to replication fork stalling. In principle, the authors should be able to perform this analysis without HU, since their cells are released from synchronized arrest in G1 (and at least for the first cell cycle should proceed synchronously on to S phase). In addition, validation of Mcm-ChEC results using ChIP for one of the subunits of the MCM complex would increase confidence in the results. 

      The HU arrest allows us to examine early events in DNA replication at much finer spatial and temporal resolution than it would be possible without it.

      We have now used Mcm2 ChIP to confirm that the signal disappears at the MCM loading site in HU in sir2 cells as discussed above (Figure 5 figure supplement 3). However, the resolution is inadequate to discriminate non-repositioned vs repositioned MCMs.

      (4.2) The non-displaced Mcm-ChEC signal in sir2D seems like it's decreasing more than in wildtype cells. Explain. It would be helpful to quantify these results by integrating the area under each peek (or based on read numbers). It looks like one of the displaced Mcm signals (the one more distal from the non-displaced) is changing at a similar rate to the non-displaced.  

      Integrating the area under each Mcm-ChEC peak or using read numbers is superfluous for the following reasons:  (1) The rectangular appearance of the peaks in Figure 5 clearly reflects signal intensity, making additional numerical integration redundant. (2) The visual differences between wild-type and sir2D cells are distinct and sufficient for drawing conclusions without further quantification.  (3) Keeping the analysis straightforward avoids unnecessary complexity and maintains clarity.

      (4.3) Can the authors explain why fun30D seems to be suppressing only one of the 2 displaced Mcms from firing? 

      We speculate that the local environment is more conductive for firing one of two displaced MCMs, but we do not understand why.

      (5) Figure 6. Why would the deletion of SIR2, a silencing factor, results in increased nucleosome occupancy at rDNA? 

      If we understand correctly, the reviewer is referring to a small increase in +2 and +3 signal in sir2 compared to the WT. In WT G1 cells, there is a single MCM between +1 and +3 nucleosome. This space cannot accommodate a +2 nucleosome in G1 cells because MCM is loaded at that position in most cells (in G2 cells however, this space is occupied by a nucleosome (Foss et al., 2019). MCM repositioning in sir2 mutant would displace MCM from this location making it possible for this space to be now occupied by a nucleosome.

      The changes in nuc density seem modest. Also, nucleosome density is similarly increased in sir2D and fun30D cells, but sir2 has a dramatic effect on origin firing but fun30D does not. Explain. 

      We believe that the FUN30 status makes most of the difference for firing of displaced MCMs.

      Since there are few displaced MCMs in SIR2 cells, there is not large impact on origin firing. Furthermore, the rDNA already fires late in WT cells, so our ability to detect further delay upon  FUN30 deletion could be more difficult.

      (6) Discussion. At rDNA Sir2 may simply act by deacetylating nucleosomes and decreasing their mobility. This is unrelated to compaction which is usually only invoked regarding the activities of the full SIR complex (Sir2/3/4) at telomeres and the mating type locus. The arguments regarding polymerase size, compaction etc may not be relevant to the main point since although the budding yeast Sir2 participates in heterochromatin formation at the mating type loci and telomeres, at rDNA it may act locally near its recruitment site at the RFB. 

      This is a valid point. We have added this sentence in the discussion to highlight the differences between silencing at rDNA and those at the silent mating loci and telomeres that SIR-complex dependent.

      “Steric arguments such as these are even less compelling when made for rDNA than for the silent mating type loci and telomeres, because chromatin compaction has been studied mostly in the context of the complete Sir complex (Sir1-4). In contrast, Sir1, 3, and 4 are not present at the rDNA.”

      Minor 

      It would be interesting to see if deletion of any histone acetyltranferases acts in a similar way to Fun30 to reduce rDNA copy number in sir2D cells. 

      Thank you for this suggestion.

      Reviewer #3 (Recommendations For The Authors): 

      (1) The design of Figure 3 could be improved. A scheme could help understand the assay without flipping back to Figure 1. The numbers below the gel bands need definition. 

      We have included the scheme describing the restriction and MCM-MNase cut sites and the location of the probe for the Southern blot.

      (2) The design of Figure 4 could be improved by adding a scheme to help interpret the 2d gel picture. The figure also lacks quantitation. Are the results reproducible and the differences significant? 

      We have added the scheme, quantification and statistics in Figure 4.

      (3) Please list in each figure legend the exact strains from Table S1 which were used. 

      We have included the strain numbers in the Figure legend.

      Durand-Dubief M, Will WR, Petrini E, Theodorou D, Harris RR, Crawford MR, Paszkiewicz K, Krueger F, Correra RM, Vetter AT et al. 2012. SWI/SNF-like chromatin remodeling factor Fun30 supports point centromere function in S. cerevisiae. PLoS Genet 8: e1002974.

      Foss EJ, Gatbonton-Schwager T, Thiesen AH, Taylor E, Soriano R, Lao U, MacAlpine DM, Bedalov A. 2019. Sir2 suppresses transcription-mediated displacement of Mcm2-7 replicative helicases at the ribosomal DNA repeats. PLoS Genet 15: e1008138.

      He Y, Petrie MV, Zhang H, Peace JM, Aparicio OM. 2022. Rpd3 regulates single-copy origins independently of the rDNA array by opposing Fkh1-mediated origin stimulation. Proc Natl Acad Sci U S A 119: e2212134119.

      Kwan EX, Alvino GM, Lynch KL, Levan PF, Amemiya HM, Wang XS, Johnson SA, Sanchez JC, Miller MA, Croy M et al. 2023. Ribosomal DNA replication time coordinates completion of genome replication and anaphase in yeast. Cell Rep 42: 112161.

    1. “I will explain,” he said, “and that you may comprehend all clearly, we will first retrace the course of your meditations, from the moment in which I spoke to you until that of the rencontre{j} with the fruiterer in question. The larger links of the chain run thus — Chantilly, Orion, Dr. Nichol,{k} (16) Epicurus, Stereotomy, the street stones, the fruiterer.” There are few persons who have not, at some period of their lives, amused themselves in retracing the steps by which particular conclusions of their own minds have been attained. The occupation is often full of interest; and he who attempts it for the first time is{l} astonished by the apparently illimitable distance and incoherence between the starting-point and the goal.(17) What, then, must have been my amazement when I heard the Frenchman speak what he had just spoken, and when I could not help acknowledging that he had spoken the truth. He continued: “We had been talking of horses, if I remember aright, just before leaving the Rue C———. This was the last subject we discussed. As we crossed into this street, a fruiterer, with a large basket upon his head, brushing quickly past us, thrust you upon a pile of paving-stones collected at a spot where the causeway is undergoing repair. You stepped upon one of the loose fragments, slipped, slightly strained your ankle, appeared vexed or sulky, muttered a few words, turned to look{m} at the pile, and then proceeded in silence. I was not particularly attentive to what you did; but observation has become with me, of late, a species of necessity. “You kept your eyes upon the ground — glancing, with a petulant expression, at the holes and ruts in the pavement, (so that I saw you were still thinking of the stones,) until we reached the little alley called Lamartine,(18) which has been paved, by way of [page 536:] experiment, with the overlapping and riveted blocks.(19) Here your countenance brightened up, and, perceiving your lips move, I could not doubt that you murmured{n} the{oo} word ‘stereotomy,’ a term very affectedly applied to this species of pavement.{oo} I knew that you could not {pp}say to yourself ‘stereotomy’ without{pp}, being brought to think of atomies, and thus of the theories of Epicurus;(20) and since{q} when we discussed this subject not very long ago, I mentioned to you how singularly, yet with how little notice, the vague guesses of that noble Greek had met with confirmation in the late nebular cosmogony, I felt that you could not avoid casting your eyes upward{r} to the great nebula{s} in Orion,(21) and I certainly expected that you would do so. You did look up; and I was now{t} assured that I had correctly followed your steps. But in that bitter tirade upon Chantilly, which appeared in yesterday's ‘Musée,’ the satirist, making some disgraceful allusions to the cobbler's change of name upon assuming the buskin, quoted a{u} Latin line{v} about which{w} we have often conversed. I mean the line {xx}Perdidit antiquum litera prima sonum{xx} I had told you that this was in reference to Orion, formerly written Urion; and, from certain pungencies connected with this explanation, I was aware that you could not have forgotten it.(22) It was clear, therefore, that you would not fail to combine the two ideas of Orion and Chantilly. That you did combine them I saw by the character of the smile which passed over your lips. You thought of the poor cobbler's immolation. So far, you had been stooping in your gait; but now I saw you draw yourself up to your full height. I was then sure that you reflected upon the diminutive figure of Chantilly. At this point I interrupted your meditations to remark [page 537:] that as, in fact, he was a very little fellow — that Chantilly — he would do better at the Théâtre des Variétés.”{y}

      I'm surprised that Poe, as the pioneer of detective literature, can come up with such a deliberate and coherent process of thinking.

    2. “I will explain,” he said, “and that you may comprehend all clearly, we will first retrace the course of your meditations, from the moment in which I spoke to you until that of the rencontre{j} with the fruiterer in question. The larger links of the chain run thus — Chantilly, Orion, Dr. Nichol,{k} (16) Epicurus, Stereotomy, the street stones, the fruiterer.” There are few persons who have not, at some period of their lives, amused themselves in retracing the steps by which particular conclusions of their own minds have been attained. The occupation is often full of interest; and he who attempts it for the first time is{l} astonished by the apparently illimitable distance and incoherence between the starting-point and the goal.(17) What, then, must have been my amazement when I heard the Frenchman speak what he had just spoken, and when I could not help acknowledging that he had spoken the truth. He continued: “We had been talking of horses, if I remember aright, just before leaving the Rue C———. This was the last subject we discussed. As we crossed into this street, a fruiterer, with a large basket upon his head, brushing quickly past us, thrust you upon a pile of paving-stones collected at a spot where the causeway is undergoing repair. You stepped upon one of the loose fragments, slipped, slightly strained your ankle, appeared vexed or sulky, muttered a few words, turned to look{m} at the pile, and then proceeded in silence. I was not particularly attentive to what you did; but observation has become with me, of late, a species of necessity. “You kept your eyes upon the ground — glancing, with a petulant expression, at the holes and ruts in the pavement, (so that I saw you were still thinking of the stones,) until we reached the little alley called Lamartine,(18) which has been paved, by way of [page 536:] experiment, with the overlapping and riveted blocks.(19) Here your countenance brightened up, and, perceiving your lips move, I could not doubt that you murmured{n} the{oo} word ‘stereotomy,’ a term very affectedly applied to this species of pavement.{oo} I knew that you could not {pp}say to yourself ‘stereotomy’ without{pp}, being brought to think of atomies, and thus of the theories of Epicurus;(20) and since{q} when we discussed this subject not very long ago, I mentioned to you how singularly, yet with how little notice, the vague guesses of that noble Greek had met with confirmation in the late nebular cosmogony, I felt that you could not avoid casting your eyes upward{r} to the great nebula{s} in Orion,(21) and I certainly expected that you would do so. You did look up; and I was now{t} assured that I had correctly followed your steps. But in that bitter tirade upon Chantilly, which appeared in yesterday's ‘Musée,’ the satirist, making some disgraceful allusions to the cobbler's change of name upon assuming the buskin, quoted a{u} Latin line{v} about which{w} we have often conversed. I mean the line {xx}Perdidit antiquum litera prima sonum{xx} I had told you that this was in reference to Orion, formerly written Urion; and, from certain pungencies connected with this explanation, I was aware that you could not have forgotten it.(22) It was clear, therefore, that you would not fail to combine the two ideas of Orion and Chantilly. That you did combine them I saw by the character of the smile which passed over your lips. You thought of the poor cobbler's immolation. So far, you had been stooping in your gait; but now I saw you draw yourself up to your full height. I was then sure that you reflected upon the diminutive figure of Chantilly. At this point I interrupted your meditations to remark [page 537:] that as, in fact, he was a very little fellow — that Chantilly — he would do better at the Théâtre des Variétés.”{y}

      I know that the author wants to create an image of Dupin as a detective who is good at reasoning; however, I wondered, how could he link all these details together and never miss one action or facial expression from our narrator? If the author had cut some of the details, would it be more convincing to most people? Since most of us could barely do that, we might not be able to think of it and resonate with it.

    1. By now, I think, we critics understand science fiction’s social role as a site for attempting to predict, premediate, resist, and even control the future.

      Science Fiction isn't always about terrifying the audience with frightening scenarios, but a tool that can be used to predict the mere future and hopefully change the future scenario with these story that may open the reader mind to understand what is actually happening in their surroundings and start acting now before they encounter somewhat a similar scenario like the ones they read in fiction science stories.

    1. anguage model

      When watching the video "How ChatGPT Works Technically | ChatGPT Architecture" I found it fascinating to learn that words are represented by numbers, as they are easier for the model to process. This gives cause to question just how reliable these models are, as one slight misspelling can skew the results completely. For instance, if I were chatting with a friend via text about my plans for the holidays and they told me they were "going home to visit their parents" and I responded "Yes. I think I will go home to visit my pants too." They would easily be able to deduce my intended statement by referencing the context of our conversation. AI models fail to offer fluid thinking in these situations.

      I would like to learn more about what the "constraints" of an AI model mean. When looking at the word constraint from my own personal experience with constraints in manufacturing that represent where we are falling short or what may be holding us back. Does this mean the same thing in AI or does the word simply mean the rules or conditions within which the AI model exists?

    1. Author response:

      The following is the authors’ response to the previous reviews.

      We are grateful to all three reviewers and editors for their critical comments and suggestions.

      Reviewer #2 (Recommendations For The Authors):

      The authors responded satisfactorily to all my comments and suggestions.

      We thank the reviewer for his time and feedback.

      Reviewer #3 (Recommendations For The Authors):

      Comments for authors:

      The authors have addressed most of the reviewer's concerns. Although no additional data were included to strengthen the manuscript, they have clarified some relevant points, and the manuscript has been updated accordingly. In my view, the current manuscript is well-written and mostly straightforward.

      We thank the reviewer for his time and suggestions. Addressing them have improved the quality of our manuscript.

      After a second revision, I just have a few minor comments (mostly editorial) that should be easy to address.

      (1) Page 16: "The dominant presence of the GRIK1-1 gene was also reported in retinal Off bipolar cells..." Please include reference(s).

      We have now cited the following reference:

      Lindstrom, S.H., Ryan, D.G., Shi, J., DeVries, S.H., 2014. Kainate receptor subunit diversity underlying response diversity in retinal Off bipolar cells. J. Physiol. 592, 1457–1477. https://doi.org/10.1113/jphysiol.2013.265033

      (2) Page 18: "Based on our functional assays, the splice seems to affect the interaction between the receptor and auxiliary proteins". Please remove or tone down this statement; the current data do not support this claim.

      We have revised the sentence as following: “Based on our functional assays, the splice may possibly affect the interaction between the receptor and auxiliary proteins.”

      (3) Page 24: "cultures ... at 0.5 µg/mL were transfected". In the current context, it is not clear what you mean with 0.5 µg/mL. Please check and correct.

      Thanks for pointing out this error. We have corrected it.

      (4) Page 30. He et al. reference is repeated.

      Thanks. We have fixed it now.

      (5) Figure 3, Panel C: Please incorporate the EC50 value for the red trace into the figure; it appears to be a different data set and, consequently, a different fitting compared with Figure 2C.

      The GluK1-1a data set (red trace) is identical to that in Figure 2c, though it may appear different due to the scale of the X and Y axis. As suggested, we have now included the EC50 value for this data set in Figure 3, panel C.

      (6) Figure legend 4: Please check two minor issues here:

      (a) "Bar graphs... with or without Neto1 protein..." This statement is apparently wrong; Figure 4 does not show the effect of Neto1.

      (b) "The wild type GluK1 splice variant data is the same as from Figure 1.." I think the authors mean Figure 2A instead of Fig. 1. Please check.

      Thanks for pointing out the error. We have fixed the same in the revised manuscript.

      (7) Please check and correct spelling/wording issues in the text. Here are some examples:

      (a) Page 9 " Figure 3G - I, Table2.." (There is no Panel I). 

      Fixed.

      (b) Page 16 "... and is involved in various pathophysiology..." 

      We have revised the sentence as “… and is involved in various pathophysiological conditions”

      (c) Page 19 "The constructs used for this study were HEK293 WT mammalian cells were seeded on..." 

      Fixed. Thanks.

      (d) Page 23 "The immunoblots were probed..." Please check the whole paragraph and correct the issues.

      Fixed. Thanks.

      (e) Page 27 "initially, 1,97,908 particles were picked". Check the value; the same issue occurs in Fig.6 table supplement 1. 

      Thanks. We have now modified the sentence to clarify that for  GluK1-1aEM ND-SYM, initially, 1,97,908 particles were picked and subjected to multiple rounds of clean-up using 2D and 3D classification. Finally,  24,531 particles were used for the final 3D reconstruction and refinement.

      (f) Legend Figure 2: Remove "(F)" from the legend. 

      Thanks. Fixed.

      (g) Legend Figure 2-Sup.1: Check/correct spelling issues. 

      Thanks. Fixed.

      (h) Figure 5-figure supplement 1: There is a mistake in panel B: "GFP" label is shown for Gluk1 and Neto2, but the authors mention that the pull-down was done with Anti-His antibodies. Please correct.

      Thanks. The pull-down experiments were done with anti-His for both the blots presented in panels A and B as mentioned in both the figures (right side panels of both A and B). However, for the GluK1 and Neto2 pull downs (panel B), the blots were probed with anti-GFP antibody which would detect both the receptor (as the receptor has both GFP-His8) and Neto2-GFP at their respective sizes. This has been indicated in the figure panel B.

      (8) Related to the point-by-point document:

      Major concern 2: Interpreting the effect of mutants on the regulation by Neto proteins requires knowing how the mutant is affecting the channel properties without Neto. In my view, if the data showing the K368/375/379/382H376-E mutant without Neto is missing (in this case due to low current amplitude), then, the pink bars in Fig. 5 should be removed from the figure. 

      We thank the reviewer for raising this interesting point and agree that it would be valuable to characterize the channel properties of all the mutants individually. However, as mentioned earlier, the functions of some mutant receptors are only rescued, or reliable, measurable currents are detected, when they are co-expressed with Neto proteins. We still believe that comparing wild-type and mutant receptors co-expressed with Neto proteins provides important insights, and therefore, we would like to retain the K368/375/379/382H376-E mutant data in the figure.

      Major concern 4: Figure 6-figure Supplement 8 is not mentioned in the manuscript. It would help to include a proper description in the Results section similar to the answer included in the point-by-point document.

      Figure6-figure Supplement 8 has already been cited on page 15. We have also cited Figure6-figure Supplement 9 on the same page and have added following sentences in the text:

      “A superimposition of GluK1-1aEM (detergent-solubilized or reconstituted in nanodiscs) and GluK1-2a (PDB:7LVT) showed an overall conservation of the structures in the desensitized state. No significant movements were observed at both the ATD and LBD layers of GluK1-1a with respect to GluK1-2a (Figure 6; Figure 6-figure supplement 9).”

      Major concern 5: The ramp/recovery protocol was not included properly in the manuscript; please include the time of the ramp pulse and the time used for the recovery period.

      Elaborated ramp and recovery protocols are included in the methods section. The time used for the recovery period was variable and was tuned as per the recovery kinetics. All the figures were representative traces are shown include the scale bar showing the time period of agonist application.

      Minor concern 1: The proposed change was not included in the manuscript; check page 7.

      Thanks for highlighting this error. We have now changed it in the revised manuscript.

      Minor concern 10: The manuscript was not corrected as indicated. Please check.

      Thanks. We have now modified the sentence as following: “…..a reduction was observed for K375/379/382H376-E receptors (1.17 ± 0.28 P=0.3733) compared to wild-type although differences do not reach statistical significance

      Minor concern 14: The figure was not corrected as indicated. Please check.

      Thanks for highlighting this error. We have now changed it in the revised manuscript.

      Minor concern 19: I suggest including this briefly in the Discussion section.

      Thanks for the suggestion. We have included the following sentence in the discussion:

      “The differences in observations could be due to variations in experimental conditions, such as the constructs and recording conditions used.”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Weaknesses:

      Given that all mutants tested showed the same degree of activation by PEG400, it seemed possible that PEG400 might be an allosteric activator of WNK1/3 through direct binding interactions. Perhaps PEG400 eliminates CWN1/2 waters by inducing conformational changes so that water loss is an effect not a cause of activation. To address this it would be helpful to comment on whether new electron densities appeared in the X-ray structure of WNK1/SA/PEG400 that might reflect PEG400 interactions with chains A or B.

      We re-evaluated the WNK1/SA/PEG400 electron density looking for non-protein densities larger than water. No new densities were found. However, we do observe a PEG400-destabilizing effect using differential scanning fluorimetry, and have included this data into Figure 2. We conclude that the effects on the water structure and destabilization are due to demands on solvent.

      We have included in the second paragraph of the introduction references to primary literature that advance similar arguments to explain osmolyte induced effects on activity.

      Specifically, Colombo MF, Rau DC, Parsegian VA (1992) Protein solvation in allosteric regulation: a water effect on hemoglobin. Science 256: 655-659 and LiCata VJ, Allewell NM (1997) Functionally linked hydration changes in Escherichia coli aspartate transcarbamylase and its catalytic subunit. Biochemistry 36: 10161—10167. 

      It would also be helpful to discuss any experiments that might have been done in previous work to examine the direct binding of glycerol and other osmolytes to WNKs.

      We did not observe PEG400 in WNK1/SA/PEG400 despite effects on the space group and subunit packing. On the other hand, glycerol was observed in WNK1/SA, which was cryoprotected in glycerol (PDB file 6CN9). We have highlighted these differences in the second section of the results. A thorough analysis on the effects of various osmolytes on WNK structure, stability, and activity is a potential future direction.

      The study would benefit from a deeper discussion about how to reconcile the different effects of mutations. For example, wouldn't most or all of the mutations be expected to disrupt the water network, and relieve the proposed autoinhibition? This seemed especially true for some of the residues, like Y420(Y346), D353(D279), and K310(K236), which based on Fig 3 appeared to interact with waters that were removed by PEG400.

      The manuscript has been updated with new data and better discussion of this point. Given the inconsistencies on the effects of mutation in static light scattering (SLS), we addressed the possibility that the reducing agent was not constant across experiments. In a repeated study, including reducing agent (1 mM TCEP), we obtained results on mutant mass more similar to wild-type than in the original experiment. An exception was that two of the mutants were much more monomeric than wild-type. It follows that the network CWN1 stabilizes the inactive dimer. The reduced activity of some of the mutants probably reflects the position of CWN1 and the AL-CL Cluster in the active site, such that mutants can affect substrate binding or catalysis. This is now better discussed both in the data and discussion sections.

      Mutants have a tendency to have complex effects on activity and structure. It was satisfying to find any activating mutants. We point out that we have been careful to present all of our data including mutants that are not easily explained by our models.

      Alternatively, perhaps the waters in CWN2 are more important for maintaining the autoinhibited structure. This possibility would be useful to discuss, and perhaps comment on what may be known about the energetic contributions of bound water towards stabilizing dimers.

      This research focused on the most salient unique feature of WNK1- CWN1. We also identified CWN2. Mutational analysis of CWN2 can’t be done without disrupting the dimer interface, greatly complicating data interpretation.

      It would also be useful to comment on why aggregation of E319Q/A (E314) shouldn't inhibit kinase activity instead of activating it.

      On recollection of the SLS data in the presence of reducing agent, we saw reduced aggregation. WNK3/D279N and WNK3/E314Q were more monomeric, especially at the higher protein concentration used. WNK3/E314Q is one of the more active mutants.

      The X-ray work was done entirely with WNK1 while the mutational work was done entirely with WNK3. Therefore, a simple explanation for the disconnect between structure and mutations might be that WNK1 and WNK3 differ enough that predictions from the structure of one are not applicable to mutations of the other. It would be helpful to describe past work comparing the structure and regulation of WNK1 and WNK3 that support the assumption of their interchangeability.

      We have responded directly to this concern. We introduced our most interesting amino acid replacement WNK3/E314A into WNK1, making WNK1/E388A. Similar trends in chloride inhibition and mutational activation were observed in WNK1 as in WNK3. This supports the assumption of interchangeability of WNK1 and WNK3 we invoked for practical reasons.  As expected, the overall activity of WNK1 is lower than WNK3. Overall, the lower activity limited data collection. However, the lower activity did allow us to fit the chloride inhibition data to a kinetic model for WNK1.  Panels on WNK1 activity, mutation, and chloride inhibition were added to Figure 5 and to Supplemental data (Table S6).

      Reviewer #2 (Public Review):

      Strengths:

      The most interesting result presented here is that P1 crystals of WNK1 convert to P21 in the presence of PEG400 and still diffract (rather than being destroyed as the crystal contacts change, as one would expect). All of the assays for activity and osmolyte sensing are carried out well.

      Thank you. We have emphasized this point in the Results section with the word “remarkably”

      Weaknesses:

      The rationale for using WNK3 for the mutagenesis study is that it is more sensitive to osmotic pressure than WNK1. I think that WNK1 would have been a better platform because of the direct correlation to the structural work leading to the hypothesis being tested. All of the crystallographic work is WNK1; it is not logical to jump to WNK3 without other practical considerations.

      This point is addressed in the last comment to Reviewer 1. We added autophosphorylation assay data on our most interesting mutant (WNK3/E314A) in WNK1 (WNK1/E388A). Conversely, we have crystallographic data on uWNK3 (on uWNK3/E314A collected to 3.3Å). These new data justify the assumption of interchangeability of results obtained for uWNK1 and uWNK3.

      Osmolyte sensing was tested by measuring ATP consumption as a function of PEG400 (Figure 6). Data for the subset of mutants analyzed by this assay showed increasing activity. It is not clear why the same collection of mutant proteins analyzed in the experiments of Figure 5 was not also measured for osmolyte sensing in Figure 6.

      These data are now more complete, having been now collected for all of the WNK3 mutants (now Figure 7).

      The last set of data presented uses light scattering to test whether the WNK3 mutant proteins exhibit quaternary structural changes consistent with the monomer/dimer hypothesis. If they did, one would expect a higher degree of monomer for those that are activated by mutation, and a lower amount of monomer (like wt) for those that are not. Instead, one of the mutant proteins that showed the most chloride inhibition (Y346F) had a quaternary structure similar to the wt protein, and others have similar monomer/dimer mixtures but distinct chloride inhibition profiles (K307A and M301A). I don't see how the light scattering data contribute to this story other than to refute the hypothesis by showing a lack of correlation between quaternary structure, water binding, and activity. This is another reason why the disconnect between WNK1 and WNK3 could be a problem. All of the detailed structural work with WNK1 must be assumed with WNK3; perhaps the light scattering data are contradicting this assumption?

      As noted above, on recollection of the SLS data in the presence of reducing agent, we saw reduced aggregation and more consistency with our model. Thus, we now feel it is a useful contribution to the manuscript. The table in Supplemental data has been updated.

      Reviewer #1 (Recommendations For The Authors):

      Fig 3D in the PDF manuscript seemed distorted - waters were cut off. Also Fig 2D would benefit from showing the whole molecule, instead of cutting off the top and bottom of the kinase domain.<br /> We suspect this is a data transfer problem, since we don’t see these truncations.

      Both Figure 2 and 3 have been changed, addressing these concerns and adding new differential scanning fluorimetry data as discussed in reply to Reviewer 1. Figure 2 was simplified by eliminating Figures 2A-2C, and replacing them with a new Figure 2B, the superposition of WNK1/SA/PEG400 (PDB 9D3F), WNK1/SA (PDB 6CN9).  

      In Figure 3, we added a panel highlighting the volume change around CWN1 in presence of PEG400 (Figure 3C). Hopefully, inappropriate cropping has been eliminated.

      Line 162: Y314F should be Y346F.

      This has been corrected. Thank you.

      Lines 211-213 - these two sentences do not seem to logically go together: "Two hyper-active mutants were discovered, WNK3/E314A, and WNK3/E314Q. These mutants are straightforward to interpret based on our model: the mutated residues support and stabilize inactive dimeric WNK."

      An extensive rewrite has been conducted to address the difference in activity between the higher activity mutants versus less active mutants, now discussed in two paragraphs, and two Figures, Figure 5 and 6. The SLS data, recollected with more reducing agent, has given more consistent results (Supplemental), making the discussion more straightforward (discussed above).

      Reviewer #2 (Recommendations For The Authors)

      I think WNK1 would be a better platform for mutagenesis than WNK3. Or minimally the authors should better justify the switch to WNK3 from WNK1. Analyze the same set of mutants in Figure 5 into Figure 6.

      Again, we have added assay data on uWNK1/E388A, and structural data on uWNK3/E314A.

      I would analyze the same set of mutants in Figures 5 and 6.

      We have analyzed all of the WNK3 mutants in the ADP-Glo assays (Figure 7).

      Will the P21 crystal form grow independently in PEG400?

      Attempts to crystallize WNK1/SA or WNK3/SA or other constructs in PEG400 have been unsuccessful.

      I would also add some context about the role of water in allosteric mechanisms. I know there is a long history in hemoglobin in which specific waters have been associated with the T and R states such as that by Marcio Colombo. There is a relatively recent article in J. Phys Chem. that would provide good context. Leitner et al., J. Chem. Phys. 152, 240901 (2020)

      Thank you. Good call.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This fundamental study uses a creative experimental system to directly test Ohno's hypothesis, which describes how and why new genes might evolve by duplication of existing ones. In agreement with existing criticism of Ohno's original idea, the authors present compelling evidence that having two gene copies does not speed up the evolution of a new function as posited by Ohno, but instead leads to the rapid inactivation of one of the copies through the accumulation of mostly deleterious mutations. These findings will be of broad interest to evolutionary biologists and geneticists.

      We thank the editors and the reviewers for their positive feedback concerning our experimental system and for the constructive feedback on how to further improve the manuscript. We have now addressed the reviewer’s comments in a revised version.

      Reviewer #1 (Public Review):

      Overview:

      The authors construct a pair of E. coli populations that differ by a single gene duplication in a selectable fluorescent protein. They then evolve the two populations under differing selective regimes to assess whether the end result of the selective process is a "better" phenotype when starting with duplicated copies. Importantly, their starting duplicated population is structured to avoid the duplication- amplification process often seen in bacterial artificial evolution experiments. They find that while duplication increases robustness and speed of adaptation, it does not result in more highly adapted final states, in contrast to Ohno's hypothesis.

      Major comments:

      This is an excellent study with a very elegant experimental setup that allows a precise examination of the role of duplication in functional evolution, exclusive of other potential mechanisms. My main concern  is  to  clarify  some  of  the  arguments  relating  to  Ohno's  hypothesis.

      I think my main confusion on first reading the manuscript was in the precise definition of Ohno's hypothesis. I think this confusion was mine and not the authors, but it is likely common and could be addressed.

      Most evolutionary biologists think of gene duplication as making neofunctionalization "easier" by providing functional redundancy and a larger mutational target, such that the evolutionary process of neofunctionalization is faster (as the authors observed). In this framework, the final evolved state might not differ when selection is applied to duplicated copies or a single-copy gene. Ohno's hypothesis, by contrast, argues that there generally exist adaptive conflicts between the ancestral function and the "desired" novel function, such that strong selection on a single-copy gene cannot produce the evolutionary optima that selection on two copies would. This idea is hinted at in the quotation from Ohno in paragraph 2 of the introduction. However, the sentences that follow I don't think reinforce this concept well enough and lead to some confusion.

      With that definition in mind, I agree with the authors' conclusion that these data do not support Ohno's hypothesis. My quibble would be that what is actually shown here is that adaptive conflict in function is not universal: there are cases where a single gene can be optimized for multiple functions just as well as duplicated copies. I do not think the authors have, however, refuted the possibility that such adaptive conflicts are nonetheless a significant barrier to evolutionary innovation in the absence of gene duplication generally. Perhaps just a sentence or two to this effect might be appropriate.

      We fully agree with the reviewer that trade-offs might play an important role in the evolution of single copy and of duplicated genes, depending on the gene and on the selection regime. And while trade-offs are not likely to play a big role in the selection regime we discuss in detail in the main text (evolution towards more green), they probably are important for at least one our selection regimes. In fact, we so state in the following passage of the discussion. In addition, we have now added a sentence that acknowledges the importance of trade-offs for evolution in the absence of gene duplication:

      “A single gene encoding such a protein suffers from an adaptive conflict between the two activities. Gene duplication may provide an escape from this adaptive conflict, because each duplicate may specialize on one activity14, 15. For coGFP, a trade-off likely exists for fluorescence in these two colors, because improvement of green fluorescence entails a loss of blue fluorescence during evolution (Figure S8 and Figure S16). We therefore expected that during selection for both green and blue fluorescence, one cogfp copy in double-copy populations would “specialize” on green fluorescence whereas the other copy would specialize on blue fluorescence. However, when we analyzed individual population members with two active gene copies we could not find any such specialization (Figure S21). Moreover, the identified key mutations at positions 147 and 162 have a very low frequency (<1%) in these populations (Figure S15). Future experiments with different selection strategies might reveal the reasons for this observation and the conditions under which such a specialization can occur.“

      I also think the authors need to clarify their approach to normalizing fluorescence between the two populations to control for the higher relative protein expression of the population with a duplicated gene. Since each population was independently selected with the highest fluorescing 60% (or less) of the cells selected, I think this normalization is appropriate. Of course, if the two populations were to compete against each other, this dosage advantage of the duplicates would itself be a selective benefit. Even as it is, the dosage advantage should be a source of purifying selection on the duplication, and perhaps this should be noted.

      The reviewer is correct. To be able to follow the evolutionary trajectories of the different constructs, the populations were treated separately. The gates were adjusted for each library separately to select for the top 60, 1 or 0.01% of cells and the gates for the double-copy populations were set to slightly higher fluorescence, reflected in the higher fluorescence of these populations in Figure 3A. Indeed, if individuals in these populations were to compete against each other, the double-copy populations would have a benefit due to the dosage advantage. However, as we already pointed out in the manuscript, we did not see any additional advantage beyond the increased gene dosage provided by the second copy (Figure 3B). To discuss this issue in more detail, we have now added the following text to the discussion:

      “It is worth noting that we evolved each of our single- and double-copy populations separately and in parallel to follow their individual evolutionary trajectories. In a natural population, individuals with one or two copies might occur in the same population and compete against each other. In this situation any dosage advantage of a duplicate gene would itself entail selective benefit. Our approach allowed us to find out if gene duplication facilitates phenotypic evolution beyond any such gene dosage effect. At least for the specific genes, selection pressures, and mutation rates we used, the data suggest that it does not.”

      Finally, I am slightly curious about the nature of the adaptations that are evolving. The authors primarily discuss a few amino-acid changing mutations that seem to fix early in the experiment. Looking at Figure 3, it however, appears that the populations are still evolving late in the experiment, and so presumably other changes are occurring later on. Do the authors believe that perhaps expression changes to increase protein levels are driving these later changes?

      Figure S15 shows that some mutations are indeed still increasing in frequency during late evolutionary rounds, in particular S2L, V141L and V205L. We have measured the emission spectra of these mutants (Figure S16), and these mutations increase fluorescence both in green and blue. It is therefore likely that these mutations, similar to L98M, increase protein expression, solubility, or thermal stability, as suggested by the reviewer. We now clarify this matter in a new passage of the results:

      “Like L98M, the additional mutations S2I, V141I and V25L also occurred in all selection regimes, but they reached lower frequencies than L98M during the 5 generations of the experiment. We hypothesized that mutations observed in all selection regimes do not derive their benefit from increasing the intensity of any one fluorescent color. Instead, they may increase protein expression, solubility, or thermal stability.”

      Reviewer #2 (Public Review):

      Summary:

      Drawing from tools of synthetic biology, Mihajlovic et al. use a cleverly designed experimental system to dissect Ohno's hypothesis, which describes the evolution of functional novelty on the gene-level through the process of duplication & divergence.

      Ohno's original idea posits that the redundancy gained from having two copies of the same gene allows one of them to freely evolve a new function. To directly test this, the authors make use of a fluorescent protein with two emission maxima, which allows them to apply different selection regimes (e.g. selection for green AND blue, or, for green NOT blue). To achieve this feat without being distracted by more complex evolutionary dynamics caused by the frequent recombination between duplicates, the authors employ a well-controlled synthetic system to prevent recombination: Duplicates are placed on a plasmid as indirect repeats in a recombination-deficient strain of E.coli. The authors implement their directed evolution approach through in vitro mutagenesis and selection using fluorescent-activated cell sorting. Their in-depth analysis of evolved mutants in single-copy versus double-copy genotypes provides clear evidence for Ohno's postulate that redundant copies experience relaxed purifying selection. In contrast to Ohno's original postulate, however, the authors go on to show that this does not in fact lead to more rapid phenotypic evolution, but rather, the rapid inactivation of one of the copies.

      Strengths:

      This paper contributes with great experimental detail to an area where the literature predominantly leans on genomics data. Through the use of a carefully designed, well-controlled synthetic system the authors are able to directly determine the phenotype & genotype of all individuals in their evolving populations and compare differences between genotypes with a single or double copy of coGFP. With it they find clear evidence for what critics of Ohno's original model have termed "Ohno's dilemma", the rapid non- functionalization by predominantly deleterious mutations.

      Including an expressed but non-functional coGFP in (phenotypically) single copy genotypes provides an especially thoughtful control that allows determining a baseline dN/dS ratio in the absence of selection. All in all the study is an exciting example of how the clever use of synthetic biology can lead to new insights.

      Weaknesses:

      The major weakness of the study is tied to its biggest strength (as often in experimental biology there is a trade-off between 'resolution' and 'realism').

      The paper ignores an important component of the evolutionary process in favour of an in-depth characterization of how two vs one copy evolve. Specifically, by employing a recombination-deficient strain and constructing their duplicates as inverted repeats their experimental design completely abolishes recombination between the two copies.

      This is problematic for two reasons:

      i)  In nature, new duplicates do not arise as inverted, but rather as direct (tandem) repeats and - as the authors correctly point out - these are very unstable, due to the fact that repeated DNA is prone to recA- dependent homologous recombination (which arise orders of magnitude more frequently than point mutations).

      ii)  This instability often leads to further amplification of the duplicates under dosage selection both in the lab and in the wild (e.g. Andersson & Hughes, Annu. Rev. Genet. 2009), and would presumably also be an outcome under the current experimental set-up if it was not prevented from happening?

      So in sum, recombination between duplicate genes is not merely a nuisance in experiments, but occurring at extremely high frequencies in nature (such that the authors needed to devise a clever engineering solution to abolish it), and is often observed in evolving populations, be it in the laboratory or the wild.

      The manuscript sells controlling of copy number as a strength. And clearly, without it, the same insights could not be gained. However, if the basis for the very process of what Ohno's model describes is prevented from happening for the process to be studied, then, for reasons of clarity and context this needs pointing out, especially, to readers less familiar with the principles of molecular evolution.

      Connected to this, there are several places in the introduction and the discussion where I feel that the existing literature, in particular models put forward since Ohno that invoke dosage selection (such as IAD) end up being slightly misrepresented.

      My point is best exemplified in line 1 of Discussion: "To test Ohno's hypothesis and to distinguish its predictions from those of competing hypotheses, it is necessary to maintain a constant and stable copy number of duplicated genes during experimental evolution."

      We understand the reviewer’s position and fully agree that we needed to clarify better what our experiments aimed to achieve. To this end, we rewrote the beginning of the discussion to read:

      “Our aim was to study whether gene duplication can affect mutational robustness and phenotypic evolution beyond any effect of increased gene dosage provided by multiple gene copies. To this end, we needed to maintain a constant and stable copy number of duplicated genes during experimental evolution.”

      I think this statement is simply not true and might be misleading. To take the exaggerated position of a devil's advocate, the goal of evolutionary biology should be to find out how evolution actually proceeds in nature most of the time, rather than creating laboratory systems that manage to recapitulate influential ideas.

      On this point, we respectfully disagree. To ask questions like ours, laboratory experiments that are highly controlled albeit possibly “unnatural” can be essential. And we would argue that our experiments do not merely aim to “recapitulate” an influential idea but to validate it and potentially refute it, as we did for our study system. Validating theory is an essential aspect of experimental science. Textbooks in biology and beyond are rife with examples.

      While fixing copy number may be a necessary step to understand how one copy evolves if a second one is present, it seems that if Ohno's hypothesis only works out in recA-deficient bacterial strains and on engineered inverted repeats, that Ohno might have missed one crucial aspect of how paralogs evolve. The mentioned competing hypotheses have been put forward to (a) address Ohno's dilemma (which the present study beautifully demonstrates exists under their experimental conditions) and (b) to reflect a commonly observed evolutionary process in bacteria (dosage gain in response to selection, e.g. a classic way of gaining antibiotic resistance). Fixing the copy number allowed the authors to show which predictions of Ohno's model hold up and which don't (under these specific conditions). But they do so without even preventing the processes described by alternative models from happening, so the experimental system is hardly appropriate to distinguish between Ohno & alternatives. Therefore, I think it could be made clearer that the experimental system is great to look at certain aspects Ohno's hypothesis in  detail, but  it  can  only inform  us about  a  universe  without  recombination.

      (1)  Citing the works by ref 8, 26, 27 to merely state that "in some copies were gained and some were lost (ref 6, ref 25)" makes it seem as if fixing at 2 copies is some sort of sensible average. Yet ref 6 (Dhar et al.) specifically states that dosage is the most important response. Moreover, in this study gene copies are lost, but plasmid copies are gained instead. In Holloway et al. 2007 (ref 25), the 2 copies resided on different plasmids, so entirely different underlying molecular genetics might be at work (high cost of plasmid maintenance, and competitive binding on both proteins onto the respective (off)-target, where either way selection favored a single copy, so a different situation altogether). In both cited studies, fixing the copy would have prohibited learning something about the process of duplication & divergence.

      Hence this statement seems to distract the readers from the main message, which seems that preventing recombination experimentally allows to follow the divergence of each copy and studying a response that does not involve dosage-increase.

      (2)   "These studies highlighted the importance of gene duplication in providing fast adaptation under changing environmental conditions but they focused on the importance of gene dosage." I think this constructs a false dichotomy. Instead, these studies pointed out that dosage (and with it, selection for dosage)  is  an  important  part  of  the  equation  that  might  have  been  missed  by  Ohno.

      Your points are well taken. To clarify the insights from previous experiments and the aims of our experiments we rewrote this passage in question as follows.

      “These studies underline the importance of gene duplication in providing fast adaptation under changing environmental conditions. In some studies one copy was lost6, 25, while in others, additional copies were gained8, 26, 27. Together these studies highlight that gene dosage and selection for dosage can play an important role during the evolution of duplicated genes6, 8, 25-28.

      These studies also raise the question whether gene duplication can provide an advantage beyond its effects on gene dosage. To find out it is necessary to study the evolution of gene duplicates while keeping the copy number of the duplicated gene exactly at two. This is challenging because gene duplication causes recombinational instability and high variability in copy number. No previous experimental studies were designed to control copy number. Here, we present an experimental system that allowed us to keep the copy number fixed at one or two genes, and to follow the evolution of each gene copy in the absence of any dosage increase.”

      (3)  "Such models are also easier to test experimentally, because they do not require precise control of gene copy number. The necessary tests can even benefit from massive gene amplifications8. Although Ohno's hypothesis is more difficult to test experimentally (...)" - again, I feel the wording is slightly misleading. The point is not that IAD is easier to test and Ohno's model is harder to test in laboratory experiments, rather, experiments (and some more limited observations of naturally evolving populations) seem to suggest that in reality evolution proceeds (more often?) according to IAD rather than Ohno's neofunctionalization hypothesis. However, as the authors point out, it will be exciting to see their clever experimental system used to test other genes and conditions to get a more comprehensive understanding of what gene- and selection- parameter values would overcome Ohno's dilemma.

      We agree and in response rewrote the paragraph in question to read:

      “The challenge that a duplicated gene copy must remain free of frequent deleterious mutations long enough to acquire beneficial mutations that provide a new selectable phenotype is known as Ohno’s dilemma13. Our experiments confirm that this challenge is highly relevant for post-duplication evolution. Other models such as the innovation-amplification-divergence (IAD) model8, 13 postulate that this dilemma can be resolved through an increase in gene dosage that allows latent pre-duplication phenotypes to come under the influence of selection. To distinguish between the effects of gene dosage and other benefits of gene duplication, we prevented recombination and gene amplification to prevent copy number increases beyond two copies. We are aware that our experimental design does not reflect how evolution may occur in the wild. However, this design allowed us to study evolutionary forces separately that are otherwise difficult to disentangle. “

      Finally, we also made two changes in the abstract (highlighted in red) to take your feedback into account.

      Reviewer #2 (Recommendations For The Authors):

      The paper is very well written, with a lot of emphasis put on explaining every step and every finding. It was a joy to read.

      Thanks!

      Full stop missing in line 5 of abstract.

      Corrected.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1 (Public Review):

      Summary: Wilmes and colleagues present a computational model of a cortical circuit for predictive processing which tackles the issue of how to learn predictions when different levels of uncertainty are present for the predicted sensory stimulus. When a predicted sensory outcome is highly variable, deviations from the average expected stimulus should evoke prediction errors that have less impact on updating the prediction of the mean stimulus. In the presented model, layer 2/3 pyramidal neurons represent either positive or negative prediction errors, SST neurons mediate the subtractive comparison between prediction and sensory input, and PV neurons represent the expected variance of sensory outcomes. PVs therefore can control the learning rate by divisively inhibiting prediction error neurons such that they are activated less, and exert less influence on updating predictions, under conditions of high uncertainty.

      Strengths: The presented model is a very nice solution to altering the learning rate in a modality and context-specific way according to expected uncertainty and, importantly, the model makes clear, experimentally testable predictions for interneuron and pyramidal neuron activity. This is therefore an important piece of modelling work for those working on cortical and/or predictive processing and learning. The model is largely well-grounded in what we know of the cortical circuit.

      Weaknesses: Currently, the model has not been challenged with experimental data, presumably because data from an ad- equate paradigm is not yet available. I therefore only have minor comments regarding the biological plausibility of the model:

      Beyond the fact that some papers show SSTs mediate subtractive inhibition and PVs mediate divisive inhibition, the selection of interneuron types for the different roles could be argued further, given existing knowledge of their properties. For instance, is a high PV baseline firing rate, or broad sensory tuning that is often interpreted as a ’pooling’ of pyramidal inputs, compatible with or predicted by the model?

      Thank you for this nice suggestion. We added a section to the discussion expanding on this: “The model predicts that the divisive interneuron type, which we here suggest to be the PVs, receive a representation of the stimulus as an input. PVs could be pooling the inputs from stimulus-responsive layer 2/3 neurons to estimate uncertainty. The more the stimulus varies, the larger the variability of the pyramidal neuron responses and, hence, the variability of the PV activity. The broader sensory tuning of PVs (Cottam et al. 2013) is in line with the model insofar as uncertainty modulation could be more general than the specific feature, which is more likely for low-level features processed in primary sensory cortices. PVs were shown to connect more to pyramidal cells with similar feature-tuning (Znamenskyiy et al. 2024); this would be in line with the model, as uncertainty modulation should be feature-related. In our model, some SSTs deliver the prediction to the positive prediction error neurons. SSTs are already known to be involved in spatial prediction, as they underlie the effect of surround suppression (Adesnik et al. 2012), in which SSTs suppress the local activity dependent on a predictive surround.”

      On a related note, SSTs are thought to primarily target the apical dendrite, while PVs mediate perisomatic inhibition, so the different roles of the interneurons in the model make sense, particularly for negative PE neurons, where a top-down excitatory predicted mean is first subtractively compared with the sensory input, s, prior to division by the variance. However, sensory input is typically thought of as arising ’bottom-up’, via layer 4, so the model may match the circuit anatomy less in the case of positive PE neurons, where the diagram shows ’s’ arising in a top-down manner. Do the authors have a justification for this choice?

      We agree that ‘s’ is a bottom-up input and should have been more clear about that we do not consider ‘s’ to be a top-down input like the prediction. We hence adjusted the figure correspondingly and added a few clarifying sentences to the manuscript. The reviewer, however, raises an important point, which is not talked about enough. Namely, that if the bottom-up input ‘s’ comes from L4, how can it be compared in a subtractive manner with the top-down prediction arriving in the superficial layers? In Attinger et al. it was shown that the visual stimulus had subtractive effects on SST neurons. The axonal fibers delivering the stimulus information are hence likely to arrive in the vicinity of the apical dendrites, where SSTs target pyramidal cells. Hence, those axons delivering stimulus information could also target the apical dendrites of pyramidal cells. As the reviewer probably had in mind, L4 input tends to arrive in the somatic layer. However, there are also stimulus-responsive cells in layer 2/3, such that the stimulus information does not need to come directly from L4, it could be relayed via those stimulus-responsive layer 2/3 cells. It has been shown that L2/3→L3 axons are mostly located in the upper basal dendrites and the apical oblique dendrites, above the input from L4 (Petreanu et al. The subcellular organization of neocortical excitatory connections). Hence, stimulus information could arrive on the apical dendrites, and be subtractively modulated by SSTs. We would also like to note that the model does not take into account the precise dendritic location of the inputs. The model only assumes that the difference between stimulus and prediction is calculated before the divisive modulation by the variance.

      In cortical circuits, assuming a 2:8 ratio of inhibitory to excitatory neurons, there are at least 10 pyramidal neurons to each SST and PV neuron. Pyramidal neurons are also typically much more selective about the type of sensory stimuli they respond to compared to these interneuron classes (e.g., Kerlin et al., 2012, Neuron). A nice feature of the proposed model is that the same interneurons can provide predictions of the mean and variance of the stimulus in a predictor-dependent manner. However, in a scenario where you have two types of sensory stimulus to predict (e.g., two different whiskers stimulated), with pyramidal neurons selective for prediction errors in one or the other, what does the model predict? Would you need specific SST and PV circuits for each type of predicted stimulus?

      If we understand correctly, this would be a scenario in which the same context (e.g., sound) is predicting two types of sensory stimulus. In that case, one may need specific SST and PV circuits for the different error neurons selective for prediction errors in these stimuli, depending on how different the predictions are for the two stimuli as we elaborate in the following. The reviewer is raising an important point here and that is why we added a section to the discussion elaborating on it.

      We think that there is a reason why interneurons are less selective than pyramidal cells and that this is also a feature in prediction error circuits. Similarly-tuned cells are more connected to each other, because they tend to be activated together as the stimuli they encode tend to be present in the environment together. Also, error neurons selective to nearby whiskers are more likely to receive similar stimulus information, and hence similar predictions. Hence, because nearby whiskers are more likely to be deflected similarly, a circuit structure may have developed during development such that neurons selective for prediction errors of nearby whiskers, may receive inputs from the same inhibitory interneurons. In that case, the same SST and PV cells could innervate those different neurons. If, however, the sensory stimuli to be predicted are very different, such that their representations are likely to be located far away from each other, then it also makes sense that the predictions for those stimuli are more diverse, and hence the error neurons selective to these are unlikely to be innervated by the same interneurons.

      We added a shorter version of this to the discussion: “The lower selectivity of interneurons in comparison to pyramidal cells could be a feature in prediction error circuits. Error neurons selective to similar stimuli are more likely to receive similar stimulus information, and hence similar predictions. Therefore, a circuit structure may have developed such that prediction error neurons with similar selectivity may receive inputs from the same inhibitory interneurons.”

      Reviewer 2 (Public Review):

      Summary: This computational modeling study addresses the observation that variable observations are interpreted differently depending on how much uncertainty an agent expects from its environment. That is, the same mismatch between a stimulus and an expected stimulus would be less significant, and specifically would represent a smaller prediction error, in an environment with a high degree of variability than in one where observations have historically been similar to each other. The authors show that if two different classes of inhibitory interneurons, the PV and SST cells, (1) encode different aspects of a stimulus distribution and (2) act in different (divisive vs. subtractive) ways, and if (3) synaptic weights evolve in a way that causes the impact of certain inputs to balance the firing rates of the targets of those inputs, then pyramidal neurons in layer 2/3 of canonical cortical circuits can indeed encode uncertainty-modulated prediction errors. To achieve this result, SST neurons learn to represent the mean of a stimulus distribution and PV neurons its variance.

      The impact of uncertainty on prediction errors is an understudied topic, and this study provides an intriguing and elegant new framework for how this impact could be achieved and what effects it could produce. The ideas here differ from past proposals about how neuronal firing represents uncertainty. The developed theory is accompanied by several predictions for future experimental testing, including the existence of different forms of coding by different subclasses of PV interneurons, which target different sets of SST interneurons (as well as pyramidal cells). The authors are able to point to some experimental observations that are at least consistent with their computational results. The simulations shown demonstrate that if we accept its assumptions, then the authors’ theory works very well: SSTs learn to represent the mean of a stimulus distribution, PVs learn to estimate its variance, firing rates of other model neurons scale as they should, and the level of un- certainty automatically tunes the learning rate, so that variable observations are less impactful in a high uncertainty setting.

      Strengths: The ideas in this work are novel and elegant, and they are instantiated in a progression of simulations that demonstrate the behavior of the circuit. The framework used by the authors is biologically plausible and matches some known biological data. The results attained, as well as the assumptions that go into the theory, provide several predictions for future experimental testing.

      Weaknesses: Overall, I found this manuscript to be frustrating to read and to try to understand in detail, especially the Results section from the UPE/Figure 4 part to the end and parts of the Methods section. I don’t think the main ideas are so complicated, and it should be possible to provide a much clearer presentation.

      For me, one source of confusion is the comparison across Figure 1EF, Figure 2A, Figure 3A, Figure 4AB, and Figure 5A. All of these are meant to be schematics of the same circuit (although with an extra neuron in Figure 5), yet other than Figures 1EF and 4AB, no two are the same! There should be a clear, consistent schematic used, with identical labeling of input sources, neuron types, etc. across all of these panels.

      We changed all figures to make them more consistent and pointed out that we consider subparts of the circuit.

      The flow of the Results section overall is clear until the “Calculation of the UPE in Layer 2/3 error neurons” and Figure 4, where I find that things become significantly more confusing. The mention of NMDA and calcium spikes comes out of the blue, and it’s not clear to me how this fits into the authors’ theory. Moreover: Why would this property of pyramidal cells cause the PV firing rate to increase as stated? The authors refer to one set of weights (from SSTs to UPE) needing to match two targets (weights from s to UPE and weights from mean representation to UPE); how can one set of weights match two targets? Why do the authors mention “out-of-distribution detection’ here when that property is not explored later in the paper? (see also below for other comments on Figure 4)

      We agree that the introduction of NMDA and calcium spikes was too short and understand that it was confusing. We therefore modified and expanded the section. To answer the two specific questions: First, Why would this property of pyramidal cells cause the PV firing rate to increase as stated? This property of pyramidal cells does not cause the PV firing rate to increase. When for example in positive error neurons, the mean input increases, then the PVs receive higher stimulus input on average, which is not compensated by the inhibitory prediction (which is still at the old mean), such that the PV firing rate increases. Due to the nonlinear integration in PVs, the firing rate can increase a lot and inhibit the error neurons strongly. If the error neurons integrate the difference nonlinearly, they compensate for the increased inhibition by PVs. In Figure 5, we show that a circuit in which error neurons exhibit a dendritic nonlinearity matches an idealised circuit in which the PVs perfectly represent the variance. We modified the text to clarify this.

      Second, how can one set of weights match two targets? In our model, one set of weights does not need to match two targets. We apologise that this was written in such a confusing way. In positive error neurons, the inhibitory weights from the SSTs need to match the excitatory weights from the stimulus, and in negative error neurons, the inhibitory weights from the SSTs need to match the excitatory weights from the prediction. The weights in positive and negative circuits do not need to be the same. So, on a particular error neuron, the inhibition needs to match the excitation to maintain EI balance. Given experimental evidence for EI balance and heterosynaptic plasticity, we think that this constraint is biologically achievable. The inhibitory and excitatory synapses that need to match are targeting the same postsynaptic neuron and could hence have access to their postsynaptic effect. We modified the text to be more clear. Finally, we omitted the mentioning of out-of-distribution detection, see our reply below.

      Coming back to one of the points in the previous paragraph: How realistic is this exact matching of weights, as well as the weight matching that the theory requires in terms of the weights from the SSTs to the PVs and the weights from the stimuli to the PVs? This point should receive significant elaboration in the discussion, with biological evidence provided. I would not advocate for the authors’ uncertainty prediction theory, despite its elegant aspects, without some evidence that this weight matching occurs in the brain. Also, the authors point out on page 3 that unlike their theory, “...SSTs can also have divisive effects, and PVs can have subtractive effects, dependent on circuit and postsynaptic properties”. This should be revisited in the Discussion, and the authors should explain why these effects are not problematic for their theory. In a similar vein, this work assumes the existence of two different populations of SST neurons with distinct UPE (pyramidal) targets. The Discussion doesn’t say much about any evidence for this assumption, which should be more thoroughly discussed and justified.

      These are very important points, we agree that the biological plausibility of the model’s predictions should be discussed and hence expanded the discussion with three new paragraphs:

      To enable the comparison between predictions and sensory information via subtractive inhibition, we pointed out that the weights of those inputs on the postsynaptic neuron need to match. This essentially means that there needs to be a balance of excitatory and inhibitory inputs. Such an EI balance has been observed experimentally (Tan and Wehr, 2009). And it has previously been suggested that error responses are the result of breaking this EI balance (Hertäg und Sprekeler, 2020, Barry and Gerstner, 2024). Heterosynaptic plasticity is a possible mechanism to achieve EI balance (Field et al. 2020). For example, spike pairing in pre- and postsynaptic neurons induces long-term potentiation at co-activated excitatory and inhibitory synapses with the degree of inhibitory potentiation depending on the evoked excitation (D’amour and Froemke, 2015), which can normalise EI balance (Field et al. 2020).

      In the model we propose, SSTs should be subtractive and PVs divisive. However, SSTs can also be divisive, and PVs subtractive dependent on circuit and postsynaptic properties (Seybold et al. 2015, Lee et al. 2012, Dorsett et al. 2021). This does not necessarily contradict our model, as circuits in which SSTs are divisive and PVs subtractive could implement a different function, as not all pyramidal cells are error neurons. Hence, our model suggests that error neurons which can calculate UPEs should have similar physiological properties to the layer 2/3 cells observed in the study by Wilson et al. 2012.

      Our model further posits the existence of two distinct subtypes of SSTs in positive and negative error circuits. Indeed, there are many different subtypes of SSTs. SST is expressed by a large population of interneurons, which can be further subdivided. There is e.g. a type called SST44, which was shown to specifically respond when the animal corrects a movement (Green et al. 2023). Our proposal is hence aligned with the observation of functionally specialised subtypes of SSTs.

      Finally, I think this is a paper that would have been clearer if the equations had been interspersed within the results. Within the given format, I think the authors should include many more references to the Methods section, with specific equation numbers, where they are relevant throughout the Results section. The lack of clarity is certainly made worse by the current state of the Methods section, where there is far too much repetition and poor ordering of material throughout.

      We implemented the reviewer’s detailed and helpful suggestions on how to improve the ordering and other aspects of the methods section and now either intersperse the equations within the results or refer to the relevant equation number from the Methods section within the Results section.

      Reviewer 3 (Public Review):

      Summary: The authors proposed a normative principle for how the brain’s internal estimate of an observed sensory variable should be updated during each individual observation. In particular, they propose that the update size should be inversely proportional to the variance of the variable. They then proposed a microcircuit model of how such an update can be implemented, in particularly incorporating two types of interneurons and their subtractive and divisive inhibition onto pyramidal neurons. One type should represent the estimated mean while another represents the estimated variance. The authors used simulations to show that the model works as expected.

      Strengths: The paper addresses two important issues: how uncertainty is represented and used in the brain, and the role of inhibitory neurons in neural computation. The proposed circuit and learning rules are simple enough to be plausible. They also work well for the designated purposes. The paper is also well-written and easy to follow.

      Weaknesses: I have concerns with two aspects of this work.

      (1) The optimality analysis leading to Eq (1) appears simplistic. The learning setting the authors describe (estimating the mean of a stationary Gaussian variable from a stream of observations) is a very basic problem in online learning/streaming algorithm literature. In this setting, the real “optimal” estimate is simply the arithmetic average of all samples seen so far. This can be implemented in an online manner with µˆt = µˆt−1 +(st −µˆt−1)/t. This is optimal in the sense that the estimator is always the maximum likelihood estimator given the samples seen up to time t. On the other hand, doing gradient descent only converges towards the MLE estimator after a large number of updates. Another critique is that while Eq (1) assumes an estimator of the mean (mˆu), it assumes that the variance is already known. However, in the actual model, the variance also needs to be estimated, and a more sophisticated analysis thus needs to take into account the uncertainty of the variance estimate and so on. Finally, the idea that the update should be inverse to the variance is connected to the well-established idea in neuroscience that more evidence should be integrated over when uncertainty is high. For example, in models of two-alternative forced choices it is known to be optimal to have a longer reaction time when the evidence is noisier.

      We agree with the reviewer that the simple example we gave was not ideal, as it could have been solved much more elegantly without gradient descent. And the reviewer correctly pointed out that our solution was not even optimal. We now present a better example in Figure 7, where the mean of the Gaussian variable is not stationary. Indeed, we did not intend to assume that the Gaussian variable is stationary, as we had in mind that the environment can change and hence also the Gaussian variable. If the mean is constant over time, it is indeed optimal to use the arithmetic mean. However, if the mean changes after many samples, then the maximum likelihood estimator model would be very slow to adapt to the new mean, because t is large and each new stimulus only has a small impact on the estimate. If the mean changes, uncertainty modulation may be useful: if the variance was small before, and the mean changes, then the resulting big error will influence the change in the estimate much more, such that we can more quickly learn the new mean. A combination of the two mechanisms would probably be ideal. We use gradient descent here, because not all optimisation problems the brain needs to solve are that simple. The problem with converging only after a large number of updates is a general problem of the algorithm. Here, we propose how the brain could estimate uncertainty to achieve the uncertainty-modulation observed in inference and learning tasks observed in behavioural studies. To give a more complex example, we present in a new Figure 8 how a hierarchy of UPE circuits can be used for uncertainty-based integration of prior and sensory information, similar to Bayes-optimal integration.

      Yes, indeed, there is well-known behavioural evidence, we would like to thank the reviewer for pointing out this connection to two-alternative forced choice tasks. We now cite this work. Our contribution is not on the already established computational or algorithmic level, but the proposal of a neural implementation of how uncertainty could modulate learning. The variance indeed needs to be estimated for optimal mean updating. That means that in the beginning, there will be non-optimal updating until the variance is learned. However, once the variance is learned, mean-updating can use the learned variance. There may be few variance contexts but many means to be learned, such that variance contexts can be reused. In any case, this is a problem on the algorithmic level, and not so much on the implementational level we are concerned with.

      (2) While the incorporation of different inhibitory cell types into the model is appreciated, it appears to me that the computation performed by the circuit is not novel. Essentially the model implements a running average of the mean and a running average of the variance, and gates updates to the mean with the inverse variance estimate. I am not sure about how much new insight the proposed model adds to our understanding of cortical microcircuits.

      We here suggest an implementation for how uncertainty could modulate learning via influencing prediction error com- putation. Our model can explain how humans could estimate uncertainty and weight prior versus sensory information accordingly. The focus of our work was not to design a better algorithm for mean and variance estimation, but rather to investigate how specialised prediction error circuits in the brain can implement these operations to provide new experimental hypotheses and predictions.

      Reviewer 1 (Recommendations For The Authors):

      Clarity and conciseness are a strength of this manuscript, but a more comprehensive explanation could improve the reader’s understanding in some instances. This includes the NMDA-based nonlinearity of pyramidal neuron activation - I am a little unclear exactly what problem this solves and how (alongside the significance of 5D and E).

      We agree that the introduction of the NMDA-based nonlinearity was too short and understand that it was confusing. We therefore modified and expanded the section, where we introduce the dendritic nonlinearity of the error neurons.

      Page 5: I think there is a ’positive’ and ’negative’ missing from the following sentence: ’the weights from the SSTs to the UPE neurons need to match the weights from the stimulus s to the UPE neuron and from the mean representation to the UPE neuron, respectively.’

      Thanks for pointing that out! We changed the sentence to be more clear to the following: “To ensure a comparison between the stimulus and the prediction, the inhibition from the SSTs needs to match the excitation it is compared to in the UPE neurons: In the positive PE circuit, the weights from the SSTs representing the prediction to the UPE neurons need to match the weights from the stimulus s to the UPE neurons. In the negative PE circuit, the weights from SSTs representing the stimulus to the negative UPE neurons need to match the weights from the mean representation to the UPE neurons, respectively.”

      Reviewer 2 (Recommendations For The Authors):

      Related to the first point above: I don’t feel that the authors adequately explained what the “s” and “a” information (e.g., in Figures 2A, 3A) represent, where they are coming from, what neurons they impact and in what way (and I believe Fig. 3A is missing one “a” label). I think they should elaborate more fully on these key, foundational details for their theory. To me, the idea of starting from the PV, SST, and pyramidal circuit, and then suddenly introducing the extra R neuron in Figure 5, just adds confusion. If the R neuron is meant to be the source, in practice, of certain inputs to some of the other cell types, then I think that should be included in the circuit from the start. Perhaps a good idea would be to start with two schematics, one in the form of Figure 5A (but with additional labeling for PV, SST) and one like Figure 1EF (but with auditory inputs as well), with a clear indication that the latter is meant to represent a preliminary, reduced form of the former that will be used in some initial tests of the performance of the PV, SST, UPE part of the circuit. Related to the Methods, I also can give a list of some specific complaints (in latex):

      (1) φ, φP V are used in equations (10), (11), so they should be defined there, not many equations later.

      Thank you, we changed that.

      (2) β, 1 − β appear without justification or explanation in (11). That is finally defined and derived several pages later.

      Thank you, we now define it right at the beginning.

      (3) Equations (10)-(12) should be immediately followed by information about plasticity, rather than deferring that.

      That’s a great idea. We changed it. Now the synaptic dynamics are explained together with the firing rate dynamics.

      (4) After the rate equations (10)-(12) and weight change equations (23)-(25) are presented, the same equations are simply repeated in the “Explanation of the synaptic dynamics” subsection.

      We agree that this was suboptimal. We moved the explanation of the synaptic dynamics up and removed the repetition.

      (5) In the circuit model (13)-(19), it’s not clear why rR shows up in the SST+ and PV− equations vs. rs in PV+ and SST−. Moreover, rs is not even defined! Also, I don’t see why wP V +,R shows up in the equation for rP V − .

      We added more explanation to the Methods section as to why the neurons receive these inputs and renamed rs to s, which is defined. The “+” in wP V +,R was a typo. Thank you for spotting that.

      (6) The authors should only number those equations that they will reference by number. Even more importantly, there are many numbers such as (20), (26), (32), (39) that are just floating there without referring to an equation at all.

      Thank you for spotting that. We corrected this.

      (7) The authors fail to specify what is ra in Figure 8. Moreover, it seems strange to me that wP V,a approaches σ rather than wP V,ara approaching σ, since φP V is a function of wP V,ara.

      You are right, wP V,ara should approach σ, but since ra is either 1 or 0 to indicate the presence of absence of the cue, and only wP V,a is plastic and changing„ wP V,a approaches σ.

      (8) I don’t understand the rationale for the authors to introduce equation. (30) when they already had plasticity equations earlier. What is the relation of (30), (31) to (24)?

      It is the same equation. In 30 we introduce simpler symbols for a better overview of the equations. 31 is equal to 30, with rP V replaced by it’s steady state.

      (9) η is omitted from (33) - it won’t affect the final result but should be there.

      We fixed this.

      I have many additional specific comments and suggestions, some related to errors that really should have been caught before manuscript submission. I will present these based on the order in which they arise in the manuscript.

      (1) In the abstract, the mention of layer 2/3 comes out of nowhere. Why this layer specifically? Is this meant to be an abstract/general cortical circuit model or to relate to a specific brain area? (Also watch for several minor grammatical issues in the abstract and later.)

      Thank you for pointing this out. We now mention that the observed error neurons can be found in layer 2/3 of diverse brain areas. It is meant to be a general cortical circuit model independent of brain area.

      (2) In par. 2 of the introduction, I find sentences 3-4 to be confusing and vague. Please rewrite what is meant more directly and clearly.

      We tried to improve those sentences.

      (3) Results subtitle 1: “suggests” → “suggest”

      Thank you.

      (4) Be careful to use math font whenever variables, such as a and N, are referenced (e.g., use of a instead of a bottom pg. 2).

      We agree and checked the entire manuscript.

      (5) Ref. to Fig. 1B bottom pg. 2 should be Fig. 1CD. The panel order in the figure should then be changed to match how it is referenced.

      We fixed it and matched the ordering of the text with the ordering of the figure.

      (6) Fig. 2C and 3E captions mention std but this is not shown in the figures - should be added.

      It is there, it is just very small.

      (7) Please clarify the relation of Figure 2C to 2F, and Figure 3F to 3H.

      We colour-coded the points in 2F that correspond to the bars in 2C. We did the same for 3F and 3H.

      (8) Figures 3E,3F appear to be identical except for the y-axis label and inclusion of std in 3F. Either more explanation is needed of how these relate or one should be cut.

      The difference is that 3E shows the activity of PVs based on only the sound cue in the absence of a whisker stimulus. And 3F shows the activity of PVs based on both the sound cue and whisker stimuli. We state this more clearly now.

      (9) Bottom of pg. 4: clarify that a quadratic φP V is a model assumption, not derived from results in the figure.

      We added that we assume this.

      (10) When k is referenced in the caption of Figure 4, the reader has no idea what it is. More substantially, most panels of Figure 4 are not referenced in the paper. I don’t understand what point the authors are trying to make here with much of this figure. Indeed, since the claim is that the uncertainy prediction should be based on division by σ2, why aren’t the numerical values for UPE rates much larger, since σ gets so small? The authors also fail to give enough details about the simulations done to obtain these plots; presumably these are after some sort of (unspecified) convergence, and in response to some sort of (unspecified) stimulus? Coming back to k, I don’t understand why k > 2 is used in addition to k = 2. The text mentions – even italicizes – “out-of-distribution dectection’, but this is never mentioned elsewhere in the paper and seems to be outside the true scope of the work (and not demonstrated in Figure 4). Sticking with k = 2 would also allow authors to simply use (·)k below (10), rather than the awkward positive part function that they have used now.

      We now introduce the equation for the error neurons in Eq. 3 within the text, such that k is introduced before the caption. It also explains why the numerical values do not become much larger. Divisive inhibition, unlike mathematical division, cannot lead to multiplication in neurons. To ensure this, we add 1 to the denominator.

      We show the error neuron responses to stimuli deviating from the learned mean after learning the mean and variance. The deviation is indicated either on the x-axis or in the legend depending on the plot. We now more explicitly state that these plots are obtained after learning the mean and the variance.

      We removed the mentioning of the “out-of-distribution detection” as a detailed treatment would indeed be outside of the scope.

      (11) Page 5, please clarify what is meant by “weights from the sound...”. You have introduced mathematical notation - use it so that you can be precise.

      We added the mathematical notation, thank you!

      (12) Figure 5D: legend has 5 entries but the figure panel only plots 4 quantities.

      The SST firing rate was below the R firing rate. We hence omitted the SST firing rate and its legend.

      (13) Figure 5: I don’t understand what point is being made about NMDA spikes. The text for Figure 5 refers to NMDA spikes in Figure 4, but nothing was said about NMDA spikes in the text for Figure 4 nor shown in Figure 4 itself.

      We were referring to the nonlinearity in the activation function of UPEs in Figure 4. We changed the text to clarify this point.

      (14) Figure 6: It is too difficult to distinguish the black and purple curves even on a large monitor. Also, the authors fail to define what they mean by “MM” and also do not define the quantities Y+ and Y− that they show. Another confusing aspect is that the model has PV+ and PV− neurons, so why doesn’t the figure?

      Thank you for the comment. We changed the colour for better visibility, replaced the Upsilons with UPE (we changed the notation at some point and forgot to change it in the figure), and defined MM, which is the mismatch stimulus that causes error activity. We did not distinguish between PV+ and PV− in the plot as their activity is the same on average. We plotted the activity of the PV+. We now mention that we show the activity of PV+ as the representative.

      (15) Also Figure 6: The authors do not make it clear in the text whether these are simulation results or cartoons. If the latter, please replace this with actual simulation results.

      They are actual simulation results. We clarified this in the text.

      (16) This work assumes the existence of two different populations of SST neurons with distinct UPE (pyramidal) targets. The Discussion doesn’t say much about any evidence for this assumption, which should be more thoroughly discussed and justified.

      We now discuss this in more detail in the discussion as mentioned in our response to the public review.

      (17) Par. 2 of the discussion refers to “Bayesian” and “Bayes-optimal” several times. Nothing was said earlier in the paper about a Bayesian framework for these results and it’s not clear what the authors mean by referring to Bayes here. This paragraph needs editing so that it clearly relates to the material of the results section and its implications.

      We added an additional results section (the last section with Figure 8) on integrating prior and sensory information based on their uncertainties, which is also the case for Bayes-optimal integration, and show that our model can reproduce the central tendency effect, which is a hallmark of Bayes-optimal behaviour.

      Reviewer 3 (Recommendations For The Authors):

      See public review. I think the gradient-descent type of update the authors do in Equation (1) could be more useful in a more complicated learning scenario where the MLE has no closed form and has to be computed with gradient-based algorithms.

      We responded in detail to your points in our point-by-point response to the public review.

    1. Author response:

      Reviewer #1 (Public review):

      This manuscript from Schwintek and coworkers describes a system in which gas flow across a small channel (10^-4-10^-3 m scale) enables the accumulation of reactants and convective flow. The authors go on to show that this can be used to perform PCR as a model of prebiotic replication.

      Strengths:

      The manuscript nicely extends the authors' prior work in thermophoresis and convection to gas flows. The demonstration of nucleic acid replication is an exciting one, and an enzyme-catalyzed proof-of-concept is a great first step towards a novel geochemical scenario for prebiotic replication reactions and other prebiotic chemistry.

      The manuscript nicely combines theory and experiment, which generally agree well with one another, and it convincingly shows that accumulation can be achieved with gas flows and that it can also be utilized in the same system for what one hopes is a precursor to a model prebiotic reaction. This continues efforts from Braun and Mast over the last 10-15 years extending a phenomenon that was appreciated by physicists and perhaps underappreciated in prebiotic chemistry to increasingly chemically relevant systems and, here, a pilot experiment with a simple biochemical system as a prebiotic model.

      I think this is exciting work and will be of broad interest to the prebiotic chemistry community.

      Weaknesses:

      The manuscript states: "The micro scale gas-water evaporation interface consisted of a 1.5 mm wide and 250 µm thick channel that carried an upward pure water flow of 4 nl/s ≈ 10 µm/s perpendicular to an air flow of about 250 ml/min ≈ 10 m/s." This was a bit confusing on first read because Figure 2 appears to show a larger channel - based on the scale bar, it appears to be about 2 mm across on the short axis and 5 mm across on the long axis. From reading the methods, one understands the thickness is associated with the Teflon, but the 1.5 mm dimension is still a bit confusing (and what is the dimension in the long axis?) It is a little hard to tell which portion (perhaps all?) of the image is the channel. This is because discontinuities are present on the left and right sides of the experimental panels (consistent with the image showing material beyond the channel), but not the simulated panels. Based on the authors' description of the apparatus (sapphire/CNC machined Teflon/sapphire) it sounds like the geometry is well-known to them. Clarifying what is going on here (and perhaps supplying the source images for the machined Teflon) would be helpful.

      We understand. We will update the figures to better show dimensions of the experimental chamber. We will also add a more complete Figure in the supplementary information. Part of the complexity of the chamber however stems from the fact that the same chamber design has also been used to create defined temperature gradients which are not necessary and thus the chamber is much more complex than necessary.

      The data shown in Figure 2d nicely shows nonrandom residuals (for experimental values vs. simulated) that are most pronounced at t~12 m and t~40-60m. It seems like this is (1) because some symmetry-breaking occurs that isn't accounted for by the model, and perhaps (2) because of the fact that these data are n=1. I think discussing what's going on with (1) would greatly improve the paper, and performing additional replicates to address (2) would be very informative and enhance the paper. Perhaps the negative and positive residuals would change sign in some, but not all, additional replicates?

      To address this, we will show two more replicates of the experiment and include them in Figure 2.

      We are seeing two effects when we compare fluorescence measurements of the experiments.

      Firstly, degassing of water causes the formation of air-bubbles, which are then transported upwards to the interface, disrupting fluorescence measurements. This, however, mostly occurs in experiments with elevated temperatures for PCR reactions, such as displayed in Figure 4.

      Secondly, due to the high surface tension of water, the interface is quite flexible. As the inflow and evaporation work to balance each other, the shape of the interface adjusts, leading to alterations in the circular flow fields below.

      Thus the conditions, while overall being in steady state, show some fluctuations. The strong dependence on interface shape is also seen in the simulation. However, modeling a dynamic interface shape is not so easy to accomplish, so we had to stick to one geometry setting. Again here, the added movies of two more experiments should clarify this issue.

      The authors will most likely be familiar with the work of Victor Ugaz and colleagues, in which they demonstrated Rayleigh-Bénard-driven PCR in convection cells (10.1126/science.298.5594.793, 10.1002/anie.200700306). Not including some discussion of this work is an unfortunate oversight, and addressing it would significantly improve the manuscript and provide some valuable context to readers. Something of particular interest would be their observation that wide circular cells gave chaotic temperature profiles relative to narrow ones and that these improved PCR amplification (10.1002/anie.201004217). I think contextualizing the results shown here in light of this paper would be helpful.

      Thanks for pointing this out and reminding us. We apologize. We agree that the chaotic trajectories within Rayleigh-Bénard convection cells lead to temperature oscillations similar to the salt variations in our gas-flux system. Although the convection-driven PCR in Rayleigh-Bénard is not isothermal like our system, it provides a useful point of comparison and context for understanding environments that can support full replication cycles. We will add a section comparing approaches and giving some comparison into the history of convective PCR and how these relate to the new isothermal implementation.

      Again, it appears n=1 is shown for Figure 4a-c - the source of the title claim of the paper - and showing some replicates and perhaps discussing them in the context of prior work would enhance the manuscript.

      We appreciate the reviewer for bringing this to our attention. We will now include the two additional repeats for the data shown in Figure 4c, while the repeats of the PAGE measurements are already displayed in Supplementary Fig. IX.2. Initially, we chose not to show the repeats in Figure 4c due to the dynamic and variable nature of the system. These variations are primarily caused by differences at the water-air interface, attributed to the high surface tension of water. Additionally, the stochastic formation of air bubbles in the inflow—despite our best efforts to avoid them—led to fluctuations in the fluorescence measurements across experiments. These bubbles cause a significant drop in fluorescence in a region of interest (ROI) until the area is refilled with the sample.

      Unlike our RNA-focused experiments, PCR requires high temperatures and degassing a PCR master mix effectively is challenging in this context. While we believe our chamber design is sufficiently gas-tight to prevent air from diffusing in, the high surface-to-volume ratio in microfluidics makes degassing highly effective, particularly at elevated temperatures. We anticipate that switching to RNA experiments at lower temperatures will mitigate this issue, which is also relevant in a prebiotic context.

      The reviewer’s comments are valid and prompt us to fully display these aspects of the system. We will now include these repeats in Figure 4c to give readers a deeper understanding of the experiment's dynamics. Additionally, we will provide videos of all three repeats, allowing readers to better grasp the nature of the fluctuations in SYBR Green fluorescence depicted in Figure 4c.

      I think some caution is warranted in interpreting the PCR results because a primer-dimer would be of essentially the same length as the product. It appears as though the experiment has worked as described, but it's very difficult to be certain of this given this limitation. Doing the PCR with a significantly longer amplicon would be ideal, or alternately discussing this possible limitation would be helpful to the readers in managing expectations.

      This is a good point and should be discussed more in the manuscript. Our gel electrophoresis is capable of distinguishing between replicate and primer dimers. We know this since we were optimizing the primers and template sequences to minimize primer dimers, making it distinguishable from the desired 61mer product. That said, all of the experiments performed without a template strand added did not show any band in the vicinity of the product band after 4h of reaction, in contrast to the experiments with template, presenting a strong argument against the presence of primer dimers.

      Reviewer #2 (Public review):

      Schwintek et al. investigated whether a geological setting of a rock pore with water inflow on one end and gas passing over the opening of the pore on the other end could create a non-equilibrium system that sustains nucleic acid reactions under mild conditions. The evaporation of water as the gas passes over it concentrates the solutes at the boundary of evaporation, while the gas flux induces momentum transfer that creates currents in the water that push the concentrated molecules back into the bulk solution. This leads to the creation of steady-state regions of differential salt and macromolecule concentrations that can be used to manipulate nucleic acids. First, the authors showed that fluorescent bead behavior in this system closely matched their fluid dynamic simulations. With that validation in hand, the authors next showed that fluorescently labeled DNA behaved according to their theory as well. Using these insights, the authors performed a FRET experiment that clearly demonstrated the hybridization of two DNA strands as they passed through the high Mg++ concentration zone, and, conversely, the dissociation of the strands as they passed through the low Mg++ concentration zone. This isothermal hybridization and dissociation of DNA strands allowed the authors to perform an isothermal DNA amplification using a DNA polymerase enzyme. Crucially, the isothermal DNA amplification required the presence of the gas flux and could not be recapitulated using a system that was at equilibrium. These experiments advance our understanding of the geological settings that could support nucleic acid reactions that were key to the origin of life.

      The presented data compellingly supports the conclusions made by the authors. To increase the relevance of the work for the origin of life field, the following experiments are suggested:

      (1) While the central premise of this work is that RNA degradation presents a risk for strand separation strategies relying on elevated temperatures, all of the work is performed using DNA as the nucleic acid model. I understand the convenience of using DNA, especially in the latter replication experiment, but I think that at least the FRET experiments could be performed using RNA instead of DNA.

      We understand the request only partially. The modification brought about by the two dye molecules in the FRET probe to be able to probe salt concentrations by melting is of course much larger than the change of the backbone from RNA to DNA. This was the reason why we rather used the much more stable DNA construct which is also manufactured at a lower cost and in much higher purity also with the modifications. But we think the melting temperature characteristics of RNA and DNA in this range is enough known that we can use DNA instead of RNA for probing the salt concentration in our flow cycling.

      Only at extreme conditions of pH and salt, RNA degradation through transesterification, especially under alkaline conditions is at least several orders of magnitude faster than spontaneous degradative mechanisms acting upon DNA [Li, Y., & Breaker, R. R. (1999). Kinetics of RNA degradation by specific base catalysis of transesterification involving the 2 ‘-hydroxyl group. Journal of the American Chemical Society, 121(23), 5364-5372.]. The work presented in this article is however focussed on hybridization dynamics of nucleic acids. Here, RNA and DNA share similar properties regarding the formation of double strands and their respective melting temperatures. While RNA has been shown to form more stable duplex structures exhibiting higher melting temperatures compared to DNA [Dimitrov, R. A., & Zuker, M. (2004). Prediction of hybridization and melting for double-stranded nucleic acids. Biophysical Journal, 87(1), 215-226.], the general impact of changes in salt, temperature and pH [Mariani, A., Bonfio, C., Johnson, C. M., & Sutherland, J. D. (2018). pH-Driven RNA strand separation under prebiotically plausible conditions. Biochemistry, 57(45), 6382-6386.] on respective melting temperatures follows the same trend for both nucleic acid types. Also the diffusive properties of RNA and DNA are very similar [Baaske, P., Weinert, F. M., Duhr, S., Lemke, K. H., Russell, M. J., & Braun, D. (2007). Extreme accumulation of nucleotides in simulated hydrothermal pore systems. Proceedings of the National Academy of Sciences, 104(22), 9346-9351.].

      Since this work is a proof of principle for the discussed environment being able to host nucleic acid replication, we aimed to avoid second order effects such as degradation by hydrolysis by using DNA as a proxy polymer. This enabled us to focus on the physical effects of the environment on local salt and nucleic acid concentration. The experiments performed with FRET are used to visualize local salt concentration changes and their impact on the melting temperature of dissolved nucleic acids.  While performing these experiments with RNA would without doubt cover a broader application within the field of origin of life, we aimed at a step-by-step / proof of principle approach, especially since the environmental phenomena studied here have not been previously investigated in the OOL context. Incorporating RNA-related complexity into this system should however be addressed in future studies. This will likely require modifications to the experimental boundary conditions, such as adjusting pH, temperature, and salt concentration, to account for the greater duplex stability of RNA. For instance, lowering the pH would reduce the RNA melting temperature [Ianeselli, A., Atienza, M., Kudella, P. W., Gerland, U., Mast, C. B., & Braun, D. (2022). Water cycles in a Hadean CO2 atmosphere drive the evolution of long DNA. Nature Physics, 18(5), 579-585.].

      (2) Additionally, showing that RNA does not degrade under the conditions employed by the authors (I am particularly worried about the high Mg++ zones created by the flux) would further strengthen the already very strong and compelling work.

      Based on literature values for hydrolysis rates of RNA [Li, Y., & Breaker, R. R. (1999). Kinetics of RNA degradation by specific base catalysis of transesterification involving the 2 ‘-hydroxyl group. Journal of the American Chemical Society, 121(23), 5364-5372.], we estimate RNA to have a halflife of multiple months under the deployed conditions in the FRET experiment (High concentration zones contain <1mM of Mg2+). Additionally, dsRNA is multiple orders of magnitude more stable than ssRNA with regards to degradation through hydrolysis [Zhang, K., Hodge, J., Chatterjee, A., Moon, T. S., & Parker, K. M. (2021). Duplex structure of double-stranded RNA provides stability against hydrolysis relative to single-stranded RNA. Environmental Science & Technology, 55(12), 8045-8053.], improving RNA stability especially in zones of high FRET signal. Furthermore, at the neutral pH deployed in this work, RNA does not readily degrade. In previous work from our lab [Salditt, A., Karr, L., Salibi, E., Le Vay, K., Braun, D., & Mutschler, H. (2023). Ribozyme-mediated RNA synthesis and replication in a model Hadean microenvironment. Nature Communications, 14(1), 1495.], we showed that the lifetime of RNA under conditions reaching 40mM Mg2+ at the air-water interface at 45°C was sufficient to support ribozymatically mediated ligation reactions in experiments lasting multiple hours.

      With that in mind, gaining insight into the median Mg2+ concentration across multiple averaged nucleic acid trajectories in our system (see Fig. 3c&d) and numerically convoluting this with hydrolysis dynamics from literature would be highly valuable. We anticipate that longer residence times in trajectories distant from the interface will improve RNA stability compared to a system with uniformly high Mg2+ concentrations.

      (3) Finally, I am curious whether the authors have considered designing a simulation or experiment that uses the imidazole- or 2′,3′-cyclic phosphate-activated ribonucleotides. For instance, a fully paired RNA duplex and a fluorescently-labeled primer could be incubated in the presence of activated ribonucleotides +/- flux and subsequently analyzed by gel electrophoresis to determine how much primer extension has occurred. The reason for this suggestion is that, due to the slow kinetics of chemical primer extension, the reannealing of the fully complementary strands as they pass through the high Mg++ zone, which is required for primer extension, may outcompete the primer extension reaction. In the case of the DNA polymerase, the enzymatic catalysis likely outcompetes the reannealing, but this may not recapitulate the uncatalyzed chemical reaction.

      This is certainly on our to-do list. Our current focus is on templated ligation rather than templated polymerization and we are working hard to implement RNA-only enzyme-free ligation chain reaction, based on more optimized parameters for the templated ligation from 2’3’-cyclic phosphate activation that was just published [High-Fidelity RNA Copying via 2′,3′-Cyclic Phosphate Ligation, Adriana C. Serrão, Sreekar Wunnava, Avinash V. Dass, Lennard Ufer, Philipp Schwintek, Christof B. Mast, and Dieter Braun, JACS doi.org/10.1021/jacs.3c10813 (2024)]. But we first would try this at an air-water interface which was shown to work with RNA in a temperature gradient [Ribozyme-mediated RNA synthesis and replication in a model Hadean microenvironment, Annalena Salditt, Leonie Karr, Elia Salibi, Kristian Le Vay, Dieter Braun & Hannes Mutschler, Nature Communications doi.org/10.1038/s41467-023-37206-4 (2023)] before making the jump to the isothermal setting we describe here. So we can understand the question, but it was good practice also in the past to first get to know the setting with PCR, then jump to RNA.

      Reviewer #2 (Recommendations for the authors):

      (1) Could the authors comment on the likelihood of the geological environments where the water inflow velocity equals the evaporation velocity?

      This is an important point to mention in the manuscript, thank you for pointing that out. To produce a defined experiment, we were pushing the water out with a syringe pump, but regulated in a way that the evaporation was matching our flow rate. We imagine that a real system will self-regulate the inflow of the water column on the one hand side by a more complex geometry of the gas flow, matching the evaporation with the reflow of water automatically. The interface would either recede or move closer to the gas flux, depending on whether the inflow exceeds or falls short of the evaporation rate. As the interface moves closer, evaporation speeds up, while moving away slows it down. This dynamic process stabilizes the system, with surface tension ultimately fixing the interface in place.

      We have seen a bit of this dynamic already in the experiments, could however so far not yet find a good geometry within our 2-dimensional constant thickness geometry to make it work for a longer time. Very likely having a 3-dimensional reservoir of water with less frictional forces would be able to do this, but this would require a full redesign of a multi-thickness microfluidics. The more we think about it, the more we envisage to make the next implementation of the experiment with a real porous volcanic rock inside a humidity chamber that simulates a full 6h prebiotic day. But then we would lose the whole reproducibility of the experiment, but likely gain a way that recondensation of water by dew in a cold morning is refilling the water reservoirs in the rocks again. Sorry that I am regressing towards experiments in the future.

      (2) Could the authors speculate on using gases other than ambient air to provide the flux and possibly even chemical energy? For example, using carbonyl sulfide or vaporized methyl isocyanide could drive amino acid and nucleotide activation, respectively, at the gas-water interface.

      This is an interesting prospect for future work with this system. We thought also about introducing ammonia for pH control and possible reactions. We were amazed in the past that having CO2 instead of air had a profound impact on the replication and the strand separation [Water cycles in a Hadean CO2 atmosphere drive the evolution of long DNA, Alan Ianeselli, Miguel Atienza, Patrick Kudella, Ulrich Gerland, Christof Mast & Dieter Braun, Nature Physics doi.org/10.1038/s41567-022-01516-z (2022)]. So going more in this direction absolutely makes sense and as it acts mostly on the length-selectively accumulated molecules at the interface, only the selected molecules will be affected, which adds to the selection pressure of early evolutionary scenarios.

      Of course, in the manuscript, we use ambient air as a proxy for any gas, focusing primarily on the energy introduced through momentum transfer and evaporation. We speculate that soluble gasses could establish chemical gradients, such as pH or redox potential, from the bulk solution to the interface, similar to the Mg2+ accumulation shown in Figure 3c. The nature of these gradients would depend on each gas's solubility and diffusivity. We have already observed such effects in thermal gradients [Keil, L. M., Möller, F. M., Kieß, M., Kudella, P. W., & Mast, C. B. (2017). Proton gradients and pH oscillations emerge from heat flow at the microscale. Nature communications, 8(1), 1897.] and finding similar behavior in an isothermal environment would be a significant discovery.

      (3) Line 162: Instead of "risk," I suggest using "rate".

      Oh well - thanks for pointing this out! Will be changed.

      (4) Using FRET of a DNA duplex as an indicator of salt concentration is a decent proxy, but a more direct measurement of salt concentration would provide further merit to the explicit statement that it is the salt concentration that is changing in the system and not another hidden parameter.

      Directly observing salt concentration using microscopy is a difficult task. While there are dyes that change their fluorescence depending on the local Na+ or Mg2+ concentration, they are not operating differentially, i.e. by making a ratio between two color channels. Only then we are not running into artifacts from the dye molecules being accumulated by the non-equilibrium settings. We were able to do this for pH in the past, but did not find comparable optical salt sensors. This is the reason we ended up with a FRET pair, with the advantage that we actually probe the strand separation that we are interested in anyhow. Using such a dye in future work would however without a doubt enhance the understanding of not only this system, but also our thermal gradient environments.

      (5) Figure 3a: Could the authors add information on "Dried DNA" to the caption? I am assuming this is the DNA that dried off on the sides of the vessel but cannot be sure.

      Thanks to the reviewer for pointing this out. This is correct and we will describe this better in the revised manuscript.

      (6) Figure 4b and c: How reproducible is this data? Have the authors performed this reaction multiple independent times? If so, this data should be added to the manuscript.

      The data from the gel electrophoresis was performed in triplicates and is shown in full in supplementary information. The data in c is hard to reproduce, as the interface is not static and thus ROI measurements are difficult to perform as an average of repeats. Including the data from the independent repeats will however give the reader insight into some of the experimental difficulties, such as air bubbles, which form from degassing as the liquid heats up, that travel upwards to the interface, disrupting the ongoing fluorescence measurements.

      (7) Line 256: "shielding from harmful UV" statement only applies to RNA oligomers as UV light may actually be beneficial for earlier steps during ribonucleoside synthesis. I suggest rephrasing to "shielding nucleic acid oligomers from UV damage.".

      Will be adjusted as mentioned.

      (8) The final paragraph in the Results and Discussion section would flow better if placed in the Conclusion section.

      This is a good point and we will merge results and discussion closer together.

      (9) Line 262, "...of early Life" is slightly overstating the conclusions of the study. I suggest rephrasing to "...of nucleic acids that could have supported early life."

      This is a fair comment. We thank the reviewer for his detailed analysis of the manuscript!

      (10) In references, some of the journal names are in sentence case while others are in title case (see references 23 and 26 for example).

      Thanks - this will be fixed.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Response to Reviewer Comments:


      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      *Glaucoma-associated optineurin mutations increase transmitophagy in vertebrate optic nerve.

      Summary In Jeong et al., the authors perform live imaging of the X. laevis optic nerve to track neuronal mitochondrial movement and expulsion in an intact nervous system. The authors observe similar mitochondrial dynamics in vivo as previously described in other systems. They find that stationary mitochondria are more likely to be associated with OPTN, suggestive of mitochondria undergoing mitophagy. Forced expression of OPTN mutations results in a larger pool of stationary mitochondria that colocalize withLC3B, and OPTN. Finally, the authors argue that extra-axonal mitochondria are observed more frequently in OPTN mutants, suggesting that mutations in OPTN that are associated with disease can lead to an increase in the expulsion of mitochondria through exopher-like structures.

      Major Findings and impact: • The authors establish that mitochondria dynamics can be tracked in the X. laevis optic nerve. • OPTN mutations increase the stationary pool of mitochondria and likely result in increased rates of mitophagy. • Exopher-like structures containing mitochondria and LC3 can be expelled from the optic nerve and increase in the presence of OPTN mutations. These structures were observed in a living system and have interesting implications in the context of disease.

      Concerns: • The authors state in their results that the secreted blebs are exophers. While these initial observations are consistent with exophers, additional data are needed to strengthen this claim. For example: what are the sizes of secreted vesicles? Do all express LC3? How frequently do these occur? From where are they expelling? Alternatively, the discussion of exophers could be moved to the discussion.*

      We agree that calling the axon shedding intermediates “exophers” was an overreach on our part. While we believe that in all probability time will demonstrate this to be the case, reviewers are correct in stating that putting our work in the context of exophers is best left to the discussion. We have removed all mention of exophers from the results and graphical abstract and now use the term only once in the discussion. We do provide detail as to the frequency of the structures, what fraction contain mitochondria, and morphological parameters of the contained mitochondria. And while all of these new data support them being exophers, the point remains that the use of the nomenclature “exopher” in the results section was inappropriate.

      • Quantifications in sparse labeling experiments seem quite surprising and concerns related to these findings should be addressed. As the authors used LC3b expression to represent axonal volume, the authors should demonstrate that this is the case using an axonal fill or membrane marker in both the wt and E50K conditions. This is important as it is unclear whether LC3b expression is consistent between the wild type and the E50K conditions. Lower expression of LC3b in E50K could account for the large changes in axonal width that seem to be observed and could confound the measured amount of expelled mitochondria.*
      • *

      We agree that using EGFP-LC3b as a “cell fill” was problematic in a situation where the interventions likely perturb autophagy/mitophagy and therefore might have also perturbed LC3b. We do provide some axon width and LC3b-EGFP intensity data for a partial dataset that had been imaged side-by-side, showing that expression of LC3b is not different in the two conditions. We also provide independent measures of extra-axonal mitochondria based on a membrane-GFP reporter. While in principle there would be value to repeat the studies of Wt vs. E50K in the context of the membrane-GFP reporter, these experiments would involve new constructs and new breedings, and would likely take months to years to complete.

        • Could large amounts of exogenous mitochondria in explant experiments be from cells that died during the plantation?* The concern that some of the exogenous mitochondria signal might derive from degenerating axons is one that we worry much about, and not only in the transplantation experiments. In our sparse labeling experiments we do occasionally see axons undergoing Wallerian degeneration, but it is rare and does not appear to be more common in the expression of the mutated OPTN, at least not at the stage after transgene expression that the analyses were performed. We do provide new data that expression of E50K OPTN does not compromise vision at the time that experiments were carried out, ruling out that extra-axonal mitochondria are the result of large-scale degeneration. However, from other data we know that axon loss would likely need to be very extensive to manifest itself in functional vision loss in our behavioral assay, so milder axon loss contributing some noise to the measures cannot be excluded. But, the point raised is heard, and now we include a sentence in the discussion acknowledging that some of the signal outside of axons could have been due to degenerating axons, but still contend that our documentation of shedding intermediates support the view that many of the axonal mitochondria outside of axons were shed from otherwise intact axons.

      Suggested experiments/quantifications: • In OPTN/MITO/LC3b trafficking experiments, does flux/number of events change? Representative kymograph in Figure 2D seems to show far more OPTN-positive mitochondria which is opposite of what is shown in Figure 2C.

      Multiple reviewers rightfully point out that we did not carry out the flux experiments which would be necessary to make definitive statements regarding the amount of mitophagy. New experiments show that inhibiting lysosomal activity through chloroquine does increase the amount of astrocytic autophagosomes not yet acidified as expected, and that they contain axonal mitochondria signal, supporting the idea that astrocytes are involved in the degradation of axonal mitochondria. However, they did not show changes in the amount of stopped mitochondria, supporting the view that the co-localization of OPTN and mitochondria in axons is not conventional autophagy. This is a very important point that affects the interpretation of our results, and we thank reviewers for suggesting this experiment.

      • Demonstrate that axonal width measured with LC3B is representative of axonal fill/membrane marker in wt and E50K. Axonal area appears to change, is this accurate? This appears to be the case for both figure 3 and figure 4.* Addressed above.

      • Raw images in addition to the reconstruction would be beneficial.* Now include raw images beside the reconstruction at the first use of reconstructions.

      • Further characterization of exopher-like structures.**

      * Addressed above.

      ***Referees cross-commenting**

      I agree with the concerns of the other reviewers, and perhaps was over-optimistic about a timeline for revision. However, I do think the work is worth the effort, and I hope to see a revised manuscript published somewhere, as these observations are novel

      Reviewer #1 (Significance (Required)):

      This work reports potentially novel biology, and thus will be of interest to the field. The strength of the study is that it is an initial description of this biology, rather than a complete analysis. The work raises many more questions than it answers, and much further work on this topic is required to support these initial findings, but the manuscript will likely be of interest to many. Revisions are required to improve the rigor and clarity of the work, but following these revisions we recommend publication to facilitate follow-up work.*

      Fully agree that our study raises far more questions than it answers. Believe that the revisions made to address reviewer comments go a long way to improve rigor and clarity of the work. We hope that the reviewers agree and deem the changes sufficient.

      *Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      Summary: This article studied transmitophagy in xenopus optic nerves in the context of overexpressing glaucoma-associated optineurin mutations. Using a series of labeling, imaging and transplantation techniques, the authors found that overexpressing mutated optineurins stops mitochondria movements and potentially induces transmitophagy, and that astrocytes are responsible for taking up the extra-axonal mitochondria. Below are my comments on this article.

      Major comments: 1. Identifying extra-axonal mitochondria is key to this research. In Figure 3, the authors used EGFP-LC3B as a marker for RGC boundaries. However, it is unconvincing how perfect LC3B is as a cell membrane marker. Particularly in the case of OPTN E50K OE, it seems that the optic nerve is thinner than the WT condition, which makes the quantification of extra-axonal OPTN less convincing. The authors should detect extra-axonal mitochondria with an RGC membrane marker or cytosolic marker. In addition, in Figure 3, the extra-axonal mitochondria seem to localize mostly on the dorsal surface. Why is there such a polarity?*

      As stated above, we acknowledge that the use of LC3b as both an autophagosome marker and a cell fill was somewhat problematic and now provide additional experiments ruling out that the LC3b expression or axon thickness in our sparse axon labeling experiments, or that E50K might affect the thickness of the optic nerve. In addition, we also provide additional new data using a bona fide membrane marker together a transgenic labeling or RGC mitochondria that also shows under the “baseline state” extensive mitochondria signal outside the axons on the surface of the optic nerve (New Fig. 6A and new Suppl Fig. 3D). All the new data are consistent with the previous data and support the view that using LC3b potentially could have been problematic, for the reasons reviewers state, but in practice it was not.

      The reviewer observes that the E50K optic nerve appears thinner--this observation is not a consistent difference in optic nerves across the experimental groups. The images we show are always near the mean values for the quantitative results presented, and we rather not include prettier nerves that are not representative of the whole datasets.

      As for why the extra-axonal mitochondria localize mostly to the dorsal surface, it remains undetermined. There are dorsoventral differences in the optic nerve established during development, as developmental Sonic hedgehog signaling emanating from the midline appears to affect dorsoventral aspects of the optic nerve differentially. Early axon loss in humans and some models of glaucoma do show a dorsal bias, and there may be optic nerve lymphatic structure reported in mice that also may be preferentially dorsal. However, it is not known whether any of these observations are connected, so we did not want to speculate beyond what the data say. We do now explicitly mention the dorsoventral difference in the discussion, and state why we think it may be worth further study.

      • The experiment in Figure 5 is very important as it gives direct evidence of transmitophagy. However, one caveat is that the mitotracker injection is done after the transplantation. If in rare cases the dye is leaky after injection and is taken up by astrocytes directly, then the conclusion that mitochondria from RGCs are phagocytosed by astrocytes will be flawed. The authors should either use a transgene in the donor to label mitochondria or inject mitotracker into the donor before the transplantation and repeat the experiments. In addition, in Figure 5E, what is the large membranous structure inside the highlighted astrocyte? Is it associated with phagocytosis?*

      We fully agree that MitoTracker is an imperfect tool, both for the reason stated here that the dye may get into the astrocytes directly (or may label astrocyte mitochondria after it is released from degrading RGC mitochondria), and, also as stated by reviewer 3, that it requires healthy mitochondria for labeling. For this reason, we have added new datasets that rely on RGC mitochondria labeling not by Mitotracker but through a genetic reporter. As to identity of the conspicuous structure shown inside the astrocytes, it remains an open question, and we are avidly pursuing what astrocytic organelles are involved through additional transgenic reporters and correlated-light-EM studies, but those are complicated experiments that are beyond the scope of the current manuscript.

      • This research is entirely based on overexpression of OPTN. Since overexpressing WT OPTN does seem to affect mito trafficking (Figure S2G, and the description in the manuscript is often inconsistent with this result), it is unclear what the increased stalled mitochondria really mean when overexpressing mutated OPTN. Similarly, the authors examined extra-axonal mitochondria in Figures 3 and 4 all in overexpressing conditions, and made the connection that increased stalled mitochondria lead to transmitophagy. However, this conclusion will be better supported by using mutant animals rather than overexpression. The authors should consider using OPTN mutant xenopus if available or using CRISPR to introduce the specific mutations and repeat mitochondria trafficking and transmitophagy.*

      • *

      We thank this reviewer by pointing out an important detail that we failed to highlight, namely that transgenic overexpression of Wt OPTN (and/or Wt LC3B) does have a small but significant effect on mitochondria trafficking. Interestingly, it is affecting just the speed of retrogradely transported mitochondria, which based on the elegant work of Holzbaur and colleagues, include mitochondria destined for degradation. So, we now acknowledge more explicitly that, since our studies involve expression of OPTN and LC3b transgenes (fluorophore tagged human genes, no less), that some caution should be exercised in not overinterpreting the results. Nonetheless, since we show that expression of Wt OPTN behaves similarly to expression of a mitochondria reporter (Tom20-mCherry) in not affecting either stopped mitochondria or extra-axonal mitochondria, we believe that our results still stand. Nonetheless, we now make mention of the effect Wt OPTN on retrograde mitochondria movement. We have embarked on OPTN loss-of-function studies and have some founder animals carrying CRISPR-generated mutations; however, these experiments will take additional time, and based on the results in mammals may or may not show any measurable effects in our assays, not only because of possible redundancy by the other damaged mitochondria adaptors that we mention in the introduction, but also because the mutations that affect the shedding process (as well as cause glaucoma) are thought to be gain-of-function mutations. However, we decided not to dwell on these complexities in the discussion, as the discussion was previously quite extensive and now is even moreso with the added discussion on how our studies relate to those of exophers.

      • On Page 12, the authors claim that even overexpressing WT OPTN causes extra-axonal mitochondria in the optic nerve. However, there is no control condition without OE to support this conclusion. It is thus unclear to what extent extra-axonal mitochondria occur at baseline and how many extra-axonal mitochondria can be induced by overexpression. The authors should include, in Figure 3 and 4, controls without overexpression.*

      We acknowledge that our language was confusing and somewhat misleading on this point. With the caveat mentioned above that WT OPTN expression does perturb the system somewhat (by increasing the speed of mitochondria retrograde transport, perhaps by increasing the proportion of retrograde moving mitochondria tagged for degradation), we still contend that the state observed after WT OPTN expression is close to the “baseline” state. In support of that, in the new data included in response to the LC3b concern, we observe plentiful shedding events in the absence of any OPTN or LC3b transgenes. Indeed, what may be the most surprising finding of our studies is that in the absence of any significant perturbation of OPTN, there is already a large fraction of axonal mitochondria that are outside of axons and inside of astrocytes, which is consistent with what we previously observed in the optic nerve head of mice; however, the current studies provide much more rigorous quantification of the process and live imaging of intermediates, but also provide for an intervention that increases the process. While there are many more questions to answer, we do believe our studies contribute mechanistic insights.

      • A technical question regarding kymographs: Based on Figure 2C, it looks that OPTN and LC3B labeling are pretty diffuse in axons and this makes sense since they may only be associated with damaged mitos. But this raises a question about how accurate the kymograph assay is. It may significantly underestimate the fraction of OPTN/LC3B that is stationary since they appeared diffusedon the kymograph. This may explain why the percentage of stationary OPTN/LC3B is so small when the authors OE WT OPTN in Figure 2E and 2E', compared to the percentage of moving mitochondria shown in Figure 1E.*

      We fully agree that the kymograph studies likely underestimate the amounts of stationary mitochondria for the reasons stated. However, we interpret the discrepancy between Figure 1E and 2E and 2E’ differently. We believe that the value of stopped mitochondria in the sparse labeling experiments are actually more accurate, as the value of stopped mitochondria in the whole nerve experiments likely include mitochondria stopped within the axons, but also mitochondria recently shed either by those or nearby axons which are perceived to be in axons due to limitations of imaging resolution. In the discussion we now make very explicit that all the measures we provide need should be interpreted as estimates, as every experiment relies on assumptions and is subject to technical limitations.

      Minor: 1. Figure 2E and 2E' do not agree with the text on page 7 and page 8. Not only F178A, but also H486R and D474N have no effect on OPTN trafficking. The authors should make their conclusions more accurate.

      F178 was the only mutation that had no effect on either OPTN or LC3b in either F0 or F1 experiments. However, we agree that our language should have been clearer, and now we have made our description of the results (and conclusions) more accurate.

      • Figure S2E-F: why does OE of mutated OPTN in F1s but not in F0s reduce trafficking speed compared to WT?*

      We do not know the reason for this discrepancy. Though it does not wholly agree with the rest of the story, we felt it important to include all relevant data, not only that which perfectly fit our interpretation. One possible reason may be that the F1 data derives from a single integration event, which is the reason why we trust more the F0 data that derive from multiple integrations, in what are essentially outbred animals, which is the reason we present the F0 data as the primary results where possible.

      * In movie 5, fusion of exopher with other structures is not clear and also the GFP signal does not disappear, which is in contrast to the statement in the text that the GFP signal is quenched in acidified environment. To confirm that LC3B leaves RGC axons in exophers, the authors should consider switching the fluorophores and examine LC3B localization during exopher formation.*

      This too is a valid point, and we have amended our description of these results. While swapping fluorophores between OPTN and LC3b is a highly worthy experiment, for technical reasons it likely would take many months to carry out just because of how involved it is to make the relevant constructs (recombineering details provided in the methods section).

      • In figure 6, to better show exopher formation and the pinching-off step, the authors should consider labeling the membrane and mitochondria instead of using the LC3B and OPTN marker.*

      This arguably was the biggest weakness of our initial submission, and now provide new experiments using a bona fide membrane marker. We have not yet captured a pinching-off event with these better reporters, but that is not surprising given how rare they are, which we now quantify. Indeed, a membrane reporter and a mitochondria transgene in sparsely labeled axons are the ideal tool for figuring out the frequency of these structures and what fraction contain mitochondria, data which we now provide.

      ***Referees cross-commenting**

      Generally agree with the criticisms voiced by the other reviewers; in aggregate the reviews indicate the manuscript needs more than just a quick fix.

      Reviewer #2 (Significance (Required)):

      Previous literature has already described the transmitophagy process in the optic nerve. The significance of this paper lies in the observation that overexpressing glaucoma-associated OPTN mutants can induce increased transmitophagy through astrocytes, which points to a potential role of OPTN in glaucoma. A highlight of this paper is the use of correlated light SBEM to directly show transmitophagy in astrocytes. However, the significance of this paper may be limited for the following reasons: 1. everything is based on overexpression of mutated OPTN, which makes it hard to translate the results to real disease conditions; 2. The consequence of increased transmitophagy on RGC survival or visual functions is unclear.

      *

      While we agree that much of the paper is based on OPTN overexpression, we did have experiments and now provide more that that were not based on OPTN overexpression. Some of these still involve expression of a different transgene (Tom20-mCherry) that might in principle perturb the system, though we show that expression of Tom20-mCherry does not affect mitochondria movement parameters as measured by Mitotracker. As to “the consequence of increased transmitophagy”, we do now provide data showing that there is no vision loss suggestive of axon loss or severe dysfunction at the time that the imaging studies were carried out. Whether longer term expression of these OPTN transgenes lead to axon degeneration and visual dysfunction are studies that are ongoing, but those studies involve extensive characterizations and controls that are beyond what could be included in this study.

      *Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary In this work, Jeong et al describe the effect of Optineurin (OPTN) mutations in the transcellular degradation of retinal ganglion cell (RGC) mitochondria by astrocytes at the Optic Nerve (ON), a process previously described this group and referred as "transmitophagy" (Davis et al 2014). Here, authors use Xenopus laevis animal model to image the optic nerve of animals carrying different OPTN mutations associated to disease or with compromised function and explore its effect in mitochondria dynamics at the RGC axons. They find that OPTN mutants lead to increased stationary mitochondria in the nerve and affect their co-localization with mitophagy-related markers, suggesting alterations in this pathway. Finally, they found that mitochondria co-localizing with OPTN can be found in the periphery of the ON under different conditions and this is particularly increased in glaucoma-associated E50K mutation. This extracellular mitochondria are transferred in vesicles to astrocytes, as they previously described in mice (Davis 2014), where they are presumably degraded. Major comments - OPTN levels at a given time point cannot be used as readout for mitophagy level/flux. Both OPTN and LC3b are degraded upon fusion with acidic compartment (i.e. lysosomes, PMID: 33783320, 33634751) and that is the reason why the field of autophagy /mitophagy blocks lysosomal activity to measure autophagy/mitophagy flux (PMID: 33634751). In this document, authors claim that there is low levels of mitophagy in RGC axons at baseline and increased levels of mitophagy in glaucoma associated perturbations just based on increased presence of OPTN+ mitochondria in this condition. This could be also interpreted as an accumulation of non-degraded defective mitochondria due to a mitophagy block in neurons carrying the glaucoma associated mutation, which is the opposite of what they propose. If authors want to evaluate mitophagy levels in this system, mitophagy/autophagy flux experiments should be performed.*

      In response to reviewers, we do now include “lysosome inhibition” experiment, using chloroquine at doses modestly above those used in aquaculture as an anti-parasitic. After testing various chemical means to inhibit lysosome activity, it was the only one that did not adversely affect the animals. We know the chloroquine intervention works because we see the expected increase in autophagosomes using the standard LC3b-tandem reporter, and in those unacidified astrocytic autophagosomes we do indeed find axonal mitochondria signal. However, since the amount of mitochondria signal there is small relative to the total amount of axonal mitochondria in the astrocytes, we do not feel it would be appropriate to make mechanistic claims, for example claiming this to be related to LC3b associated phagocytosis; much more work would be needed to make that claim. However, we were surprised to find no alteration in either stopped mitochondria in axons or axonal mitochondria material within the astrocytes. There are technical reasons why this result might be difficult to interpret, but now having done it (as we should have before), we are even more careful in describing the process as transcellular degradation rather than transmitophagy. We elaborate further on this point in the next response.

      - I find inappropriate the use of the term "transmitophagy". Although this term transmits very well the message that the authors try to strength, the term "mitophagy" refers to the specific elimination of mitochondria through autophagy (PMID: 21179058). There are many reasons why I think that "transmitophagy" is not adequate to describe this phenomena but I will just refer to these three: First, authors do not provide data showing that this mechanism is specific for mitochondria as they have never checked for the presence of other type of cargo in the vesicles produced by RGCs. If these are related to exophers as they suggest in the document, is very probable that they contain other type of cargo; Second, if the final destiny for those particles is the acidic compartment of astrocytes, this process may have nothing to do with autophagy/mitophagy and just share some molecular mediators with those pathways; Third, they should explore if other canonical mitophagy molecular mediators (i.e. Parkin/Pink) are regulating the production or the mitochondria recruitment to this extracellular particles.

      We too struggle with our own “transmitophagy” term, for the very reasons stated. To address this concern, we now refer to the process as “transcellular degradation of mitochondria”, which is how we described it initially in mice as well. We do present new data that show that while the majority of axonal outpocketings contain mitochondria, not all do. This suggests that the others may contain other cargo, which supports the view that what we are dealing with in axons are indeed exophers. And yet, since what we measure is mitochondria, we think most appropriate to describe the process narrowly and not extrapolate to other types of exophers. We agree that what we originally discovered in mice and now live image and perturb in frog, may not be “autophagy” according to the strict definition of the term, but rather a process that uses some of the same molecular machinery, which given the evolutionary link between autophagy and phagocytosis that should be no surprise. Terminology can be tricky, and we thank the reviewer for calling us out on this point. We now use the term “transmitophagy” only once in the discussion section making the link between our work and the emerging field of exopher biology, and use that occasion to elaborate the point that the more descriptive term “transcellular degradation of mitochondria” is more appropriate in our case.

      *- In several experiments, authors use Mitotracker instead of genetic tools to quantify the amount of mitochondria co-localizing with OPTN (Fig2, Fig3) or being transferred to astrocytes (Fig4). A problem here is that Mitotracker needs the mitochondria to be active at the time of injection in order to label them (PMID: 21807856) and it has a clear effect in mitochondria dynamics in their setting, as pointed by the authors. Since most mitochondria transferred to astrocytes would be presumably damaged and not able to import Mitotracker, I am concern about how this is affecting their quantifications and the conclusions.

      *

      We agree. The use of Mitotracker to label the RGC mitochondria can be problematic for the reasons stated by reviewers 1 and 3. Indeed, our opinion is that many of the studies out there that claim to demonstrate transfer of mitochondria between cells likely are just showing the transfer of the dye rather than the mitochondria. While the previous submission included a number of controls to address this concern, we now provide multiple new experiments that measure the transfer of mitochondria through a transgene rather than Mitotracker. The provided experiments use a new Tom20-mCherry transgene which is highly specific to mitochondria due to the use of an SOD2 UTR. We have similar data using RGC-expressed Mito-mCherry and Mito-EGFP-mCherry (using the commonly used Cox8 mitochondria matrix targeting sequence); we do not include such data because we find the provided data sufficiently compelling, and the story is already sufficiently long and complicated.

      - Some conclusions are based on single images with no quantifications or statistics. This is the case for: 1) Page 6) "Most of the mCherry and Mitotracker objects colocalized with each other both in the merged images (Fig. S1C) and kymographs (Fig. S1D), indicating that the mitochondria-targeted transgene and Mitotracker similarly label the RGC axonal mitochondria".

      That is a fair comment. After reanalyzing the original dataset used, it would be very difficult to quantify that statement, largely because the Tom20-mCherry expression was relatively weak in those particular animals. We are confident that we could generate a new dataset to provide support for this statement, but instead chose to just provide side-by-side movies of mitochondria labeled by Mitotracker or the Tom20-mCherry transgenes, which we believe is far more compelling than any quantification we could provide.

      2) Page 8) "In the nerves labeled by Mitotracker, visual inspection of the raw images (Fig. 2C) and the derived kymographs (Fig. 2D) showed that OPTN and the Mitotracker labeled mitochondria often co-localized, particularly in the stopped populations, and more so in the animals expressing E50K OPTN, further suggesting that at least a fraction of the stopped LC3b, OPTN and mitochondria might represent mitophagy occurring in the axons".

      While we have made a minor change to this sentence, we feel that it is appropriate given that it serves just as a justification to carry out the quantitative studies that follow. We would not have quantified the process had it not been obvious to the eye. However, we do not interpret the results as supporting that mitophagy occurs in axons, for the reasons explained above.

      3) Page 14) "We also observed similar axonal dystrophies and exopher-like structures in E50K OPTN under similar imaging settings, but with 2-min intervals and additional Mitotracker labeling (Mov. 6), demonstrating that these structures not only contain OPTN but also mitochondria or mitochondria remnants". Image in video is not clear and there is not quantification for OPTN or OPTN+ mitochondria.*

      *

      We have removed Mov. 6.

      *Minor comments

      • In Figures showing the reconstruction of OPTN+ mitochondria outside nerve (Fig.3 and Fig.4), those seem to be present only in one lateral of the nerve. Is this process polarized in any way (i.e. faced to astrocytes) or is the result of a technical issue (i.e. difference in laser penetration for blue vs Yellow lasers)? I think it will be important to include this in the discussion.*

      This was also pointed out by reviewer 1, and we agree that it is worth including in the discussion, which we now do. While we do not believe it to be a light penetration issue (based on fluorescence intensities and apparent spatial resolution), we also do not yet have an explanation. Having studied dorsoventral differences in the visual pathway both during my graduate and post-doctoral years, I am very interested in this asymmetry, and we have some theories that might explain it, mentioned above. The asymmetry is obvious and thus we think it would have been inappropriate not to show, but it also be inappropriate to be overly speculative.

      - In Pag.13 authors claim "OPTN and mitochondria leave RGC axons in the form of exophers". After "exophers" were coined by the Driscoll lab in 2017, too few people has adopted this terminology and the molecular machinery involved in this process is still under research. It is clear that the particles described here share some similarities with exophers like size (in the range of microns) and cargo (mitochondria), but you have not demonstrated if they share the same origin or are part of the same phenomena. For that reason, I recommend to be more cautious with this statement and point these limitations in the discussion. Additionally, since Exophers are not a consensus or well defined particles, authors should include an introductory paragraph at the beginning of this section for readers to understand what they are talking about.

      We wholly agree with all points. We now have moved all mention of exophers to just the discussion.

      - Exophers described by Monica Driscoll and Andres Hidalgo laboratories are presented as "garbage bags" that help cells to stay fit through elimination of unwanted material. If the extracellular vesicles presented here are part of the same mechanism and potentially beneficial for the RGCs, why are they increased in OPTN mutants? Is it part of RGCs response to a proteomic stress generated by malfunctioning OPTN? I think that is critical to understand this to figure out the relevance of your findings.

      • *

      Our personal opinion is that the OPTN mutants most likely lead to stress focally in the axons, thus triggering exopher generation. We are carrying additional experiments to determine whether too much exopher generation or their insufficient degradation by astrocytes might be deleterious (by causing inflammation). However, those are big stories that would not stand on their own were we not able to first rigorously demonstrate that certain OPTN mutants increase exopher generation, which I believe our study demonstrates, albeit now without calling them exophers.

      - Related to Fig.5G, authors say "The soma of the astrocytes were located at the optic nerve periphery but had processes that extended deep into the parenchyma". This is very interesting and opens the possibility that many mitochondria are directly transferred to astrocytes through that processes instead of the lateral of the nerve, meaning that your quantifications of "transmitophagy" may be underestimated.

      * *We also agree that this. Our limited optical resolution, and limitations intrinsic to carrying out quantifications with Imaris software, are likely the main reasons for the discrepancy between the whole nerve and sparse-labelled-axon estimates of how much axonal material is outside of axons. Our view is that most of the transcellular degradation occurs within fine astrocyte processes, and that only in the case of failure to degrade material in these fine processes that significant amounts accumulate in the cell body (optic nerve periphery), and that in the cell body additional or different degradative pathways are utilized. Experiments using various transgenes and correlated EM as well as perturbation experiments are ongoing attempting to firmly establish what organelles are used in processes versus soma. However, we believe that such studies are well beyond the scope of this manuscript..

      - Reference to Fig. S2G is missing. Now mentioned twice. Thank you.

      - I cannot find in Fig.5 E-I legends what are the cells/structures labelled in Green and Red. Thank you.

      ***Referees cross-commenting**

      In agreement with my colleagues, I think that a revision is needed to support some important points of the paper. The the work is interesting and I think it deserves a chance for revision. Having that said, I am not familiar with the breeding and experimental times when working with Xenopus but, considering the amount of work requested, it may require more than 3 months to have the work done.

      *

      *Reviewer #3 (Significance (Required)):

      Until not very long ago, it was thought that mitochondria could not cross cell barriers. In recent years however, there has been an explosion in the number of works showing mitochondria transfer between different cell types in vivo. This may happen either as an organelle donation to improve energy production or as a quality control mechanism to get rid of damaged mitochondria, as it is the case in this work. The laboratory of Nicholas Marsh-Armstrong was pioneer in this field with a foundational work in 2014 where they show how RGC-derived mitochondria are captured and eliminated by astrocytes in mice (PMID: 24979790). This work was particularly relevant because it proposed for the first time that mitochondrial degradation can occur in RGC axons far from the cell soma, and surrogated in a different cell type, something that changed completely the view of how quality control is maintained in neurons and other cell types. In the present study, Jeong and collaborators explore how Glaucoma-associated Optineurin mutations affect this process, which is of potential interest for the broad cell biologist community due to its possible implications in other tissues and cell types (OPTN is broadly expressed), but especially for those researchers interested in neurobiology, quality control mechanisms and mitochondria biology. Since some OPTN mutations studied here cause disease, they are also relevant for the clinic. This work provides a thorough characterization of how relevant Optineurin mutations affect mitochondria dynamics in RGCs and their transference to astrocytes, as fairly claimed in the title. However, the mechanism by which they result in pathology is not either explored or carefully discussed, making this a descriptive work with no much conceptual insight. In addition, conclusions are often not unambiguously stated and the results part contains a lot of large sentences and unnecessary technical data that hinders reading and difficult the transmission of the key messages. Even if it stands as a descriptive work, the physiological and clinical relevance of these findings is not clear. There are some claims related with mitophagy activity that may require more sophisticated experiments (mitophagy flux with lysosomal inhibitors). Please see comments above. A critical point to understand the relevance of this work would be to demonstrate if alterations in transmitophagy are either causing or involved in the disease generated by these OPTN mutations in any way, or just a correlative phenomenon. To help authors contextualize my point of view, my field of expertise includes cell biology, imaging, quality control pathways, mitochondria biology and phagocytosis, among others. I am not familiar with Xenopus Laevis genetics or the limitations to work with this animal model.*

      • *

      We appreciate both the complements and the critiques. To a fault, we rather undersell than oversell. We are actively pursuing the possibility that dysregulation of this process is disease causing, and not just for glaucoma. However, those studies will not stand without a strong foundation, which we believe this study provides.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #3

      Evidence, reproducibility and clarity

      Summary

      In this work, Jeong et al describe the effect of Optineurin (OPTN) mutations in the transcellular degradation of retinal ganglion cell (RGC) mitochondria by astrocytes at the Optic Nerve (ON), a process previously described this group and referred as "transmitophagy" (Davis et al 2014). Here, authors use Xenopus laevis animal model to image the optic nerve of animals carrying different OPTN mutations associated to disease or with compromised function and explore its effect in mitochondria dynamics at the RGC axons. They find that OPTN mutants lead to increased stationary mitochondria in the nerve and affect their co-localization with mitophagy-related markers, suggesting alterations in this pathway. Finally, they found that mitochondria co-localizing with OPTN can be found in the periphery of the ON under different conditions and this is particularly increased in glaucoma-associated E50K mutation. This extracellular mitochondria are transferred in vesicles to astrocytes, as they previously described in mice (Davis 2014), where they are presumably degraded.

      Major comments

      • OPTN levels at a given time point cannot be used as readout for mitophagy level/flux. Both OPTN and LC3b are degraded upon fusion with acidic compartment (i.e. lysosomes, PMID: 33783320, 33634751) and that is the reason why the field of autophagy /mitophagy blocks lysosomal activity to measure autophagy/mitophagy flux (PMID: 33634751). In this document, authors claim that there is low levels of mitophagy in RGC axons at baseline and increased levels of mitophagy in glaucoma associated perturbations just based on increased presence of OPTN+ mitochondria in this condition. This could be also interpreted as an accumulation of non-degraded defective mitochondria due to a mitophagy block in neurons carrying the glaucoma associated mutation, which is the opposite of what they propose. If authors want to evaluate mitophagy levels in this system, mitophagy/autophagy flux experiments should be performed.
      • I find inappropriate the use of the term "transmitophagy". Although this term transmits very well the message that the authors try to strength, the term "mitophagy" refers to the specific elimination of mitochondria through autophagy (PMID: 21179058). There are many reasons why I think that "transmitophagy" is not adequate to describe this phenomena but I will just refer to these three: First, authors do not provide data showing that this mechanism is specific for mitochondria as they have never checked for the presence of other type of cargo in the vesicles produced by RGCs. If these are related to exophers as they suggest in the document, is very probable that they contain other type of cargo; Second, if the final destiny for those particles is the acidic compartment of astrocytes, this process may have nothing to do with autophagy/mitophagy and just share some molecular mediators with those pathways; Third, they should explore if other canonical mitophagy molecular mediators (i.e. Parkin/Pink) are regulating the production or the mitochondria recruitment to this extracellular particles.
      • In several experiments, authors use Mitotracker instead of genetic tools to quantify the amount of mitochondria co-localizing with OPTN (Fig2, Fig3) or being transferred to astrocytes (Fig4). A problem here is that Mitotracker needs the mitochondria to be active at the time of injection in order to label them (PMID: 21807856) and it has a clear effect in mitochondria dynamics in their setting, as pointed by the authors. Since most mitochondria transferred to astrocytes would be presumably damaged and not able to import Mitotracker, I am concern about how this is affecting their quantifications and the conclusions.
      • Some conclusions are based on single images with no quantifications or statistics. This is the case for:
        1. Page 6) "Most of the mCherry and Mitotracker objects colocalized with each other both in the merged images (Fig. S1C) and kymographs (Fig. S1D), indicating that the mitochondria-targeted transgene and Mitotracker similarly label the RGC axonal mitochondria".
        2. Page 8) "In the nerves labeled by Mitotracker, visual inspection of the raw images (Fig. 2C) and the derived kymographs (Fig. 2D) showed that OPTN and the Mitotracker labeled mitochondria often co-localized, particularly in the stopped populations, and more so in the animals expressing E50K OPTN, further suggesting that at least a fraction of the stopped LC3b, OPTN and mitochondria might represent mitophagy occurring in the axons".
        3. Page 14) "We also observed similar axonal dystrophies and exopher-like structures in E50K OPTN under similar imaging settings, but with 2-min intervals and additional Mitotracker labeling (Mov. 6), demonstrating that these structures not only contain OPTN but also mitochondria or mitochondria remnants". Image in video is not clear and there is not quantification for OPTN or OPTN+ mitochondria.

      Minor comments

      • In Figures showing the reconstruction of OPTN+ mitochondria outside nerve (Fig.3 and Fig.4), those seem to be present only in one lateral of the nerve. Is this process polarized in any way (i.e. faced to astrocytes) or is the result of a technical issue (i.e. difference in laser penetration for blue vs Yellow lasers)? I think it will be important to include this in the discussion.
      • In Pag.13 authors claim "OPTN and mitochondria leave RGC axons in the form of exophers". After "exophers" were coined by the Driscoll lab in 2017, too few people has adopted this terminology and the molecular machinery involved in this process is still under research. It is clear that the particles described here share some similarities with exophers like size (in the range of microns) and cargo (mitochondria), but you have not demonstrated if they share the same origin or are part of the same phenomena. For that reason, I recommend to be more cautious with this statement and point these limitations in the discussion. Additionally, since Exophers are not a consensus or well defined particles, authors should include an introductory paragraph at the beginning of this section for readers to understand what they are talking about.
      • Exophers described by Monica Driscoll and Andres Hidalgo laboratories are presented as "garbage bags" that help cells to stay fit through elimination of unwanted material. If the extracellular vesicles presented here are part of the same mechanism and potentially beneficial for the RGCs, why are they increased in OPTN mutants? Is it part of RGCs response to a proteomic stress generated by malfunctioning OPTN? I think that is critical to understand this to figure out the relevance of your findings.
      • Related to Fig.5G, authors say "The soma of the astrocytes were located at the optic nerve periphery but had processes that extended deep into the parenchyma". This is very interesting and opens the possibility that many mitochondria are directly transferred to astrocytes through that processes instead of the lateral of the nerve, meaning that your quantifications of "transmitophagy" may be underestimated.
      • Reference to Fig. S2G is missing.
      • I cannot find in Fig.5 E-I legends what are the cells/structures labelled in Green and Red.

      Referees cross-commenting

      In agreement with my colleagues, I think that a revision is needed to support some important points of the paper. The the work is interesting and I think it deserves a chance for revision. Having that said, I am not familiar with the breeding and experimental times when working with Xenopus but, considering the amount of work requested, it may require more than 3 months to have the work done.

      Significance

      Until not very long ago, it was thought that mitochondria could not cross cell barriers. In recent years however, there has been an explosion in the number of works showing mitochondria transfer between different cell types in vivo. This may happen either as an organelle donation to improve energy production or as a quality control mechanism to get rid of damaged mitochondria, as it is the case in this work. The laboratory of Nicholas Marsh-Armstrong was pioneer in this field with a foundational work in 2014 where they show how RGC-derived mitochondria are captured and eliminated by astrocytes in mice (PMID: 24979790). This work was particularly relevant because it proposed for the first time that mitochondrial degradation can occur in RGC axons far from the cell soma, and surrogated in a different cell type, something that changed completely the view of how quality control is maintained in neurons and other cell types.

      In the present study, Jeong and collaborators explore how Glaucoma-associated Optineurin mutations affect this process, which is of potential interest for the broad cell biologist community due to its possible implications in other tissues and cell types (OPTN is broadly expressed), but especially for those researchers interested in neurobiology, quality control mechanisms and mitochondria biology. Since some OPTN mutations studied here cause disease, they are also relevant for the clinic.

      This work provides a thorough characterization of how relevant Optineurin mutations affect mitochondria dynamics in RGCs and their transference to astrocytes, as fairly claimed in the title. However, the mechanism by which they result in pathology is not either explored or carefully discussed, making this a descriptive work with no much conceptual insight. In addition, conclusions are often not unambiguously stated and the results part contains a lot of large sentences and unnecessary technical data that hinders reading and difficult the transmission of the key messages.

      Even if it stands as a descriptive work, the physiological and clinical relevance of these findings is not clear. There are some claims related with mitophagy activity that may require more sophisticated experiments (mitophagy flux with lysosomal inhibitors). Please see comments above. A critical point to understand the relevance of this work would be to demonstrate if alterations in transmitophagy are either causing or involved in the disease generated by these OPTN mutations in any way, or just a correlative phenomenon. To help authors contextualize my point of view, my field of expertise includes cell biology, imaging, quality control pathways, mitochondria biology and phagocytosis, among others. I am not familiar with Xenopus Laevis genetics or the limitations to work with this animal model.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      SUMO proteins are processed and then conjugated to other proteins via a C-terminal di-glycine motif. In contrast, the N-terminus of some SUMO proteins (SUMO2/3) contains lysine residues that are important for the formation of SUMO chains. Using NMR studies, the N-terminus of SUMO was previously reported to be flexible (Bayer et al., 1998). The authors are investigating the role of the flexible (referred to as intrinsically disordered) N-terminus of several SUMO proteins. They report their findings and modeling data that this intrinsically disordered N-terminus of SUMO1 (and the C. elegans Smo1) regulates the interaction of SUMO with SUMO interacting motifs (SIMs).

      Strengths:

      Among the strongest experimental data suggesting that the N-terminus plays an inhibitory function are their observations that

      (1) SUMO1∆N19 binds more efficiently to SIM-containing Usp25, Tdp2, and RanBp2,<br /> (2) SUMO1∆N19 shows improved sumoylation of Usp25,<br /> (3) changing negatively-charged residues, ED11,12KK in the SUMO1 N-terminus increased the interaction and sumoylation with/of USP25.

      The paper is very well-organized, clearly written, and the experimental data are of high quality. There is good evidence that the N-terminus of SUMO1 plays a role in regulating its binding and conjugation to SIM-containing proteins. Therefore, the authors are presenting a new twist in the ever-evolving saga of SUMO, SIMs, and sumoylation.

      Weaknesses:

      Much has been learned about SUMO through structure-function analyses and this study is another excellent example. I would like to suggest that the authors take some extra time to place their findings into the context of previous SUMO structure-function analyses. Furthermore, it would be fitting to place their finding of a potential role of N-terminally truncated Smo1 into the context of the many prior findings that have been made with regard to the C. elegans SUMO field. Finally, regarding their data modeling/simulation, there are questions regarding the data comparisons and whether manipulations of the N-terminus also have an effect on the 70/80 region of the core.

      We thank the reviewer for insightful and constructive comments to improve our manuscript. We have now placed our findings in the context of previous structure-function analyses at several occasions, details of which can be found in our replies to the detailed comments.

      We are also placing the C. elegans data into context of previously published findings on the various functions of SMO-1 in controlling development and maintaining genomic stability (lines 510ff). Finally, we addressed all questions and suggestions regarding comparison of MD simulation and NMR data, and addressed the question whether mutations in the N-terminus affected the 70/80 region. We have now clarified in the manuscript that the sum of MD and NMR data does not allow a clear-cut conclusion on the 70/80 interactions. 

      Reviewer #2 (Public Review):

      Summary:

      This very interesting study originated from a serendipitous observation that the deletion of the disordered N-terminal tail of human SUMO1 enhances its binding to its interaction partners. This suggested that the N terminus of SUMO1 might be an intrinsic competitive inhibitor of SUMO-interacting motif (SIM) binding to SUMO1. Subsequent experiments support this mechanism, showing that in humans it is specific to SUMO1 and does not extend to SUMO2 or SUMO3 (except, perhaps, when the N terminus of SUMO2 becomes phosphorylated, as the authors intriguingly suggest - and partially demonstrate). The auto-inhibition of SUMO1 via its N-terminal tail apparently explains the lower binding of SUMO1 compared to SUMO2 to some SIMs and lower SIM-dependent SUMOylation of some substrates with SUMO1 compared to SUMO2, thus adding an important element to the puzzle of SUMO paralogue preference. In line with this explanation, N-terminally truncated SUMO1 was equally efficient to SUMO2 in the studied cases. The inhibitory role of SUMO1's N terminus appears conserved in other species including S. cerevisiae and C. elegans, both of which contain only one SUMO. The study also elucidates the molecular mechanism by which the disordered N-terminal region of SUMO1 can exert this auto-inhibitory effect. This appears to depend on the transient, very highly dynamic physical interaction between the N terminus and the surroundings of the SIM-binding groove based mostly on electrostatic interactions between acidic residues in the N terminus and basic residues around the groove.

      Strengths:

      A key strength of this study is the interplay of different techniques, including biochemical experiments, NMR, molecular dynamics simulations, and, at the end, in vivo experiments. The experiments performed with these different techniques inform each other in a productive way and strengthen each others' conclusions. A further strength is the detailed and clear text, which patiently introduces, describes, and discusses the study. Finally, in terms of the message, the study has a clear, mechanistic message of fundamental importance for various aspects of the SUMO field, and also more generally for protein biochemists interested in the functional importance of intrinsically disordered regions.

      Weaknesses:

      Some of the authors' conclusions are similar to those from a recent study by Lussier-Price et al. (NAR, 2022), the two studies likely representing independent inquiries into a similar topic. I don't see it as a weakness by itself (on the contrary), but it seems like a lost opportunity not to discuss at more length the congruence between these two studies in the discussion (Lussier-Price is only very briefly cited). Another point that can be raised concerns the wording of conclusions from molecular dynamics. The use of molecular dynamics simulations in this study has been rigorous and fruitful - indeed, it can be a model for such studies. Nonetheless, parameters derived from molecular dynamics simulations, including kon and koff values, could be more clearly described as coming from simulations and not experiments. Lastly, some of the conclusions - such as enhanced binding to SIM-containing proteins upon N-terminal deletion - could be additionally addressed with a biophysical technique (e.g. ITC) that is more quantitative than gel-based pull-down assays - but I don't think it is a must.

      Thank you very much for pointing towards the study of Lussier-Price. We now point out congruent findings in more detail in the discussion.

      We also thank the reviewer for the advice to present and discuss the MD findings more clearly, and more explicitly specify which parameters were obtained from MD. We have made changes throughout the Results and Discussion sections.

      We agree that it would be a nice addition to use ITC measurements as a more quantitative method to assess differences in binding affinities upon deletion of the SUMO N-terminus. We had tried to measure affinities between SUMO and SIM-containing binding partners by ITC but in our hand, this failed. In the study of Lussier-Price et al., the authors were able to measure differences in SIM binding upon deleting the N-terminus but only when they used phosphorylated SIM peptides. Follow-up studies, e.g., on the effect of SUMO’s N-terminal modifications should certainly include more quantitative measurement such as ITCs, however these studies will have to be picked up by others. The main PI Frauke Melchior and most contributing authors moved on to new challenges.

      Reviewing Editor (Recommendations For The Authors):

      Both reviewers agreed that your manuscript presents novel results and the key findings including the self-inhibitory role of the N-terminal tail of SUMO proteins in their interaction with SIM are overall well supported by the data. The reviewers also provided constructive suggestions. They pointed out that some simulation results are not clear, which could be strengthened by control analysis and by toning down the related descriptions. In addition, Reviewer 2 suggested that the conclusions from the current biochemical and simulation studies could be further reinforced by more quantitative binding measurements. We hope that these points can be addressed in the revision.

      We thank both reviewers for their insightful and constructive comments and the appreciative tone. In our replies above and below we address most of the raised concerns.

      We strongly recommend the change of the current title. eLife advises that the authors avoid unfamiliar abbreviations or acronyms, or spell out in full or provide a brief explanation for any acronyms in the title.

      We changed the title to “The intrinsically disordered N-terminus of SUMO1 is an intramolecular inhibitor of SUMO1 interactions” to avoid acronyms in the title.

      Reviewer #1 (Recommendations For The Authors):

      Major:

      Lines 190-262: The authors use NMR experiments and all-atom molecular dynamics (MD) simulations. They state that this approach reveals a highly dynamic interaction of the SUMO1 N-terminus with the core and that the SIM binding groove and the 70/80 region are temporarily occupied by the SUMO1 N-terminus (Fig. 3C). After comparing SUMO1, Smt3, SUMO2, and Smo1 by this approach they state that the most striking differences exist for the interaction with the SIM-binding groove, while interactions with the 70/80 region are rather comparable.

      The authors then compare the average binding time data of Figure 3C, D, E, F in Figure 3G.

      It is not clear which data points are included in the bar graphs of Figure 3G and how the individual data points (there are maybe 8 shown in each bar) correspond to the data shown in 3C, D, E, and F or if they are iterations (n?) of the modeled data. This should be clarified. Also, for comparison, the authors should also graph the average data of the 70/80 region.

      We clarified the data shown in Figure 3G as well as 3C-F, and how It relates to each other. Indeed, Figure 3G shows 8 data points for 8 trajectories, and their average. Figure 3C-F are based on the same 8 trajectories, in this case broken down per residue of the protein. The average data of the 70/80 region does not show any significant differences across the proteins, as already pretty well visible from panels 3C-F.

      Line 322: More concerning, in Figure 5, the authors model how a ED11,12KK mutations disrupt the interaction between the N-terminus and the SIM-binding groove and state that this mutation leaves interactions with the 70/80 region largely untouched. Again, it is not clear which data points are included in the bar graph 5D and 5G and how many iterations. Furthermore, data of 5B, C (SUMO1) and 5 E, F (smo1) do show clear differences between the WT and mutants affecting both the SIM binding groove and the 70/80 region. The double mutation clearly seems to affect the 70/80 region when comparing 5B, C (SUMO1) and 5 E, F (smo1), but this result is not mentioned. Indeed, the authors state that the double mutants leave the interactions with the 70/80 region largely untouched, but this is not borne out by the data presented.

      We improved the clarity of the legend of Figure 5 as suggested. We also thank the reviewer for the comment on the changes in the 70/80 region, to which we point the reader explicitly now in the corresponding Results section. We, however, refrain from drawing conclusions from the MD in this case, as this change is not supported by the NMR measurements (Fig 5a). Charge-charge interactions in the charge-rich double mutants might be overstabilized in the MD simulations, a problem known for the canonical force fields used here, albeit tailoring it for IDPs. We now cite a corresponding reference. Another potential explanation for that the CMPs do not take this change up upon mutation could be a pronounced fuzziness in this region, which however, in turn, is not apparent from the simulations. We would therefore not overinterpret these differences in the 70/80 region. Our key conclusion is the loss of interactions with the SIM-binding groove – and thus of cis-inhibition – by mutations, which is supported by both, MD and NMR.  

      341: In their N-termini substitution experiments, the authors show that the SUMO1 core that carries the SUMO2 N-terminus (S2N-S1C) binds USP25 more efficiently than wt SUMO1. However, the SUMO1 core that carries the SUMO2 N-terminus is also reduced in its interaction with Usp25. This is concerning as the SUMO2 N-terminus was not predicted to interfere with SIM binding.

      We were excited to see that the inhibitory potential could be partially transplanted by swapping the N-termini of SUMO1 and SUMO2 demonstrating that some important determinants are contained within the N-terminal tail of SUMO proteins. However, the observed effects were partial indicating that also other determinants contribute and that we do not yet understand all aspects. Obviously, the SUMO1 and SUMO2 cores are similar (also in the area comprising the SIM binding groove) but not identical, and as the inhibition arises from dynamic interactions of the N-terminus with the SIM binding area, differences in the SUMO cores and in residues flanking SUMO’s N-terminus are likely to influence the inhibitory potential as well.

      Blue bars in 3G, 5D, and 6A look surprisingly similar down to the individual data points - does that mean that the same SUMO1 WT data was recycled for these different experiments? This is concerning to me.

      The data displayed in the figures listed above are derived from in silico simulations and indeed display the same data set for the case of SUMO1 WT repeatedly, as we also state in the figure legends (we had done so for 5D “(identical to Fig. 3C)”, and now added the same comment to 6A, thanks for pointing this out). We show the SUMO1 WT data again to facilitate comparing the different SUMO variants in MD simulations.

      Line 352 and 496: The authors used phosphomimetic mutants to assess the effect of SUMO2 N-term phosphorylation on interaction with Usp25. The data suggest a mild phenotype (6G) which is borne out by the quantization in 6H. In contrast, the effect of an array of modifications for SUMO1 (Figures 6A - C) was solely analyzed by MD simulation. If possible, this data should be confirmed, at least by using a phosphomimetic at the Ser9 position of SUMO1. Alternatively, a caveat explaining the need to confirm these predictions by actual experiments should be added to the text.

      Already now we state in “Limitations of the study” that “While our MD simulations and in vitro studies with selected mutants point in this direction, we have not been able to generate quantitatively acetylated and/or phosphorylated SUMO variants to test this hypothesis.”

      We agree that the hypothesis needs experimental validation. Phosphomimetic amino acids can be a useful tool in some cases but fail to mimic a phosphor group in other cases. In the past we had tested whether replacing Ser9 by a potentially phospho-mimicking amino acid (Glu) would further diminish binding of SIM-containing proteins compared to already strongly reduced binding to wt SUMO1 but the effect was too mild to yield a significant difference, at least in our assay. Whether this is due to a lack of Glu in mimicking phosphorylation of Ser9, due to limited sensitivity of our pulldown assay combined with the challenge to detect inhibition compared to an already inhibited state, or a failure in our hypothesis we were not able to clarify so far. We therefore now also added a sentence to the paragraph introducing phosphoSer9 MD simulations (now line 367) stating that this hypothesis needs to be tested experimentally.

      Minor:

      Line 110: the authors should include references for their summary statement that "A defining feature of SUMO proteins is the intrinsically disordered N-terminus, whose function is only partly understood." Also cite in line 119.

      Thank you, we now included some references.

      Line 75: Please indicate early on that the N-terminus of some SUMO proteins contains lysines for the formation of SUMO chains. Please list them.

      We now list, which of the SUMO proteins used in this study contain lysine residues in their N-termini.

      Line 113: Please cite studies that elucidated the sumoylation of lysines in the N-terminus of SUMO2/3 proteins.

      Thank you, we now included some references.

      Line 153: The authors should include additional references on Smt3 structure function analyses to provide better context. One important detail, for example, is the important finding that Yeast SUMO (Smt3) deletion can be complemented by hsSUMO1 but not hsSUMO2 and hsSUMO3. Additionally, in yeast the entire Smt3 N-terminus can be deleted without detectable effects on growth, underscoring the enigmatic role of the N-terminus (Newman et al., 2017). Caveat also applies to line 266.

      Thank you, we now included some additional information and references around line 153 and below.

      164: The hypothesis that the SUMO1 N-terminus interferes with SIM binding groove ignores the previous observation that deletion of the SUMO2 N-terminus does not have an effect on binding (in vitro). While this is addressed later, the authors should clarify this e.g. by stating "a unique feature of the SUMO1 N-terminus".
>

      We now explicitly mention that this feature appears to be unique to SUMO1.

      374 and 499: The authors should discuss the caveat that the deletion of the N-terminus of Smt3 does not have a phenotype in yeast in vivo (Newman et al., 2017).

      We now discuss that Smt3’s N-terminus can be deleted without detectable phenotype, both in the results as well as in “Limitations of the study”.

      Line 367: I feel this is overstated and I do not see any evidence that post translation modifications of the SUMO core plays a role. Therefore, I suggest: Our data and modeling are consistent with an interpretation that the N-termini of human and C. elegans SUMO1 proteins are inhibitory and that other SUMO N-termini may acquire such a function upon posttranslational modification of the N-terminus.

      We agree that this is pure speculation and therefore restrict our hypothesis to modifications of the N-terminus.

      Line 374 ff: Since Smo-∆N12 increases sumoylation (Fig. 2I), it is likely that the in vivo defect is due to over-sumoylation in C. elegans. The authors should discuss this possibility and quote appropriate literature e.g.: Rytinki et al., Overexpression of SUMO perturbs the growth and development of Caenorhabditis elegans. Cell Mol Life Sci. 2011 Oct;68(19):3219-32. PMID: 21253676.

      In our study, we employ in vitro SUMOylation as a means to assess the SIM binding capability in an in-solution assay. For this, we use USP25 as a specific substrate known to depend on a SIM for its SUMOylation. We cannot exclude that some specific substrates depending on this same mechanism for their modification may be upregulated in modification also in the Smo-1∆N12 worms. In vivo however, the majority of SUMO substrates is not subject to SIM-dependent SUMOylation. We now added a control experiment showing that we neither observe significantly increased SUMO levels nor upregulated steady state levels of SUMOylation in these worms (Supplemental figure 8).

      The phenotypes shown in the paper by Rytinki et al. do not resemble the smo-1∆N12 mutants. Rather, we observed a specific defect in the meiotic germ cells at the pachytene stage causing increased apoptosis Moreover, we show by western blot analysis that there is no global over-sumoylation occurring in smo-1∆N12 mutants (Fig. s8). Together, our data point to a germline-specific function of the SMO-1 N-terminus in maintaining genome stability (lines 510ff).

      Reviewer #2 (Recommendations For The Authors):

      Page2 - "Small Ubiquitin-related modifiers of the SUMO family regulate thousands of proteins in eukaryotic cells" - The authors could consider a more precise statement, e.g. that SUMO modifiers have been detected on thousands of proteins and their regulatory effect on many proteins have been demonstrated.

      To be a bit more precise, the sentence now reads: “Ubiquitin-related proteins of the SUMO family are reversibly attached to thousands of proteins”. The summary has a word limit, hence we did not expand further at this place.

      Page 4 - "Both events require SUMO-binding motifs (reviewed, e.g. in 7 ." - The end bracket is missing. Also, isn't it too strong a statement that paralogue specificity always requires a SIM? I don't know all the literature sufficiently well, but the authors could double-check if it is correct to say that paralogue-specific SUMOylation always depends on a SIM.

      Thank you, we added the missing bracket. We agree that it would not be correct to say that paralogue-specificity always depends on a SIM. One alternative example is Dpp9, which shows a clear preference for SUMO1 without owning a SIM. Instead, Dpp9 harbors an alternative SUMO-binding motif, the E67-interacting loop, with a strong paralogue-preference (Pilla et al., 2012). We never intended to imply that a SIM is required for paralogue preference and we also rather generically wrote “SUMO binding motif” instead of “SIM”. However, in the subsequent paragraph about SUMO binding motifs we only go into details of SIMs as one of three classes of SUMO binding motifs not even mentioning the alternative classes. To make this more obvious, we now list the two other known classes of SUMO binding motifs hoping that it will shed the correct light onto our previous statement about paralogue preference.

      Page 4 - In the nice discussion of different types of SIMs, the authors could consider mentioning also the special case of TDP2, which is used later by them as a model binding protein. This could provide an occasion to explain what the unusual "split SIM", mentioned on page 6, but not discussed, is, and what its relation to a normal SIM is. Also, it can perhaps be mentioned that TDP2 contacts SUMO2 not only through the two hydrophobic elements contiguous in space that mimic a SIM but also through a slightly larger interface around these regions on the surface of a folded domain.

      Thank you for pointing this out. In the introduction, we extended our section on SUMO binding and now also included TDP2’s “split SIM”.

      Page 11-12 - In the section "Interaction between SUMO's disordered N-termini and the SIM binding groove is highly dynamic" (and corresponding figures), it should be stated that the discussed kinetic parameters are derived from molecular dynamics simulations and not experimental measurements. It was not very clear to me. This also applies to this sentence on page 17: "First, we observed a very fast (ns) rate of the binding/unbinding process", which in its current form suggests direct observation rather than simulation.

      We thank the reviewer for pointing this out, and in fact, Rev #1 made the same comment. We specified now clearly that the rates were calculated from MD simulations, in the Results and Discussion sections (on page 11-12 and 18 (previously 17)).

      Page 16 - The authors could briefly mention that this relatively long disordered N-terminal tail is a specific feature of SUMO proteins that distinguishes them from ubiquitin. I guess it is obvious to people from the SUMO field, but I don't think it is explicitly stated anywhere in the text and it could be interesting for readers who are less familiar with SUMO/ubiquitin differences.

      Thank you, we added a short half-sentence pointing out this difference.

      Page 17 - "The N-terminal region remains fully disordered in the bound state and is thus a classic example of intrinsic disorder irrespective of the binding state." - it could be added to this sentence that this is suggested by molecular dynamics simulations and not directly observed.

      We added the information that this finding is based on the MD simulations.

      Page 18 - "(e.g., 41,53 or flanking the SIM binding groove24,42" - the end bracket is missing.

      Thanks, we added it.

      Page 19 - "Our analysis in C. elegans (Fig. 7) suggests that this N-terminal function is particularly important in DNA damage response, a pathway that is strongly dependent on the SUMO system." - this brief description of the in vivo data seems to overgeneralise them a little bit. Perhaps one can describe what was observed with slightly more nuance.

      See changes on p.19, lines 510ff.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      Evidence, reproducibility and clarity

      Summary:

      This manuscript by Xu, Hörner, Schüle and colleagues is an RNA-seq study focusing characterization of axonal transcriptomes from human iPSC-derived cortical neurons. The authors have differentiated iPSC into neurons, cultured them in microfluidic devices and isolated axonal RNA, comparing this to corresponding cell soma transcriptomes. Second, axonal transcriptomes are compared between wild type and Kif1c knockout axons to determine Kif1c-dependently localized transcripts. Characterization of the latter allows the authors to suggest differentially expressed transcripts in Kif1c-KO axons can be mRNAs relevant for motor neuron degeneration owing to Kif1c mutations in hereditary spastic paraplegias.

      Major comments: Overall, his manuscript reads like work in (early) progress. This manuscript provides an interesting dataset, but needs substantial additional experimental and/or bioinformatic work to merit publication. The technical complexity of steps that have led to obtaining axonal transcriptomes can be appreciated, the soundness of generating these data is beyond doubt. However, the study stops at the point of generating axonal transcriptomes from wild type and Kif1c axons. No follow-up experiments are performed to study genes of interest found in RNA-seq. This could be compensated by in-depth bioinformatic analysis (e.g. comparisons with the many different datasets in known in the field), but this is clearly lacking as well. The results section only contains minimal bioinformatic analysis and nothing else. Introduction and discussion are well, clearly written and are in good dialogue with the existing body of work. To improve the manuscript, at minimum these two aspects should be addressed: 1. Characterization of the iPSC-derived neurons is missing (immunostaining with neuronal markers, e.g. Tau, MAP2, exclusion of glial markers, and lack of stem cell markers) 2. Validation of candidates of interest (e.g. FISH analysis in axons vs somata, Kif1c vs wt). Very specific requests from the review are useless at this point, as the authors should have the liberty to focus.

      Thank you for the review of our manuscript. We appreciate your recognition of the technical complexity involved in generating axonal transcriptomes and the clarity of our introduction and discussion sections.

      __Characterization of iPSC-derived neurons: __We acknowledge the importance of immunostaining with neuronal markers to ensure the purity of our neuronal population. We included this characterization in our revised manuscript and added it into the results and methods section of the paper (Supplementary Figure S1). Additionally, we included RT-qPCR analysis that confirmed the presence of cortical markers and added these to the results and method section of the paper (Supplementary Figure S2).

      Additional bioinformatic work: We agree that additional bioinformatic work will greatly benefit this paper. Therefore, we compared our datasets to all additional datasets that we were able to retrieve. This was added to the main text (results and discussion) and supplementary material (Supplementary Figure S5 and S6). We believe this strengthens the merit of our paper, and adds a lot of new unpublished information to the manuscript

      __Validation of candidates of interest: __We understand the necessity of validating our RNA-seq findings through experimental approaches such as FISH analysis and comparisons between KIF1C knockout and wild-type neurons. While we appreciate the comment and agree on the importance of high-resolution RNA FISH, we believe it is beyond the scope of this manuscript due to the considerable complexity of these experiments in human iPSC-derived cortical neurons. We will focus on incorporating this aspect into future studies and added a corresponding statement outlining the limitations of our study in the discussion stressing the importance of this.

      Minor comments: 1. Details of RNA seq technicalities are redundant in the results section, e.g. „Our RNA-seq pipeline encompassed read quality control (QC), RNA-seq mapping, and gene quantification" (p. 7) is a trivial description - this and similar details should be skipped or described in methods.

      We will ensure that technical details are appropriately placed in the methods section and avoid redundancy in the results. Technical details included in the results section have been moved to the methods.

      1. Fig1A: Y axis should start from 0

      We adjusted Figure 1A to start the Y-axis from 0.

      1. Too much interpretational voice in figure legends (e.g. see Fig. 1, „PC1 clearly distinguishes the soma (blue)"

      We revised the interpretational voice in the figure legends to maintain objectivity.

      1. PCA analysis seems redundant in Fig. 2C

      We removed the PCA analysis in Fig. 2A (2C corresponds to Gene ontology term enrichment analysis).

      1. Subheading „Human motor axons show a unique transcription factor profile" is misleading - you are not dealing with motor iPS-derived motoneurons (Isl-1 positive), but cortical neurons (again, no marker information provided to assess this!)

      The subheading „Human motor axons show a unique transcription factor profile" was adjusted. Furthermore, validation of neuronal identity has been added to the supplementary figures (Supplementary Figure S1 and S2), as well as main text and methods section.

      1. Fig. 3: Just by comparing top expressed factors in axonal samples is not informative - overall high expression of a certain transcript likely makes it easier for it to be picked up in the axonal compartment. Axon/soma ratios would perhaps be more appropriate.

      After careful consideration, we decided that we will not change the data presentation in Figure 3. Our aim in this figure was not to compare axon and soma but to see highly expressed transcripts in the axon, regardless of whether they are highly expressed in the soma as well. We think that looking at transcripts present in the axon can give information about axonal function, that we might lose when we only consider transcripts that are upregulated compared to the soma. The fact that 25 out of 50 transcription factor RNAs detected in the axon are actually specific to the axons supports this point of view. The comparison between transcripts expressed in axon and soma are presented in Figure 2.

      1. Figure 4 (KIF1C modulates the axonal transcriptome): you should show also data for the same genes in the soma, axonal data only is misleading (is overall expression changed?)

      We appreciate your suggestion. This data was already included in Supplementary Figure S6 (now Supplementary Figure S9). To make this easier to find, we've added a section to the results part to more clearly state how transcript expression changes in the soma.

      Significance

      Axonal transcriptomes have been studied since early 2010s by a number of groups and several datasets exist from different model systems. The authors know these studies well, address their findings and cite them appropriately. Is the dataset in this manuscript novel? Does it contribute to the field? Several axonal transcriptomes have been characterized in thorough studies, and even in the specific niche (human IPS-derived motoneurons) a point of reference exists - as the authors themselves point out, it is the Nijssen 2018 study. With appropriate presentation and follow-up experiments this material could have merit as a replication study.

      Audience: specialized

      We appreciate the reviewer's suggestion to clarify the differences between our findings and previously published data. In response, we have added a dedicated section to the discussion, where we provide a more detailed comparison of our results with existing research. This includes an in-depth examination of the methodologies, experimental conditions, and biological contexts that may explain the observed discrepancies (e.g., variations in methods, neuronal types, and disease contexts). As prior studies primarily focused on mouse-derived neurons, we have included a new section in both the results (Supplementary Figure S6) and the discussion to highlight the limited overlap in gene expression between the axons of mouse- and human-derived neurons. Furthermore, previous studies on human-derived cells either investigated i3 neurons -induced by transcription factors but not fully representative of human-derived CNS-resident neurons - or neurons of the peripheral nervous system (lower motor neurons). In contrast, our study focuses on human-derived CNS-resident cortical neurons (Supplementary Figure S1, S2; comparison shown in Supplementary Figure S5), emphasizing the greater translatability of our findings.

      Moreover, we have expanded our bioinformatic analyses and compared our dataset with additional datasets to further substantiate our conclusions (Supplementary Figure S5, S6)

      We believe that these revisions significantly enhance the clarity, quality, and impact of our manuscript. We sincerely thank the reviewer for their constructive feedback.

      Reviewer #2

      Evidence, reproducibility and clarity

      This study seeks to identify axonal transcriptome by RNA-sequencing of the iPSC-derived cortical neuron axons. This is achieved by comparing the RNA expressions between the axonal and soma compartments using microfluid system. The specific expression of axon specific RNAs in the axonal compartment validate the specificity of the approach. Some unique RNAs including TF specific RNAs are identified. Furthermore, this study compared the KIF1C-knockout neurons (which models hereditary spastic paraplegia characterized by axonal degeneration) with wildtype (WT) control neurons, which led to the identification of specific down-regulated RNAs involved in axonal development and guidance, neurotransmission, and synaptic formation.

      The data of this study are interesting and clearly presented. The major concerns are the lack of characterization of the neuron identities and the examination of functional deficits in the KIF1C-knockout neurons. For example: 1) are these neurons express layer V/VI markers at protein levels, and the proportion of positive neurons (efficiency of cortical neuron differentiation); 2) What are the phenotypic changes in the KIF1C-knockout neurons; are there change sin axonal growth or transport? 3) Day 58 was selected for collecting RNA for sequencing study: how this time point is selected? And are there phenotypic differences between the WT and knockout neurons at this time point?

      We appreciate the favorable review of our manuscript and the insightful comments:

      Characterization of neuron identities: We agree on the importance of validating neuron identities and included protein-level characterization of layer V/VI markers and efficiency of cortical neuron differentiation in our revised manuscript: We conducted immunohistochemical staining for layer V/VI and other neuronal markers, as well as qRT-PCR to validate the identity of the neurons, ensuring a comprehensive characterization of our neuronal population.

      Functional deficits in KIF1C-knockout neurons: We have conducted phenotypic examinations of the neurons but did not observe gross differences in differentiation, axon growth or axon length. We added a corresponding statement to the results section. Neurons were harvested at DAI 58 because at this time we achieved a nearly confluent chamber that yielded enough material for in-depth RNA-sequencing. We did not observe phenotypic differences between wt and KIF1C-KO neurons at this time point. We added a statement to the method section outlining this.

      Some minor comments:1. The protein levels of some critical factors needs to be validated.

      We validated neuronal identities on qRT-PCR level (Supplementary Figure S2). While we understand the necessity of validating our RNA-seq findings on protein level, we believe it is beyond the scope of this manuscript. However, we will focus on incorporating this aspect into future studies and added a corresponding statement outlining the limitations of our study in the discussion stressing the importance of this.

      1. Figure 4C, for the list genes, statistical analyses between WT and knockout groups are required.

      In Figure 4C we only included differentially expressed genes with a p-value We added a corresponding statement in the main text and figure legend.

      1. Page 15, the 5th to last sentence: "nucleus nucleus" (repeat)

      The repeat word on page 15 was deleted.

      1. The sequencing data requires public links to the deposited library

      We will provide public links to the deposited library for the sequencing data once the data is submitted to a journal (depending on journal guidelines).

      Significance

      The strength of this study is the combinations of iPSC differentiation, gene editing (KIF1C knockout iPSC) and microfluidic system. This allows the identification of specific axonal transcriptomes. Moreover, the comparisons of control and KIF1C knockout neurons at both axon and soma compartments enables the identification of RNAs and pathways caused by the loss of KIF1C.

      The limitation is the lack of functional assessment of the iPSC-derived neurons, especially phenotypic changes in the KIF1C-knockout neurons. Only one time point is selected for comparing the WT and KIF1C knockout neurons, and the relationship between this time point and disease phenotypes is unclear.

      This study will be of interest to researchers from both basic and translational fields, and in the fields of stem cells, neuroscience, neurology and genetics.

      My expertise includes stem cells, iPSC modeling, motor neuron diseases, and nerve degeneration.

      We appreciate the favorable significance statement and believe addressing these points will strengthen the scientific rigor and impact of our study. Thank you for your valuable feedback.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):  Using microfluidics chambers and RNA sequencing (RNA-seq) of axons from iPSC-derived human cortical neurons, authors use RNA profiling to investigate the RNAs present in the soma and axons and the impact of KIF1C molecular motor downregulation (KIF1CKO) on the axonal transcriptome. The rationale is that mutations in KIF1C are associated with an autosomal recessive form of hereditary spastic paraplegia, and KIF1C is implicated in the long-range directional transport of APC-dependent mRNAs and RNA-dependent transport of the exon junction complex into neurites.  Employing a well-defined RNA-seq pipeline for analysis, they obtained RNA sequences particular to axonal samples, outperforming previous studies. They detected over 16,000 genes in the soma (which includes axons) and RNA for more than 5,000 genes in axons. A comparison of the list of axonal genes revealed a strong correlation with previous publications, but they detected more genes overall. They identified transcripts enriched in axons compared to somas, notably those for ribosomal and mitochondrial proteins. Indeed, they observed enrichment for ribosomal subunits, respiratory chain complexes, ion transport, and mRNA splicing.  The study also found that human axons exhibit a unique RNA transcription profile of transcription factors (TFs), with TFs such as GTF3A and ATF4 predominant in axons. At the same time, CREB3 was highly expressed in the soma.  Upon analyzing the soma and axon transcriptomes from KIF1CKO cultures, they identified 189 differentially regulated transcripts: 89 downregulated and 100 upregulated in the KIF1CKO condition. Some of these transcripts are critical for synaptic growth and neurotransmission. Notably, only two targets of APC-target RNAs were downregulated, contrary to their expectation. Their data indicates that KIF1C downregulation significantly alters the axonal transcriptome landscape.  Reviewer #3 (Significance (Required)):  The study is well-performed and informative, particularly for researchers interested in the local translation of axonal proteins and the axonal transcriptome. However, the authors did not validate their findings for any transcripts and did not perform any functional assays, so the manuscript lacks mechanistic insight. Interestingly, GTF3A is a transcription factor that stimulates polymerase III transcription of ribosomal proteins, and mRNAs for ribosomal proteins are enriched in human axons. Maybe there is an interesting story there. 

      We appreciate the favorable significance statement and the valuable feedback. We have conducted phenotypic examinations of the neurons but did not observe gross differences in differentiation, axon growth or axon length. We added a corresponding statement to the results section. While we understand the necessity of validating our RNA-seq findings on protein level, we believe it is beyond the scope of this manuscript. However, we will focus on incorporating this aspect into future studies and added a corresponding statement outlining the limitations of our study in the discussion stressing the importance of this.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This important research uses an elegant combination of protein-protein biochemistry, genetics, and microscopy to demonstrate that the novel bacterial protein FipA is required for polar flagella synthesis and binds to FlhF in multiple bacterial species. This manuscript is convincing, providing evidence for the early stages of flagellar synthesis at a cell pole; however, the protein biochemistry is incomplete and would benefit from additional rigorous experiments. This paper could be of significant interest to microbiologists studying bacterial motility, appendages, and cellular biology.

      We are very grateful for the very positive and helpful evaluation.

      Joint Public Review:

      Bacteria exhibit species-specific numbers and localization patterns of flagella. How specificity in number and pattern is achieved in Gamma-proteobacteria needs to be better understood but often depends on a soluble GTPase called FlhF. Here, the authors take an unbiased protein-pulldown approach with FlhF, resulting in identifying the protein FipA in V. parahaemolyticus. They convincingly demonstrate that FipA interacts genetically and biochemically with previously known spatial regulators HubP and FlhF. FipA is a membrane protein with a cytoplasmic DUF2802; it co-localizes to the flagellated pole with HubP and FlhF. The DUF2802 mediates the interaction between FipA and FlhF, and this interaction is required for FipA function. Altogether, the authors show that FipA likely facilitates the recruitment of FlhF to the membrane at the cell pole together with the known recruitment factor HupB. This finding is crucial in understanding the mechanism of polar localization. The authors show that FipA co-occurs with FlhF in the genomes of bacteria with polarly-localized flagella and study the role of FipA in three of these organisms: V. parahaemolyticus, S. purtefaciens, and P. putida. In each case, they show that FipA contributes to FlhF polar localization, flagellar assembly, flagellar patterning, and motility, though the details differ among the species. By comparing the role of FipA in polar flagellum assembly in three different species, they discover that, while FipA is required in all three systems, evolution has brought different nuances that open avenues for further discoveries.

      Strengths:

      The discovery of a novel factor for polar flagellum development. The solid nature and flow of the experimental work.

      The authors perform a comprehensive analysis of FipA, including phenotyping of mutants, protein localization, localization dependence, and domains of FipA necessary for each. Moreover, they perform a time-series analysis indicating that FipA localizes to the cell pole likely before, or at least coincident with, flagellar assembly. They also show that the role of FipA appears to differ between organisms in detail, but the overarching idea that it is a flagellar assembly/localization factor remains convincing.

      The work is well-executed, relying on bacterial genetics, cell biology, and protein interaction studies. The analysis is deep, beginning with discovering a new and conserved factor, then the molecular dissection of the protein, and finally, probing localization and interaction determinants. Finally, the authors show that these determinants are important for function; they perform these studies in parallel in three model systems.

      Weaknesses:

      The comparative analysis in the different organisms was on balance, a weakness. Mixing the data for the organisms together made the text difficult to read and took away key points from the results. The individual details crowded out the model in its current form. Indeed, because some of the phenotypes and localization dependencies differ between model systems, the comparison is challenging to the reader. The authors could more clearly state what these differences mean, why they arise, and (in the discussion) how they might relate to the organism's lifestyle.

      More experiments would be needed to fully analyze the effects of interacting proteins on individual protein stability; this absence slightly detracted from the conclusions.

      We have tried our best to improve the manuscript according to the insightful suggestions of the reviewers. Please find our answers to the raised issues below.

      Reviewer #1 (Recommendations For The Authors):

      We are very grateful to this reviewer for the very positive evaluation and the great suggestions to improve the manuscript.

      I think there is value to the comparative analysis but how to present it in such a way that the key similarities and differences stand out is the challenge. Perhaps a table that compares the three datasets is sufficient. Or tell the story of V. parahaemolyticus first to establish the model, followed by comparative analysis of the other two organisms highlighting differences and relegating similarities to supplemental?

      We agree that the our previous presentation of our comparative analysis made it very hard to follow the major findings and the general role(s) of FipA, and we are very grateful for the suggestions on how to improve this. We have decided to change the presentation as the reviewer recommended. We used V. parahaemolyticus as a ‚lead model‘ to describe the role of FipA, and we then compared the major findings to the other two species. We hope that the story is now easier to follow.

      This is not something that needs to be addressed in the text but I wanted to bring the protein SwrB to the authors' attention which may further expand FipA relevance. Bacillus subtilis uses FlhFG to somehow pattern flagella in a peritrichous arrangement and there are a number of striking similarities, in my opinion, between FipA and SwrB. The two proteins have very similar domain architecture/topology, both proteins promote flagellar assembly, and the genetic neighborhood/operon organization is uncannily similar. There are other more minor similarities dependent on the organism in this paper.

      Phillips, Kearns. 2021. Molecular and cell biological analysis of SwrB in Bacillus subtilis. J Bacteriol 203:e0022721

      Phillips, Kearns. 2015. Functional activation of the flagellar type III secretion export apparatus. PLoS Genet 11:e1005443.

      We thank this reviewer for pointing out these intriguing similarities. For this study we have decided to exclusively concentrate on polarly flagellated bacteria. FlhF und FlhG are also present in B. subtilis where they play a role in organizing flagellation, but we feel that this would be out of scope for this manuscript.

      Reviewer #2 (Recommendations For The Authors):

      We would like to thank this reviewer for the very positive evaluation and for pointing out several issues to strengthen the story.

      Figure 3A data are problematic since everything is too small to visualize. Since these are functional GFP fusions (or mCherry for 2E data), why are they not presented in color?

      Again - why are color figures not used to help the reader in Fig 4A and 5F & 5G to confirm what is asserted?

      Again, it is difficult to see the images presented. It is asserted that FipA is recruited to the cell pole after cell division and before flagellum assembly, but one has to take their word for it.

      We fully agree that in some case the localization pattern is hard to see on the micrographs presented. We have, therefore, provided enlarged micrographs in the supplemental part which allow to better see the fluorescent foci within the cells. With respect to presentations in color – we found that this did not improve the visibility of localizations and therefore have decided to use the grayscale images.

      Here, what is missing are turnover assays. Do FipA, FlhF, and HubP all co-localize as complex or is the absence of one leading to the protein turnover of other partners? I think this needs to be sorted out before final conclusions can be made.

      Thanks for pointing out this important point. We have now provided western analysis which demonstrate that FipA and FlhF are produced and stable in the absence of the other partners (see Supplemental Figure 5). Stability of HubP as a general polar marker not only required for flagellation was not determined.

      Minor comments:

      Line 58: change "around" to "in timing with"

      Line 79: what "signal" is transferred from the C-ring to the MS-ring. Are they not fully connected such that rotation is the entire structure - C-ring-MS-ring-Rod-Hook-Filament. Is it not the change in the relationship to the stator complex where the signal is transferred?

      Line 85: change "counting" to "control of flagellar numbers per cell"

      Line 110: change "is (co-)responsible for recruiting" to "facilitates recruitment of"

      Thanks for pointing this out. We have adjusted the wording according to the reviewer’s suggestions.

      Given that motility phenotypes vary on individual plates (volumes and dryness vary), why in Figure 2C are the motility assays for fipA and flhF mutants of P. putida done on different plates?

      For better visualisation, we have rearranged the spreading halos for the figure. All strain spreading comparisons on soft agar were always conducted on the same plate due to the reasons this reviewer mentioned.

      Reviewer #3 (Recommendations For The Authors):

      We thank this reviewer for the very positive evalution and the great suggestions.

      One possibility is to describe first all the results relating to FipA in Vibrio and then add the result sections at the end to illustrate the differences between Vibrio and Shewanella, and then Vibrio and Pseudomonas. This may make it easier to follow for the reader.

      We agree that the our previous presentation of our comparative analysis made it very hard to follow the major findings and the general role(s) of FipA, and we are very grateful for the suggestions on how to improve this. We have decided to change the presentation as the reviewer recommended. We used V. parahaemolyticus as a ‚lead model‘ to describe the role of FipA, and we then compared the major findings to the other two species. We hope that the story is now easier to follow.

      I would have liked to see some TEM analysis of flagella in fipA/hubP double mutants strains and was also wondering if FipA/FlhF/HubP colocalization had been studied in E. coli when all proteins are expressed together, at least with two bearing fluorescent tags.

      Thanks for these great suggestions. In this study, we have concentrated on the localization of FlhF by FipA and HubP. HubP has multiple functions in the cell and may also affect flagellar synthesis to some extent in a species-specific fashion. Therefore, any findings would have to be discussed very carefully, so we have decided to leave that out for the time being.

      With respect to the FipA/HubP/FlhF production in a heterologous host such as E. coli, this has been partly done (without FipA) in a second parallel story (see reference to Dornes et al (2024) in this manuscript). Rebuilding larger parts of the system in a heterologous host is currently done in an independent study. Therefore, we have decided not to include this already here.

      From the Reviewing Editor:

      We are grateful for handling the fair reviewing process, for the positive evaluation and the helpful hints.

      The microscopy was inconsistent (DIC versus phase) for unclear reasons. Did using different microscopes impact the ability to acquire low-intensity fluorescence signals? Please add a sentence in the Methods section to clarify.

      We are sorry for this inconsistency. As the imaging was carried out by different labs (to some part before the projects were joined), the corresponding preferred microscopy settings were used. We have added an explaining sentence to the Methods section.

      Also, some subcellular fluorescence localizations were not visible in the selected images (e.g., Figures 3 and 5). The reader had to rely on the authors' statements and analyses. The conclusions could be more robust with fluorescence measurements across the cell body for a subset of cells. The authors could provide this data analysis in the Supplemental; this measurement would more clearly show an accumulation of fluorescence at the cell pole, particularly in low-intensity images.

      We fully agree that in some case the localization pattern is hard to see on the micrographs presented. Unfortunately, often the signal is not sufficiently strong to provied proper demographs. We have, therefore, provided enlarged micrographs in the supplemental part, which allow to better see the fluorescent foci within the cells.

    1. Author response:

      We sincerely thank the reviewers for their thoughtful, critical, and constructive comments, which will help us in further exploring the mechanisms by which LDH regulates glycolysis, the tricarboxylic acid cycle, and oxidative phosphorylation future studies. The following is our responses to the reviewers' comments.

      Reviewer #1 (Public Review):

      Summary:

      Zeng et al. have investigated the impact of inhibiting lactate dehydrogenase (LDH) on glycolysis and the tricarboxylic acid cycle. LDH is the terminal enzyme of aerobic glycolysis or fermentation that converts pyruvate and NADH to lactate and NAD+ and is essential for the fermentation pathway as it recycles NAD+ needed by upstream glyceraldehyde-3-phosphate dehydrogenase. As the authors point out in the introduction, multiple published reports have shown that inhibition of LDH in cancer cells typically leads to a switch from fermentative ATP production to respiratory ATP production (i.e., glucose uptake and lactate secretion are decreased, and oxygen consumption is increased). The presumed logic of this metabolic rearrangement is that when glycolytic ATP production is inhibited due to LDH inhibition, the cell switches to producing more ATP using respiration. This observation is similar to the well-established Crabtree and Pasteur effects, where cells switch between fermentation and respiration due to the availability of glucose and oxygen. Unexpectedly, the authors observed that inhibition of LDH led to inhibition of respiration and not activation as previously observed. The authors perform rigorous measurements of glycolysis and TCA cycle activity, demonstrating that under their experimental conditions, respiration is indeed inhibited. Given the large body of work reporting the opposite result, it is difficult to reconcile the reasons for the discrepancy. In this reviewer's opinion, a reason for the discrepancy may be that the authors performed their measurements 6 hours after inhibiting LDH. Six hours is a very long time for assessing the direct impact of a perturbation on metabolic pathway activity, which is regulated on a timescale of seconds to minutes. The observed effects are likely the result of a combination of many downstream responses that happen within 6 hours of inhibiting LDH that causes a large decrease in ATP production, inhibition of cell proliferation, and likely a range of stress responses, including gene expression changes.

      Strengths:

      The regulation of metabolic pathways is incompletely understood, and more research is needed, such as the one conducted here. The authors performed an impressive set of measurements of metabolite levels in response to inhibition of LDH using a combination of rigorous approaches.

      Weaknesses:

      Glycolysis, TCA cycle, and respiration are regulated on a timescale of seconds to minutes. The main weakness of this study is the long drug treatment time of 6 hours, which was chosen for all the experiments. In this reviewer's opinion, if the goal was to investigate the direct impact of LDH inhibition on glycolysis and the TCA cycle, most of the experiments should have been performed immediately after or within minutes of LDH inhibition. After 6 hours of inhibiting LDH and ATP production, cells undergo a whole range of responses, and most of the observed effects are likely indirect due to the many downstream effects of LDH and ATP production inhibition, such as decreased cell proliferation, decreased energy demand, activation of stress response pathways, etc.

      We appreciate the reviewer’s critical comments. The main argument is whether the inhibition of LDH induces a temporal perturbation in glycolysis, the TCA cycle, and OXPHOS, or if it leads to a shift to a new steady state. We argue that this shift represents a transition between two steady states; specifically, GNE-140 treatment drives metabolism from one steady state to another.

      Before conducting the experiment, we performed a time course experiment, measuring glucose consumption and lactate production in cells treated with GNE-140. The results demonstrated a very good linearity, indicating that the glycolytic rate remained constant—thus confirming that glycolysis was at steady state. Given the tight coupling between glycolysis, the TCA cycle, and OXPHOS, we infer that the TCA cycle and OXPHOS were also at steady state. However, this ‘infer’ requires further confirmation.

      Multiple published reports have shown that LDH inhibition in cancer cells causes a shift from fermentative ATP production to respiratory ATP production. This notion persists because it is often compared to the well-established Crabtree and Pasteur effects, where cells toggle between fermentation and respiration based on glucose and oxygen availability. However, in the Pasteur or Crabtree effects, the deprivation of oxygen—the terminal electron acceptor—drives the switch, which is fundamentally different from LDH inhibition.

      Reviewer #2 (Public Review):

      Summary:

      Zeng et al. investigated the role of LDH in determining the metabolic fate of pyruvate in HeLa and 4T1 cells. To do this, three broad perturbations were applied: knockout of two LDH isoforms (LDH-A and LDH-B), titration with a non-competitive LDH inhibitor (GNE-140), and exposure to either normoxic (21% O2) or hypoxic (1% O2) conditions. They show that knockout of either LDH isoform alone, though reducing both protein level and enzyme activity, has virtually no effect on either the incorporation of a stable 13C-label from a 13C6-glucose into any glycolytic or TCA cycle intermediate, nor on the measured intracellular concentrations of any glycolytic intermediate (Figure 2). The only apparent exception to this was the NADH/NAD+ ratio, measured as the ratio of F420/F480 emitted from a fluorescent tag (SoNar).

      The addition of a chemical inhibitor, on the other hand, did lead to changes in glycolytic flux, the concentrations of glycolytic intermediates, and in the NADH/NAD+ ratio (Figure 3). Notably, this was most evident in the LDH-B-knockout, in agreement with the increased sensitivity of LDH-A to GNE-140 (Figure 2). In the LDH-B-knockout, increasing concentrations of GNE-140 increased the NADH/NAD+ ratio, reduced glucose uptake, and lactate production, and led to an accumulation of glycolytic intermediates immediately upstream of GAPDH (GA3P, DHAP, and FBP) and a decrease in the product of GAPDH (3PG). They continue to show that this effect is even stronger in cells exposed to hypoxic conditions (Figure 4). They propose that a shift to thermodynamic unfavourability, initiated by an increased NADH/NAD+ ratio inhibiting GAPDH explains the cascade, calculating ΔG values that become progressively more endergonic at increasing inhibitor concentrations.

      Then - in two separate experiments - the authors track the incorporation of 13C into the intermediates of the TCA cycle from a 13C6-glucose and a 13C5-glutamine. They use the proportion of labelled intermediates as a proxy for how much pyruvate enters the TCA cycle (Figure 5). They conclude that the inhibition of LDH decreases fermentation, but also the TCA cycle and OXPHOS flux - and hence the flux of pyruvate to all of those pathways. Finally, they characterise the production of ATP from respiratory or fermentative routes, the concentration of a number of cofactors (ATP, ADP, AMP, NAD(P)H, NAD(P)+, and GSH/GSSG), the cell count, and cell viability under four conditions: with and without the highest inhibitor concentration, and at norm- and hypoxia. From this, they conclude that the inhibition of LDH inhibits the glycolysis, the TCA cycle, and OXPHOS simultaneously (Figure 7).

      Strengths:

      The authors present an impressively detailed set of measurements under a variety of conditions. It is clear that a huge effort was made to characterise the steady-state properties (metabolite concentrations, fluxes) as well as the partitioning of pyruvate between fermentation as opposed to the TCA cycle and OXPHOS.

      A couple of intermediary conclusions are well supported, with the hypothesis underlying the next measurement clearly following. For instance, the authors refer to literature reports that LDH activity is highly redundant in cancer cells (lines 108 - 144). They prove this point convincingly in Figure 1, showing that both the A- and B-isoforms of LDH can be knocked out without any noticeable changes in specific glucose consumption or lactate production flux, or, for that matter, in the rate at which any of the pathway intermediates are produced. Pyruvate incorporation into the TCA cycle and the oxygen consumption rate are also shown to be unaffected.

      They checked the specificity of the inhibitor and found good agreement between the inhibitory capacity of GNE-140 on the two isoforms of LDH and the glycolytic flux (lines 229 - 243). The authors also provide a logical interpretation of the first couple of consequences following LDH inhibition: an increased NADH/NAD+ ratio leading to the inhibition of GAPDH, causing upstream accumulations and downstream metabolite decreases (lines 348 - 355).

      Weaknesses:

      Despite the inarguable comprehensiveness of the data set, a number of conceptual shortcomings afflict the manuscript. First and foremost, reasoning is often not pursued to a logical conclusion. For instance, the accumulation of intermediates upstream of GAPDH is proffered as an explanation for the decreased flux through glycolysis. However, in Figure 3C it is clear that there is no accumulation of the intermediates upstream of PFK. It is unclear, therefore, how this traffic jam is propagated back to a decrease in glucose uptake. A possible explanation might lie with hexokinase and the decrease in ATP (and constant ADP) demonstrated in Figure 6B, but this link is not made.

      We appreciate the reviewer's critical comment. In Figure 3C, there is no accumulation of F6P or G6P, which are upstream of PFK1. This is because the PFK1-catalyzed reaction sets a significant thermodynamic barrier. Even with treatment using 30 μM GNE-140, the ∆GPFK1 (Gibbs free energy of the PFK1-catalyzed reaction) remains -9.455 kJ/mol (Figure 3D), indicating that the reaction is still far from thermodynamic equilibrium, thereby preventing the accumulation of F6P and G6P.

      We agree with the reviewer that hexokinase inhibition may play a role, this requires further investigation.

      The obvious link between the NADH/NAD+ ratio and pyruvate dehydrogenase (PDH) is also never addressed, a mechanism that might explain how the pyruvate incorporation into the TCA cycle is impaired by the inhibition of LDH (the observation with which they start their discussion, lines 511 - 514).

      We agree with the reviewer’s comment. In this study, we did not explore how the inhibition of LDH affects pyruvate incorporation into the TCA cycle. As this mechanism was not investigated, we have titled the study: "Elucidating the Kinetic and Thermodynamic Insights into the Regulation of Glycolysis by Lactate Dehydrogenase and Its Impact on the Tricarboxylic Acid Cycle and Oxidative Phosphorylation in Cancer Cells."

      It was furthermore puzzling how the ΔG, calculated with intracellular metabolite concentrations (Figures 3 and 4) could be endergonic (positive) for PGAM at all conditions (also normoxic and without inhibitor). This would mean that under the conditions assayed, glycolysis would never flow completely forward. How any lactate or pyruvate is produced from glucose, is then unexplained.

      This issue also concerned me during the study. However, given the high reproducibility of the data, we consider it is true, but requires explanation.

      The PGAM-catalyzed reaction is tightly linked to both upstream and downstream reactions in the glycolytic pathway. In glycolysis, three key reactions catalyzed by HK2, PFK1, and PK are highly exergonic, providing the driving force for the conversion of glucose to pyruvate. The other reactions, including the one catalyzed by PGAM, operate near thermodynamic equilibrium and primarily serve to equilibrate glycolytic intermediates rather than control the overall direction of glycolysis, as previously described by us (J Biol Chem. 2024 Aug 8;300(9):107648).

      The endergonic nature of the PGAM-catalyzed reaction does not prevent it from proceeding in the forward direction. Instead, the directionality of the pathway is dictated by the exergonic reaction of PFK1 upstream, which pushes the flux forward, and by PK downstream, which pulls the flux through the pathway. The combined effects of PFK1 and PK may account for the observed endergonic state of the PGAM reaction.

      However, if the PGAM-catalyzed reaction were isolated from the glycolytic pathway, it would tend toward equilibrium and never surpass it, as there would be no driving force to move the reaction forward.

      Finally, the interpretation of the label incorporation data is rather unconvincing. The authors observe an increasing labelled fraction of TCA cycle intermediates as a function of increasing inhibitor concentration. Strangely, they conclude that less labelled pyruvate enters the TCA cycle while simultaneously less labelled intermediates exit the TCA cycle pool, leading to increased labelling of this pool. The reasoning that they present for this (decreased m2 fraction as a function of DHE-140 concentration) is by no means a consistent or striking feature of their titration data and comes across as rather unconvincing. Yet they treat this anomaly as resolved in the discussion that follows.

      GNE-140 treatment increased the labeling of TCA cycle intermediates by [13C6]glucose but decreased the OXPHOS rate, we consider the conflicting results as an 'anomaly' that warrants further explanation. To address this, we analyzed the labeling pattern of TCA cycle intermediates using both [13C6]glucose and  [13C5]glutamine. Tracing the incorporation of glucose- and glutamine-derived carbons into the TCA cycle suggests that LDH inhibition leads to a reduced flux of glucose-derived acetyl-CoA into the TCA cycle, coupled with a decreased flux of glutamine-derived α-KG, and a reduction in the efflux of intermediates from the cycle. These results align with theoretical predictions. Under any condition, the reactions that distribute TCA cycle intermediates to other pathways must be balanced by those that replenish them. In the GNE-140 treatment group, the entry of glutamine-derived carbon into the TCA cycle was reduced, implying that glucose-derived carbon (as acetyl-CoA) entering the TCA cycle must also be reduced, or vice versa.

      This step-by-step investigation is detailed under the subheading "The Effect of LDHB KO and GNE-140 on the Contribution of Glucose Carbon to the TCA Cycle and OXPHOS" in the Results section in the manuscript.

      In the Discussion, we emphasize that caution should be exercised when interpreting isotope tracing data. In this study, treatment of cells with GNE-140 led to an increase labeling percentage of TCAC intermediates by [13C6]glucose (Figure 5A-E). However, this does not necessarily imply an increase in glucose carbon flux into TCAC; rather, it indicates a reduction in both the flux of glucose carbon into TCAC and the flux of intermediates leaving TCAC. When interpreting the data, multiple factors must be considered, including the carbon-13 labeling pattern of the intermediates (m1, m2, m3, ---) (Figure 5G-K), replenishment of intermediates by glutamine (Figure 5M-V), and mitochondrial oxygen consumption rate (Figure 5W). All these factors should be taken into account to derive a proper interpretation of the data. 

      Reviewer #3 (Public Review):

      Hu et al in their manuscript attempt to interrogate the interplay between glycolysis, TCA activity, and OXPHOS using LDHA/B knockouts as well as LDH-specific inhibitors. Before I discuss the specifics, I have a few issues with the overall manuscript. First of all, based on numerous previous studies it is well established that glycolysis inhibition or forcing pyruvate into the TCA cycle (studies with PDKs inhibitors) leads to upregulation of TCA cycle activity, and OXPHOS, activation of glutaminolysis, etc (in this work authors claim that lowered glycolysis leads to lower levels of TCA activity/OXPHOS). The authors in the current work completely ignore recent studies that suggest that lactate itself is an important signaling metabolite that can modulate metabolism (actual mechanistic insights were recently presented by at least two groups (Thompson, Chouchani labs). In addition, extensive effort was dedicated to understanding the crosstalk between glycolysis/TCA cycle/OXPHOS using metabolic models (Titov, Rabinowitz labs). I have several comments on how experiments were performed. In the Methods section, it is stated that both HeLa and 4T1 cells were grown in RPMI-1640 medium with regular serum - but under these conditions, pyruvate is certainly present in the medium - this can easily complicate/invalidate some findings presented in this manuscript. In LDH enzymatic assays as described with cell homogenates controls were not explained or presented (a lot of enzymes in the homogenate can react with NADH!). One of the major issues I have is that glycolytic intermediates were measured in multiple enzyme-coupled assays. Although one might think it is a good approach to have quantitative numbers for each metabolite, the way it was done is that cell homogenates (potentially with still traces of activity of multiple glycolytic enzymes) were incubated with various combinations of the SAME enzymes and substrates they were supposed to measure as a part of the enzyme-based cycling reaction. I would prefer to see a comparison between numbers obtained in enzyme-based assays with GC-MS/LC-MS experiments (using calibration curves for respective metabolites, of course). Correct measurements of these metabolites are crucial especially when thermodynamic parameters for respective reactions are calculated. Concentrations of multiple graphs (Figure 1g etc.) are in "mM", I do not think that this is correct.

      While the roles of lactate as a signaling metabolite and metabolic models are important areas of research, our work focuses on different aspects.

      It is true that cell homogenates contain many enzymes that use NAD as a hydride acceptor or NADH as a hydride donor. However, in our assay system, the substrates are pyruvate and NADH, meaning only enzymes that catalyze the conversion of pyruvate + NADH to NAD + lactate can utilize NADH. Other enzymes do not interfere with this reaction. Although some enzymes may also catalyze this reaction, their catalytic efficiency is markedly lower than that of LDH, ensuring the validity of this assay.

      Similarly, the assays for glycolytic intermediates are validated by the substrate specificity.

      We have developed an LC-MS methodology for some glycolytic intermediates, but the accuracy of quantification remains unsatisfactory due to inherent limitations of this methodology.

    1. This leads us to ultimately conclude that while the concept of learning styles is appealing, at this point, it is still a myth.

      Article review: This article discusses the idea of "learning styles," disputes their standing as legitimate in educational circles, and offers alternative options. Overall, I think this article presents a solid argument for why, while we put lots of stock in the idea of them, learning styles may not be accurate or helpful in the long run.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors use microscopy experiments to track the gliding motion of filaments of the cyanobacteria Fluctiforma draycotensis. They find that filament motion consists of back-and-forth trajectories along a "track", interspersed with reversals of movement direction, with no clear dependence between filament speed and length. It is also observed that longer filaments can buckle and form plectonemes. A computational model is used to rationalize these findings.

      We thank the reviewer for this accurate summary of the presented work.

      Strengths:

      Much work in this field focuses on molecular mechanisms of motility; by tracking filament dynamics this work helps to connect molecular mechanisms to environmentally and industrially relevant ecological behavior such as aggregate formation.

      The observation that filaments move on tracks is interesting and potentially ecologically significant.

      The observation of rotating membrane-bound protein complexes and tubular arrangement of slime around the filament provides important clues to the mechanism of motion.

      The observation that long filaments buckle has the potential to shed light on the nature of mechanical forces in the filaments, e.g. through the study of the length dependence of buckling.

      We thank the reviewer for listing these positive aspects of the presented work.

      Weaknesses:

      The manuscript makes the interesting statement that the distribution of speed vs filament length is uniform, which would constrain the possibilities for mechanical coupling between the filaments. However, Figure 1C does not show a uniform distribution but rather an apparent lack of correlation between speed and filament length, while Figure S3 shows a dependence that is clearly increasing with filament length. Also, although it is claimed that the computational model reproduces the key features of the experiments, no data is shown for the dependence of speed on filament length in the computational model. The statement that is made about the model "all or most cells contribute to propulsive force generation, as seen from a uniform distribution of mean speed across different filament lengths", seems to be contradictory, since if each cell contributes to the force one might expect that speed would increase with filament length.

      We agree that the data shows in general a lack of correlation, rather than strictly being uniform. In the revised manuscript, we intend to collect more data from observations on glass to better understand the relation between filament length and speed. 

      In considering longer filaments, one also needs to consider the increased drag created by each additional cell - in other words, overall friction will either increase or be constant as filament length increases. Therefore, if only one cell (or few cells) are generating motility forces, then adding more cells in longer filaments would decrease speed.

      Since the current data does not show any decrease in speed with increasing filament length, we stand by the argument that the data supports that all (or most) cells in a filament are involved in force generation for motility. We would revise the manuscript to make this point - and our arguments about assuming multiple / most cells in a filament contributing to motility - clear.

      The computational model misses perhaps the most interesting aspect of the experimental results which is the coupling between rotation, slime generation, and motion. While the dependence of synchronization and reversal efficiency on internal model parameters are explored (Figure 2D), these model parameters cannot be connected with biological reality. The model predictions seem somewhat simplistic: that less coupling leads to more erratic reversal and that the number of reversals matches the expected number (which appears to be simply consistent with a filament moving backwards and forwards on a track at constant speed).

      We agree that the coupling between rotation, slime generation and motion is interesting and important when studying the specific mechanism leading to filament motion. However, we believe it even more fundamental to consider the intercellular coordination that is needed to realise this motion. Individual filaments are a collection of independent cells. This raises the question of how they can coordinate their thrust generation in such a way that the whole filament can both move and reverse direction of motion as a single unit. With the presented model, we want to start addressing precisely this point.

      The model allows us to qualitatively understand the relation between coupling strength and reversals (erratic vs. coordinated motion of the filament). It also provides a hint about the possibility of de-coordination, which we then look for and identify in longer filaments.

      While the model results seem obvious in hindsight, the analysis of the model allows phrasing the question of cell-to-cell coordination, which has not been brought up previously when considering the inherently multi-cell process of filament motility.

      Filament buckling is not analysed in quantitative detail, which seems to be a missed opportunity to connect with the computational model, eg by predicting the length dependence of buckling.

      Please note that Figure S10 provides an analysis of filament length and number of buckling instances observed. This suggests that buckling happens only in filaments above a certain length.

      We do agree that further analyses of buckling - both experimentally and through modelling would be interesting.  This study, however,  focussed on cell-to-cell coupling / coordination during filament motility. We have identified the possibility of de-coordination through the use of a simple 1D model of motion, and found evidence of such de-coordination in experiments. Notice that the buckling we report does not depend on the filament hitting an external object. It is a direct result of a filament activity which, in this context, serves as evidence of cellular de-coordination.

      Now that we have observed buckling and plectoneme formation, these processes need to be analysed with additional experiments and modelling. The appropriate model for this process needs to be 3D, and should ideally include torques arising from filament rotation. Experimentally, we need to identify means of influencing filament length and motion and see if we can measure buckling frequency and position across different filament lengths. These works are ongoing and will have to be summarised in a separate, future publication.

      Reviewer #2 (Public review):

      Summary:

      The authors combined time-lapse microscopy with biophysical modeling to study the mechanisms and timescales of gliding and reversals in filamentous cyanobacterium Fluctiforma draycotensis. They observed the highly coordinated behavior of protein complexes moving in a helical fashion on cells' surfaces and along individual filaments as well as their de-coordination, which induces buckling in long filaments.

      We thank the reviewer for this accurate summary of the presented work.

      Strengths:

      The authors provided concrete experimental evidence of cellular coordination and de-coordination of motility between cells along individual filaments. The evidence is comprised of individual trajectories of filaments that glide and reverse on surfaces as well as the helical trajectories of membrane-bound protein complexes that move on individual filaments and are implicated in generating propulsive forces.

      We thank the reviewer for listing these positive aspects of the presented work.

      Limitations:

      The biophysical model is one-dimensional and thus does not capture the buckling observed in long filaments. I expect that the buckling contains useful information since it reflects the competition between bending rigidity, the speed at which cell synchronization occurs, and the strength of the propulsion forces.

      Cell-to-cell coordination is a more fundamental phenomenon than the buckling and twisting of longer filaments, in that the latter is a consequence of limits of the former. In this sense, we are focussing here on something that we think is the necessary first step to understand filament gliding. The 3D motion of filaments (bending, plectoneme formation) is fascinating and can have important consequences for collective behaviour and macroscopic structure formation. As a consequence of cellular coupling, however, it is beyond the scope of the present paper.

      Please also see our response above. We believe that the detailed analysis of buckling and plectoneme formation requires (and merits) dedicated experiments and modelling which go beyond the focus of the current study (on cellular coordination) and will constitute a separate analysis that stands on its own. We are currently working in that direction.

      Future directions:

      The study highlights the need to identify molecular and mechanical signaling pathways of cellular coordination. In analogy to the many works on the mechanisms and functions of multi-ciliary coordination, elucidating coordination in cyanobacteria may reveal a variety of dynamic strategies in different filamentous cyanobacteria.

      We thank the reviewer for highlighting this point again and seeing the value in combining molecular and dynamical approaches.

      Reviewer #3 (Public review):

      Summary:

      The authors present new observations related to the gliding motility of the multicellular filamentous cyanobacteria Fluctiforma draycotensis. The bacteria move forward by rotating their about their long axis, which causes points on the cell surface to move along helical paths. As filaments glide forward they form visible tracks. Filaments preferentially move within the tracks. The authors devise a simple model in which each cell in a filament exerts a force that either pushes forward or backwards. Mechanical interactions between cells cause neighboring cells to align the forces they exert. The model qualitatively reproduces the tendency of filaments to move in a concerted direction and reverse at the end of tracks.

      We thank the reviewer for this accurate summary of the presented work.

      Strengths:

      The observations of the helical motion of the filament are compelling.

      The biophysical model used to describe cell-cell coordination of locomotion is clear and reasonable. The qualitative consistency between theory and observation suggests that this model captures some essential qualities of the true system.

      The authors suggest that molecular studies should be directly coupled to the analysis and modeling of motion. I agree.

      We thank the reviewer for listing these positive aspects of the presented work and highlighting the need for combining molecular and biophysical approaches.

      Weaknesses:

      There is very little quantitative comparison between theory and experiment. It seems plausible that mechanisms other than mechano-sensing could lead to equations similar to those in the proposed model. As there is no comparison of model parameters to measurements or similar experiments, it is not certain that the mechanisms proposed here are an accurate description of reality. Rather the model appears to be a promising hypothesis.

      We agree with the referee that the model we put forward is one of several possible. We note, however, that the assumption of mechanosensing by each cell - as done in this model - results in capturing both the alignment of cells within a filament (with some flexibility) and reversal dynamics. We have explored an even more minimal 1D model, where the cell’s direction of force generation is treated as an Ising-like spin and coupled between nearest neighbours (without assuming any specific physico-chemical basis). We found that this model was not fully able to capture both phenomena. In that model, we found that alignment required high levels of coupling (which is hard to justify except for mechanical coupling) and reversals were not readily explainable (and required additional assumptions). These points led us to the current, mechanically motivated model.

      The parameterisation of the current model would require measuring cellular forces. To this end, a recent study has attempted to measure some of the physical parameters in a different filamentous cyanobacteria [1] and in our revision we will re-evaluate model parameters and dynamics in light of that study. We will also attempt to directly verify the presence of mechano-sensing by obstructing the movement of filaments.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The authors present valuable findings on how to determine the genetic architecture of extreme phenotype values by using data on sibling pairs. While the authors' derivations of the method are correct, the scenarios considered are incomplete, making it difficult to have confidence in the interpretation of the results as demonstrating the influence of de-novo or Mendelian (rare, penetrant-variant) architectures. The method nevertheless shows promise and will be of interest to researchers studying complex trait genetics. 

      A.1: We have now expanded our consideration of the scenarios and we have ensured that we do not over-interpret our results as being due to de novo or Mendelian architectures. Instead, we make clear that our statistical tests are powered to identify these architectures but that there are other potential causes of significant results (e.g. measurement error or uncontrolled environmental factors from heavy-tailed distributions), making follow-up validation studies necessary before underlying architectures can be confirmed. We consider this to be typical of observational research, in which significant results may indicate causal effects unless uncontrolled confounding factors explain the observed associations, requiring experimental/trial follow-up for validation. We believe that our tests are useful for providing initial inference, and that in some settings – e.g. prioritising samples for sequencing to identify rare variants – could be useful as an initial screening step to increase the efficacy of a planned analysis or study.

      Additionally, we have now developed “SibArc”, an openly available software tool that takes input sibling trait data and estimates conditional sibling heritability across the trait distribution. Then - based on our theoretical framework developed and described in the paper - for each tail of the trait distribution, estimates effect sizes and generates P-values corresponding to our de novo and Mendelian tests, and performs a Kolmogorov-Smirnov test to identify general departures from our null model. Furthermore, SibArc also provides additional functionality for users under preliminary beta form, for example, running an iterative optimisation routine to infer approximate relative degrees of polygenic, de novo, and Mendelian architectures prevailing in each trait tail. We have made this software tool, Quick Start tutorial, and sample data available online at Github and are hosting these on a dedicated website: www.sibarc.net.

      Reviewer #1 (Public Review):

      This is a clever and well-done paper that should be published. The authors sought to craft a method, applicable to biobank-scale data but without necessarily using genotyping or sequencing, to detect the presence of de novo mutations and rare variants that stand out from the polygenic background of a given trait. Their method depends essentially on sibling pairs where one sibling is in an extreme tail of the phenotypic distribution and whether the other sibling's regression to the mean shows a systematic deviation from what is expected under a simple polygenic architecture. 

      Their method is successful in that it builds on a compelling intuition, rests on a rigorous derivation, and seems to show reasonable statistical power in the UK Biobank. (More biobanks of this size will probably become available in the near future.)  It is somewhat unsuccessful in that rejection of the null hypothesis does not necessarily point to the favored hypothesis of de novo or rare variants. The authors discuss the alternative possibility of rare environmental events of large effect. Maybe attention should be drawn to this in the abstract or the introduction of the paper. Nevertheless, since either of these possibilities is interesting, the method remains valuable. 

      A.2: We agree with the reviewer that we should have made it clearer that - while our statistical tests are powered to identify de novo and Mendelian architectures – significant findings from our tests could also be explained by rare environmental events of large effect (specifically by uncontrolled environmental factors with heavy-tailed distributions). We have now made this clear throughout the manuscript (see A.1).

      Moreover, we agree with the reviewer that whether the cause of deviations from expectations are due to de novo or rare variants, or environmental factors, either possibility is interesting. For example, in either scenario, our results can highlight inaccuracy in PRS prediction of extreme trait values for certain traits, and also provides a relative measure across different traits of large effects impacting on the trait tails, irrespective of whether genetic or environmental. We now place more emphasis on this point throughout the manuscript.

      Reviewer #2 (Public Review):

      Souaiaia et al. attempt to use sibling phenotype data to infer aspects of genetic architecture affecting the extremes of the trait distribution. They do this by considering deviations from the expected joint distribution of siblings' phenotypes under the standard additive genetic model, which forms their null model. They ascribe excess similarity compared to the null as due to rare variants shared between siblings (which they term 'Mendelian') and excess dissimilarity as due to de-novo variants. While this is a nice idea, there can be many explanations for rejection of their null model, which clouds interpretation of Souaiaia et al.'s empirical results.

      A.3: We agree with the reviewer that we should have made clearer that there are other explanations for significant results from our tests and we have now fully addressed this point – (see A.1, A.2, A.4, A.5 for more detail).  In addition, we now elaborate on exactly what our null hypothesis is: which is not only that the expected joint distribution of siblings’ phenotypes is governed by the standard additive genetic model, but that environmental effects are either controlled for or else their combined effect is approximately Gaussian. Furthermore, by selecting only those traits whose raw trait distribution most closely corresponds to a Gaussian distribution from the UK Biobank, we increase the probability that significant results from our tests are due to rare variants (shared or unshared among siblings).

      The authors present their method as detecting aspects of genetic architecture affecting the extremes of the trait distribution. However, I think it would be better to characterize the method as detecting whether siblings are more or less likely to be aggregated in the extremes of the phenotype distribution than would be predicted under a common variant, additive genetic model.

      A.4: As discussed above we should have stated more clearly that significant results could be due to non-genetic factors, we have now addressed this.

      However, we do not think that it would be appropriate to characterise our tests as merely corresponding to over and under aggregation of siblings in the tails. Firstly, environmental factors should be controlled for as part of our testing, increasing the probability that significant results are due to genetic, and not environmental factors. Secondly, tests for identifying broad over and under aggregation of siblings in the tails should be designed differently and, accordingly, the tests that we have developed here would not be optimal to detect over/under aggregation of siblings in trait tails. Our test for inference of de novo variants, for example, exploits the fact that de novo alleles of large effect result in one sibling being extreme and all others being drawn from the background distribution, so that the mean of other siblings is relatively low – not merely that other siblings are less likely to be found in the tail. For more discussion on this issue in relation to one of reviewer 1’s points, see A.9.

      Exactly how the rareness and penetrance of a genetic variant influence the conditional sibling phenotype distribution at the extremes is not made clear. The contrast between de-novo and 'Mendelian' architectures is somewhat odd since these are highly related phenomena: a 'Mendelian' architecture could be due to a de-novo variant of the previous generation. The fact that these two phenomena are surmised to give opposing signatures in the authors' statistical tests seems suboptimal to me: would it not be better to specify a parameter that characterizes the degree or sharing between siblings of rare factors of large effect? This could be related to the mixture components in the bimodal distribution displayed in Fig 1. In fact, won't the extremes of all phenotypes be influenced by all three types of variants (common, rare, de-novo) to greater or lesser degree? By framing the problem as a hypothesis testing problem, I think the authors are obscuring the fact that the extremes of real phenotypes likely reflect a mixture of causes: common, de-novo, and rare variants (and shared and non-shared environmental factors). 

      A.5: We absolutely recognise that there will typically be a complex and continuous mix of genetic architectures underlying complex traits in their tails, dictated by the 2-dimensional relationship between allele frequency and effect size. We did consider developing a fully Bayesian statistical framework to model this, but soon realised that doing this properly would require a substantial amount of model development, accounting for multiple factors in ways that would require a great deal of further investigation; for example, performing a range of complex simulations to investigate the effects of different selective pressures over time, different patterns of assortative mating, and effect size generating distributions. We are in the process of applying for funding for a multi-year project that will perform exactly these investigations as a step towards developing more sophisticated models of inference. In the meantime, we do believe that the simpler hypothesis-testing framework that we have developed here does have important value. Assuming that environmental factors are accounted for, or that any that are not accounted for have combined Gaussian effects, then our tests will indeed infer enrichments of de novo and ‘Mendelian’ rare alleles of large effect in the tails of complex traits. Results from these tests can also be compared within and across traits to compare the relative degree of such enrichments among traits. For some traits we observe significant results from both tests, and for other traits we observe highly significant results from one of our tests but not the other. Thus, while our tests do not provide a complete picture about the genetic architecture in the tails of complex traits, they do offer some intriguing initial insights into tail architecture, important given the enrichment of disease in trait tails.

      To better enable interpretation of the results of this method, a more comprehensive set of simulations is needed. Factors that may influence the conditional distribution of siblings' phenotypes beyond those considered include: non-normal distribution, assortative mating, shared environment, interactions between genetic and shared environmental factors, and genetic interactions. 

      A.6: As described above (see A.5) we do agree that a more comprehensive set of simulations is exactly what is needed to further extend this work. However, we believe that the tests that we have developed so far, which make some simplifying assumptions that we think would often hold in practice, is a useful start to what is an entirely novel approach to inferring genetic architecture from family trait-only (non-genetic) data. Our work could already be useful for method developers who may wish to extend our approach in ways that we may not think of. It could also be useful for applied scientists focusing on specific traits who will be able to gain initial, inference-level, insights by applying our tests to their data, while the results of applying our tests may even guide study design of rare variant mapping studies.

      In summary, I think this is a promising method that is revealing something interesting about extreme values of phenotypes. Determining exactly what is being revealed is going to take a lot more work, however. 

      A.7: We thank the reviewer for highlighting the promise in our approach and agree that it is revealing something interesting about complex traits. We also agree that it is going to take a lot more work to reveal exactly what that is for different traits, which we plan to work on ourselves and hope that this paper will help other interested scientists to follow-up on and extend as well.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      R.1.1: Why these particular traits (body fat, mean corpuscular haemoglobin, neuroticism, heel bone mineral density, monocyte count, sitting height)? 

      A.8: Traits were initially selected to cover a variety of traits (anthropometric, metabolic, personality..) and to illustrate different examples of tail architecture. However, in response to a point from reviewer 2 (see A.17), we have now overhauled our quality control of traits to ensure that only traits closely matching Gaussian distributions are included. In total, 18 traits were selected, with detailed results presented in Appendix 4 and results corresponding to 6 of the traits presented in the main text (Figure 6) to show examples of different types of tail architecture.

      R.1.2: Why are there separate tests for de novo and Mendelian architectures? It seems that one could use either of the derived tests for both purposes, simply by switching to a two-sided test for each tail. My guess is that the score test of whether alpha is zero would be the more statistically powerful test. 

      A.9: The score test of whether alpha is zero has limited power to detect Mendelian architectures. This is because under Mendelian effects, half the siblings in a family have trait values reflecting the background distribution, such that the mean of sibling trait values is not so different from the polygenic expectation (i.e. alpha close to 0). The Mendelian score test that we developed is substantially more powerful because it evaluates co-occurrence of siblings in the tails, which is far higher under Mendelian architecture in the tail than compared to polygenic architecture.

      However, in order test for general departures from our null model, including those of non-Gaussian environmental factors, we now include results from performing a Kolmogorov-Smirnoff test of difference from the expected distribution, and also provide this test as an option in our ‘SibArc’ software tool.

      R.1.3: This method assumes that assortative mating is absent. I worry that sitting height might not be a good trait to analyze, since there is some assortative mating (~0.3) for height (e.g., Yengo et al., 2018). Perhaps this trait should not be included among those that are analyzed in this paper. Then again, it is possible that there is less assortative mating for sitting height than total height (i.e., leg length) (Jensen & Sinha, 1993). 

      A.10:  It is true that our method assumes random mating. We note that while  assortative mating increases sibling similarity relative to expectation, if it is stable across the trait distribution it will also bias heritability estimation upward which is likely it’s potential impact in our framework.  However, if assortative mating is more prevalent in the tails of the distribution, it can result in excess kurtosis – an impact that can increase false positive Mendelian tests and false negative de novo tests.  Given that the trait distribution for Sitting Height has only moderate excess Kurtosis (~0.4, see Fig 9, Appendix 4) and we inferred de novo architecture only for this trait, we feel that including it in the paper is appropriate. 

      R.1.4: I wonder if it's possible to discuss the impact of non-additive genetic variance on the method. How does this affect the estimation of heritability, which calibrates the expectation for regression to the mean? Can non-additive genetic deviations explain a rejection of the null hypothesis of simple polygenicity? 

      A.11: Yes, the heritability estimation, which calibrates expectation for regression to the mean, assumes additivity of effects, as do the most popular estimators of heritability from GWAS data in the field: GCTA-GREML, LD Score regression and LDAK. Accordingly, non-additive genetic effects could result in rejection of the null hypothesis. We have highlighted this point in the Discussion. However, we also point out that current evidence suggests that the contribution of non-additive genetic effects to complex trait variation is relatively small (Hivert 2021) and that non-additive genetic effects that have a similar impact across the trait distribution should not be a problem for our approach (only those that have an increasing effect towards the tails would be).

      R.1.5: p.5: Maybe a more realistic way to simulate a genetic architecture is to draw the MAF from the distribution [MAF(1 - MAF)]^{-1} and then an effect of the minor allele from some mound-shaped distribution (e.g., mixture of normals). The absolute or squared effect of the minor allele should increases as the MAF decreases, and there have been some papers trying to estimate this relationship (e.g., Zeng et al., 2021). Maybe make the number of causal SNPs 10,000. I don't rate this as an urgent suggestion because my sense is that the method should be robust, making adequate even a fairly minimal simulation confirming its accuracy. 

      A.11: In separate work, we have performed a comprehensive simulation study using the forward-in-time population genetic simulator SLIM-3 (Haller and Messer, 2019), which generates genetic effects according to Gaussian and Gamma distributions and models different selective pressures on complex traits. We plan to publish this work shortly and also extend the simulations to family data, from which we will be able to test the performance of our methods here under a range of different scenarios of genetic variation generation, including a variety of relationships between allele frequency and effect sizes. We agree with the reviewer that at this point, however, our minimal simulation should be sufficient to confirm our tests’ general robustness and so we will perform further testing once we have extended our more sophisticated simulation study.

      R.1.6: p.6: Step D seems to leave out a normalization of G to have unit variance. Also, the last part should say "the square of the correlation between the genetic liability and the trait is equal to the heritability." 

      A.12: Corrected – we thank the reviewer for spotting this.

      R.1.7: Figure 5: The power being adequate if roughly 1 of a 1000 index siblings with an extreme trait value owes their values to de novo mutations makes me think that there should be a discussion of the prior probability. The average person carries about 80 de novo mutations. How many of these are likely to affect, e.g., height? Zeng et al. (2021) gave estimates of mutational targets. Given that a mutation affects height, will its likely effect size be large enough to be detected with the method? Kemper et al. (2012) discussed this point in a perhaps useful way. 

      A.13: We find the work investigating mutational target sizes and generating effect sizes of different mutations (de novo or rare) to be extremely interesting and critical for understanding the causes of observed genetic variation. However, we think that this work is insufficiently progressed at this point to build on directly here for making more nuanced interpretation of our results. We are, however, exploring the impact of mutational target sizes, effect size distributions and selection effects, on the genetic architecture of complex traits via population genetic simulations (see A.11), and so we hope to be able to provide more in-depth interpretation of our results in the future.

      R.1.8: Figure 6: The number in the tables for Mendelian architecture are presumably observed and expected counts. But what about the numbers for de novo architecture? Those don't look like counts. Maybe they are conditional expectations of standardized trait values. Whatever the case may be, the caption should provide an explanation. 

      A.14: The observed and expected values for the de novo statistical test represent the expected and observed mean standardized trait values for siblings of individuals in the bottom and top 1% of the distribution. We have now made this clear in our updated figure.

      R.1.9: p. 16: Element (2,1) in the precision matrix after Equation 15 is missing a negative sign. 

      A.15: Corrected – we thank the reviewer for spotting this.

      R.1.10: p. 20: Shouldn't Equation 20 place an exponent of n on the factor outside of the exponential? 

      A.16: Corrected – we thank the reviewer for spotting this.

      Reviewer #2 (Recommendations For The Authors):

      R.2.1: The first concern that I have is that their statistical tests rely heavily on an assumption of bivariate normal distribution for sibling pair's phenotypes. Real phenotypes do not have such a distribution in general. The authors rely upon an inverse-normal transform when applying their method to real data. While the inverse-normal transform will ensure that the siblings' phenotypes have a marginal normal distribution, such a transform does not ensure that the joint distribution is bivariate normal. The authors should examine their procedure for simulated phenotypes with a non-normal distribution to see if their statistical tests remain properly calibrated. Related to this, I am concerned about applying an inverse normal transform to the neuroticism phenotype that contains only 13 unique values in UKB. How does the transform deal with tied values? Can we sensibly talk about extreme trait values for such a set of observations? 

      A.17: The reviewer is correct that a bivariate normal distribution for sibling pairs’ trait values does not necessarily hold, and only does so if the assumptions of our null model are met (polygenic effects, Gaussian environmental effects, random mating..). We have now more clearly described the assumptions of our null model, and to increase the matching of our selected traits to those assumptions we have expanded our analyses and now present results on traits that are close to Gaussian. As part of this more strict quality control, only traits with more than 50 unique values are included, meaning that neuroticism is excluded in our final analysis. We also now note that performing an inverse normal transformation on the traits only increases the robustness of the tests to some of our modelling assumptions. In future work we plan to investigate how best to model the conditional sibling distribution under a variety of non-Gaussian environmental effects and different non-random patterns of mating.

      R.2.2: The joint sibling phenotype distribution (Equation 4) can be derived by applying the formula for the conditional distribution of a multivariate Gaussian to the standard additive genetic model. The authors' derivation is unnecessarily complex. Furthermore, many of the formulae have been used in Shai Carmi's work on embryo screening, but this work is not cited. 

      A.18: We now state in the text that the conditional sibling distribution can also be derived from the joint trait distribution of related individuals, which we use in our extension to the 3-sibling scenario, and cite Shai Carmi’s work where this is used. The joint distribution is a more straightforward way to derive the conditional sibling distribution, but our derivation based on considering mid-parents is generalisable to cases where assumptions of random mating, Gaussian population trait distribution and no selection do not hold. We also think that our mid-parent based derivation will be more intuitive to many readers, leading to greater understanding and potential for extension. Therefore, overall we believe that its presentation is worthwhile and we have now elaborated on this in the Methods.

      R.2.3: Equation 8: this probability should be conditional on s1 

      A.19: Corrected – we thank the reviewer for spotting this.

      R.2.4: The empirical application to UKB data is lacking methodological details. Also, the number of siblings used is low compared to the number of available sibling pairs. Around 19k sibling pairs are available in the UKB white British subsample, but only 10k were used for height. Why? Also, why are extreme values excluded? Isn't this removing the signal the authors are looking to explain?

      A.20: We have now provided more methodological details throughout the Methods section, in particular in relation to the samples used and quality control performed. The removal of individuals with extreme values, in particular, is because unusually low/high trait values are more likely to be due to measurement error (e.g. due to imperfect measuring device, or storage/assaying) than for typical values, and so while this may also result in some loss in power (albeit small due to few individuals having values +/- 8 s.d. trait means) we consider it worth it for the potential reduction in type I error. In performing our newly expanded analysis (described above), and accounting for the reviewer’s point here about sample size, we did find a bug in our pipeline that meant that we did not include as many sibling pairs as available. We thank the reviewer for spotting this, since this contributed to our new analysis being substantially more powerful than the original (including up to ~17k sibling pairs depending on completeness of trait data).

      Benjamin C Haller, Phillip W Messer. SLiM 3: Forward Genetic Simulations Beyond the Wright–Fisher Model. Molecular Biology and Evolution. 2019. 36(3): 632-637.

      SD Whiteman, SM McHale, A Soli. Theoretical Perspectives on Sibling Relationships. J Fam Theory Rev. 2011 Jun 1;3(2):124-139.

      Nicholas H Barton, Alison M Etheridge, and Amandine Véber. The infinitesimal model: Definition, derivation, and implications. Theoretical population biology, 118:50–73, 2017.

      Valentin Hivert et al. “Estimation of non-additive genetic variance in human complex traits from a large sample of unrelated individuals.” American journal of human genetics vol. 108,5 (2021)

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors demonstrate that it is possible to carry out eQTL experiments for the model eukaryote S. cerevisiae, in "one pot" preparations, by using single-cell sequencing technologies to simultaneously genotype and measure expression. This is a very appealing approach for investigators studying genetic variation in single-celled and other microbial systems, and will likely inspire similar approaches in non-microbial systems where comparable cell mixtures of genetically heterogeneous individuals could be achieved.

      Strengths:

      While eQTL experiments have been done for nearly two decades (the corresponding author's lab are pioneers in this field), this single-cell approach creates the possibility for new insights about cell biology that would be extremely challenging to infer using bulk sequencing approaches. The major motivating application shown here is to discover cell occupancy QTL, i.e. loci where genetic variation contributes to differences in the relative occupancy of different cell cycle stages. The authors dissect and validate one such cell cycle occupancy QTL, involving the gene GPA1, a G-protein subunit that plays a role in regulating the mating response MAPK pathway. They show that variation at GPA1 is associated with proportional differences in the fraction of cells in the G1 stage of the cell cycle. Furthermore, they show that this bias is associated with differences in mating efficiency.

      Weaknesses:

      While the experimental validation of the role of GPA1 variation is well done, the novel cell cycle occupancy QTL aspect of the study is somewhat underexploited. The cell occupancy QTLs that are mentioned all involve loci that the authors have identified in prior studies that involved the same yeast crosses used here. It would be interesting to know what new insights, besides the "usual suspects", the analysis reveals. For example, in Cross B there is another large effect cell occupancy QTL on Chr XI that affects the G1/S stage. What candidate genes and alleles are at this locus? And since cell cycle stages are not biologically independent (a delay in G1, could have a knock-on effect on the frequency of cells with that genotype in G1/S), it would seem important to consider the set of QTLs in concert.

      We thank the reviewer for this suggested clarification. We have modified the text to make it clear that cell cycle occupancy is a compositional phenotype. Like the reviewer, we also noticed the distal trans eQTL hotspot on Chr XI in Cross B, but we were not able to identify compelling candidate gene(s) or variant(s) despite extensive effort.

      Reviewer #2 (Public Review):

      Boocock and colleagues present an approach whereby eQTL analysis can be carried out by scRNA-Seq alone, in a one-pot-shot experiment, due to genotypes being able to be inferred from SNPs identified in RNA-Seq reads. This approach obviates the need to isolate individual spores, genotype them separately by low-coverage sequencing, and then perform RNA-Seq on each spore separately. This is a substantial advance and opens up the possibility to straightforwardly identify eQTLs over many conditions in a cost-efficient manner. Overall, I found the paper to be well-written and well-motivated, and have no issues with either the methodological/analytical approach (though eQTL analysis is not my expertise), or with the manuscript's conclusions.

      I do have several questions/comments.

      393 segregant experiment:

      For the experiment with the 393 previously genotyped segregants, did the authors examine whether averaging the expression by genotype for single cells gave expression profiles similar to the bulk RNA-Seq data generated from those genotypes? Also, is it possible (and maybe not, due to the asynchronous nature of the cell culture) to use the expression data to aid in genotyping for those cells whose genotypes are ambiguous? I presume it might be if one has a sufficient number of cells for each genotype, though, for the subsequent one-pot experiments, this is a moot point.

      As mentioned in our preliminary response, while it is possible to expand the analysis along these lines, this is not relevant for the subsequent one-pot experiments. We have made all the data available so that anyone interested can try these analyses.

      Figure 1B:

      Is UMAP necessary to observe an ellipse/circle - I wouldn't be surprised if a simple PCA would have sufficed, and given the current discussion about whether UMAP is ever appropriate for interpreting scRNA-Seq (or ancestry) data, it seems the PCA would be a preferable approach. I would expect that the periodic elements are contained in 2 of the first 3 principal components. Also, it would be nice if there were a supplementary figure similar to Figure 4 of Macosko et al (PMID 26000488) to indeed show the cell cycle dependent expression.

      We have added two new figures (S2 and S3) that represent alternative visualizations of the cell-cycle that are not dependent on UMAP. Figure S2 shows plots of different pairs of principal components, with each cell colored by its assigned cell-cycle stage. We do not observe a periodic pattern in the first 3 principal components as the reviewer expected, but when we explore the first 6 principal components, we see combinations of components that clearly separate the cell cycle clusters. We emphasize that the clusters were generated using the Louvain algorithm and assigned to cell-cycle stages using marker genes, and that UMAP was used only for visualization.

      We could not create a figure similar to Macosko et al. because of differences between the cell cycle categories we used and those of Spellman et al (PMID 9843569). We instead created Figure S3 to address the reviewer's comment. This figure uses a heatmap in a style similar to that of Macosko et al. to display cell-cycle-dependent expression of the 22 genes we used as cell cycle markers across each of the five cell cycle stages (M/G1, G1, G1/S, S, G2/M).

      We have renumbered the supplementary figures after incorporating these two additional supplementary figures into the manuscript.

      Aging, growth rate, and bet-hedging:

      The mention of bet-hedging reminded me of Levy et al (PMID 22589700), where they saw that Tsl1 expression changed as cells aged and that this impacted a cell's ability to survive heat stress. This bet-hedging strategy meant that the older, slower-growing cells were more likely to survive, so I wondered a couple of things. It is possible from single-cell data to identify either an aging, or a growth rate signature? A number of papers from David Botstein's group culminated in a paper that showed that they could use a gene expression signature to predict instantaneous growth rate (PMID 19119411) and I wondered if a) this is possible from single-cell data, and b) whether in the slower growing cells, they see markers of aging, whether these two signatures might impact the ability to detect eQTLs, and if they are detected, whether they could in some way be accounted for to improve detection.

      As mentioned in our preliminary response, we are not sure how to look for gene expression signatures of aging in yeast scRNA-seq data. We believe that the proposed analyses are beyond the scope of the current paper. As noted above, we have made all the data available so that anyone interested can explore these hypotheses.

      AIL vs. F2 segregants:

      I'm curious if the authors have given thought to the trade-offs of developing advanced intercross lines for scRNA-Seq eQTL analysis. My impression is that AIL provides better mapping resolution, but at the expense of having to generate the lines. It might be useful to see some discussion on that.

      We thank the reviewer for the comments. We believe that a discussion of trade-offs between different approaches for constructing mapping populations, such as AIL and F2 segregants, is beyond the scope of this paper.

      10x vs SPLit-Seq

      10x is a well established, but fairly expensive approach for scRNA-Seq - I wondered how the cost of the 10x approach compares to the previously used approach of genotyping segregants and performing bulk RNA-Seq, and how those costs would change if one used SPLiT-Seq (see PMID 38282330).

      We thank the reviewer for the comments. We believe that a discussion of cost trade-offs between 10x and other approaches is beyond the scope of this paper, especially given the rapidly evolving costs of different technologies.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Throughout the results section the authors point to File S1 for additional information. This file is a tarball with about 20 Excel documents in it, each with several sheets embedded. The authors should provide a detailed README describing how to understand the organizations of the files in File S1 and the many embedded sheets in each file. Statements made in the manuscript about File S1 should explicitly direct the reader to a specific spreadsheet and table to refer to.

      We have added an additional README file to the tarball that explains the organization of File S1 and describes the data contained in each sheet. Throughout the text, we now reference specific spreadsheets to assist the reader. In addition, these spreadsheets have been added to a github repository https://github.com/theboocock/finemapping_spreadsheets_single_cell

      Neither of the two GitHub repositories referenced under "Code availability" has adequate documentation that would allow a reader to try and reproduce the analyses presented here. The one entitled https://github.com/joshsbloom/single_cell_eQTL has no functional README, while https://github.com/theboocock/yeast_single_cell_post_analysis is somewhat better but still hard to navigate. Basic information on expected inputs, file formats, file organization, output types, and formats, etc. is required to get any of these pipelines to run and should be provided at a minimum.

      We thank the reviewer for the comment. In response, we have refactored both GitHub repositories and added extensive documentation to improve usability. We updated the versions of software and packages, this has been reflected in the methods section.

      S. cerevisiae strains are preferentially diploid in nature and many genes involved in the mating pathway are differentially regulated in diploids vs haploids. Have the authors explored the fitness effects of the GPA1 82R allele in diploids? What is the dominance relationship between 82W and 82R?

      We thank the reviewer for the comment. In diploid yeast, the mating pathway is repressed, and thus we would not expect there to be any fitness consequences due to the presence of different alleles of GPA1.

      The diploid expression profiling (page 5 and Table S9) doesn't implicate GPA1; can you the authors comment on this in light of their finding in haploids?

      The mating pathway, including GPA1, is repressed in diploids, and hence the expression of GPA1 cannot be studied in these strains (PMID: 3113739). In addition, allele-specific expression differences only identify cis-regulatory effects. We know that the GPA1 variant results in a protein-coding change, which may or may not influence the levels of mRNA in cis, so that even if GPA1 were expressed in diploids, there would be no expectation of an allele-specific difference in expression.

      With respect to the candidate CYR1 QTL -- note that strains with compromised Cyr1 function also generally show increased sporulation rates and/or sporulation in rich media conditions (cAMP-PKA signaling represses sporulation). Is this the case in diploids with the CBS2888 allele at CYR1? If the CBS2888 allele is a CYR1 defect one might expect reduced cAMP levels. It is possible to estimate adenylate cyclase levels using a fairly straightforward ELISA assay. This would provide more convincing evidence of the causal mechanism of the alleles identified.

      We thank the reviewer for the comment, and we agree that a functional study of the CYR1 alleles would provide more convincing evidence for the causal mechanism of the connection between cell cycle occupancy, cAMP levels, and growth. However, we believe that the proposed experiments are beyond the scope of our current study. The evidence we provide is sufficient to establish that CYR1 is a strong candidate gene for the eQTL hotspot.

      Re: CYR1 candidate QTL -- The authors should reference the work of [Patrick Van Dijck] (https://pubmed.ncbi.nlm.nih.gov/?sort=date&term=Van+Dijck+P&cauthor_id= 20924200) and [Johan M Thevelein] (https://pubmed.ncbi.nlm.nih.gov/?sort=date&term=Thevelein+JM&cauth or_id=20924200) on CYR1 allelic variation, and other papers besides the Matsumoto/ Ishikawa papers, as the effects of cAMP-PKA signaling on stress can be quite variable. cAMP pathway variants, including in CYR1, have popped up in quite a few other yeast QTL mapping and experimental evolution papers. These should be referenced as well.

      We thank the reviewer for these references; we have added a comment about the relationship between stress tolerance and CYR1 variation, and cited the relevant references accordingly.

      Figure S10 - the subfigure showing the frequency of the GPA 82R compared to 82W suggests a fairly large and deleterious fitness effect of this allele; on the order of 7-8% fewer cells per cell cycle stage than the 82W allele. Can the authors reconcile this with the more modest growth rate effect they report on page 8?

      Figure S12C displays the allele frequency of the 82R allele across the cell cycle in the single-cell data from allele-replacement strains. These strains were grown separately and processed using two individual 10x chromium runs. The resulting sequenced library had 11,695 cells with the 82R allele and 14,894 cells with the 82W allele. The 7-8% difference in the number of cells is due to slight differences in the number of captured cells per run, not due to growth differences, because we attempted to pool cells in equal numbers from separate mid-log cultures.

      The proportion of cells in G1 increases by ~3% in strains with the 82R allele relative to the baseline proportion of cells in the experiment, which, to the reviewers point, is still larger than the ~1% growth difference we observed. Cell cycle occupancy is a compositional phenotype. As shown in figure S12C, the 82R variant increases the fraction of cells in G1 and slightly decreases the fraction of cells in M/G1. There is no obvious expectation for quantitatively translating a change in cell cycle occupancy to a change in growth rate.

      The authors refer to the Lang et al. 2009 paper w/respect to GPA1 variant S469I but that paper seems to have explored a different GPA1 allele, GPA1-G1406T, with respect to growth rates.

      We thank the reviewer for their comment. The S469I variant is the same as the G1406T variant, one denoting the amino acid change at position 469 in the protein and the other denoting the corresponding nucleotide change at position 1406 in the DNA coding sequence. We have altered the text to make this clear to the reader.

      Reviewer #2 (Recommendations For The Authors):

      I make no recommendations as to additional work for the authors. The manuscript is complete. I suggested some things I would like to see in my review, but it's up to them to decide whether they think any of those would further enhance the manuscript.

      However, I do have I have some pedantic formatting notes:

      - Microliters are variously presented as uL, ul, and µl - it should be µL

      - Similarly, milliliters are presented as ml and ML - it should be mL

      - Also, there should be a space between the number and the unit, e.g. 10 µL

      - Some gene names in the manuscript are not italicized in all instances, e.g., GPA1

      We thank the reviewer for these formatting suggestions, we have made these changes throughout the text.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Response to Public Reviews:

      We thank the reviewers for their kind comments have implemented many of the suggestion their suggestions. Our paper has greatly benefited from their advice.  Like Reviewer 1, we acknowledge that while the exact involvement of Ih in allowing smooth transitions is likely not universal across all systems, our demonstration of the ways in which such currents can affect the dynamics of the response of complex rhythmic motor networks provides valuable insight. To address the concerns of Reviewer 2, we included a sentence in the discussion to highlight the fact that cesium neither increased the pyloric frequency nor caused consistent depolarization in intracellular recordings. We also highlighted that these observations suggest both that cesium is not indirectly raising [K+]outside and support the conclusion that the effects of cesium are primarily through blockade of Ih rather than other potassium channels.

      Reviewer 3 raised some important points about modeling. While the lab has models that explore the effects of temperature on artificial triphasic rhythms, these models do not account for all the biophysical nuances of the full biological system. We have limited data about the exact nature of temperature-induced parameter changes and the extent to which these changes are mediated by intrinsic effects of temperature on protein structure versus protein interactions/modification by processes such as phosphorylation. With respects to the A current, Tang et al., 2010 reported that the activation and inactivation rates are differentially temperature sensitive but we do not have the data to suggest whether or not the time courses of such sensitivities are different. As such, we focus our discussion on the properties we know are modulated by temperature, i.e. activation rates. Within the discussion we now include the suggestion that future, more comprehensive modeling may be appropriate to further elucidate the ways in which reducing Ih may produce the here reported experimentally observed effects.

      Reviewer #1 (Recommendations For The Authors):

      Suggested revisions:

      A figure showing examples of the voltage-clamp traces for the critical measurements of the extent of Ih block by 5 mM CsCl in PD and LP neurons at the temperature extremes in these preparations is not shown, and the authors should consider including such a figure, perhaps as a supplemental figure.

      We have added Supplemental Figure 1 containing voltage-clamp traces demonstrating the extent of Ih block by 5mM CsCl in PD and LP neurons at 11 and 21°C.  Due to technical concerns, different preparations were used in the measurements at 11°C and 21°C, but the point that the H-current is reduced is demonstrated in all cases.

      Reviewer #2 (Recommendations for The Authors):

      Specific (Minor) Comments:

      (1) Line 83: In Cs+ "at 11°C, the pyloric frequency was significantly decreased compared to control conditions (Saline: 1.2± 0.2 Hz; Cs+ 0.9± 0.2 Hz)".

      As above, the authors often report that cesium generally reduces pyloric frequency. Figure 5A demonstrates this action quite nicely. However, cesium's effect on pyloric frequency at 11°C seems less robust in Figure 1C. Why the discrepancy?

      There is variability in the effects of Cs+ on the pyloric frequency.  As noted, the standard deviation in frequency in both conditions is 0.2Hz.  As such, there are some cases in which the initial frequency drop in Cs+ compared to control was relatively small.  1C is one such case, but was selected as an example because of its clear reduction in temperature sensitivity. 

      (2) I don't understand what the arrows/dashed lines are trying to convey in Figure 3C.

      The arrows/dashed lines represent the criteria used to define a cycle as “decreasing in frequency” (Temperature Increasing) or “increasing in frequency” (Temperature Stable).  We have amended lines 130 and 137 in the text to hopefully clarify this point, as well as the figure legend.

      (3) Lines 118/168. The description of cesium's specific action on the depolarizing portion of PD activity is a bit confusing. In my mind, "depolarization phase" refers to the point at which PD is most depolarized. Perhaps restating the phrase to "elongation of the depolarizing trajectory" is less confusing. The authors may also want to consider labeling this trajectory in Figure 2C.

      We have changed “depolarization phase” to “depolarizing phase” to highlight that this is the period during which the cell is depolarizing, rather than at its most depolarized.  We consider the plateau of the slow wave and spiking (the point at which PD is most depolarized) to be the “bursting phase”.  We have labeled these phases in Figure 2C as suggested.

      (4) Figure 3C legend: a few words seem to be missing. I suggest "the change in mean frequency was more likely TO decrease IN Cs+ than in saline".

      Thank you for catching this typo, it has been corrected.

      (5) Line 165: Awkward phrasing. “In one experiment, the decrease in frequency while temperature increased and subsequent increase in frequency after temperature stabilized was particularly apparent in Cs+ PTX”.

      How about: “One Cs+ PTX experiment wherein elevating the temperature transiently decreased pyloric frequency is shown in Figure 4F.”

      We have amended this sentence to read, “One Cs++PTX experiment in which elevating the temperature produced a particularly pronounced transient decrease in frequency is shown in Figure 4F.”

      (6) Line 186: Awkward phrasing. "LP OFF was also significantly advanced in Cs+, although duty cycle (percent of the period a neuron is firing) was preserved".

      The use of the word "although" seems a bit strange. If both LP onset and LP offset phase advance by the same amount, then isn't an unchanged duty cycle expected?

      “Although” has been changed to “and subsequently”.

      Reviewer #3 (Recommendations For The Authors):

      Major comments:

      (1) I know the Marder lab has detailed models of the pyloric rhythm. I am not saying they have to add modeling to this already extensive and detailed paper, but it would be useful to know how much of these temperature effects have been modeled, for example in the following locations.

      (2) Line 259 - "Mathematically..." - Is there a computational model of H current that has shown this decrease in frequency in pyloric neurons? If you are working on one for the future, you could mention this.

      There is not currently a model in which the reduction of the H-current results in the non-minimum phase dynamics in the frequency response to temperature seen experimentally. It should be noted that our existing models of pyloric activity responses to temperature are not well suited to investigate such dynamics in their current iterations.  Further work is necessary to demonstrate the principles observed experimentally in computational modeling, and we have added a sentence to the paper to reflect this point (Line 268).

      (3) Line 318 - "therefore it remains unclear" - I thought they had models of the circuit rhythmicity. Do these models include temperature effects? Can they comment on whether their models of the circuit show an opposite effect to what they see in the experiment? I'm not saying they have to model these new effects as that is probably an entirely different paper, but it would be interesting to know whether current models show a different effect.

      We have some models of the pyloric response to temperature, but these models were specifically selected to maintain phase across the range of temperature.  When Ih was reduced in these models, a variety of effects on phase and duty cycle were seen.  These models were selected to have the same key features of behavior as the pyloric rhythm, but do not capture all the biophysical nuances of the complete system, and therefore should not necessarily be expected to reflect the experimental findings in their current iterations.  Furthermore, these models are meant to have temperature as a static, rather than dynamic input, and thus are ill-suited to examine the conditions of our experiments.  The models in their current state are not sufficiently relevant to these experimental findings that we they can illuminate the present paper `2.

      (4) "If deinactivation is more accelerated or altered by temperature than inactivation...While temperature continued to change, the difference in parameters would continue to grow" - This is described as a difference in temperature sensitivity, but it seems like it is also a function of the time course of the response to change in temperature (i.e. the different components could have the same final effect of temperature but show a different time course of the change).

      We know from Tang et al, 2010, that activation and inactivation rates of the A current are differentially temperature sensitive. We have no evidence to suggest that the time course of the response to temperature of various parameters differ.  The physical actions of temperature on proteins are likely to be extremely rapid, making a time course difference on the order of tens of seconds less unlikely, though not impossible. Modeling of the biophysics might illuminate the relative plausibility of these different mechanisms of action, but we feel that our current suggested explanation is reasonable based on existing information.

      (5) Is it known how temperature is altering these channel kinetics? Is it via an intrinsic rearrangement of the protein structure, or is it a process that involves phosphorylation (that could explain differences in time course?). Some mention of the mechanism of temperature changes would be useful to readers outside this field.

      It is not known exactly how temperature alters channel parameters.  Invariably some, if not all, of it is due to an intrinsic rearrangement of protein structure, and our current models treat all parameter changes as an instantaneous consequence.  However, it is possible that some effects of temperature are due to longer timescale processes such as phosphorylation or cAMP interactions.  Current work in the lab is actively exploring these questions, but there is no definitive answer. Given that this paper focuses on the phenomenon and plausible biomolecular explanations based on existing data, we have not altered the paper to include more exhaustive  coverage of all the possible avenues by which temperature may alter channel properties.

      Specific comments:

      Title: misspelling of "Cancer" ?

      We are unsure how that extra “w” got into the earliest version of the manuscript and have removed it.

      Line 66 "We used 5mM CsCl" - might mention right up front that this was a bath application of the substance.

      We have altered this line to read “used bath application of 5mM CsCl”.  

      Figure 4 - "The only feedback synapse to the pacemaker kernel neurons, LP to PD, and is blocked by picrotoxin" - I think the word "and" should be removed from this phrase in the figure legend.

      Fixed

      Figure 4 legend - "Reds denote temperature...yellows denote..." - I think it should be "Red dots denote temperature...yellow dots denote...".

      Done

      Figure 4B - Why does the change in frequency in cesium look so different in Figure 4B compared to Figure 1C or Figure 3B? In the earlier figures, the increase of frequency is smaller but still present in cesium, whereas, in Figure 4B, cesium seems to completely block the increase in frequency. I'm not sure why this is different, but I guess it's because 3B and 4B are just mean traces from single experiments. Presumably, 4B is showing an experiment in which the cesium was subsequently combined with picrotoxin?

      Figures 1C, 3B, and 4B are indeed all from different single experiments. As acknowledged in our concluding paragraph, there was substantial variability in the exact response of the pyloric rhythm to temperature while in cesium.  The most consistent effect was that the difference in frequency between cesium and saline at a particular temperature increased, as demonstrated across 21 preparations in Figure 1D. It may be noted in Figure 1E that the Q10 was not infrequently <1, meaning that there was a net decrease in frequency as temperature increased in some experiments such as seen in the example of Figure 4B.  The “fold over” (initial increase in steady-state frequency with temperature, then decrease at higher temperatures) has been observed at higher temperatures (typically around 23-30 degrees C) even under control conditions but has not been highlighted in previous publications.  The example in 4B was chosen because it demonstrated both the similarity in jags between Cs+ and Cs++PTX and an overall decrease in temperature sensitivity, even though in this instance the steady-state change in frequency with temperature was not monotonic. 

      Figure 6A - "Phase 0 to 1.0" - The y-axis should provide units of phase. Presumably, these are units of radians so 1.0=2*pi radians (or 360 degrees, but probably best to avoid using degrees of phase due to confusion with degrees of temperature).

      Phase, with respect to pyloric rhythm cycles, does not traditionally have units as it is a proportion rather than an angle. As such, we have not changed the figure.

      Line 275 - "the pacemaker neuron can increase" - Does this indicate that the main effects of H current are in the follower neurons (i.e. LP and PY versus the driver neuron PD)?

      Not necessarily.  We posit in the next paragraph that the effect of the H current on the temperature sensitivity could be due to its phase advance of LP, but that phase advance of LP is not particularly expected to increase frequency.  We favor the possibility that temperature increases Ih in the pacemaker, which in turn advances the PRC of the rhythm, allowing the frequency increase seen under normal conditions.  In Cs+, this advance does not occur, resulting in the lower temperature sensitivity.  In Cs++PTX, the lack of inhibition from LP means compensatory advance of the pacemaker PRC by Ih is unnecessary to allow increased frequency.

      Line 285 - "either increase frequency have no effect" - Is there a missing "or" in this phrase?

      Thank you, we have added the “or”.

    1. Author response:

      The following is the authors’ response to the current reviews.

      Reviewer 2:

      In addition, it is still unacceptable for me that the number of ovulated oocytes in mice at 6 months of age is only one third of young mice (10 vs 30; Fig. S1E). The most of published literature show that mice at 12 months of age still have ~10 ovulated oocytes.

      We disagree with the reviewer’s comment, and the concerns raised were not shared by the other reviewers.  We have reported our data with full transparency (each data point is plotted). In the current study, we observed an intermediate phenotype in gamete number (assessed by both ovarian follicle counts and ovulated eggs) when comparing 6 month old mice to 6 week or 10 month old mice; this is as expected. It is well accepted that follicle counts are highly mouse strain dependent.  Although the reviewer mentions that mice at 12 months have ~10 ovulated oocytes, no actual references are provided nor are the mouse strain or other relevant experimental details mentioned.  Therefore, we do not know how these quoted metrics relate to the female FVB mice used in our current study.   As clearly explained and justified in our manuscript, we used mice at 6 months and 10 months to represent a physiologic aging continuum. 

      Moreover, based on the follicle counting method used in the present study (Fig. S1D), there are no antral follicles observed in mice at 6 months and 10 months of age, which is not reasonable.

      This statement is incorrect. Antral follicles were present at 6 and 10 months of age, but due to the scale of the y-axis and the normalization of follicle number/area in Fig. S1D, the values are small.  The absolute number of antral follicles per ovary (counted in every 5th section) was 31.3 ± 3.8 follicles for 6-week old mice, 9.3 ± 2.3 follicles for 6-month old mice, and 5.3 ± 1.8 follicles for 10-month old mice.  Moreover, it is important to note that these ovaries were not collected in a specific stage of the estrous cycle, so the number of antral follicles may not be maximal.  In addition, as described in the Materials and Methods, antral follicles were only counted when the oocyte nucleus was present in a section to avoid double counting.  Therefore, this approach (which was applied consistently across samples) could potentially underestimate the total number.


      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This manuscript by Bomba-Warczak describes a comprehensive evaluation of long-lived proteins in the ovary using transgenerational radioactive labelled 15N pulse-chase in mice. The transgenerational labeling of proteins (and nucleic acids) with 15N allowed the authors to identify regions enriched in long-lived macromolecules at the 6 and 10-month chase time points. The authors also identify the retained proteins in the ovary and oocyte using MS. Key findings include the relative enrichment in long-lived macromolecules in oocytes, pregranulosa cells, CL, stroma, and surprisingly OSE. Gene ontology analysis of these proteins revealed enrichment for nucleosome, myosin complex, mitochondria, and other matrix-type protein functions. Interestingly, compared to other post-mitotic tissues where such analyses have been previously performed such as the brain and heart, they find a higher fractional abundance of labeled proteins related to the mitochondria and myosin respectively.

      Response: We thank the reviewer for this thoughtful summary of our work.  We want to clarify that our pulse-chase strategy relied on a two-generation stable isotope-based metabolic labelling of mice using 15N from spirulina algae (for reference, please see (Fornasiero & Savas, 2023; Hark & Savas, 2021; Savas et al., 2012; Toyama et al., 2013)).  We did not utilize any radioactive isotopes.

      Strengths:

      A major strength of the study is the combined spatial analyses of LLPs using histological sections with MS analysis to identify retained proteins.

      Another major strength is the use of two chase time points allowing assessment of temporal changes in LLPs associated with aging.

      The major claims such as an enrichment of LLPs in pregranulosa cells, GCs of primary follicles, CL, stroma, and OSE are soundly supported by the analyses, and the caveat that nucleic acids might differentially contribute to this signal is well presented.

      The claims that nucleosomes, myosin complex, and mitochondrial proteins are enriched for LLPs are well supported by GO enrichment analysis and well described within the known body of evidence that these proteins are generally long-lived in other tissues.

      Weaknesses:

      Comment 1: One small potential weakness is the lack of a mechanistic explanation of if/why turnover may be accelerating at the 6-10 month interval compared to 1-6.

      Response 1: At the 6-month time point, we detected more long lived proteins than the 10 month time point in both the ovary and the oocyte.  We anticipated this because proteins are degraded over time, and substantially more time has elapsed at the later time point.  Moreover, at the 6–10-month time point, age-related tissue dysfunction is already evident in the ovary.  For example, in 6-9 month old mice, there is already a deterioration of chromosome cohesion in the egg which results in increased interkinetochore distances (Chiang et al., 2010), and by 10 months, there are multinucleated giant cells present in the ovarian stroma which is consistent with chronic inflammation (Briley et al., 2016).  Thus, the observed changes in protein dynamics may be another early feature of aging progression in the ovary.  

      Comment 2: A mild weakness is the open-ended explanation of OSE label retention. This is a very interesting finding, and the claims in the paper are nuanced and perfectly reflect the current understanding of OSE repair. However, if the sections are available and one could look at the spatial distribution of OSE signal across the ovarian surface it would interesting to note if label retention varied by regions such as the CLs or hilum where more/less OSE division may be expected. 

      Response 2: We agree that the enrichment of long-lived molecules in the OSE is interesting. To make interpretable conclusions about the dynamics of long-lived molecules in the OSE, we would need to generate a series of samples at precise stages of the estrous cycle or ideally across a timecourse of ovulation to capture follicular rupture and repair.  These samples do not currently exist and are beyond the scope of this study. However, this idea is an important future direction and it has been added to the discussion (lines 221-223). Furthermore, from a practical standpoint, MIMS imaging is resource and time intensive. Thus, we are not able to readily image entire ovarian sections.  Instead, we focused on structures within the ovary and took select images of follicles, stroma, and OSE.  We, therefore, do not have a comprehensive series of images of the OSE from the entire ovarian section for each mouse analyzed.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript by Bomba-Warczak et al. applied multi-isotope imaging mass spectrometry (MIMS) analysis to identify the long-lived proteins in mouse ovaries during reproductive aging, and found some proteins related to cytoskeletal and mitochondrial dynamics persisting for 10 months.

      Response: We thank the reviewer for their summary and feedback.

      Strengths:

      The manuscript provides a useful dataset about protein turnover during ovarian aging in mice.

      Weaknesses:

      Comment 1: The study is pretty descriptive and short of further new findings based on the dataset. In addition, some results such as the numbers of follicles and ovulated oocytes in aged mice are not consistent with the published literature, and the method for follicle counting is not accurate. The conclusions are not fully supported by the presented evidence.

      Response 1: We agree with the reviewer that this study is descriptive. Our goal, as stated, was to use a discovery-based approach to define the long-lived proteome of the ovary and oocyte across a reproductive aging continuum.  As the prominent aging researcher, Dr. James Kirkland, stated: “although ‘descriptive’ is sometimes used as a pejorative term…descriptive or discovery research leading to hypothesis generation has become highly sophisticated and of great relevance to the aging field (Kirkland, 2013).”  We respectfully disagree with the reviewer that our study is short of new findings. In fact, this is the first time that a stable two-generation stable isotope-based metabolic labelling of mice in combination with two different state-of-the-art mass spectrometry methods has been used to identify and localize long lived molecules in the ovary and oocyte along this particular reproductive aging continuum in an unbiased manner.  We have identified proteins groups that were previously not known to be long lived in the ovary and oocyte.  Our hope is that this long-lived proteome will become an important hypothesis-generating resource for the field of reproductive aging.

      The age-dependent decline in number of follicles and eggs ovulated in mice has been well established by our group as well as others (Duncan et al., 2017; Mara et al., 2020).  Thus, we are unclear about the reviewer’s comments that our results are not consistent with the published literature.  The absolute numbers of follicles and eggs ovulated as well as the rate of decline with age are highly strain dependent.  Moreover, mice can have a very small ovarian reserve and still maintain fertility (Kerr et al., 2012).  In our study, we saw a consistent age-dependent decrease in the ovarian reserve (Figure 1 – figure supplement 1 D), the number of oocytes collected from large antral follicles following hyperstimulation with PMSG (used for LC-MS/MS), and the number of eggs collected from the oviduct following hyperstimulation and superovulation with PMSG and hCG (Figure 1 – figure supplement 1 E and F).  In all cases, the decline was greater in 10 month old compared to 6 month old mice demonstrating a relative reproductive aging continuum even at these time points.

      Our research team has significant expertise in follicle classification and counting as evidenced by our publication record (Duncan et al., 2017; Kimler et al., 2018; Perrone et al., 2023; Quan et al., 2020).  We used our established methods which we have further clarified in the manuscript text (lines 395-397).  Follicle counts were performed on every 5th tissue section of serial sectioned ovaries, and 1 ovary from 3 mice per timepoint were counted. Therefore, follicle counts were performed on an average of 48-62 total sections per ovary. The number of follicles was then normalized per total area (mm2) of the tissue section, and the counts were averaged. Figure 1 – figure supplement 1 C and D represents data averaged from all ovarian sections counted per mouse.   It is important to note that the same criteria were applied consistently to all ovaries across the study, and thus regardless of the technique used, the relative number of follicles or oocytes across ages can be compared.

      Reviewer #3 (Public Review):

      Summary:

      In this study, Bomba-Warczak et al focused on reproductive aging, and they presented a map for long-lived proteins that were stable during reproductive lifespan. The authors used MIMS to examine and show distinct molecules in different cell types in the ovary and tissue regions in a 6 month mice group, and they also used proteomic analysis to present different LLPs in ovaries between these two timepoints in 6-month and 10-month mice. The authors also examined the LLPs in oocytes in the 6-months mice group and indicated that these were nuclear, cytoskeleton, and mitochondria proteins.

      Response: We thank the reviewer for their summary and feedback.

      Strengths:

      Overall, this study provided basic information or a 'map' of the pattern of long-lived proteins during aging, which will contribute to the understanding of the defects caused by reproductive aging.

      Weaknesses:

      Comment 1: The 6-month mice were used as an aged model; no validation experiments were performed with proteomics analysis only.  

      Response 1:  We did not select the 6-month time point to be representative of the “aged model” but rather one of two timepoints on the reproductive aging continuum – 6 and 10 months.  In the manuscript (Figure 1 – figure supplement 1) we have demonstrated the relevance of the two timepoints by illustrating a decrease in follicle counts, number of fully grown oocytes collected, and number of eggs ovulated as well as a tendency towards increased stromal fibrosis (highlighted in the main text lines 78-85).  Inclusion of the 6-month timepoint ultimately turned out to be informative and essential as many long-lived proteins were absent by the 10 month timepoint. These results suggest that important shifts in the proteome occur during mid to advanced reproductive age.  The relevance of these timepoints is mentioned in the discussion (lines 247-270).

      Two independent mass spectrometry approaches (MIMS and LC-MS/MS) were used to validate the presence of long-lived macromolecules in the ovary and oocyte. Studies focused on the role of specific long-lived proteins in oocyte and ovarian biology as well as how they change with age in terms of function, turnover, and modification are beyond the scope of the current study but are ongoing.  We have acknowledged these important next steps in the manuscript text (lines 286-288, 311-312).

      It is important to note, that oocytes are biomass limited cells, and their numbers decrease with age.  Thus, we had to select ages where we could still collect enough from the mice available to perform LC-MS/MS. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Comment 1: The writing and figures are beautiful - it would be hard to improve this manuscript.

      Response 1: We greatly appreciate this enthusiastic evaluation of our work.

      Comment 2: In Fig S1E/F it would help to list the N number here. Why are there 2 groups at 6-12 wk?

      Response 2:  We did not have 6 month and 10-month-old mice available at the same time to be able to run the hyperstimulation and superovulation experiment in parallel.  Therefore, we performed independent experiments comparing the number of eggs collected from either 6-month-old or 10 month old mice relative to 6-12 week old controls.  In each trial, eggs were collected from pooled oviducts from between 3-4 mice per age group, and the average total number of eggs per mouse was reported.  Each point on the graph corresponds to the data from an individual trial, and two trials were performed.  This has been clarified in the figure legend (lines 395-397).  Of note, while addressing this reviewer’s comments, we noticed that we were missing Materials and Methods regarding the collection of eggs from the oviduct following hyperstimulation and superovulation with PMSG and hCG.  This information has now been added in Methods Section, lines 477-481.

      Comment 3: The manuscript would benefit from an explanation of why the pups were kept on a 1-month N15 diet after birth, since the oocytes are already labeled before birth, and granulosa at most by day 3-4. Would ZP3 have not been identified otherwise?

      Response 3:   The pups used in this study were obtained from fully labeled female dams that were maintained on an15N diet.  These pups had to be kept with their mothers through weaning.  To limit the pulse period only through birth, the pups would have had to be transferred to unlabeled foster mothers.  However, this would have risked pup loss which would have significantly impacted our ability to conduct the studies given that we only had 19 labeled female pups from three breeding pairs.  We have clarified this in the manuscript text in lines 78-80.  It is hard to know, without doing the experiment, whether we would have detected ZP3 if we only labeled through birth.  The expression of ZP3 in primordial follicles, albeit in human, would suggest that this protein is expressed quite early in development.

      Comment 4: What is happening to the mitochondria at 6-10 months? Does their number change in the oocyte? Is there a change in the rate of fission? Any chance to take a stab at it with these or other age-matched slides?

      Response 4:  The reviewer raises an excellent point.  As mentioned previously in the Discussion (lines 290-301), there are well documented changes in mitochondrial structure and function in the oocyte in mice of advanced reproductive age.  However, there is a paucity of data on the changes that may happen at earlier mid-reproductive age time points.  From the oocyte mitochondrial proteome perspective, our data demonstrate a prominent decline in the persistence of long-lived proteins between 6 and 10 months, and this occurs in the absence of a change in the total pool of mitochondrial proteins (both long and short lived populations) as assessed by spectral counts or protein IDs (figure below).  These data, which we have added into Figure 3 – figure supplement 1 and in the manuscript text (lines 164-170) are suggestive of similar numbers of mitochondria at these two timepoints. It would be informative to do a detailed characterization of oocyte mitochondrial structure and function within this window to see if there is a correlation with this shift in long lived mitochondrial proteins.  Although this analysis is beyond the scope of the current manuscript, it is an important next line of inquiry which we have highlighted in the manuscript text (lines 255-257 and 311-312).

      Reviewer #2 (Recommendations For The Authors):

      Several concerns are raised as shown below.

      Comment 1: In Fig. 2F, it is surprising that ZP3 disappeared in the ovary from mice at the age of 10 months by MIMS analysis, because quite a few oocytes with intact zona pellucida can still be obtained from mice at this age. Notably, ZP would not be renewed once formed.

      Response 1: To clarify, Figure 2F shows LC-MS/MS data and not MIMS data.  As mentioned in the Discussion, the detection of long-lived pools of ZP3 at 6 months cannot be derived from newly synthesized zona pellucidae in growing follicles because they would not have been present during the pulse period.  The only way we could detect ZP3 at 6 months is if it forms a primitive zona scaffold in the primordial follicle or if ZPs from atretic follicles of the first couple of waves of folliculogenesis incorporate into the extracellular matrix of the ovary.  The lack of persistence of ZP3 at 10 months could be due to protein degradation. Should ZP3 indeed form a primitive zona, its loss at 10 months would be predicted to result in poor formation of a bona fide zona pellucida upon follicle growth.  Interestingly, aging has been associated with alterations in zona pellucida structure and function.   These data open novel hypotheses regarding the zona pellucida (e.g. a primitive zona scaffold and part of the extracellular matrix) and will require significant further investigation to test. These points are highlighted in the Discussion lines 227-245.

      Comment 2: To determine whether those proteins that can not be identified by MIMS at the time point of 10 months are degraded or renewed, the authors should randomly select some of them to examine their protein expression levels in the ovary by immunoblotting analysis.

      Response 2: To clarify, proteins were identified by LC-MS/MS and not MIMS which was used to visualize long lived macromolecules.   Each protein will be comprised of old pools (15N containing) and newly synthesized pools (14N containing).  Degradation of the old pool of protein does not mean that there will be a loss of total protein.  Moreover, immunoblotting cannot distinguish old and newly synthesized pools of protein. Where overall peptide counts are listed for each protein identified at both time points.  As peptides derive from proteins, the table provided with the manuscript reflects what immunoblotting would, but on a larger and more precise scale.

      Comment 3: I think those proteins that can be identified by MIMS at the time point of 6 months but not 10 months deserve more analyses as they might be the key molecules that drive ovarian aging.

      Response 3:  This comment conflicts with comment 2 from Reviewer #3 (Recommendations For The Authors).  This underscores that different researchers will prioritize the value and follow up of such rich datasets differently.  We agree that the LLP identified at 6 months are of particular interest to reproductive aging, and we are planning to follow up on these in future studies.

      Comment 4:  Figure 1 – figure supplement 1 C-F, compared with the published literature, the numbers of follicles at different developmental stages and ovulated oocytes at both ages of 6 months and 10 months were dramatically low in this study. For 6-month-old female mice, the reproductive aging just begins, thus these numbers should not be expected to decrease too much. In addition, follicle counting was carried out only in an area of a single section, which is an inaccurate way, because the numbers and types of follicles in various sections differ greatly. Also, the data from a single section could not represent the changes in total follicle counts.

      Response 4: We have addressed these points in response to Comment 1 in the Reviewer #2 Public Review, and corresponding changes in the text have been noted.    

      Comment 5:  The study lacks follow-up verification experiments to validate their MIMS data.

      Response 5: Two independent mass spectrometry approaches (MIMS and LC-MS/MS) were used to validate the presence of long-lived macromolecules in the ovary and oocyte. Studies focused on the role of specific long-lived proteins in oocyte and ovarian biology as well as how they change with age in terms of function, turnover, and modification are beyond the scope of the current study but ongoing.  We have acknowledged these important next steps in the manuscript text (lines 286-288 and 311-312).

      Reviewer #3 (Recommendations For The Authors):

      Comment 1: The authors used the 6-month mice group to represent the aged model, and examined the LLPs from 1 month to 6 months. Indeed, 6-month-old mice start to show age-related changes; however, for the reproductive aging model, the most widely accepted model is that 10-month-old age mice start to show reproductive-related changes and 12-month-old mice (corresponding to 35-40 year-old women) exhibit the representative reproductive aging phenotypes. Therefore, the data may not present the typical situation of LLPs during reproductive aging.

      Response 1: As described in the response to Comment 1 in the Reviewer #3 Public Review, there were clear logistical and technical feasibility reasons why the 6 month and 10-month timepoints were selected for this study.  Importantly, however, these timepoints do represent a reproductive aging continuum as evidenced by age-related changes in multiple parameters.  Furthermore, there were ultimately very few LLPs that remained at 10 months in both the oocyte and ovary, so inclusion of the 6-month time point was an important intermediate.  Whether the LLPs at the 6-month timepoint serve as a protective mechanism in maintaining gamete quality or whether they contribute to decreased quality associated with reproductive aging is an intriguing dichotomy which will require further investigation.  This has been added to the discussion (lines 247-257).

      Comment 2:  Following the point above, the authors examined the ovaries in 6 months and 10 months mice by proteomics, and found that 6 months LLPs were not identical compared with 10 months, while there were Tubb5, Tubb4a/b, Tubb2a/b, Hist2h2 were both expressed at these two time points (Fig 2B), why the authors did not explore these proteins since they expressed from 1 month to 10 months, which are more interesting.

      Response 2:  The objective of this study was to profile the long-lived proteome in the ovary and oocyte as a resource for the field rather than delving into specific LLPs at a mechanistic level.  That being said, we wholeheartedly agree with the reviewer that the proteins that were identified at both 6 month and 10 months are the most robust and long lived and worthy of prioritizing for further study.  Interestingly, Tubb5 and Tubb4a have high homology to primate-specific Tubb8, and Tubb8 mutations in women are associated with meiosis I arrest in oocytes and infertility (Dong et al., 2023; Feng et al., 2016).  Thus, perturbation of these specific proteins by virtue of their long-lived nature may be associated with impaired function and poor reproductive outcomes.  We have highlighted the importance of these LLPs which are present at both timepoints and persist to at least 10 months in the manuscript text (lines 259-270).

      Comment 3:  The authors also need to provide a hypothesis or explanation as to why LLDs from 6 months LLPs were not identical compared with 10 months.

      Response 3:  We agree that LLDs identified at 10 months should be also identified as long-lived at 6 months. This is a common limitation of mass spectrometry-based proteomics where each sample is prepared and run individually, which introduces variability between biological replicates, especially when it comes to low abundant proteins. It is key to note that just because we do not identify a protein, it does not mean the protein is not there – it merely means that we were not able to detect it in this particular experiment, but low levels of the protein may still be there. To compensate for this known and inherent variability, we have applied stringent filtering criteria where we required long-lived peptides to be identified in an independent MS scan (alternative is to identify peptide in either heavy or light scan and use modeling to infer FA value based on m/z shift), which gave us peptides of highest confidence. Ideally, these experiments would be done using TMT (tandem mass tag) approach. However, TMT-based experiments typically require substantial amount of input (80-100ug per sample) which unfortunately is not feasible with oocytes obtained from a limited number of pulse-chased animals.  We have added this explanation to the discussion (lines 265-270).

      Comment 4:  The reviewer thinks that LLPs from 6 months to 10 months may more closely represent the long-lived proteins during reproductive aging.

      Response 4:  We fully agree that understanding the identity of LLPs between the 6 month and 10 month period will be quite informative given that this is a dynamic period when many of LLPs get degraded and thus might be key to the observed decline in reproductive aging. This is a very important point that we hope to explore in future follow-up studies.

      Comment 5: The authors used proteomics for the detection of ovaries and oocytes, however, there are no validation experiments at all. Since proteomics is mainly for screening and prediction, the authors should examine at least some typical proteins to confirm the validity of proteomics. For example, the authors specifically emphasized the finding of ZP3, a protein that is critical for fertilization.

      Response 5:  Thank you, we agree that closer examination of proteins relevant and critical for fertilization is of importance.  However, a detailed analysis of specific proteins fell outside of the scope of this study which aimed at unbiased identification of long-lived macromolecules in ovaries and oocytes. We hope to continue this important work in near future.

      Comment 6: For the oocytes, the authors indicated that cytoskeleton, mitochondria-related proteins were the main LLPs, however, previous studies reported the changes of the expression of many cytoskeleton and mitochondria-related proteins during oocyte aging. How do the authors explain this contrary finding?   

      Response 6:  Our findings are not contrary to the studies reporting changes in protein expression levels during oocyte aging – the two concepts are not mutually exclusive. The average FA value at 6-month chase for oocyte proteins is 41.3 %, which means that while 41.3% of long-lived proteins pool persisted for 6 months, the other 58.7% has in fact been renewed. With the exception of few mitochondrial proteins (Cmkt2 and Apt5l), and myosins (Myl2 and Myh7), which had FA values close to 100% (no turnover), most of the LLPs had a portion of protein pools that were indeed turned over. Moreover, we included new data analysis illustrating that we identify comparable number of mitochondrial proteins between the two time points, indicating that while the long-lived pools are changing over time, the total content remains stable (Figure 3 – figure supplement 1E-G).

      Comment 7:  The authors also should provide in-depth discussion about the findings of the current study for long-lived proteins. In this study, the authors reported the relationship between these "long-lived" proteins with aging, a process with multiple "changes". Do long-lived proteins (which are related to the cytoskeleton and mitochondria) contribute to the aging defects of reproduction? or protect against aging?

      Response 7: This is a very important comment and one that needs further exploration. The fact is – we do not know at this moment whether these proteins are protective or deleterious, and such a statement would be speculative at this stage of research into LLPs in ovaries and oocytes. Future work is needed to address this question in detail.

      Briley, S. M., Jasti, S., McCracken, J. M., Hornick, J. E., Fegley, B., Pritchard, M. T., & Duncan, F. E. (2016). Reproductive age-associated fibrosis in the stroma of the mammalian ovary. Reproduction, 152(3), 245-260. https://doi.org/10.1530/REP-16-0129

      Chiang, T., Duncan, F. E., Schindler, K., Schultz, R. M., & Lampson, M. A. (2010). Evidence that Weakened Centromere Cohesion Is a Leading Cause of Age-Related Aneuploidy in Oocytes. Current Biology, 20(17), 1522-1528. https://doi.org/10.1016/j.cub.2010.06.069

      Dong, J., Jin, L., Bao, S., Chen, B., Zeng, Y., Luo, Y., Du, X., Sang, Q., Wu, T., & Wang, L. (2023). Ectopic expression of human TUBB8 leads to increased aneuploidy in mouse oocytes. Cell Discov, 9(1), 105. https://doi.org/10.1038/s41421-023-00599-z

      Duncan, F. E., Jasti, S., Paulson, A., Kelsh, J. M., Fegley, B., & Gerton, J. L. (2017). Age-associated dysregulation of protein metabolism in the mammalian oocyte. Aging Cell, 16(6), 1381-1393. https://doi.org/10.1111/acel.12676

      Feng, R., Sang, Q., Kuang, Y., Sun, X., Yan, Z., Zhang, S., Shi, J., Tian, G., Luchniak, A., Fukuda, Y., Li, B., Yu, M., Chen, J., Xu, Y., Guo, L., Qu, R., Wang, X., Sun, Z., Liu, M., . . . Wang, L. (2016). Mutations in TUBB8 and Human Oocyte Meiotic Arrest. N Engl J Med, 374(3), 223-232. https://doi.org/10.1056/NEJMoa1510791

      Fornasiero, E. F., & Savas, J. N. (2023). Determining and interpreting protein lifetimes in mammalian tissues. Trends Biochem Sci, 48(2), 106-118. https://doi.org/10.1016/j.tibs.2022.08.011

      Hark, T. J., & Savas, J. N. (2021). Using stable isotope labeling to advance our understanding of Alzheimer's disease etiology and pathology. J Neurochem, 159(2), 318-329. https://doi.org/10.1111/jnc.15298

      Kerr, J. B., Hutt, K. J., Michalak, E. M., Cook, M., Vandenberg, C. J., Liew, S. H., Bouillet, P., Mills, A., Scott, C. L., Findlay, J. K., & Strasser, A. (2012). DNA damage-induced primordial follicle oocyte apoptosis and loss of fertility require TAp63-mediated induction of Puma and Noxa. Mol Cell, 48(3), 343-352. https://doi.org/10.1016/j.molcel.2012.08.017

      Kimler, B. F., Briley, S. M., Johnson, B. W., Armstrong, A. G., Jasti, S., & Duncan, F. E. (2018). Radiation-induced ovarian follicle loss occurs without overt stromal changes. Reproduction, 155(6), 553-562. https://doi.org/10.1530/REP-18-0089

      Kirkland, J. L. (2013). Translating advances from the basic biology of aging into clinical application. Exp Gerontol, 48(1), 1-5. https://doi.org/10.1016/j.exger.2012.11.014

      Mara, J. N., Zhou, L. T., Larmore, M., Johnson, B., Ayiku, R., Amargant, F., Pritchard, M. T., & Duncan, F. E. (2020). Ovulation and ovarian wound healing are impaired with advanced reproductive age. Aging (Albany NY), 12(10), 9686-9713. https://doi.org/10.18632/aging.103237

      Perrone, R., Ashok Kumaar, P. V., Haky, L., Hahn, C., Riley, R., Balough, J., Zaza, G., Soygur, B., Hung, K., Prado, L., Kasler, H. G., Tiwari, R., Matsui, H., Hormazabal, G. V., Heckenbach, I., Scheibye-Knudsen, M., Duncan, F. E., & Verdin, E. (2023). CD38 regulates ovarian function and fecundity via NAD(+) metabolism. iScience, 26(10), 107949. https://doi.org/10.1016/j.isci.2023.107949

      Quan, N., Harris, L. R., Halder, R., Trinidad, C. V., Johnson, B. W., Horton, S., Kimler, B. F., Pritchard, M. T., & Duncan, F. E. (2020). Differential sensitivity of inbred mouse strains to ovarian damage in response to low-dose total body irradiationdagger. Biol Reprod, 102(1), 133-144. https://doi.org/10.1093/biolre/ioz164

      Savas, J. N., Toyama, B. H., Xu, T., Yates, J. R., 3rd, & Hetzer, M. W. (2012). Extremely long-lived nuclear pore proteins in the rat brain. Science, 335(6071), 942. https://doi.org/10.1126/science.1217421

      Toyama, B. H., Savas, J. N., Park, S. K., Harris, M. S., Ingolia, N. T., Yates, J. R., 3rd, & Hetzer, M. W. (2013). Identification of long-lived proteins reveals exceptional stability of essential cellular structures. Cell, 154(5), 971-982. https://doi.org/10.1016/j.cell.2013.07.037

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      This is a fine paper that serves the purpose to show that the use of light sheet imaging may be used to provide whole brain imaging of axonal projections. The data provided suggest that at this point the technique provides lower resolution than with other techniques. Nonetheless, the technique does provide useful, if not novel, information about particular brain systems. 

      Strengths: 

      The manuscript is well written. In the introduction a clear description of the functional organization of the barrel cortex is provided provides the context for applying the use of specific Cre-driver lines to map the projections of the main cortical projection types using whole brain neuroanatomical tracing techniques. The results provided are also well written, with sufficient detail describing the specifics of how techniques were used to obtain relevant data. Appropriate controls were done, including the identification of whisker fields for viral injections and determination of the laminar pattern of Cre expression. The mapping of the data provides a good way to visualize low resolution patterns of projections. 

      Weaknesses: 

      (1) The results provided are, as stated in the discussion, "largely in agreement with previously reported studies of the major projection targets". However it must be stated that the study does not "extend current knowledge through the high sensitivity for detecting sparse axons, the high specificity of labeling of genetically defined classes of neurons and the brain wide analysis for assigning axons to detailed brain regions" which have all been published in numerous other studies. ( the allen connectivity project and related papers, along with others). If anything the labeling of axons obtained with light sheet imaging in this study does not provide as detailed mapping obtained with other techniques. Some detail is provided of how the raw images are processed to resolve labeled axons, but the images shown in the figures do not demonstrate how well individual axons may be resolved, of particular interest would be to see labeling in terminal areas such as other cortical areas, striatum and thalamus. As presented the light sheet imaging appears to be rather low resolution compared to the many studies that have used viral tracing to look at cortical projections from genetically identified cortical neurons. 

      We agree with the reviewer that the resolution of imaging should be further improved in future studies, as also mentioned in the original manuscript. On P. 17 of the revised manuscript we write “Probably most important for future studies is the need to increase the light-sheet imaging resolution perhaps combined with the use of expansion microscopy to provide brain-wide micron-resolution data (Glaser et al., 2023; Wassie et al., 2019).” However, even at somewhat lower resolution, through bright sparse labelling, individual axonal segments can nonetheless be traced through machine learning to define axonal skeletons, whose length can be quantified as we do in this study. This methodology highlights sparse wS1 and wS2 innervation of a large number of brain areas, some of which are not typically considered, and our anatomical results might therefore help the neuronal circuit analysis underlying various aspects of whisker sensorimotor processing. Despite impressive large-scale projection mapping projects such as the Allen connectivity atlas, there remains relatively sparse cell typespecific projection map data for the representations of the large posterior whiskers in wS1 and wS2, and our data in this study thus adds to a growing body of cell-type specific projection mapping with the specific focus on the output connectivity of these whisker-related neocortical regions of sensory cortex.

      In the revised manuscript, we now provide an additional supplementary figure (Figure 1 – figure supplement 2) showing examples of the axonal segmentation from further additional image planes including branching axons in the key innervation regions mentioned by the reviewer, namely “other cortical areas, striatum and thalamus”.

      (2) Amongst the limitations of this study is the inability to resolve axons of passage and terminal fields. This has been done in other studies with viral constructs labeling synaptophysin. This should be mentioned. 

      The reviewer brings up another important point for future methodological improvements to enhance connectivity mapping. Indeed, we already mentioned this in our original submission near the end of the first paragraph under the Limitations and future perspectives section. In the revised manuscript on P. 17, we write “Future studies should also aim to identify neurotransmitter release sites along the axon, which could be achieved by fluorescent labeling of prominent synaptic components, such as synaptophysin-GFP (Li et al., 2010).”

      (3) There is no quantitative analysis of differences between the genetically defined neurons projecting to the striatum, what is the relative area innervated by, density of terminals, other measures. 

      The reviewer raises an interesting question, and in the revised manuscript, we now present a more detailed analysis of cell class-specific axonal projections focusing specifically on the striatum. Following the reviewer’s suggestion, in a new supplementary figure (Figure 7 – figure supplement 1), we now report spatial axonal density maps in the striatum from SSp-bfd and SSs, finding potentially interesting differences comparing the projections of Rasgrf2-L2/3, Scnn1a-L4 and Tlx3-L5IT neurons. On P. 12 of the revised manuscript, we now write “We also investigated the spatial innervation pattern of Rasgrf2-L2/3, Scnn1a-L4 and Tlx3-L5IT neurons in the striatum (Figure 7 – figure supplement 1), where we found that axonal density from Rasgrf2-L2/3 neurons in both SSp-bfd and SSs was concentrated in a posterior dorsolateral part of the ipsilateral striatum, whereas Tlx3-L5IT neurons had extensive axonal density across a much larger region of the striatum, including bilateral innervation by SSp-bfd neurons. Striatal innervation by Scnn1a-L4 neurons was intermediate between Rasgrf2-L2/3 and Tlx3-L5IT neurons.” We think the reviewer’s comment has helped reveal further interesting aspects of our data set, and we thank the reviewer.

      (4) Figure 5 is an example of the type of large sets of data that can be generated with whole brain mapping and registration to the Allen CCF that provides information of questionable value. Ordering the 50 plus structures by the density of labeling does not provide much in terms of relative input to different types of areas. There are multiple subregions for different functional types ( ie, different visual areas and different motor subregions are scattered not grouped together. Makes it difficult to understand any organizing principles.

      We agree with the reviewer, and fully support the importance of considering subregions within the relatively coarse compartmentalization of the current Allen CCF. In order to provide some further information about connectivity that may help give the reader further insights into the data, we have now added further quantification of cortex-specific axonal density ranked according to functional subregions in a new supplementary figure (Figure 5 – figure supplement 2). 

      (5) The GENSAT Cre driver lines used must have the specific line name used, not just the gene name as the GENSAT BAC-Cre lines had multiple lines for each gene and often with very different expression patterns. Rbp4_KL100, Tlx3_PL56, Sim1_KJ18, Ntsr1_ GN220. 

      In the revised manuscript, we now write out a fuller description of the mouse lines the first time they are mentioned in the Results section on P. 7. The full mouse line names, accession numbers and references were of course already described in the methods section, which remains the case in the revised manuscript.

      Reviewer #2 (Public Review): 

      Summary: 

      This study takes advantage of multiple methodological advances to perform layer-specific staining of cortical neurons and tracking of their axons to identify the pattern of their projections. This publication offers a mesoscale view of the projection patterns of neurons in the whisker primary and secondary somatosensory cortex. The authors report that, consistent with the literature, the pattern of projection is highly different across cortical layers and subtype, with targets being located around the whole brain. This was tested across 6 different mouse types that expressed a marker in layer 2/3, layer 4, layer 5 (3 sub-types) and layer 6.  Looking more closely at the projections from primary somatosensory cortex into the primary motor cortex, they found that there was a significant spatial clustering of projections from topographically separated neurons across the primary somatosensory cortex. This was true for neurons with cell bodies located across all tested layers/types. 

      Strengths: 

      This study successfully looks at the relevant scale to study projection patterns, which is the whole brain. This is achieved thanks to an ambitious combination of mouse lines, immunohistochemistry, imaging and image processing, which results in a standardized histological pipeline that processes the whole-brain projection patterns of layer-selected neurons of the primary and secondary somatosensory cortex. 

      This standardization means that comparisons between cell-types projection patterns are possible and that both the large-scale structure of the pattern and the minute details of the intra-areas pattern are available. 

      This reference dataset and the corresponding analysis code are made available to the research community. 

      Weaknesses: 

      One major question raised by this dataset is the risk of missing axons during the postprocessing step. Indeed, it appears that the control and training efforts have focused on the risk of false positives (see Figure 1 supplementary panels). And indeed, the risk of overlooking existing axons in the raw fluorescence data id discussed in the article. 

      Based on the data reported in the article, this is more than a risk. In particular, Figure 2 shows an example Rbp4-L5 mouse where axonal spread seems massive in Hippocampus, while there is no mention of this area in the processed projection data for this mouse line. 

      In Figure 2, we show the expression of tdTomato in double-transgenic mice in which the Cre-driver lines were crossed with a Cre-dependent reporter mouse expressing cytosolic tdTomato. In addition to the specific labelling of L5PT neurons in the somatosensory cortex, Rbp4-Cre mice also express Cre-recombinase in other brain regions including the hippocampus. In the reporter mice crossed with Rbp4-Cre mice, tdTomato is expressed in neurons with cell bodies in the hippocampus which is clearly visualized in Figure 2. Because our axonal labelling is based on localized viral vector expression of tdTomato in SSp-bfd and SSs, the expression of Cre in hippocampus does not affect our analysis. In order to clarify to the reader, in the legend to Figure 2D, we now specifically write “As for panel A, but for Rbp4-L5 neurons. Note strong expression of Cre in neurons with cell bodies located in the hippocampus, which does not affect our analysis of axonal density based on virus injected locally into the neocortex.” Consistent with this observation, the Allen Institute’s ISH data support

      expression of Rbp4 in neurons of the hippocampus e.g. https://mouse.brainmap.org/gene/show/19425 and https://mouse.brainmap.org/experiment/show/68632655.

      Similarily, the Ntsr1-L6CT example shows a striking level of fluorescence in Striatum, that does not reflect in the amount of axons that are detected by the algorithms in the next figures.  These apparent discrepancies may be due to non axonal-specific fluorescence in the samples. In any case, further analysis of such anatomical areas would be useful to consolidate the valuable dataset provided by the article. 

      As pointed out above, Figure 2 shows cytosolic tdTomato fluorescence in transgenic crosses of the Cre-driver mice with Cre-dependent tdTomato reporter mice. For the Ntsr1-Cre x LSL-tdTomato mice, all corticothalamic L6CT neurons from across the entire cortex drive tdTomato expression. The axon of each neuron must traverse the striatum giving rise to fluorescence in the striatum. As discussed above, labelling of synaptic specialisations will be important in future studies to separate travelling axon from innervating axon. However, the overall impact of the axons traversing the striatum is again mitigated in our study by considering the axonal projections from local sparse infections in SSp-bfd and SSs rather than from cortex-wide tdTomato expression.

      Reviewer #3 (Public Review): 

      Summary: 

      The paper offers a systematic and rigorous description of the layer-and sublayer specific outputs of the somatosensory cortex using a modern toolbox for the analysis of brain connectivity which combines: 1) Layer-specific genetic drivers for conditional viral tracing; 2) whole brain analyses of axon tracts using tissue clearing and imaging; 3) Segmentation and quantification of axons with normalization to the number of transduced neurons; 4) registration of connectivity to a widely used anatomical reference atlas; 5) functional validation of the connectivity using optogenetic approaches in vivo. 

      Strengths: 

      Although the connectivity of the somatosensory cortex is already known, precise data are dispersed in different accounts (papers, online resources,) using different methods. So the present account has the merit of condensing this information in one very precisely documented report. It also brings new insights on the connectivity, such as the precise comparison of layer specific outputs, and of the primary and secondary somatosensory areas. It also shows a topographic organization of the circuits linking the somatosensory and motor cortices. The paper also offers a clear description of the methodology and of a rigorous approach to quantitative anatomy. 

      Weaknesses: 

      The weakness relates to the intrinsic limitations of the in toto approaches, that currently lack the precision and resolution allowing to identify single axons, axon branching or synaptic connectivity. These limitations are identified and discussed by the authors. 

      We agree with the reviewer.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      No additional comment 

      OK

      Reviewer #2 (Recommendations For The Authors): 

      In Figure 8, we don't get to see much raw data, while the diversity of functional responses pattern to the primary and supplementary S1 activations is highly intriguing (and this diversity exists as suggested by the results in Figure 8E, LRPT). 

      Can Figure 8C be less blurred? Maybe give more space to individual examples, such as an overlay of the delineations of the activated area across the tested mice? 

      Also, can we have a view on the time dynamics of the functional activation and integration window? 

      Raw data - We have now added a new supplementary figure (Figure 8 – figure supplement 1) to show data from individual mice, as well as plotting the time-course of the evoked jRGECO fluorescence signals in the frontal cortex hotspot. 

      Image blur - Each pixel represents 62.5 x 62.5 um on the cortical surface. The images in Figure 8B&C were averaged across mice, which causes some additional spatial blurring. However, the most likely explanation for the ‘blurred’ impression, is the overall large horizontal extent of the axonal innervation as well as likely rapid lateral spread of excitation both at the stimulation area and in the target region, as for example also indicated in rapid voltage-sensitive imaging experiments (Ferezou et al., 2007).  

      Reviewer #3 (Recommendations For The Authors): 

      At the time being, the abstract is really centred on the methodology which is no longer very novel as it has actually been already been described previously by other groups. In my view the paper would gain visibility, and be a useful tool for the community if amended to better point out the significant results of the study, for instance, i) the layer and sub-layer specificity of the outputs, using the listed genetic drivers; ii) the comparison of primary and secondary somatosensory areas, iii) the functional validation. The layer specificity of each cre- line should be indicated in the abstract. 

      We have tried to improve the writing of the abstract along the lines suggested by the reviewer. Specifically, we have now added layer and projection class of the various Cre-lines, and we now also highlight the most obvious differences in the innervation patterns.

      There is some degree of redundancy in the description in the result section. One suggestion, for an easier flow of reading, would be to join the paragraphs " Laminar characterization of the Cre-lines.." and: "Axonal projections...". Start for each Cre-line with a description of the laminar localisation of recombination in the somatosensory cortices, followed therefrom by the description of outputs from SSp-bfd and SSs; Then the general description/overview of the outputs can be summarized as a legend to Figure 5-supplementary 2, which could appear as a main figure. 

      Although we agree with the reviewer that there is some level of redundancy in the text, the results of the characterization of the Cre-line (Figure 2) is quite a different experiment compared to the viral injections described in other figures, and we therefore prefer to keep these sections separate.

      Other minor points: 

      In the text; Indicate the genetic background of the transgenic mouse lines. 

      On P. 18, we now indicate that all mice were “back-crossed with C57BL/6 mice”.

      Keep consistency in the designation of the areas, S1 appears sometimes as SSp-bfd or as SSp 

      We thank the reviewer for pointing out the inconsistent nomenclature, which we have now corrected in the revised manuscript. ‘SSp’ remains used on P. 9 and P. 16 of the revised manuscript to indicate a region including SSp-bfd but also extending beyond.

      Figure 1 supplement 2 is not really necessary to show (as the viral tools have previously been validated) can just be stated in the text. Conversely one would like to see a higher resolution image of the injection sites that allowed to do the cell counts used for normalization, as this can be pretty tricky. 

      In response to the reviewer’s suggestion, we have now added a new supplemental figure to show an example of how cells in the injection site were counted (Figure 1 – figure supplement 3).

      Figure 2: the most important here is the higher magnification to show the precise laminar localisation of the recombination, rather than the atlas landmarks that is already shown in Figure 1. This would allow more space for clearer higher magnification panels comparing SSs and SSp. The present image hints to some real differences, but difficult to appreciate with the current resolution. The legend should also comment on the labelling seen in layer 1, in the Tlx2 and Rbp4 lines. Could be dendritic labelling, but this needs a word of clarification.

      We think both the overview images as well as the high-resolution images are of value to the reader. Following the reviewer’s comment, in the legends to Figure 2C&D, we have now added text suggesting that the layer 1 fluorescence is likely axonal or dendritic in origin : “Labelling in layer 1 is likely of axonal or dendritic origin, and no cell bodies were labelled in this layer.” In addition, we have added a new supplemental figure which shows the cortical labelling in SSp and SSS in a more magnified view (Figure 2 – figure supplement 1).

      Figure 3: the comparison of the 3 transgenic lines labelling layer 5 and showing sublaminar identities is really interesting in showing the heterogeneity of this layer and possible regional differences. However, the cases shown for illustration for Rbp4 and Tlx3 seem pretty massive in comparison with the other drivers. Maybe cases with smaller injections could be chosen for illustration. 

      Figure 3 shows grand average axonal density maps across different mice normalized to the number of neurons in the injection site. The large amount of axon per neuron observed in Rbp4 and Tlx3 mice therefore shows their long, wide-ranging axons compared to other neuronal classes.

      Figure 6A could be a supplementary figure in my view; 6B is clearer. 

      We think both representations are useful, and we think different readers might better appreciate either of the two analyses.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The manuscript by Jang et al. describes the application of new methods to measure the localization of GTP-binding signaling proteins (G proteins) on different membrane structures in a model mammalian cell line (HEK293). G proteins mediate signaling by receptors found at the cell surface (GPCRs), with evidence from the last 15 years suggesting that GPCRs can induce G-protein mediated signaling from different membrane structures within the cell, with variation in signal localization leading to different cellular outcomes. While it has been clearly shown that different GPCRs efficiently traffic to various intracellular compartments, it is less clear whether G proteins traffic in the same manner, and whether GPCR trafficking facilitates "passenger" G protein trafficking. This question was a blind spot in the burgeoning field of GPCR localized signaling in need of careful study, and the results obtained will serve as an important guidepost for further work in this field. The extent to which G proteins localize to different membranes within the cell is the main experimental question tested in this manuscript. This question is pursued through two distinct methods, both relying on genetic modification of the G-beta subunit with a tag. In one method, G-beta is modified with a small fragment of the fluorescent protein mNG, which combines with the larger mNG fragment to form a fully functional fluorescent protein to facilitate protein trafficking by fluorescent microscopy. This approach was combined with the expression of fluorescent proteins directed to various intracellular compartments (different types of endosomes, lysosome, endoplasmic reticulum, Golgi, mitochondria) to look for colocalization of G-beta with these markers. These experiments showed compelling evidence that G-beta co-localizes with markers at the plasma membrane and the lysosome, with weak or absent co-localization for other markers. A second method for measuring localization relied on fusing G-beta with a small fragment from a miniature luciferase (HiBit) that combines with a larger luciferase fragment (LgBit) to form an active luciferase enzyme. Localization of Gbeta (and luciferase signal) was measured using a method known as bystander BRET, which relies on the expression of a fluorescent protein BRET acceptor in different cellular compartments. Results using bystander BRET supported findings from fluorescence microscopy experiments. These methods for tracking G protein localization were also used to probe other questions. The activation of GPCRs from different classes had virtually no impact on the localization of G-beta, suggesting that GPCR activation does not result in the shuttling of G proteins through the endosomal pathway with activated receptors.

      Strengths:

      The question probed in this study is quite important and, in my opinion, understudied by the pharmacology community. The results presented here are an important call to be cognizant of the localization of GPCR coupling partners in different cellular compartments. Abundant reports of endosomal GPCR signaling need to consider how the impact of lower G protein abundance on endosomal membranes will affect the signaling responses under study.

      The work presented is carefully executed, with seemingly high levels of technical rigor. These studies benefit from probing the experimental questions at hand using two different methods of measurement (fluorescent microscopy and bystander BRET). The observation that both methods arrive at the same (or a very similar) answer inspires confidence about the validity of these findings.

      Weaknesses:

      The rationale for fusing G-beta with either mNG2(11) or SmBit could benefit from some expansion. I understand the speculation that using the smallest tag possible may have the smallest impact on protein performance and localization, but plenty of researchers have fused proteins with whole fluorescent proteins to provide conclusions that have been confirmed by other methods. Many studies even use G proteins fused with fluorescent proteins or luciferases. Is there an important advantage to tagging G-beta with small tags? Is there evidence that G proteins with full-size protein tags behave aberrantly? If the studies presented here would not have been possible without these CRISPR-based tagging approaches, it would be helpful to provide more context to make this clearer. Perhaps one factor would be interference from newly synthesized G proteins-fluorescent protein fusions en route to the plasma membrane (in the ER and Golgi).

      There are several advantages to using small peptide tags that we did not fully explain. From a practical standpoint the most important advantage of using the HiBit tag instead of full-length Nanoluc is that it allows us to restrict luminescence output to cells transiently transfected with LgBit. In this way untransfected cells contribute no background signal. Although we did not take advantage of it here, this also applies to fluorescent protein complementation, and will be useful for visualizing proteins in individual cells within tissues. The HiBit tag also allows PAGE analysis by probing membranes with LgBit (as in Fig. 1). We are not aware of evidence that tagging Gb or Gg subunits on the N terminus results in aberrant behavior, while there is some evidence that Ga subunits tagged with full-size protein tags (in some positions) have altered functional properties (PMID: 16371464). We do think that editing endogenous genes is critical, as studies using transient overexpression (usually driven by strong promoters) have sometimes reported accumulation of tagged G proteins in the biosynthetic pathway (e.g., PMID: 17576765), as the reviewer suggests. Ga and Gbg appear to be mutually dependent on each other for appropriate trafficking to the plasma membrane (reviewed in PMID: 23161140), therefore the native (presumably matched) stoichiometry is likely to be critical.

      To clarify this context the revised manuscript includes the following:

      “For bioluminescence experiments we added the HiBit tag (Schwinn et al., 2018) and isolated clonal “HiBit-b1“ cell lines. An advantage of this approach over adding a full-length Nanoluc luciferase is that it requires coexpression of LgBit to produce a complemented luciferase. This limits luminescence to cotransfected cells and thus eliminates background from untransfected cells.”

      “Some studies using overexpressed G protein subunits have suggested that a large pool of G proteins is located on intracellular membranes, including the Golgi apparatus (Chisari et al., 2007; Saini et al., 2007; Tsutsumi et al., 2009), whereas others have indicated a distribution that is dominated by the plasma membrane (Crouthamel et al., 2008; Evanko, Thiyagarajan, & Wedegaertner, 2000; Marrari et al., 2007; Takida & Wedegaertner, 2003). A likely factor contributing to these discrepant results is the stoichiometry of overexpressed subunits, as neither Ga nor Gbg traffic appropriately to the plasma membrane as free subunits (Wedegaertner, 2012). Our gene-editing approach presumably maintains the native subunit stoichiometry, providing a more accurate representation of native G protein distribution.”

      As noted by the authors, they do not demonstrate that the tagged G-beta is predominantly found within heterotrimeric G protein complexes. If there is substantial free G-beta, then many of the conclusions need to be reconsidered. Perhaps a comparison of immunoprecipitated tagged G beta vs immunoprecipitated supernatant, with blotting for other G protein subunits would be informative.

      We do think that HiBit-b1 exists predominantly within heterotrimeric complexes, for several reasons. First, overexpression studies have shown that Gbg requires association with Ga to traffic to the plasma membrane, and that by itself Gbg is retained on the endoplasmic reticulum

      (PMID: 12609996; PMID: 12221133). We find almost no endogenous Gb1 on the endoplasmic reticulum, and a high density on the plasma membrane. Second, we are able to detect large increases in free HiBit-Gbg after G protein activation using free Gbg sensors (e.g. Fig. 1). Third, many proteins that bind to free Gbg are found entirely in the cytosol of HEK 293 cells (e.g. PMID: 10066824), suggesting there is not a large population of free Gbg. We have added discussion of these points to the revised manuscript as follows:

      “Endogenous Ga and Gb subunits are expressed at approximately a 1:1 ratio, and Gb subunits are tightly associated with Gg and inactive Ga subunits (Cho et al., 2022; Gilman, 1987; Krumins & Gilman, 2006). Moreover, proteins that bind to free Gbg dimers are found in the cytosol of unstimulated HEK 293 cells, suggesting at most only a small population of free Gbg in these cells. Therefore, we assume that the large majority of mNG-b1 and HiBit-b1 subunits in unstimulated cells are part of heterotrimers.”

      “Notably, when Gbg dimers are expressed alone they accumulate on the endoplasmic reticulum

      (Michaelson et al., 2002; Takida & Wedegaertner, 2003). That we detect almost no endogenous Gbg on the endoplasmic reticulum supports our conclusion that the large majority of Gbg in unstimulated HEK 293 cells is associated with Ga, although we cannot rule out a small population of free Gbg.”

      We do not entirely understand the suggested experiment, as free Gbg will still be largely associated with the membrane fraction. Notably, we find almost no HiBit-b1 in the supernatant after lysis in hypotonic buffer and preparation of membrane fractions, and the small amount that we do find does not change if Ga is overexpressed.

      Additional context and questions:

      (1) There exists some evidence that certain GPCRs can form enduring complexes with G-betagamma (PubMed: 23297229, 27499021). That would seem to offer a mechanism that would enable receptor-mediated transport of G protein subunits. It would be helpful for the authors to place the findings of this manuscript in the context of these previous findings since they seem somewhat contradictory.

      We agree. In our original submission we noted “It is possible that other receptors will influence G protein distribution using mechanisms not shared by the receptors we studied.” In the revised manuscript we have added:

      “For example, a few receptors are thought to form relatively stable complexes with Gbg, which could provide a mechanism of trafficking to endosomes (Thomsen et al., 2016; Wehbi et al., 2013).”

      (2) There is some evidence that GaS undergoes measurable dissociation from the plasma membrane upon activation (see the mechanism of the assay in PubMed: 35302493). It seems possible that G-alpha (and in particular GaS) might behave differently than the G-beta subunit studied here. This is not entirely clear from the discussion as it now stands.

      Indeed, there is abundant evidence that some Gas translocates away from the plasma membrane upon activation. We referred to translocation of “some Ga subunits” in the introduction, although we did not specify that Gas is by far the most studied example. In a previous study (PMID: 27528603) we found that overexpressed Gas samples many intracellular membranes upon activation and returns to the plasma membrane when activation ceases. This is similar to activation-dependent translocation of free Gbg dimers. Because these translocation mechanisms depend on activation and are reversible they are unlikely to be a major source of inactive heterotrimers for intracellular membranes.

      We did a poor job of making it clear that we intentionally avoided translocation mechanisms that operate only during receptor and G protein stimulation. In the revised manuscript we have added new data showing reversible activation-dependent translocation of endogenous HiBitGb1.

      (3) The authors say "The presence of mNG-b1 on late endosomes suggested that some G proteins may be degraded by lysosomes". The mechanism of lysosomal degradation by proteins on the outside of the lysosome is not clear. It would be helpful for the authors to clarify.

      We agree we didn’t connect the dots here. Our initial idea was that G proteins on the surface of late endosomes might reach the interior of late endosomes and then lysosomes by involution into multivesicular bodies. However, the reviewer correctly points out that much of the G protein associated with lysosomes still appears to be on the cytosolic surface, where it would not be subject to degradation. In fact, since lysosomes can fuse with the plasma membrane under certain circumstances, this could even represent a pathway for recycling G proteins to the plasma membrane.

      We have revised the text to avoid giving the impression that lysosomes degrade G proteins, since we have scant evidence that this occurs. In the revised discussion we point out that we do not know the fate of G proteins located on the surface of lysosomes and speculate that these could be returned to the plasma membrane:

      “We do not know the fate of G proteins located on the surface of lysosomes. Since lysosomes may fuse with the plasma membrane under certain circumstances (Xu & Ren, 2015), it is possible that this represents a route of G protein recycling to the plasma membrane.”

      (4) Although the authors do a good job of assessing G protein dilution in endosomal membranes, it is unclear how this behavior compares to the measurement of other lipidanchored proteins using the same approach. Is the dilution of G proteins what we would expect for any lipid-anchored protein at the inner leaflet of the plasma membrane?

      This is a great question. To begin to address it we have studied a model lipid-anchored protein consisting of mNeongreen2 anchored to the plasma membrane by the C terminus of HRas, which is palmitoylated and prenylated. We find that this protein is also diluted on endocytic vesicles, although to a lesser degree than heterotrimeric G proteins. We have added a section to the results and a new figure supplement describing these results:

      “To test if other peripheral membrane proteins are similarly depleted from endocytic vesicles, we performed analogous experiments by overexpressing mNG bearing the C-terminal membrane anchor of HRas (mNG-HRas ct). We found that mNG-HRas ct was also less abundant on FM464-positive endocytic vesicles than expected based on plasma membrane abundance, although not to the same extent as mNG-b1 (Figure 4 - figure supplement 2); mNG-HRas ct density on FM4-64-positive vesicles was 64 ± 17% (mean ± 95% CI; n=78) of the nearby plasma membrane.”

      Reviewer #2 (Public Review):

      This is an interesting method that addresses the important problem of assessing G protein localization at endogenous levels. The data are generally convincing.

      Specific comments

      Methods:

      The description of the gene editing method is unclear. There are two different CRISPR cell lines made in two different cell backgrounds. The methods should clearly state which CRISPR guides were used on which cell line. It is also not clear why HiBit is included in the mNG-β1 construct. Presumably, this is not critical but it would be helpful to explicitly note. In general, the Methods could be more complete.

      We have added the following to the methods to clarify that the same gRNA was used to produce both cell lines:

      “The human GNB1 gene was targeted at a site corresponding to the N-terminus of the Gb1 protein; the sequence 5’-TGAGTGAGCTTGACCAGTTA-3’ was incorporated into the crRNA, and the same gRNA was used to produce both HiBit-b1 and mNG-b1 cell lines.”

      We have added the following to the methods to clarify why HiBit is included in the mNG-b1 construct:

      “HiBit was included in the repair template for producing mNG-b1 cells to enable screening for edited clones using luminescence.”

      Results:

      The explanation of validation experiments in Figures 1 C and D is incomplete and difficult to follow. The rationale and explanation of the experiments could be expanded. In addition, because this is an interesting method, it would be helpful to know if the endogenous editing affects normal GPCR signaling. For example, the authors could include data showing an Isoinduced cAMP response. This is not critical to the present interpretation but is relevant as a general point regarding the method. Also, it may be relevant to the interpretation of receptor effects on G protein localization.

      We have expanded the rationale and explanation of experiments in Figures 1C and D by adding:

      “For example, we observed agonist-induced BRET between the D2 dopamine receptor and mNG-b1, an interaction that requires association with endogenous Ga subunits (Figure 1C). Similarly, we observed BRET between HiBit-b1 and the free Gbg sensor memGRKct-Venus after activation of receptors that couple Gi/o, Gs, and Gq heterotrimers, indicating that HiBit-b1 associated with endogenous Ga subunits from these three families (Figure 1D).”

      We have done the suggested cAMP experiment and provide the data in a new figure supplement:

      “We also found that cyclic AMP accumulation in response to stimulation of endogenous b adrenergic receptors was similar in edited cell lines and their unedited parent lines (Figure 1 - figure supplement 1).”

      Discussion:

      The conclusion that beta-gamma subunits do not redistribute after GPCR activation seems new and different from previous reports. Is this correct? Can the authors elaborate on how the results compare to previous literature?

      Many previous studies have indeed shown that free Gbg dimers can redistribute after GPCR activation and sample intracellular membranes. Our initial focus was on possible changes in heterotrimer distribution after GPCR activation, but in retrospect we should have directly addressed free Gbg translocation and made the distinction clear. 

      In the revised manuscript we show that during stimulation we observe changes consistent with modest translocation of endogenous Gbg from the plasma membrane and sampling of intracellular compartments. To our knowledge this is the first demonstration of endogenous Gbg translocation.

      We have added:

      “With overexpressed G proteins free Gbg dimers translocate from the plasma membrane and sample intracellular membrane compartments after activation-induced dissociation from Ga subunits. Consistent with this, we observed small decreases in bystander BRET at the plasma membrane and small increases in bystander BRET at intracellular compartments during activation of GPCRs, suggesting that endogenous Gbg subunits undergo similar translocation (Figure 5- figure supplement 1). Notably, these changes occurred at room temperature, suggesting that endocytosis was not involved, and developed over the course of minutes. The latter observation and the small magnitude of agonist-induced changes are both consistent with expression of primarily slowly-translocating endogenous Gg subtypes in HEK 293 cells. Moreover, as shown previously for overexpressed Gbg, the changes we observed with endogenous Gbg were readily reversible (Figure 5- figure supplement 1), suggesting that most heterotrimers reassemble at the plasma membrane after activation ceases.”

      Can the authors note that OpenCell has endogenously tagged Gβ1 and reports more obvious internal localization? Can the authors comment on this point?

      OpenCell has tagged GNB1 and the Leonetti group kindly provided a parent cell line we used to add a slightly different tag. Although their study did not identify any specific intracellular compartments, our impression is that most of the internal structures visible in their images are likely to be lysosomes, as they are large, round and often have a clear lumen. Overall their images and ours are comfortingly similar. We have added:

      “Unsurprisingly, our images are quite similar to those made as part of previous study that labeled Gb1 subunits with mNG2 (Cho et al., 2022).”

      Notably, the Leonetti group has recently reported the subcellular distribution of many untagged proteins using a proteomic approach. They find that Gb1 is enriched on the plasma membrane and lysosomes but is not enriched on endosomes, the Golgi apparatus, endoplasmic reticulum or mitochondria (https://www.biorxiv.org/content/10.1101/2023.12.18.572249v1). We have cited this work in the revised manuscript.

      Is this the first use of CRISPR / HiBit for BRET assay? It would be helpful to know this or cite previous work if not. Also, as this is submitted as a tools piece, the authors might say a little more about the potential application to other questions.

      The only previous study we are aware of utilizing a similar combination of methods is a 2020 report from the group of Dr. Stephen Hill, in which the authors studied binding of fluorescent ligands to HiBit-tagged GPCRs. This work is now cited.

      We have also added the following to our previous brief statement about potential applications:

      “In addition, it may also be possible to use these cells in combination with targeted sensors to study endogenous G protein activation in different subcellular compartments. More broadly, our results show that subcellular localization of endogenous membrane proteins can be studied in living cells by adding a HiBit tag and performing bystander BRET mapping. Applied at large scale this approach would have some advantages over fluorescent protein complementation, most notably the ability to localize endogenous membrane proteins that are expressed at levels that are too low to permit fluorescence microscopy.”

      Reviewer #3 (Public Review):

      Summary:

      This article addresses an important and interesting question concerning intracellular localization and dynamics of endogenous G proteins. The fate and trafficking of G protein-coupled receptors (GPCRs) have been extensively studied but so far little is known about the trafficking routes of their partner G proteins that are known to dissociate from their respective receptors upon activation of the signaling pathway. The authors utilize modern cell biology tools including genome editing and bystander bioluminescence resonance energy transfer (BRET) to probe intracellular localization of G proteins in various membrane compartments in steady state and also upon receptor activation. Data presented in this manuscript shows that while G proteins are mostly present on the plasma membrane, they can be also detected in endosomal compartments, especially in late endosomes and lysosomes. This distribution, according to data presented in this study, seems not to be affected by receptor activation. These findings will have implications in further studies addressing GPCR signaling mechanisms from intracellular compartments.

      Strengths:

      The methods used in this study are adequate for the question asked. Especially, the use of genome-edited cells (for the addition of the tag on one of the G proteins) is a great choice to prevent the effects of overexpression. Moreover, the use of bystander BRET allowed authors to probe the intracellular localization of G proteins in a very high-throughput fashion. By combining imaging and BRET authors convincingly show that G proteins are very low abundant on early endosomes (also ER, mitochondria, and medial Golgi), however seem to accumulate on membranes of late endosomal compartments.

      Weaknesses:

      While the authors provide a novel dataset, many questions regarding G protein trafficking remain open. For example, it is not entirely clear which pathway is utilized to traffic G proteins from the plasma membrane to intracellular compartments. Additionally, future studies should also address the dynamics of G protein trafficking, for example by tracking them over multiple time points.

      We agree, there is much more to do.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      On page 7 the text says "the difference did reach significance (Figure 5D)". It looks like the difference did not reach significance. Please check on this.

      Thank you, this was an unfortunately significant typo.

      Reviewer #3 (Recommendations For The Authors):

      This article addresses an important and interesting question concerning intracellular localization and dynamics of endogenous G proteins. While the posed question is indeed a grand one and the methods used by the authors are novel, I believe that the data presented in this manuscript are still insufficient to support all claims posed by the authors. Below I list my major concerns:

      (1) The authors claim that they provide a "detailed subcellular map of endogenous G protein distribution", however, the map is in my opinion not sufficiently detailed (e.g. trans-Golgi network is not included) and not quantitative enough (e.g. % of proteins present on one compartment vs. the other as authors claim that BRET signals "cannot be directly compared between different compartments"). To strengthen this statement, except for providing more extensive and quantitative data, it would be beneficial to provide such a "map" as an illustration based on the findings presented in this article.

      “Detailed” is certainly a subjective term. While we maintain that our description of endogenous G protein distribution is far more detailed than any previous study, we now simply claim to provide a “subcellular map”. We have added images of TGNP (TGN46; TGOLN2), showing that endogenous G proteins are readily detectable on the structures labeled by this marker. These data are now provided in Figure 3 – figure supplement 7.

      We did not claim that our study was quantitative- we did not try to count G proteins. However, if we use published estimates of total G proteins and surface area for HEK 293 cells we estimate that there are roughly 2,500 G proteins µm-2 on the plasma membrane and 500 G proteins µm-2 on endocytic vesicles. For other intracellular compartments relative density can be approximated by inspecting images, but a truly quantitative estimate would require a surface area standard analogous to FM4-64 for each compartment. The percentage of the total G protein pool on a given compartment is, in our opinion, less important than the density of G proteins on that compartment, as the latter is more likely to affect the efficiency of local signal transduction. Since we do not claim to have accurate G protein density estimates for many intracellular compartments, we prefer to provide several raw images for each compartment rather than a schematized map.

      Bystander BRET values cannot be compared directly across compartments due to differences in expression and energy transfer efficiency of different markers and compartment surface area. This method is well suited for following changes in distribution as a function of time or after perturbations and for sensitive detection of weak colocalization but can only provide approximate “maps” of absolute distribution.

      (2) Probing of the intracellular distribution of these proteins, especially after GPCR activation, includes a single chosen timepoint. I believe that the manuscript would greatly benefit from including some dynamic data on internalization and intracellular trafficking kinetics. What is the turnover of tested G proteins? What is the fraction that is going to recycling compartments and/or lysosomes? Authors could perhaps turn to other methods to be able to dynamically track proteins over time e.g. via photoconversion techniques.

      Because G protein trafficking appears to be largely constitutive there is no easy way for us to assess how long it takes G proteins to transit various intracellular compartments, although we agree this would be interesting. As the reviewer suggests, dynamic data on constitutive trafficking would require methods (such as photoconversion) not currently available to us for endogenous G proteins. Accordingly, we have made no claims regarding the kinetics of G protein trafficking. As for possible redistribution after GPCR activation, in the revised manuscript we have added 5- and 15-minute timepoints after agonist stimulation for our bystander BRET mapping (Figure 5- figure supplement 2). These timepoints were chosen to correspond to persistent signaling mediated by internalized receptors. 

      (3) Exemplary images with cells showing significant colocalization with lysosomal compartments seem to contain more intracellular vesicles visible in the mNG channel than in the case of the other compartment. Is it an effect of the treatment to stain lysosomes? It would be beneficial to compare it with some endogenous marker e.g. LAMP1 without additional treatments.

      The visibility of intracellular vesicles in our lysosome images likely reflects our selection of cells and regions with visible and abundant lysosomes, specifically peripheral regions directly adhered to the coverslip, rather than treatment with lysosomal stains (LV 633 and dextran). As suggested, we now include images of cells expressing LAMP1 as an alternative lysosome marker (Figure 3 - figure supplement 6).

      (4) The authors probe an abundance of G proteins along the constitutive endocytic pathway. However, to prove that G proteins are not de-palmitoylated rather than endocytosed authors should perform control experiments where endocytosis is blocked e.g. pharmacologically or via a knockdown approach. Additionally, various endocytic pathways can be probed.

      We did not claim that depalmitoylation plays no role in delivery of G proteins to internal compartments. In fact, we pointed out that we cannot at present rule out other pathways and delivery mechanisms. Importantly, if some of the G proteins that we detect along the endocytic pathway do arrive there by trafficking through the cytosol this would only strengthen our major conclusion that endocytosis is inefficient.

      Having said this, we have now conducted extensive experiments investigating the role of palmitate cycling in the trafficking of heterotrimeric G proteins and the small G protein H-Ras. Our results suggest that a depalmitoylation-repalmitoylation cycle is not important for the distribution of heterotrimers, but these findings will be the subject of a separate publication focused on this specific question for both large and small G proteins.

      We agree that it will be interesting to probe different endocytic pathways, as suggested using a genetic approach. Our main interest here was in endocytic membranes that were defined functionally (with FM4-64 or internalized receptors) rather than biochemically.

      Minor comments:

      (5) "Imaging" paragraph in the Methods section refers to a non-existent figure called "SI Appendix S9".

      Thank you.

      (6) It is not clear what was used as a "control" in Figure 5E.

      “Control” refers to DPBS vehicle alone. This information is now added to the legend for Figure 5E.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Line 127. Provide a few more words describing the voltage protocol. To the uninitiated, panels A and B will be difficult to understand. "The large negative step is used to first close all channels, then probe the activation function with a series of depolarizing steps to re-open them and obtain the max conductance from the peak tail current at -36 mV. "

      We have revised the text as suggested (revision lines 127 to Line 131): “From a holding potential within the gK,L activation range (here –74 mV), the cell is hyperpolarized to –124 mV, negative to EK and the activation range, producing a large inward current through open gK,L channels that rapidly decays as the channels deactivate. We use the large transient inward current as a hallmark of gK,L. The hyperpolarization closes all channels, and then the activation function is probed with a series of depolarizing steps, obtaining the max conductance from the peak tail current at –44 mV (Fig. 1A).”

      Incidentally, why does the peak tail current decay? 

      We added this text to the figure legend to explain this: “For steps positive to the midpoint voltage, tail currents are very large. As a result, K+ accumulation in the calyceal cleft reduces driving force on K+, causing currents to decay rapidly, as seen in A (Lim et al., 2011).”

      The decay of the peak tail current is a feature of gK,L (large K+ conductance) and the large enclosed synaptic cleft (which concentrates K+ that effluxes from the HC). See Govindaraju et al. (2023) and Lim et al. (2011) for modeling and experiments around this phenomenon.

      Line 217-218. For some reason, I stumbled over this wording. Perhaps rearrange as "In type II HCs absence of Kv1.8 significantly increased Rin and tauRC. There was no effect on Vrest because the conductances to which Kv1.8 contributes, gA and gDR activate positive to the resting potential. (so which K conductances establish Vrest???). 

      We kept our original wording because we wanted to discuss the baseline (Vrest) before describing responses to current injection.

      ->Vrest is presumably maintained by ATP-dependent Na/K exchangers (ATP1a1), HCN, Kir, and mechanotransduction currents. Repolarization is achieved by delayed rectifier and A-type K+ conductances in type II HCs.

      Figure 4, panel C - provides absolute membrane potential for voltage responses. Presumably, these were the most 'ringy' responses. Were they obtained at similar Vm in all cells (i.e., comparisons of Q values in lines 229-230). 

      We added the absolute membrane potential scale. Type II HC protocols all started with 0 pA current injection at baseline, so they were at their natural Vrest, which did not differ by genotype or zone. Consistent with Q depending on expression of conductances that activate positive to Vrest, Q did not co-vary with Vrest (Pearson’s correlation coefficient = 0.08, p = 0.47, n= 85).

      Lines 254. Staining is non-specific? Rather than non-selective? 

      Yes, thanks - Corrected (Line 264).

      Figure 6. Do you have a negative control image for Kv1.4 immuno? Is it surprising that this label is all over the cell, but Kv1.8 is restricted to the synaptic pole? 

      We don’t have a null-animal control because this immunoreactivity was done in rat. While the cuticular plate staining was most likely nonspecific because we see that with many different antibodies, it’s harder to judge the background staining in the hair cell body layer. After feedback from the reviewers, we decided to pull the KV1.4 immunostaining from the paper because of the lack of null control, high background, and inability to reproduce these results in mouse tissue. In our hands, in mouse tissue, both mouse and rabbit anti-KV1.4 antibodies failed to localize to the hair cell membrane. Further optimization or another method could improve that, but for now the single-cell expression data (McInturff et al., 2018) remain the strongest evidence for KV1.4 expression in murine type II hair cells.

      Lines 400-404. Whew, this is pretty cryptic. Expand a bit? 

      We simplified this paragraph (revision lines 411-413): “We speculate that gA and gDR(KV1.8) have different subunit composition: gA may include heteromers of KV1.8 with other subunits that confer rapid inactivation, while gDR(KV1.8) may comprise homomeric KV1.8 channels, given that they do not have N-type inactivation .”

      Line 428. 'importantly different ion channels'. I think I understand what is meant but perhaps say a bit more. 

      Revised (Line 438): “biophysically distinct and functionally different ion channels”.

      Random thought. In addition to impacting Rin and TauRC, do you think the more negative Vrest might also provide a selective advantage by increasing the driving force on K entry from endolymph? 

      When the calyx is perfectly intact, gK,L is predicted to make Vrest less negative than the values we report in our paper, where we have disturbed the calyx to access the hair cell (–80, Govindaraju et al., 2023, vs. –87 mV, here). By enhancing K+ accumulation in the calyceal cleft, the intact calyx shifts EK—and Vrest—positively (Lim et al., 2011), so the effect on driving force may not be as drastic as what you are thinking.

      Reviewer #2 (Recommendations For The Authors): 

      (1) Introduction: wouldn't the small initial paragraph stating the main conclusion of the study fit better at the end of the background section, instead of at the beginning? 

      Thank you for this idea, we have tried that and settled on this direct approach to let people know in advance what the goals of the paper are.

      (2) Pg.4: The following sentence is rather confusing "Between P5 and P10, we detected no evidence of a non-gK,L KV1.8-dependent.....". Also, Suppl. Fig 1A seems to show that between P5 and P10 hair cells can display a potassium current having either a hyperpolarised or depolarised Vhalf. Thus, I am not sure I understand the above statement. 

      Thank you for pointing out unclear wording. We used the more common “delayed rectifier” term in our revision (Lines 144-147): “Between P5 and P10, some type I HCs have not yet acquired the physiologically defined conductance, gK,L.. N effects of KV1.8 deletion were seen in the delayed rectifier currents of immature type I HCs (Suppl. Fig. 1B), showing that they are not immature forms of the Kv1.8-dependent gK,L channels. ”

      (3) For the reduced Cm of hair cells from Kv1.8 knockout mice, could another reason be simply the immature state of the hair cells (i.e. lack of normal growth), rather than less channels in the membrane? 

      There were no other signs to suggest immaturity or abnormal growth in KV1.8–/– hair cells or mice. Importantly, type II HCs did not show the same Cm effect.

      We further discussed the capacitance effect in lines 160-167: “Cm scales with surface area, but soma sizes were unchanged by deletion of KV1.8 (Suppl. Table 2). Instead, Cm may be higher in KV1.8+/+ cells because of gK,L for two reasons. First, highly expressed trans-membrane proteins (see discussion of gK,L channel density in Chen and Eatock, 2000) can affect membrane thickness (Mitra et al., 2004), which is inversely proportional to specific Cm. Second, gK,L could contaminate estimations of capacitive current, which is calculated from the decay time constant of transient current evoked by small voltage steps outside the operating range of any ion channels. gK,L has such a negative operating range that, even for Vm negative to –90 mV, some gK,L channels are voltage-sensitive and could add to capacitive current.”

      (4) Methods: The electrophysiological part states that "For most recordings, we used .....". However, it is not clear what has been used for the other recordings.

      Thanks for catching this error, a holdover from an earlier ms. version.  We have deleted “For most recordings” (revision line 466).

      Also, please provide the sign for the calculated 4 mV liquid junction potential. 

      Done (revision line 476).

      Reviewer #3 (Recommendations For The Authors): 

      (1) Some of the data in panels in Fig. 1 are hard to match up. The voltage protocols shown in A and B show steps from hyperpolarized values to -71mV (A) and -32 mV (B). However, the value from A doesn't seem to correspond with the activation curve in C.

      Thank you for catching this.  We accidentally showed the control I-X curve from a different cell than that in A. We now show the G-V relation for the cell in A.

      Also the Vhalf in D for -/- animals is ~-38 mV, which is similar to the most positive step shown in the protocol.

      The most positive step in Figure 1B is actually –25 mV. The uneven tick labels might have been confusing, so we re-labeled them to be more conventional.

      Were type I cells stepped to more positive potentials to test for the presence of voltage-activated currents at greater depolarizations? This is needed to support the statement on lines 147-148. 

      We added “no additional K+ conductance activated up to +40 mV” (revision line 149-150).  Our standard voltage-clamp protocol iterates up to ~+40 mV in KV1.8–/– hair cells, but in Figure 1 we only showed steps up to –25 mV because K+ accumulation in the synaptic cleft with the calyx distorts the current waveform even for the small residual conductances of the knockouts. KV1.8–/– hair cells have a main KV conductance with a Vhalf of ~–38 mV, as shown in Figure 1, and we did not see an additional KV conductance that activated with a more positive Vhalf up to +40 mV.

      (2) Line 151 states "While the cells of Kv1.8-/- appeared healthy..." how were epithelia assessed for health? Hair cells arise from support cells and it would be interesting to know if Kv1.8 absence influences supporting cells or neurons. 

      We added our criteria for cell health to lines 477-479: “KV1.8–/– hair cells appeared healthy in that cells had resting potentials negative to –50 mV, cells lasted a long time (20-30 minutes) in ruptured patch recordings, membranes were not fragile, and extensive blebbing was not seen.”

      Supporting cells were not routinely investigated. We characterized calyx electrical activity (passive membrane properties, voltage-gated currents, firing pattern) and didn’t detect differences between +/+, +/–, and –/– recordings (data not shown). KV1.8 was not detected in neural tissue (Lee et al., 2013). 

      (3) Several different K+ channel subtypes were found to contribute to inner hair cell K+ conductances (Dierich et al. 2020) but few additional K+ channel subtypes are considered here in vestibular hair cells. Further comments on calcium-activated conductances (lines 310-317) would be helpful since apamin-sensitive SK conductances are reported in type II hair cells (Poppi et al. 2018) and large iberiotoxin-sensitive BK conductances in type I hair cells (Contini et al. 2020). Were iberiotoxin effects studied at a range of voltages and might calcium-dependent conductances contribute to the enhanced resonance responses shown in Fig. 4? 

      We refer you to lines 310-317 in the original ms (lines 322-329 in the revised ms), where we explain possible reasons for not observing IK(Ca) in this study.

      (4) Similar to GK,L erg (Kv11) channels show significant Cs+-permeability. Were experiments using Cs+ and/or Kv11 antagonists performed to test for Kv11? 

      No. Hurley et al. (2006) used Kv11 antagonists to reveal Kv11 currents in rat utricular type I hair cells with perforated patch, which were also detected in rats with single-cell RT-PCR (Hurley et al. 2006) and in mice with single-cell RNAseq (McInturff et al., 2018).  They likely contribute to hair cell currents, alongside Kv7, Kv1.8, HCN1, and Kir. 

      (5) Mechanosensitive ("MET") channels in hair cells are mentioned on lines 234 and 472 (towards the end of the Discussion), but a sentence or two describing the sensory function of hair cells in terms of MET channels and K+ fluxes would help in the Introduction too. 

      Following this suggestion we have expanded the introduction with the following lines  (78-87): “Hair cells are known for their large outwardly rectifying K+ conductances, which repolarize membrane voltage following a mechanically evoked perturbation and in some cases contribute to sharp electrical tuning of the hair cell membrane.  Because gK,L is unusually large and unusually negatively activated, it strongly attenuates and speeds up the receptor potentials of type I HCs (Correia et al., 1996; Rüsch and Eatock, 1996b). In addition, gK,L augments a novel non-quantal transmission from type I hair cell to afferent calyx by providing open channels for K+ flow into the synaptic cleft (Contini et al., 2012, 2017, 2020; Govindaraju et al., 2023), increasing the speed and linearity of the transmitted signal (Songer and Eatock, 2013).”

      (6) Lines 258-260 state that GKL does not inactivate, but previous literature has documented a slow type of inactivation in mouse crista and utricle type I hair cells (Lim et al. 2011, Rusch and Eatock 1996) which should be considered. 

      Lim et al. (2011) concluded that K+ accumulation in the synaptic cleft can explain much of the apparent inactivation of gK,L. In our paper, we were referring to fast, N-type inactivation. We changed that line to be more specific; new revision lines 269-271: “KV1.8, like most KV1 subunits, does not show fast inactivation as a heterologously expressed homomer (Lang et al., 2000; Ranjan et al., 2019; Dierich et al., 2020), nor do the KV1.8-dependent channels in type I HCs, as we show, and in cochlear inner hair cells (Dierich et al., 2020).”

      (7) Lines 320-321 Zonal differences in inward rectifier conductances were reported previously in bird hair cells (Masetto and Correia 1997) and should be referenced here.

      Zonal differences were reported by Masetto and Correia for type II but not type I avian hair cells, which is why we emphasize that we found a zonal difference in I-H in type I hair cells. We added two citations to direct readers to type II hair cell results (lines 333-334): “The gK,L knockout allowed identification of zonal differences in IH and IKir in type I HCs, previously examined in type II HCs (Masetto and Correia, 1997; Levin and Holt, 2012).”

      Also, Horwitz et al. (2011) showed HCN channels in utricles are needed for normal balance function, so please include this reference (see line 171). 

      Done (line 184).

      (8) Fig 6A. Shows Kv1.4 staining in rat utricle but procedures for rat experiments are not described. These should be added. Also, indicate striola or extrastriola regions (if known). 

      We removed KV1.4 immunostaining from the paper, see above.

      (9) Table 6, ZD7288 is listed -was this reagent used in experiments to block Gh? If not please omit. 

      ZD7288 was used to block gH to produce a clean h-infinity curve in Figure 6, which is described in the legend.

      (10) In supplementary Fig. 5A make clear if the currents are from XE991 subtraction. Also, is the G-V data for single cell or multiple cells in B? It appears to be from 1 cell but ages P11-505 are given in legend. 

      The G-V curve in B is from XE991 subtraction, and average parameters in the figure caption are for all the KV1.8–/–  striolar type I hair cells where we observed this double Boltzmann tail G-V curve. I added detail to the figure caption to explain this better.

      (11) Supplementary Fig. 6A claims a fast activation of inward rectifier K+ channels in type II but not type I cells-not clear what exactly is measured here.

      We use “fast inward rectifier” to indicate the inward current that increases within the first 20 ms after hyperpolarization from rest (IKir, characterized in Levin & Holt, 2012) in contrast to HCN channels, which open over ~100 ms. We added panel C to show that the activation of IKir is visible in type II hair cells but not in the knockout type I hair cells that lack gK,L. IKir was a reliable cue to distinguish type I and type II hair cells in the knockout.

      For our actual measurements in Fig 6B, we quantified the current flowing after 250 ms at –124 mV because we did not pharmacologically separate IKir and IH.

      Could the XE991-sensitive current be activated and contributing?

      The XE991-sensitive current could decay (rapidly) at the onset of the hyperpolarizing step, but was not contributing to our measurement of IKir­ and IH, made after 250 ms at –124 mV, at which point any low-voltage-activated (LVA) outward rectifiers have deactivated. Additionally, the LVA XE991-sensitive currents were rare (only detected in some striolar type I hair cells) and when present did not compete with fast IKir, which is only found in type II hair cells.

      Also, did the inward rectifier conductances sustain any outward conductance at more depolarized voltage steps? 

      For the KV1.8-null mice specifically, we cannot answer the question because we did not use specific blocking agents for inward rectifiers.  However, we expect that there would only be sustained outward IR currents at voltages between EK and ~-60 mV: the foot of IKir’s I-V relation according to published data from mouse utricular hair cells – e.g., Holt and Eatock 1995, Rusch and Eatock 1996, Rusch et al. 1998, Horwitz et al., 2011, etc.  Thus, any such current would be unlikely to contaminate the residual outward rectifiers in Kv1.8-null animals, which activate positive to ~-60 mV. 

      (I-HCN is also not a problem, because it could only be outward positive to its reversal potential at ~-40 mV, which is significantly positive to its voltage activation range.)

    1. Author response:

      The following is the authors’ response to the original reviews.

      We edited the manuscript for clarity, added information described in new figure panels (below) and corrected typos.

      In figure 1 we corrected a typo.

      In figure 2, panel 2H, and Figure S2E, we included a new statistical analysis (mixed effect linear regression) to compare mutational burden in controls and AD patients.

      In figure 3, and Figure S4B, we revised the western blots panels in Panel 3E,F, to improve presentation of controls and quantification.

      we corrected typos.

      In figure 5 we removed a panel (former 5D) which did not add useful information.

      In Figure S1A we included information about sex and age from the control and patients analyzed. In Figure S2B, we added an analysis of the mutational burden in controls, distinguishing controls with and without cancer.

      We modified Table S1 for completeness of information for all samples analyzed.

      Reviewer #1:

      Weaknesses: 

      Even though the study is overall very convincing, several points could help to connect the seen somatic variants in microglia more with a potential role in disease progression. The connection of P-SNVs in the genes chosen from neurological disorders was not further highlighted by the authors. 

      All P-SNVs are reported in Table S3.

      We observed only two P-SNVs within genes associated to neurological disorders (brain panel in Table S2). - SQSTM1 (p.P392L) was identified in blood but not in brain from the patient AD48A.

      - OPTN was identified (p.Q467P) in PU.1 from control 25.   

      To highlight this point, we modified the first paragraph of the discussion as follow:

      “We report here that microglia from a cohort of 45 AD patients with intermediate-onset sporadic AD (mean age 65 y.o) is enriched for clones carrying pathogenic/oncogenic variants in genes associated with clonal proliferative disorders (Supplementary Table 2) in comparison to 44 controls. Of note we did not observe microglia P-SNVs within genes reported to be associated with neurological disorders (Supplementary Table 2) in patients, and one such variant was identified in a control (Supplementary Table 3) “.

      The authors show in snRNA-seq data that a disease-associated microglia state seems to be enriched in patients with somatic variants in the CBL ring domain, however, this analysis could be deepened. For example, how this knowledge may translate to patient benefits when the relevant cell populations appear concentrated in a single patient sample (Figure 5; AD52) is unclear; increasing the analyzed patient pool for Figure 5 and showcasing the presence of this microglia state of interest in a few more patients with driving mutations for CBL or other MAPK pathway associated mutations would lend their hypotheses further credibility. 

      We acknowledge this limitation, but we respectfully submit that the analysis was performed in 2 patients. AD 53 also show a MAPK-associated inflammatory signature in the microglia clusters associated with mutations.

      We performed the analysis on all FACS-purified PU.1+ nuclei samples that passed QC for single nuclei RNAseq. It should be noted that this analysis is extremely difficult with current technologies because microglia nuclei need to be fixed for PU.1 staining and FACS purification and the clones are small (~1% of microglia).

      A potential connection between P-SNVs in microglia and disease pathology and symptoms was not further explored by the authors. 

      At the population level, Braak/CERAD scores, the presence of Lewy bodies, amyloid angiopathy, tauopathy, or alpha synucleinopathy were not different between AD patients with or without pathogenic microglial clones (Figure S3 and Table S1). Of note, we studied here a homogenous population of AD patients.

      At the tissue level, the roles of mutant microglia in plaques for example is being investigated, but we do not have results to present at this time.

      A recent preprint (Huang et al., 2024) connected the occurrence of somatic variants in genes associated with clonal hematopoiesis in microglia in a large cohort of AD patients, this study is not further discussed or compared to the data in this manuscript. 

      This pre-print supports the high frequency of detection of oncogenic variants associated with clonal proliferative disorders, they hypothesize that the mutations may be associated with microglia, but they only check a few mutations in purified microglia. Most of the study is performed in whole brain tissue. It does not really bring new information as compared to other study we cite in the introduction (and to our manuscript).

      Reviewer #2 (Recommendations For The Authors): 

      Suggestions for improved or additional experiments, data, or analyses: 

      The authors can demonstrate that identified pathological SNVs from their AD cohort also lead to the activation of human microglia-like cells in vitro, but do not provide any data from histological examination of the patient cohort (e.g. accumulation at the plaque site, microglia distribution, and cell number). The study could be further supported by providing a histological examination of patients with and without P-SNVs to identify if microglia response to pathology, microglia accumulation, or phagocytic capacity are altered in these patients. 

      We performed IBA1 staining in brain samples from control and from AD patients, with or without microglial clones and microglia density was not different between patient with and without mutations. In addition, histological reports from the brain bank (Braak/CERAD scores, Lewis bodies, amyloid angiopathy, tauopathy, or alpha synucleinopathy did not suggest differences between patient with and without mutations (Figure S3). These results are preliminary and further investigations are ongoing.

      It would have been interesting to see if for example, transgenic AD mice with an introduced somatic mutation in microglia show an altered disease progression with alterations in amyloid pathology or cognition. 

      We agree with the reviewer. We performed an in vivo study with mice expressing a  5xFAD transgene, an inducible microglia Cx3cr1CreERt2 BrafLSL-V600E transgene, or both, and performed survival, behavioral (Y-Maze and Novel Object Recognition), and histological analyses for β-Amyloid, p-Tau and Iba1 staining.

      Microgliosis was increased in the group with the 2 transgenes, however the phenotype associated with the expression of a BrafV600E allele in microglia (Mass et al Nature 2017) was strongly dominant over the phenotype of 5xFAD mice, which did not allow us to conclude on survival and behavioral analyses.

      Other studies with different transgenes are in progress but we have no results yet to include in this revised manuscript.

      To connect the somatic mutations in microglia better to a potential contribution in neurodegeneration or neurotoxicity, the authors could provide further details on how to demonstrate if human microglia-like cells respond differentially to amyloid or induce neurotoxicity in a co-culture or slice culture model. 

      These studies are undertaken in the laboratory, but unfortunately, we have no results as yet to include in this revised manuscript.

      The number of samples analyzed for hippocampi, especially in the age-matched controls might be underpowered. 

      Unfortunately, despite our best efforts, we were not able to analyze more hippocampus from control individuals. To control for bias in sampling as well as to other potential bias in our analysis, we investigated the statistical analysis of the cohorts for inclusion of age as a criterion (age matched controls), inclusion of a random effect structure, and possible confounding factor such as sex, brain bank site, and samples’ anatomical location (see revised Methods and revised Fig. 2C, F, and H, and S2B).

      We first tested whether the inclusion of age is appropriate in a fixed-effects linear regression using a generalized linear model (GLM) with gaussian distribution. Compared to the baseline model, the model with age had significantly low AIC (from -66.6 to -71.9, P = 0.0067 by chi-square test). Therefore, the inclusion of age as a fixed effect is appropriate. We next tested multiple structures of mixed-effects linear modeling. We used donors as random effects, while utilizing age, disease status (neurotypical control vs. AD), or both as fixed effects. Fitting was performed using the lme function implemented in the nlme package with the maximum likelihood (ML) method. The incorporation of age and disease status significantly improved overall model fitting. Both age and AD are associated with a significant increase in SNV burden in this model (P<1x10^-4 and P=1x10^-4, respectively, by likelihood ratio test). The model's total explanatory power is substantial (conditional R^2=0.48). We also asked if the addition of potential confounding factors to the model is justified. Three factors were tested via the two above-mentioned methods: sex, brain bank site, and the anatomical location of the samples. In all cases, the AIC increased, and the P values by likelihood ratio tests were higher than 0.99. Therefore, from a statistical standpoint, the inclusion of these potential confounding factors does not seem to improve overall model fitting.

      Minor corrections to the text and figures: 

      The authors made a great effort to analyze various samples from one individual donor. One can get a bit confused by the sentence that "an average of 2.5 brains samples were analyzed for each donor". Maybe the authors could highlight more in the first paragraph of the results section and in Figure 1A, that there are multiple samples ("technical replicates") from one individual patient across different brain regions used. 

      We removed the ‘2.5’ sentence and rewrote the paragraph for clarity. Samples information’s are now displayed in Table S1.

      In the method section is a part included "Expression of target genes in microglia", it was very hard to allocate where these data from public data sets were actually used and for which analysis. Maybe the authors could clarify this again. 

      AU response: we apologize and corrected the paragraph in the methods (page 6) as follow: “ Expression of target genes in microglia. To evaluate the expression levels of the genes identified in this study as target of somatic variants, we consulted a publicly available database (https://www.proteinatlas.org/), and also plotted their expression as determined by RNAseq in 2 studies (Galatro et al. GSE99074 33, and Gosselin et al. 34) (Table S3 and Figure S2). For data from Galatro et al. (GSE99074) 33, normalized gene expression data and associated clinical information of isolated human microglia (N = 39) and whole brain (N = 16) from healthy controls were downloaded from GEO. For data from Gosselin et al. 34, raw gene expression ­data and associated clinical information of isolated microglia (N = 3) and whole brain (N = 1) from healthy controls were extracted from the original dataset. Raw counts were normalized using the DESeq2 package in R 35.”

      Table S3 is very informative, but also very complex. The reader could maybe benefit a lot from this table if it can be structured a bit easier especially when it comes to identifying P-SNVs and in which tissue sample they were found and if this was the same patient. The sorting function on top of the columns helps, but the color coding is a bit unclear. 

      Despite our best efforts we agree that the table, which contain all sequencing data for all samples, is complex. The color coding (red) only highlights the presence of pathogenic mutation.

      Reviewer #3 (Recommendations For The Authors): 

      This is a well-done study of an important problem. I present the following minor critiques: 

      At the bottom of Page 4 and into the top of Page 5, the authors state that 66 of the 826 variants identified in their panel sequencing experiment were found in multiple donors. Then the authors proceed to analyze the remaining 760 variants. It seems that the authors concluded that these multi-donor mosaics were artifacts, which is why they were excluded from further analysis. I think this is a reasonable assumption, but it should be stated explicitly so it is clear to the reader. Complicating this assumption, however, the authors later state that one of their CBL variants was found in two donors, and it is treated as a true mosaic. The authors should make it clear whether recurrent variants were filtered out of any given analysis. It remains possible that all recurrent variants are true mosaics that occurred in multiple donors. The authors should do a bit more to characterize these recurrent variants. Are they observed in the human population using a database like gnomAD, which, together with their recurrence, would strongly suggest they are germline variants? Are they in MAPK genes, or otherwise relevant to the study?

      We apologize for the confusion. Our original intent for the ddPCR validation of variants (Figure 1E) was to count only 1 ‘unique’ variant for variants found for example in 1 brain sample and in the blood from the same patient, or in 2 brain regions from one patient, in order to avoid the criticism of overinflating our validation rate. This was notably the case for TET2 and DNMT3 variants. For example, validation of a TET2 variant found in 2 different brain areas and blood of the same donor is counted as 1 and not 3. We did not eliminate these variants from the analysis as they passed the criteria for somatic variants as presented in Methods.

      In contrast, when a specific variant was found and validated in two different donors, we counted it as 2.

      The characterization of variants included multiple parameters and databases, including for example AF and gnomAD, as indicated in Methods and reported in Table S3.

      All ddPCR results can be found at the end of Table S3.

      Figure 2B labels age-matched controls as "C", but Figure 2C labels age-matched controls as AM-C. Labels should be consistent throughout the manuscript. 

      We corrected this in the revised version.

      It is not clear if the "p:0.02" label in Figure 2F is referring to AM-C Cx vs. AD-Cx or AM-C vs. AD. Please clarify. 

      We apologize for the confusion, and we corrected the legend. The calculated p value is for the comparison between Cortex from Controls (age-matched) and the Cortex from AD.

      On Page 7, the authors state, "The allelic frequencies at which MAPK activating variants are detected in brain samples from AD patients range from ~1-6% of microglia (Fig. 3G), which correspond to clones representing 2 to 12% of mutant microglia in these samples, assuming heterozygosity." I understand what the authors mean here but I think it's a bit confusingly stated. I suggest something like "The allelic frequencies at which MAPK activating variants are detected in brain samples from AD patients range from ~1-6% in microglia (Figure 3G), which correspond to mutant clones representing 2 to 12% of all microglia in these samples, assuming heterozygosity." 

      We thank the reviewer for this suggestion and re-wrote that sentence.

      Is there any evidence that the transcriptional regulators mutated in AD microglia (MED12, SETD2, MLL3, DNMT3A, ASXL1, etc.) are involved in regulating MAPK genes? This would tie these mutations into the broader conclusions of the paper. 

      This is a very interesting question, and indeed published studies indicate that some of the transcriptional /epigenetic regulators regulate expression of MAPK genes. However, in the absence of experimental evidence in microglia and patients, the argument may be too speculative to be included.

      Do the authors have any thoughts as to whether germline variants in CBL are linked to AD? If not, why do they think germline mutations in CBL are not relevant to AD? 

      This is also a very interesting question. As indicated in our manuscript, germline mutations in CBL (and other member of the classical MAPK genes, see Figure 3C) cause early onset (pediatric) and severe developmental diseases known as RASopathies, characterized by multiple developmental defects, and associated with frequent neurological and cognitive deficits.

      It is possible that some other (and more frequent?) germline variants may be associated with a late-onset brain restricted phenotype, but we did not find germline pSNV in our patients. GWAS studies may be more appropriate to test this hypothesis.

      Do any donors show multiple variants? I don't think this is addressed in the text. 

      We do find donors with multiple variants (see Figure 3D and Figure S3), however at this stage, we did not perform single nuclei genotyping to investigate whether they are part of the same clone.

      Figure S3 appears to be upside down. 

      This was corrected

      Figure 5C should have some kind of label telling the reader what gene set is being depicted. 

      We added this information above the panel (it was in the corresponding legend).

      At the top of Page 12, Lewy bodies are written as Lewis bodies. 

      This was corrected

      Many control donors died of cancer (Table S1). Is there any information on which, if any, chemotherapeutics or radiation these patients received? Might this impact the somatic mutation burden? The authors should compare controls with and without cancer or with and without cancer treatments to rule this out. 

      As suggested by the reviewer, we analyzed the mutational load of age-matched controls with and without cancer (revised Figure S2B). As expected, we saw an increase in the mutational load in controls with cancer, particularly in their blood. This information was added in the result section.

      This is most likely associated with the treatments received as well as possible cancer clones.

      The formatting for Table S3 is odd. Multiple different fonts are used (this is also seen in Table S5). Column Q has no column ID. The word "panel" is spelled "pannel." The word "expressed" is spelled "expressd" in one of the worksheet labels. Columns BG-BN in the ALL-SNV worksheet are blank but seemingly part of the table. 

      We fixed this error in Table S3.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their constructive reviews.  Taken together, the comments and suggestions from reviewers made it clear that we needed to focus on improving the clarity of the methods and results.  We have revised the manuscript with that in mind.  In particular, we have restructured the results to make the logic of the manuscript clearer and we have added details to the methods section.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      The work of Muller and colleagues concerns the question of where we place our feet when passing uneven terrain, in particular how we trade-off path length against the steepness of each single step. The authors find that paths are chosen that are consistently less steep and deviate from the straight line more than an average random path, suggesting that participants indeed trade-off steepness for path length. They show that this might be related to biomechanical properties, specifically the leg length of the walkers. In addition, they show using a neural network model that participants could choose the footholds based on their sensory (visual) information about depth. 

      Strengths: 

      The work is a natural continuation of some of the researchers' earlier work that related the immediately following steps to gaze [17]. Methodologically, the work is very impressive and presents a further step forward towards understanding real-world locomotion and its interaction with sampling visual information. While some of the results may seem somewhat trivial in hindsight (as always in this kind of study), I still think this is a very important approach to understanding locomotion in the wild better. 

      Weaknesses: 

      The manuscript as it stands has several issues with the reporting of the results and the statistics. In particular, it is hard to assess the inter-individual variability, as some of the data are aggregated across individuals, while in other cases only central tendencies (means or medians) are reported without providing measures of variability; this is critical, in particular as N=9 is a rather small sample size. It would also be helpful to see the actual data for some of the information merely described in the text (e.g., the dependence of \Delta H on path length). When reporting statistical analyses, test statistics and degrees of freedom should be given (or other variants that unambiguously describe the analysis).

      There is only one figure (Figure 6) that shows data pooled over subjects and this is simply to illustrate how the random paths were calculated. The actual paths generated used individual subject data. We don’t draw our conclusions from these histograms – they are instead used to generate bounds for the simulated paths.  We have made clear both in the text and in the figure legends when we have plotted an example subject. Other plots show the individual subject data. We have given the range of subject medians as well as the standard deviation for data illustrated in Figure (random vs chosen), we have also given the details of the statistical test comparing the flatness of the chosen paths versus the randomly generated paths.  We have added two supplemental figures to show individual walker data more directly: (Fig. 14) the per subject histograms of step parameters, (Fig. 18) the individual subject distributions for straight path slopes and tortuosity.

      The CNN analysis chosen to link the step data to visual sampling (gaze and depth features) should be motivated more clearly, and it should describe how training and test sets were generated and separated for this analysis.

      We have motivated the CNN analysis and moved it earlier in the manuscript to help clarify the logic the manuscript. Details of the training and test are now provided, and the data have been replotted. The values are a little different from the original plot after making a correction in the code, but the conclusions drawn from this analysis are unchanged. This analysis simply shows that there is information in the depth images from the subject’s perspective that a network can use to learn likely footholds. This motivates the subsequent analysis of path flatness.

      There are also some parts of figures, where it is unclear what is shown or where units are missing. The details are listed in the private review section, as I believe that all of these issues can be fixed in principle without additional experiments. 

      Several of the Figures have been replotted to fix these issues.

      Reviewer #2 (Public Review): 

      Summary: 

      This manuscript examines how humans walk over uneven terrain using vision to decide where to step. There is a huge lack of evidence about this because the vast majority of locomotion studies have focused on steady, well-controlled conditions, and not on decisions made in the real world. The author team has already made great advances in this topic, but there has been no practical way to map 3D terrain features in naturalistic environments. They have now developed a way to integrate such measurements along with gaze and step tracking, which allows quantitative evaluation of the proposed trade-offs between stepping vertically onto vs. stepping around obstacles, along with how far people look to decide where to step. 

      Strengths: 

      (1) I am impressed by the overarching outlook of the researchers. They seek to understand human decision-making in real-world locomotion tasks, a topic of obvious relevance to the human condition but not often examined in research. The field has been biased toward well-controlled studies, which have scientific advantages but also serious limitations. A well-controlled study may eliminate human decisions and favor steady or periodic motions in laboratory conditions that facilitate reliable and repeatable data collection. The present study discards all of these usually-favorable factors for rather uncontrolled conditions, yet still finds a way to explore real-world behaviors in a quantitative manner. It is an ambitious and forward-thinking approach, used to tackle an ecologically relevant question. 

      (2) There are serious technical challenges to a study of this kind. It is true that there are existing solutions for motion tracking, eye tracking, and most recently, 3D terrain mapping. However most of the solutions do not have turn-key simplicity and require significant technical expertise. To integrate multiple such solutions together is even more challenging. The authors are to be commended on the technical integration here.

      (3) In the absence of prior studies on this issue, it was necessary to invent new analysis methods to go with the new experimental measures. This is non-trivial and places an added burden on the authors to communicate the new methods. It's harder to be at the forefront in the choice of topic, technical experimental techniques, and analysis methods all at once. 

      Weaknesses: 

      (1) I am predisposed to agree with all of the major conclusions, which seem reasonable and likely to be correct. Ignoring that bias, I was confused by much of the analysis. There is an argument that the chosen paths were not random, based on a comparison of probability distributions that I could not understand. There are plots described as "turn probability vs. X" where the axes are unlabeled and the data range above 1. I hope the authors can provide a clearer description to support the findings. This manuscript stands to be cited well as THE evidence for looking ahead to plan steps, but that is only meaningful if others can understand (and ultimately replicate) the evidence. 

      We have rewritten the manuscript with the goal of clarifying the analyses, and we have re-labelled the offending figure.

      (2) I wish a bit more and simpler data could be provided. It is great that step parameter distributions are shown, but I am left wondering how this compares to level walking.  The distributions also seem to use absolute values for slope and direction, for understandable reasons, but that also probably skews the actual distribution. Presumably, there should be (and is) a peak at zero slope and zero direction, but absolute values mean that non-zero steps may appear approximately doubled in frequency, compared to separate positive and negative. I would hope to see actual distributions, which moreover are likely not independent and probably have a covariance structure. The covariance might help with the argument that steps are not random, and might even be an easy way to suggest the trade-off between turning and stepping vertically. This is not to disregard the present use of absolute values but to suggest some basic summary of the data before taking that step. 

      We have replotted the step parameter distributions without absolute values. Unfortunately, the covariation of step parameters (step direction and step slope) is unlikely to help establish this tradeoff.  Note that the primary conclusion of the manuscript is that works make turns to keep step slope low (when possible). Thus, any correlation that might exist between goal direction and step slope would be difficult to interpret without a direct comparison to possible alternative paths (as we have done in this paper). As such we do not draw our conclusions from them.  We use them primarily to generate plausible random paths for comparison with the chosen paths.  We have added two supplementary figures including distributions (Fig 15) and covariation of all the step parameters discussed in the methods (Fig 16).

      (3) Along these same lines, the manuscript could do more to enable others to digest and go further with the approach, and to facilitate interpretability of results. I like the use of a neural network to demonstrate the predictiveness of stepping, but aside from above-chance probability, what else can inform us about what visual data drives that?

      The CNN analysis simply shows that the information is there in the image from the subject’s viewpoint and is used to motivate the subsequent analysis.  As noted above, we have generally tried to improve the clarity of the methods.

      Similarly, the step distributions and height-turn trade-off curves are somewhat opaque and do not make it easy to envision further efforts by others, for example, people who want to model locomotion. For that, clearer (and perhaps) simpler measures would be helpful. 

      We have clarified the description of these plots in the main text and in the methods.  We have also tried to clarify why we made the choices that we did in measuring the height-turn trade-off and why it is necessary in order to make a fair comparison.

      I am absolutely in support of this manuscript and expect it to have a high impact. I do feel that it could benefit from clarification of the analysis and how it supports the conclusions. 

      Reviewer #3 (Public Review): 

      Summary: 

      The systematic way in which path selection is parametrically investigated is the main contribution. 

      Strengths: 

      The authors have developed an impressive workflow to study gait and gaze in natural terrain. 

      Weaknesses: 

      (1) The training and validation data of the CNN are not explained fully making it unclear if the data tells us anything about the visual features used to guide steering. It is not clear how or on what data the network was trained (training vs. validation vs. un-peeked test data), and justification of the choices made. There is no discussion of possible overfitting. The network could be learning just e.g. specific rock arrangements. If the network is overfitting the "features" it uses could be very artefactual, pixel-level patterns and not the kinds of "features" the human reader immediately has in mind. 

      The CNN analysis has now been moved earlier in the manuscript to help clarify its significance and we have expanded the description of the methods. Briefly, it simply indicates that there is information in the depth structure of the terrain that can be learned by a network. This helps justify the subsequent analyses.  Importantly, the network training and testing sets were separated by terrain to ensure that the model was being tested on “unseen” terrain and avoid the model learning specific arrangements.  This is now clarified in the text.

      (2) The use of descriptive terminology should be made systematic. 

      Specifically, the following terms are used without giving a single, clear definition for them: path, step, step location, foot plant, foothold, future foothold, foot location, future foot location, foot position. I think some terms are being used interchangeably. I would really highly recommend a diagrammatic cartoon sketch, showing the definitions of all these terms in a single figure, and then sticking to them in the main text. 

      We have made the language more systematic and clarified the definition of each term (see Methods). Path refers to the sequence of 5 steps. Foothold is where the foot was placed in the environment. A step is the transition from one foothold to the next.

      (3) More coverage of different interpretations / less interpretation in the abstract/introduction would be prudent.  The authors discuss the path selection very much on the basis of energetic costs and gait stability. At least mention should be given to other plausible parameters the participants might be optimizing (or that indeed they may be just satisficing). That is, it is taken as "given" that energetic cost is the major driver of path selection in your task, and that the relevant perception relies on internal models. Neither of these is a priori obvious nor is it as far as I can tell shown by the data (optimizing other variables, satisficing behavior, or online "direct perception" cannot be ruled out). 

      The abstract has been substantially rewritten.  We have adjusted our language in the introduction/discussion to try to address this concern.

      Recommendations for the authors:

      Reviewing Editor comments 

      You will find a full summary of all 3 reviews below. In addition to these reviews, I'd like to highlight a few points from the discussion among reviewers. 

      All reviewers are in agreement that this study has the potential to be a fundamental study with far-reaching empirical and practical implications. The reviewers also appreciate the technical achievements of this study. 

      At the same time, all reviewers are concerned with the overall lack of clarity in how the results are presented. There are a considerable number of figures that need better labeling, text parts that require clearer definitions, and the description of data collection and analysis (esp. with regard to the CNN) requires more care. Please pay close attention to all comments related to this, as this was the main concern that all reviewers shared. 

      At a more specific level, the reviewers discussed the finding around leg length, and admittedly, found it hard to believe, in short: "extraordinary claims need strong evidence". It would be important to strengthen this analysis by considering possible confounds, and by including a discussion of the degree of conviction. 

      We have weakened the discussion of this finding and provided some an additional analyses in a supplemental figure (Figure 17) to help clarify the finding.

      Reviewer #1 (Recommendations For The Authors): 

      First, let me apologize for the long delay with this review. Despite my generally positive evaluation (see public review), I have some concerns about the way the data are presented and questions about methodological details. 

      (1) Representation of results: I find it hard to decipher how much variability arises within an individual and how much across individuals. For example, Figure 7b seems to aggregate across all individuals, while the analysis is (correctly) based on the subject medians.

      Figure 7b That figure was just one subject. This is now clarified.

      It would be good to see the distribution of all individuals (maybe use violin plots for each observer with the true data on one side and the baseline data on the other, or simple histograms for each). To get a feeling for inter-individual and intra-individual variability is crucial, as obviously (see the leg-length analysis) there are larger inter-individual differences and representations like these would be important to appreciate whether there is just a scaling of more or less the same effect or whether there are qualitative differences (especially in the light of N=9 being not a terribly huge sample size). 

      The medians for the individual subjects are now provided with the standard deviations between subjects to indicate the extent of individual differences. Note that the random paths were chosen from the distribution of actual step slopes for that subject as one of the constraints. This makes the random paths statistically similar to the chosen paths with the differences only being generated by the particular visual context. Thus the test for a difference between chosen and random is quite conservative

      Similarly, seeing \DeltaH plotted as a function of steps in the path as a figure rather than just having the verbal description would also help. 

      To simplify the discussion of our methods/results we have removed the analyses that examine mean slope as a function of steps.  Because of the central limit theorem the slopes of the chosen paths remain largely unchanged regardless of the choice path length.  The slopes of the simulated paths are always larger irrespective of the choice of path length.

      (2) Reporting the statistical analyses: This is related to my previous issue: I would appreciate it if the test statistics and degrees-of-freedom of the statistical tests were given along with the p-values, instead of only the p-values. This at some points would also clarify how the statistics were computed exactly (e.g., "All subjects showed comparable difference and the difference in medians evaluated across subjects was highly significant (p<<0.0001).", p.10, is ambiguous to me). 

      Details have been added as requested.

      (3) Why is the lower half ("tortuosity less than the median tortuosity") of paths used as "straight" rather than simply the minimum of all viable paths)?

      The benchmark for a straight path is somewhat arbitrary. Using the lower half rather than the minimum length path is more conservative.

      (4) For the CNN analysis, I failed to understand what was training and what was test set. I understand that the goal is to predict for all pixels whether they are a potential foothold or not, and the AUC is a measure of how well they can be discriminated based on depth information and then this is done for each image and the median over all images taken. But on which data is the CNN trained, and on which is it tested? Is this leave-n-out within the same participant? If so, how do you deal with dependencies between subsequent images? Or is it leave-1-out across participants? If so, this would be more convincing, but again, the same image might appear in training and test. If the authors just want to ask how well depth features can discriminate footholds from non-footholds, I do not see the benefit of a supervised method, which leaves the details of the feature combinations inside a black box. Rather than defining the "negative set" (i.e., the non-foothold pixels) randomly, the simulated paths could also be used, instead. If performance (AUC) gets lower than for random pixels, this would confirm that the choice of parameters to define a "viable path" is well-chosen. 

      This has been clarified as described above.

      Minor issues: 

      (5) A higher tortuosity would also lead a participant to require more steps in total than a lower tortuosity. Could this partly explain the correlation between the leg length and the slope/tortuosity correlation? (Longer legs need fewer steps in total, thus there might be less tradeoff between \Delta H and keeping the path straight (i.e., saving steps)). To assess this, you could give the total number of steps per (straight) distance covered for leg length and compare this to a flat surface.

      The calculations are done on an individual subject basis and the first and last step locations are chosen from the actual foot placements, then the random paths are generated between those endpoints. The consequence of this is that the number of steps is held constant for the analysis.  We have clarified the methods for this analysis to try to make this more clear.

      (6) As far as I understand, steps happen alternatingly with the two feet. That is, even on a flat surface, one would not reach 0 tortuosity. In other words, does the lateral displacement of the feet play a role (in particular, if paths with even and paths with odd number of steps were to be compared), and if so, is it negligible for the leg-length correlation? 

      All the comparisons here are done for 5 step sequences so this potential issue should not affect the slope of the regression lines or the leg length correlation.

      (7) Is there any way to quantify the quality of the depth estimates? Maybe by taking an actual depth image (e.g., by LIDAR or similar) for a small portion of the terrain and comparing the results to the estimate? If this has been done for similar terrain, can a quantification be given? If errors would be similar to human errors, this would also be interesting for the interpretation of the visual sampling data.

      Unfortunately, we do not have the ground truth depth image from LIDAR.  When these data were originally collected, we had not imagined being able to reconstruct the terrain.  However, we agree with the reviewers that this would be a good analysis to do. We plan to collect LIDAR in future experiments. 

      To provide an assessment of quality for these data in the absence of a ground truth depth image, we have performed an evaluation of the reliability of the terrain reconstruction across repeats of the same terrain both between and within participants.  We have expanded the discussion of these reliability analyses in the results section entitled “Evaluating Terrain Reconstruction”, as well as in the corresponding methods section (see Figure 10).

      (8) The figures are sometimes confusing and a bit sloppy. For example, in Figure 7a, the red, cyan, and green paths are not mentioned in the caption, in Figure 8 units on the axes would be helpful, in Figure 9 it should probably be "tortuosity" where it now states "curviness". 

      These details have been fixed.

      (9) I think the statement "The maximum median AUC of 0.79 indicates that the 0.79 is the median proportion of pixels in the circular..." is not an appropriate characterization of the AUC, as the number of correctly classified pixels will not only depend on the ROC (and thus the AUC), but also on the operating point chosen on the ROC (which is not specified by the AUC alone). I would avoid any complications at this point and just characterize the AUC as a measure of discriminability between footholds and non-footholds based on depth features. 

      This has been fixed.

      (10) Ref. [16]is probably the wrong Hart paper (I assume their 2012 Exp. Brain Res. [https://doi.org/10.1007/s00221-012-3254-x] paper is meant at this point) 

      Fixed

      Typos (not checked systematically, just incidental discoveries): 

      (11) "While there substantial overlap" (p.10) 

      (12) "field.." (p.25) 

      (13) "Introduction", "General Discussion" and "Methods" as well as some subheadings are numbered, while the other headings (e.g., Results) are not. 

      Fixed

      Reviewer #2 (Recommendations For The Authors): 

      The major suggestions have been made in the Public Review. The following are either minor comments or go into more detail about the major suggestions. All of these comments are meant to be constructive, not obstructive. 

      Abstract. This is well written, but the main conclusions "Walkers avoid...This trade off is related...5 steps ahead" sound quite qualitative. They could be strengthened by more specificity (NOT p-values), e.g. "positive correlation between the unevenness of the path straight ahead and the probability that people turned off that path." 

      The abstract has been substantially rewritten.

      P. 5 "pinning the head position estimated from the IMU to the Meshroom estimates" sounds like there are two estimates. But it does not sound like both were used. Clarify, e.g. the Meshroom estimate of head position was used in place of IMU? 

      Yes that’s correct.  We have clarified this in the text.

      Figure 5. I was confused by this. First, is a person walking left to right? When the gaze position is shown, where was the eye at the time of that gaze? There are straight lines attached to the blue dots, what do they represent? The caption says gaze is directed further along the path, which made me guess the person is walking right to left, and the line originates at the eye. Except the origins do not lie on or close to the head locations. There's also no scale shown, so maybe I am completely misinterpreting. If the eye locations were connected to gaze locations, it would help to support the finding that people look five steps ahead of where they step. 

      We have updated the figure and clarified the caption to remove these confusions.  There was a mistake in the original figure (where the yellow indicated head locations, we had plotted the center of mass and the choice of projection gave the incorrect impression that the fixations off the path, in blue, were separated from the head).

      The view of the data is now presented so the person is walking left to right and with a projection of the head location (orange), gaze locations (blue or green) and feet (pink).

      Figure 6. As stated in the major comments, the step distributions would be expected to have a covariance structure (in terms of raw data before taking absolute values). It would be helpful to report the covariances (6 numbers). As an example of a simple statistical analysis, a PCA (also based on a data covariance) would show how certain combinations of slope/distance/direction are favored over others. Such information would be a simple way to argue that the data are not completely random, and may even show a height-turn trade-off immediately. (By the way, I am assuming absolute values are used because the slopes and directions are only positive, but it wasn't clear if this was the definition.) A reason why covariances and PCA are helpful is that such data would be helpful to compute a better random walk, generated from dynamics. I believe the argument that steps are not random is not served by showing the different histograms in Figure 7, because I feel the random paths are not fairly produced. A better argument might draw randomly from the same distribution as the data (or drive a dynamical random walk), and compare with actual data. There may be correlations present in the actual data that differ from random. I could be mistaken, because it is difficult or impossible to draw conclusions from distributions of absolute values, or maybe I am only confused. In any case, I suspect other readers will also have difficulty with this section. 

      This has been addressed above in the major comments.

      p. 9, "average step slope" I think I understand the definition, but I suggest a diagram might be helpful to illustrate this.

      There is a diagram of a single step slope in Figure 6 and a diagram of the average step slope for a path segment in Figure 12.

      Incidentally, the "straight path slope" is not clearly defined. I suspect "straight" is the view from above, i.e. ignoring height changes. 

      Clarified

      p. 11 The tortuosity metric could use a clearer definition. Should I interpret "length of the chosen path relative to a straight path" as the numerator and denominator? Here does "length" also refer to the view from above? Why is tortuosity defined differently from step slope? Couldn't there be an analogue to step slope, except summing absolute values of direction changes? Or an analogue to tortuosity, meaning the length as viewed from the side, divided by the length of the straight path? 

      We followed the literature in the definition of tortuosity.  We have clarified the definition of tortuosity in the methods, but yes, you can interpret the length of the chosen path relative to a straight path, as the numerator and denominator, and length refers to 3D length.  We agree that there are many interesting ways to look at the data but for clarity we have limited the discussion to a single definition of tortuosity in this paper.

      Figure 8 could use better labeling. On the left, there is a straight path and a more tortuous path, why not report the metrics for these? On the right, there are nine unlabeled plots. The caption says "turn probability vs. straight path slope" but the vertical axis is clearly not a probability. Perhaps the axis is tortuosity? I presume the horizontal axis is a straight path slope in degrees, but this is not explained. Why are there nine plots, is each one a subject? I would prefer to be informed directly instead of guessing. (As a side note, I like the correlations as a function of leg length, it is interesting, even if slightly unbelievable. I go hiking with people quite a bit shorter and quite a lot taller than me, and anecdotally I don't think they differ so much from each other.) 

      We have fixed Figure 8 which shows the average “mean slope” as a function of tortuosity.  We have added a supplemental figure which shows a scatter plot of the raw data (mean slope vs. tortuosity for each path segment).  

      Note that when walking with friends other factors (e.g. social) will contribute to the cost function. As a very short person my experience is that it is a problem. In any case, the data are the data, whatever the underlying reasons. It does not seem so surprising that people of different heights make different tradeoffs. We know that the preferred gait depends on individual’s passive dynamics as described in the paper, and the terrain will change what is energetically optimal as described in the Darici and Kuo paper.

      Figure 9 presumably shows one data point per subject, but this isn't clear. 

      The correlations are reported per subject, and this has been clarified. 

      p. 13 CNN. I like this analysis, but only sort of. It is convincing that there is SOME sort of systematic decision-making about footholds, better than chance. What it lacks is insight. I wonder what drives peoples' decisions. As an idle suggestion, the AlexNet (arXiv: Krizhevsky et al.; see also A. Karpathy's ConvNETJS demo with CIFAR-10) showed some convolutional kernels to give an idea of what the layers learned. 

      Further exploration of CNN’s would definitely be interesting, but it is outside the scope of the paper. We use it simply to make a modest point, as described above.

      p. 15 What is the definition of stability cost? I understand energy cost, but it is unclear how circuitous paths have a higher stability cost. One possible definition is an energetic cost having to do with going around and turning. But if not an energy cost, what is it? 

      We meant to say that the longer and flatter paths are presumably more stable because of the smaller height changes. You are correct that we can’t say what the stability cost is and we have clarified this in the discussion.

      p. 16 "in other data" is not explained or referenced.

      Deleted 

      p. 10 5 step paths and p. 17 "over the next 5 steps". I feel there is very little information to really support the 5 steps. A p-value only states the significance, not the amount of difference. This could be strengthened by plotting some measures vs. the number of steps ahead. For example, does a CNN looking 1-5 steps ahead predict better than one looking N<5 steps ahead? I am of course inclined to believe the 5 steps, but I do not see/understand strong quantitative evidence here. 

      We have weakened the statements about evidence for planning 5 steps ahead.

      p. 25 CNN. I did not understand the CNN. The list of layers seems incomplete, it only shows four layers. The convolutional-deconvolutional architecture is mentioned as if that is a common term, which I am unfamiliar with but choose to interpret as akin to encoder-decoder. However, the architecture does not seem to have much of a bottleneck (25x25x8 is not greatly smaller than 100x100x4), so what is the driving principle? It's also unclear how the decoder culminates, does it produce some m x m array of probabilities of stepping, where m is some lower dimension than the images? It might be helpful also to illustrate the predictions, for example, show a photo of the terrain view, along with a probability map for that view. I would expect that the reader can immediately say yes, I would likely step THERE but not there. 

      We have clarified the description of the CNN. An illustration is shown in Figure 11.

      Reviewer #3 (Recommendations For The Authors): 

      (This section expands on the points already contained in the Public Review). 

      Major issues 

      (1) The training and validation data of the CNN are not explained fully making it unclear if the data tells us anything about the visual features used to guide steering. A CNN was used on the depth scenes to identify foothold locations in the images. This is the bit of the methods and the results that remains ambiguous, and the authors may need to revisit the methods/results. It is not clear how or on what data the network was trained (training vs. validation vs. un-peeked test data), and justification of the choices made. There is no discussion of possible overfitting. The network could be learning just for example specific rock arrangements in the particular place you experimented. Training the network on data from one location and then making it generalize to another location would of course be ideal. Your network probably cannot do this (as far as I can tell this was not tried), and so the meaning of the CNN results cannot really be interpreted. 

      I really like the idea, of getting actual retinotopic depth field approximations. But then the question would be: what features in this information are relevant and useful for visual guidance (of foot placement)? But this question is not answered by your method. 

      "If a CNN can predict these locations above chance using depth information, this would indicate that depth features can be used to explain some variation in foothold selection." But there is no analysis of what features they are. If the network is overfitting they could be very artefactual, pixel-level patterns and not the kinds of "features" the human reader immediately has in mind. As you say "CNN analysis shows that subject perspective depth features are predictive of foothold locations", well, yes, with 50,000 odd parameters the foothold coordinates can be associated with the 3D pixel maps, but what does this tell us? 

      See previous discussion of these issues.

      It is true that we do not know the precise depth features used. We established that information about height changes was being used, but further work is needed to specify how the visual system does this. This is mentioned in the Discussion.

      You open the introduction with a motivation to understand the visual features guiding path selection, but what features the CNN finds/uses or indeed what features are there is not much discussed. You would need to bolster this, or down-emphasize this aspect in the Introduction if you cannot address it. 

      "These depth image features may or may not overlap with the step slope features shown to be predictive in the previous analysis, although this analysis better approximates how subjects might use such information." I do not think you can say this. It may be better to approximate the kind of (egocentric) environment the subjects have available, but as it is I do not see how you can say anything about how the subject uses it. (The results on the path selection with respect to the terrain features, viewpoint viewpoint-independent allocentric properties of the previous analyses, are enough in themselves!) 

      We have rewritten the section on the CNN to make clearer what it can and cannot do and its role in the manuscript. See previous discussion.

      (2) The use of descriptive terminology should be made systematic. Overall the rest of the methodology is well explained, and the workflow is impressive. However, to interpret the results the introduction and discussion seem to use terminology somewhat inconsistently. You need to dig into the methods to figure out the exact operationalizations, and even then you cannot be quite sure what a particular term refers to. Specifically, you use the following terms without giving a single, clear definition for them (my interpretation in parentheses): 

      foothold (a possible foot plant location where there is an "affordance"? or a foot plant location you actually observe for this individual? or in the sample?) 

      step (foot trajectory between successive step locations) 

      step location (the location where the feet are placed) 

      path (are they lines projected on the ground, or are they sequences of foot plants? The figure suggests lines but you define a path in terms of five steps. 

      foot plant (occurs when the foot comes in contact with step location?) 

      future foothold (?) 

      foot location (?) 

      future foot location (?) 

      foot position (?) 

      I think some terms are being used interchangeably here? I would really highly recommend a diagrammatic cartoon sketch, showing the definitions of all these terms in a single figure, and then sticking to them in the main text. Also, are "gaze location" and "fixation" the same? I.e. is every gaze-ground intersection a "gaze location" (I take it it is not a "fixation", which you define by event identification by speed and acceleration thresholds in the methods)? 

      We have cleaned up the language. A foothold is the location in the terrain representation (mesh) where the foot was placed. A step is the transition from one foothold to the next. A path is the sequences of 5 steps. The lines simply illustrate the path in the Figures. A gaze location is the location in the terrain representation where the walker is holding gaze still (the act of fixating). See Muller et al (2023) for further explanation.

      (3) More coverage of different interpretations / less interpretation in the abstract/introduction would be prudent. You discuss the path selection very much on the basis of energetic costs and gait stability. At least mention should be given to other plausible parameters the participants might be optimizing (or that indeed they may be just satisficing). Temporal cost (more circuitous route takes longer) and uncertainty (the more step locations you sample the more chance that some of them will not be stable) seem equally reasonable, given the task ecology / the type of environment you are considering. I do not know if there is literature on these in the gait-scene, but even if not then saying you are focusing on just one explanation because that's where there is literature to fall back on would be the thing to do. 

      Also in the abstract and introduction you seem to take some of this "for granted". E.g. you end the abstract saying "are planning routes as well as particular footplants. Such planning ahead allows the minimization of energetic costs. Thus locomotor behavior in natural environments is controlled by decision mechanisms that optimize for multiple factors in the context of well-calibrated sensory and motor internal models". This is too speculative to be in the abstract, in my opinion. That is, you take as "given" that energetic cost is the major driver of path selection in your task, and that the relevant perception relies on internal models. Neither of these is a priori obvious nor is it as far as I can tell shown by your data (optimizing other variables, satisficing behavior, or online "direct perception" cannot be ruled out). 

      We have rewritten the abstract and Discussion with these concerns in mind.

      You should probably also reference: 

      Warren, W. H. (1984). Perceiving affordances: Visual guidance of stair climbing. Journal of Experimental Psychology: Human Perception and Performance, 10(5), 683-703. https://doi.org/10.1037/0096-1523.10.5.683 

      Warren WH Jr, Young DS, Lee DN. Visual control of step length during running over irregular terrain. J Exp Psychol Hum Percept Perform. 1986 Aug;12(3):259-66. doi: 10.1037//0096-1523.12.3.259. PMID: 2943854. 

      We have added these references to the introduction.

      Minor point 

      Related to (2) above, the path selection results are sometimes expressed a bit convolutedly, and the gist can get lost in the technical vocabulary. The generation of alternative "paths" and comparison of their slope and tortuousness parameters show that the participants preferred smaller slope/shorter paths. So, as far as I can tell, what this says is that in rugged terrain people like paths that are as "flat" as possible. This is common sense so hardly surprising. Do not be afraid to say so, and to express the result in plain non-technical terms. That an apple falls from a tree is common sense and hardly surprising. Yet quantifying the phenomenon, and carefully assessing the parameters of the path that the apple takes, turned out to be scientifically valuable - even if the observation itself lacked "novelty". 

      Thanks.  We have tried to clarify the methods/results with this in mind.

    2. Reviewer #1 (Public review):

      Summary:

      The work of Muller and colleagues concerns the question where we place our feet when passing uneven terrain, in particular how we trade-off path length against the steepness of each single step. The authors find that paths are chosen that are consistently less steep and deviate from the straight line more than an average random path, suggesting that participants indeed trade off steepness for path length. They show that this might be related to biomechanical properties, specifically the leg length of the walkers. In addition, they show using a neural network model that participants could choose the footholds based on their sensory (visual) information about depth.

      Strengths:

      The work is a natural continuation of some of the researchers' earlier work that related the immediately following steps to gaze. Methodologically, the work is very impressive and presents a further step forward towards understanding real-world locomotion and its interaction with sampling visual information. While some of the results may seem somewhat trivial in hindsight (as always in this kind of studies), I still think this is a very important approach to understand locomotion in the wild better.

      Weaknesses:

      The concerns I had regarding the initial version of the manuscript have all been fixed in the current one.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We are grateful for the many positive comments. Moreover, we appreciate the recommendations to improve the manuscript; particularly, the important discussion points raised by reviewer 1 and the comments made by reviewer 2 concerning an extended quantification of how near-spike input conductances vary across individual spikes. We have performed several new detailed analyses to address reviewer 2’s comments. In particular, we now provide for all relevant postsynaptic cells the complete distributions of the excitatory and inhibitory input conductance changes that occur right before and after postsynaptic spiking, and we provide corresponding distributions of non-spiking regions as a reference. We performed these analyses separately for different baseline activity levels. Our new results largely support our previous conclusions but provide a much more nuanced picture of the synaptic basis of spiking. To the best of our knowledge, this is the first time that parallel information on input excitation, inhibition and postsynaptic spiking is provided for individual neurons in a biological network. We would argue that our new results further support the fundamental notion that even a reductionist neuronal culture model can give rise to sophisticated network dynamics with spiking – at least partially – triggered by rapid input fluctuations, as predicted by theory. Moreover, it appears that changes in input inhibition are a key mechanism to regulate spiking during spontaneous recurrent network activity. It will be exciting to test whether this holds true for neural circuits in vivo.

      In the following section, we address the reviewers’ comments individually.

      Reviewer 1:

      In this study the authors develop methods to interrogate cultured neuronal networks to learn about the contributions of multiple simultaneously active input neurons to postsynaptic activity. They then use these methods to ask how excitatory and inhibitory inputs combine to result in postsynaptic neuronal firing in a network context.

      The study uses a compelling combination of high-density multi-electrode array recordings with patch recordings. They make ingenious use of physiology tricks such as shifting the reversal potential of inhibitory inputs, and identifying inhibitory vs. excitatory neurons through their influence on other neurons, to tease apart the key parameters of synaptic connections.

      We thank the reviewer for acknowledging our efforts to develop an approach to investigate the synaptic basis of spiking in biological neurons and for appreciating the technical challenges that needed to be overcome.

      The method doesn't have complete coverage of all neurons in the culture, and it appears to work on rather low-density cultures so the size of the networks in the current study is in the low tens.

      (1) It would be valuable to see the caveats associated with the small size of the networks examined here.

      (2) It would be also helpful if there were a section to discuss how this approach might scale up, and how better network coverage might be achieved.

      These are indeed very important points that we should have discussed in more detail. Maximizing the coverage of neurons is critical to our approach, as it determines the number of potential synaptic connections that can be tested. The number of cells that we seeded onto our HD-MEA chip was chosen to achieve monolayer neuronal cultures. As detailed in ‘Materials and Methods -> Electrode selection and long-term extracellular recording of network spiking’, the entire HD-MEA chip (all 26'400 electrodes) was scanned for activity at the beginning of each experiment, and electrodes that recorded spiking activity were subsequently selected. While it is possible that some individual neurons escape detection, since they were not directly adjacent to an electrode, we estimate that a large majority of the active neurons in the culture was covered by our electrode selection method. New generations of CMOS HD-MEAs developed in our laboratory and other groups feature higher electrode densities, larger recording areas, and larger sets of electrodes that can be simultaneously recorded from (e.g., DOI:

      10.1109/JSSC.2017.2686580 & 10.1038/s41467-020-18620-4). These features will substantially improve the coverage of the network and also allow for using larger neuronal networks. As suggested by reviewer 1, we added these points to the Discussion section of the revised manuscript.

      The authors obtain a number of findings on the conditions in which the dynamics of excitatory and inhibitory inputs permit spiking, and the statistics of connectivity that result in this. This is of considerable interest, and clearly one would like to see how these findings map to larger networks, to non-cortical networks, and ideally to networks in-vivo. The suite of approaches discussed here could potentially serve as a basis for such further development.

      (3) It would be useful for the authors to suggest such approaches.

      We are confident that our suite of approaches will open important avenues to study the E & I input basis of postsynaptic spiking in other circuits beyond the in vitro cortical networks studied here. In fact, CMOS HD-MEA probes have been successfully combined with patch clamping in vivo (DIO: 10.1101/370080) and, in principle, the strategies and software tools introduced in our study would be equally applicable in an in vivo context. However, currently available in vitro CMOS HD-MEAs still surpass their in vivo counterparts (e.g., Neuropixels probes) in terms of electrode count. Moreover, using in vitro neural networks enables easy access and better network coverage compared to in vivo conditions. These are the main reasons why we chose an in vitro network for our investigation. We added these points to the Discussion section of the revised manuscript.

      (4) The authors report a range of synaptic conductance waveforms in time. Not surprisingly, E and I look broadly different. Could the authors comment on the implications of differences in time-course of conductance profiles even within E (or I) synapses? Is this functional or is it an outcome of analysis uncertainty?

      We are grateful to the reviewer for raising this interesting point. On the one hand, the onsets of the synaptic conductance waveform estimates were strikingly different between E and I synapses (see Fig. 8D). Furthermore, the rise and decay phases of synaptic currents were distinct for E vs. I inputs (Fig. 4C). We think that these differences are not just due to analysis uncertainty because both these observations are consistent with previously described properties of E and I inputs: Synaptic GABAergic I currents are typically slower compared to Glutamatergic E currents with respect to both rising and decay phase (DOI: 10.1126/science.abj586). Moreover, the relatively small onset latencies for I inputs that we observed are consistent with the well-known local action of inhibition. This finding was also consistent with smaller PRE-POST distances and general differences in neurite characteristics of E compared to I cells (Fig. S2).

      One of the challenges in doing such studies in a dish is that the network is simply ticking away without any neural or sensory context to work on, nor any clear idea of what its outputs might mean. Nevertheless, at a single-neuron level one expects that this system might provide a reasonable subset of the kinds of activity an individual cell might have to work on.

      (5) Could the authors comment on what subsets of network activity is, and is not, likely to be seen in the culture?

      (6) Could they indicate what this would mean for the conclusions about E-I summation, if the in-vivo activity follows different dynamics?

      We agree that there are natural limitations to a reductionist model, such as a dissociated cell culture. One may argue that neuronal cultures bear some similarities with neural networks formed during early brain development, where network formation is primarily driven by intrinsic, self-organizational capabilities. While such a self-organization is likely constrained in a 2D culture, it has been shown that several important circuit mechanisms that are observed in vivo are preserved in 2D dissociated cultures. For example, dissociated neuronal cultures can maintain E-I balance and achieve active decorrelation (DOI: 10.1038/nn.4415). In addition, in terms of activity levels, the sequences of heightened and more quiescent network spiking bear similarities with cortical Up-Down state oscillations observed during slow-wave sleep. To what extent individual circuit connectivity motifs and more nuanced network dynamics, found in vivo, can be recapitulated in vitro, is still not clear. However, combining our and previous work (especially DOI: 10.1038/nn.4415), we believe that there is sufficient evidence to justify work such as ours. On the one hand, identifying in simple cell culture models features of network dynamics and microcircuits known (or predicted) to exist in vivo is a testimony of neuronal self-organizing capabilities. On the other hand, our in vitro results will allow for more directed testing of equivalent mechanisms in vivo.

      Reviewer 2:

      The authors had two aims in this study. First, to develop a tool that lets them quantify the synaptic strength and sign of upstream neurons in a large network of cultured neurons. Second, they aimed at disentangling the contributions of excitatory and inhibitory inputs to spike generation.

      For the quantification of synaptic currents, their methods allows them to quantify excitatory and inhibitory currents simultaneously, as the sign of the current is determined by the neuron identity in the high-density extracellular recording. They further made sure that their method works for nonstationary firing rates, and they did a simulation to characterize what kind of connections their analysis does not capture. They did not include the possibility of (dendritic) nonlinearities or gap junctions or any kind of homeostatic processes.

      Thank you for the concise summary of our aims and of the features of our method. Indeed, we did not model nonlinear synaptic interactions, short-term plasticity etc., as there is likely a spectrum of possible interaction rules. Importantly, non-linear synaptic interactions were reduced by performing synaptic measurements in voltage-clamp mode.

      We do not anticipate that this would impact our connectivity inference per se. However, the presence of a significant number of nonlinear events would imply that some deviations between reconstructed and measured patch current traces were to be expected even if all incoming monosynaptic connections were identified. In the future, it will be exciting to add to our current experimental protocol a simultaneous HD-MEA & patch-clamp recording, in which the membrane potential is measured in current-clamp mode. Following application of our synaptic input-mapping procedure, one could, in this way, directly assess input-sequence dependent non-linear synaptic integration during spontaneous network activity.

      I see a clear weakness in the way that they quantify their goodness of fit, as they only report the explained variance, while their data are quite nonstationary. It could help to partition the explained variance into frequency bands, to at least separate the effects of a bias in baseline, the (around 100 Hz) band of synaptic frequencies and whatever high-frequency observation noise there may be. Another weak point is their explanation of unexplained variance by potential activation of extrasynaptic receptors without providing evidence. Given that these cultures are not a tissue and diffusion should be really high, this idea could easily be tested by adding a tiny amount of glutamate to the culture media.

      As suggested by the reviewer, we have now partitioned the current traces into frequency bands and separately assessed the goodness-of-fit. We have updated Fig. 3C accordingly:

      The following sentence was added to the main text:

      “We separately compared slow baseline changes (< 3 Hz), fast synaptic activity (3 - 200 Hz) and putative high-frequency noise (> 200 Hz), yielding a median variance explained of approximately 60% in the 3 - 200 Hz range (Fig. 3C).”

      Importantly, the variance explained in the frequency range of synaptic activity remains high. We would also like to point out that, even if all synaptic input connections were identified, one would expect some deviations between measured and reconstructed current trace. This is because the reconstructed trace is based on average input current waveforms and in the measured trace there may be synaptic transmission failures.

      We agree that the offered explanation for unexplained variance by activation of extrasynaptic receptors is fairly speculative. As it was not a crucial discussion point, we have therefore removed the statement.

      For the contributions of excitation and inhibition to neuronal spiking, the authors found a clear reduction of inhibitory inputs and increase of excitation associated with spiking when averaging across many spikes. And interestingly, the inhibition shows a reversal right after a spike and the timescale is faster during higher network activity. While these findings are great and provide further support that their method is working, they stop at this exciting point where I would really have liked to see more detail.

      Thank you for acknowledging our main results concerning the synaptic basis of spiking. We attempted to integrate in one manuscript a suite of new approaches, in addition to the respective applications. We, therefore, tried to strike the appropriate level of detail in presenting our findings. With regard to our analyses of which synaptic input events regulate postsynaptic spiking, we agree with reviewer 2’s assessment that more detail concerning the variability across individual spikes would be helpful. In the following parts, we detail multiple new analyses that we have included in the revised manuscript to address reviewer 2’s comments.

      A concern, of course, is that the network bursts in cultures are quite stereotypical, and that might cause averages across many bursts to show strange behaviour. So what I am missing here is a reference or baseline or null hypothesis. How does it look when using inputs from neurons that are not connected? And then, it looks like the E/(E+I) curve has lots of peaks of similar amplitude (that could be quantified...), so why does the neuron spike where it does? If I would compare to the peak (of similar amplitude) right before or right after (as a reference) are there some systematic changes? Is maybe the inhibition merely defining some general scaffold where spikes can happen and the excitation causes the spike as spiking is more irregular?

      The averaged trace reveals a different timescale for high and low activity states. But does that reflect a superposition of EPSCs in a single trial or rather a different jittering of a single EPSC across trials? For answering this question, it would be good to know the variance (and whether/ how much it changes over time). Maybe not all spikes are preceded by a decrease in inhibition. Could you quantitify (correlate, scatterplot?) how exactly excitation and inhibition contributions relate for single postsynaptic spikes (or single postsynaptic non-spikes)? After all, this would be the kind of detail that requires the large amount of data that this study provides.

      First of all, we are very grateful for the reviewer’s thorough assessment of our work and for the many valuable suggestions to improve it. We are convinced that we have addressed with our new analyses and the updated manuscript all issues raised here. One of the main findings from our original manuscript was that a rapid and brief change in input conductance (and particularly a reduction in inhibition) is an important spike trigger/regulator. We followed the reviewer’s suggestion and now provide scatter plots and distributions of the pre- (and post-spike) changes in input excitation and inhibition for individual postsynaptic spikes. A quantification of the peaks in the noisy E/(E+I) traces was not always trivial, which is why we reasoned that an assessment of the respective E and I changes is better suited. Moreover, as an unbiased reference, we generated separately for each postsynaptic cell a corresponding distribution of changes in input conductance in non-spiking periods (using random time points). We included our new results and updated figures in our responses to the specific reviewer comments below.

      For the first part, the authors achieved their goal in developing a tool to study synaptic inputs driving subthreshold activity at the soma, and characterizing such connections. For the second part, they found an effect of EPSCs on firing, but they barely did any quantification of its relevance due to the lack of a reference.

      With the availability of Neuropixels probes, there is certainly use for their tool in in vivo applications, and their statistical analysis provides a reference for future studies.

      The relevance of excitatory and inhibitory currents on spiking remains to be seen in an updated version of the manuscript.

      Thank you. Please see our new analyses below. Our new findings are in agreement with the main conclusions of the original manuscript. We provide evidence that rapid pre-spike changes in input conductance are observed across most individual spikes and that these rapid changes occur significantly more often before measured spikes than in non-spiking periods.

      I feel that specifically Figures 6 and 7 lack relevant detail and a consistent representation that would allow the reader to establish links between the different panels. The analysis shows very detailed examples, but then jumps into analyses that show population averages over averaged responses, losing or ignoring the variability across trials. In addition, while their results themselves pass a statistical test, it is crucial to establish some measure of how relevant these results are. For that, I would really want to know how much spiking would actually be restricted by the constraints that would be posed by these results, i.e. would this be reflected in tiny changes in spiking probabilities, or are there times when spiking probabilities are necessarily high, or do we see times when we would almost certainly get a spike, but neurons can fire during other times as well.

      I would agree that a detailed, quantitative analysis of this question is beyond the scope of this paper, but a qualitative analysis is feasible and should be done.

      Please see our revised Figure 6. We have rearranged some of the original panels and removed one example of mean conductance profiles. Moreover, we removed a panel with analysis results based on mean conductances that is now obsolete, as more detailed analyses are provided (which are in agreement with the original findings). Analyses from panels (A-F) are mostly unchanged. Panels (G-J) show the new results.

      The following paragraphs, which were added to the main text of the revised manuscript, describe our new findings:

      “For a more nuanced picture of which synaptic events are associated with postsynaptic spiking, we next quantified the changes in input excitation and inhibition that preceded individual postsynaptic spikes. In our analysis, we first focused on periods with high synaptic input activity. As previously discussed, cortical neurons in vivo typically receive and integrate barrages of input activation, similar to the high-activity events that we observed here (e.g., the event depicted in Fig. 6A, right). In Fig. 6G/H, individual pre-spike changes in input conductance are shown for two example postsynaptic neurons (plots labeled ‘spiking’, right). To assess how specific these conductance changes were to spiking periods, we also quantified the changes in input conductance that occurred during non-spiking periods as a reference (we used random time points from high-activity events excluding time points adjacent to measured spike times; we upscaled the number of measured spikes by 10x; the respective plots were labeled ‘non-spiking’). Spikes of both example neurons exhibited – compared to non-spiking regions – significantly more often a pre-spike decrease in inhibition, consistent with the mean conductance profiles. Precisely how an increase (top-right quadrants in Fig. 6G/H) or decrease (bottom-left quadrants) in both I and E conductance influenced the neuronal membrane potential is difficult to predict. However, if rapid changes in input conductance had a significant role in triggering spikes, one would expect that fewer spikes would exhibit a hyperpolarizing pre-spike increase in I and decrease in E (top-left quadrant) compared to the non-spiking period. Conversely, a decrease in I and an increase E (bottom-right quadrants) would likely result in a membrane potential depolarization so that more spikes should feature the corresponding pre-spike conductance changes compared to non-spiking periods. These relative shifts are precisely what can be observed in the plots of the two example neurons (Fig. 6G/H) and, in fact, across recordings (Fig. 6I). Finally, we compared the distributions of pre-spike changes in input inhibition and excitation of each postsynaptic neuron (Fig. 6J). Further indicating a pivotal role of inhibition in triggering spikes, 6 out of 7 neurons exhibited a clear decrease in the mean values (and medians) of pre-spike changes in inhibition compared to non-spiking periods. Interestingly, the 3 out of 7 neurons with an increase in excitation showed the smallest decrease in inhibition (or even an increase in inhibition in case of neuron #7). This latter observation suggests a matching of E and I inputs and cell-specific relative contributions of E and I conductance changes in triggering spikes.

      Theoretically, neuronal spiking could be driven by a prolonged suprathreshold depolarization (Petersen and Berg 2016; Renart et al. 2007) or, in more favorable subthreshold regimes, by fast synaptic input fluctuations (Ahmadian and Miller 2021; Amit and Brunel 1997; Brunel 2000; Van Vreeswijk and Sompolinsky 1996). In this section, we demonstrated that the majority of investigated neurons featured – during high-activity periods – a significant number of spikes that were associated with rapid pre-spike changes in input conductances. These findings suggest that even simple neuronal cultures can self-organize to form circuits exhibiting sophisticated spiking dynamics.”

      Our new analyses detailed in Fig. 6 show that there are also presumably depolarizing events (e.g., decrease in I and increase in E) in non-spiking regions. In future studies, it will be interesting to examine what distinguishes these events from spike-inducing events of similar magnitude – one possibility is a dependency on specific input-activation sequences.

      During the first days and weeks of developing neuronal cultures, spiking activity rapidly shifts from synapse-independent activity patterns to spiking dynamics that do depend on synaptic inputs and are progressively organized in network-wide high-activity events (DOI: 10.1016/j.brainres.2008.06.022). In our study, cultures at days-in-vitro 15-18 were used, and approximately 15% of the spikes occurred during high-activity events with relatively strong E and I input activity. In addition, spikes that occurred during low-activity events were at least partially regulated by synaptic input (see answers below related to Fig. 7).

      In the following, I am detailing what I would consider necessary to be done about these two Figures:

      Figure 6C is indeed great, though I don't see why the authors would characterize synchrony as low. When comparing with Figure 4B, I'd think that some of these values are quite high. And it wouldn't help me to imagine error bars in panel 6D.

      We have removed our characterization as ‘low’ from the text. One important difference between our synchrony measure (STTC) and the quantification of spike-transmission probability (STP) is the ‘lag’ of a few milliseconds for the STP quantification window to account for synaptic delay.

      Figure 6B is useful, but could be done better: The autocovariance of a shotnoise process is a convolution of the autocovariance of underlying point process and the autocovariance of the EPSC kernel. So one would want to separate those to obtain a better temporal resolution. But a shotnoise process has well defined peaks, and the time of these local maxima can be estimated quite precisely. Now if I would do a peak triggered average instead of the full convolution, I would do half of the deconvolution and obtain a temporally asymmetric curve of what is expected to happen around an EPSC. Importantly, one could directly see expected excitation after inhibition or expected inhibition after excitation, and this visualization could be much better and more intuitively compared to panel 6E.

      We appreciate the reviewer’s suggestion to present these results in a more sophisticated way. We would like to propose to stick with the original analysis to have it comparable with related analyses from the literature (e.g., DOI: 10.1038/nn.2105). Therefore, we hope the reviewer finds it acceptable that we leave the presentation of the data in its original form and potentially follow up in future work with the analysis strategy proposed by the reviewer.

      Panel D needs some variability estimate (i.e. standard deviation or interquartile range or even a probability density) for those traces.

      Figure 6E: Please use more visible colors. A sensitivity analysis to see traces for 2E/(2E+I) and E/(E+2I) would be great.

      Figure 6F: with an updated panel B, we should be able to have a slope for average inhibition after excitation for each of these cells. A second panel / third column showing those slopes would be of interest. It would serve as a reference for what could be expected from E-I interactions alone.

      With regard to the variability estimate in D, we now provide multiple panels characterizing the variability. For one, Fig. 6H contains a scatter plot of the pre-spike changes in input conductance across all individual postsynaptic spikes from the example cell shown in D. Moreover, in Fig. 7A, we show from the same example cell the standard deviations associated with the mean conductance traces separately for spikes that occurred during low- and high-activity states. For better visibility and because the separation according to activity states is more informative, we kept the original presentation of panel D (however, removing one example cell). In addition, we show the same mean traces from panel D with the respective standard deviations (across all spikes) in Supplementary Figure S3.

      Colors in Fig. 6E are adjusted, as requested.

      We have removed panel Fig. 6F as we now provide more detailed analyses at single-spike level (see Fig. 6G-J).

      Figure 6G: Could the authors provide an interquartile range here?

      With regard to the aligned input-output data from original panel Fig. 6G, now in panel Fig. 6F in the updated figure version, we show all individual traces that were averaged: the E/I traces from panel Fig. 6E and the three action potential waveforms from Supplementary Figure S5. Therefore, we chose to present the means only for better visibility.

      Figure 7A: it may be hard to squeeze in variability estimates here, but the information on whether and how much variance might be explained is essential. Maybe add another panel to provide a variability estimate? The variability estimate in panel 7B and 7D only reflect variability across connections, and it would be useful to add panels for the time courses of the variability of g (or E/(E+I) respectively).

      We now include the standard deviations across the input conductance traces in the updated Fig. 7A, as requested. We have also simplified Fig. 7 and performed the analysis using the 6 out of 7 neurons that, based on our new analysis (Fig. 6J) displayed a clear reduction in pre-spike inhibition, relative to the reference distribution. For a complete overview of the state-dependent changes in input conductance that are associated with individual postsynaptic spikes, we have included a new supplementary figure (Fig. S6). Fig. S6 also includes a characterization of the changes in input inhibition that occur right after postsynaptic spiking. In addition, Fig. S6D shows the standard deviations corresponding to the mean input conductance traces of all cells – separately for high- and low-activity periods.

      We added the following paragraph to the main text of the revised manuscript:

      “How can these deviations in the mean conductance profiles be explained? To answer this question, we further quantified – separately for low and high g states – the changes in input inhibition that occurred right before and after individual postsynaptic spikes (Fig. S6). This single-spike analysis suggested that, during high g states, most spikes experienced a post-spike increase and pre-spike decrease in inhibition (see also Fig. 6J). On the other hand, low g states were characterized by sparse synaptic input (e.g., see reconstruction in Fig. 6A). Therefore, many of the spikes that occurred during low g states were associated with little change in input conductance (note medians of approximately zero in Fig. S6A/C). Nevertheless, a considerable fraction of spikes (often > 25%) from low g states were also associated with a post-spike increase and pre-spike drop in inhibition. It, therefore, appears that even the sparse inhibitory inputs of low g states could influence spike timing. Moreover, the post-spike increases in input inhibition during low g states suggest that there were strong regulatory inhibitory circuits in place. However, limited activity levels during low g states presumably introduced an increased jitter of these spike-associated changes in input inhibition.

      In summary, the input inhibition of high-conductance states provides reliable and narrow windows-of-spiking opportunity. In addition, even during periods of sparse activity, there are rudimentary synaptic mechanisms in place to regulate spike timing.”

      As a suggestion for further analysis, though I am well aware that this is likely beyond the scope of this manuscript, I'd suggest the following analysis:

      I would split the data into the high and low activity states. Then I would compute the average of E/(E+I) values for spikes. Assuming that spikes tend to happen for local maxima of E/(E+I) I would find local maxima for periods without spike such that their average is equal to the value for actual spikes. Finally, I would test for a systematic difference in either excitation or inhibition.

      If there is no difference, you can make the claim that synaptic input does not guarantee a spike, and compare to a global average of E/(E+I).

      We are grateful for the fantastic suggestions for future analysis. We look forward to conducting these analyses in a more detailed follow-up characterization.

      In addition to the major alterations detailed above, we performed smaller corrections (e.g., spelling mistakes, inaccuracies) in some parts of the manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this manuscript, the authors use a large dataset of neuroscience publications to elucidate the nature of self-citation within the neuroscience literature. The authors initially present descriptive measures of self-citation across time and author characteristics; they then produce an inclusive model to tease apart the potential role of various article and author features in shaping self-citation behavior. This is a valuable area of study, and the authors approach it with an appropriate and well-structured dataset.

      The study's descriptive analyses and figures are useful and will be of interest to the neuroscience community. However, with regard to the statistical comparisons and regression models, I believe that there are methodological flaws that may limit the validity of the presented results. These issues primarily affect the uncertainty of estimates and the statistical inference made on comparisons and model estimates - the fundamental direction and magnitude of the results are unlikely to change in most cases. I have included detailed statistical comments below for reference.

      Conceptually, I think this study will be very effective at providing context and empirical evidence for a broader conversation around self-citation. And while I believe that there is room for a deeper quantitative dive into some finer-grained questions, this paper will be a valuable catalyst for new areas of inquiry around citation behavior - e.g., do authors change self-citation behavior when they move to more or less prestigious institutions? do self-citations in neuroscience benefit downstream citation accumulation? do journals' reference list policies increase or decrease self-citation? - that I hope that the authors (or others) consider exploring in future work.

      Thank you for your suggestions and your generally positive view of our work. As described below, we have made the statistical improvements that you suggested.

      Statistical comments:

      (1) Throughout the paper, the nested nature of the data does not seem to be appropriately handled in the bootstrapping, permutation inference, and regression models. This is likely to lead to inappropriately narrow confidence bands and overly generous statistical inference.

      We apologize for this error. We have now included nested bootstrapping and permutation tests. We defined an “exchangeability block” as a co-authorship group of authors. In this dataset, that meant any authors who published together (among the articles in this dataset) as a First Author / Last Author pairing were assigned to the same exchangeability block. It is not realistic to check for overlapping middle authors in all papers because of the collaborative nature of the field. In addition, we believe that self-citations are primarily controlled by first and last authors, so we can assume that middle authors do not control self-citation habits. We then performed bootstrapping and permutation tests in the constraints of the exchangeability blocks.

      We first describe this in the results (page 3, line 110):

      “Importantly, we accounted for the nested structure of the data in bootstrapping and permutation tests by forming co-authorship exchangeability blocks.”

      We also describe this in 4.8 Confidence Intervals (page 21, line 725):

      “Confidence intervals were computed with 1000 iterations of bootstrap resampling at the article level. For example, of the 100,347 articles in the dataset, we resampled articles with replacement and recomputed all results. The 95% confidence interval was reported as the 2.5 and 97.5 percentiles of the bootstrapped values.

      We grouped data into exchangeability blocks to avoid overly narrow confidence intervals or overly optimistic statistical inference. Each exchangeability block comprised any authors who published together as a First Author / Last Author pairing in our dataset. We only considered shared First/Last Author publications because we believe that these authors primarily control self-citations, and otherwise exchangeability blocks would grow too large due to the highly collaborative nature of the field. Furthermore, the exchangeability blocks do not account for co-authorship in other journals or prior to 2000. A distribution of the sizes of exchangeability blocks is presented in Figure S15.”

      In describing permutation tests, we also write (page 21, line 739):

      “4.9 P values

      P values were computed with permutation testing using 10,000 permutations, with the exception of regression P values and P values from model coefficients. For comparing different fields (e.g., Neuroscience and Psychiatry) and comparing self-citation rates of men and women, the labels were randomly permuted by exchangeability block to obtain null distributions. For comparing self-citation rates between First and Last Authors, the first and last authorship was swapped in 50% of exchangeability blocks.”

      For modeling, we considered doing a mixed effects model but found difficulties due to computational power. For example, with our previous model, there were hundreds of thousands of levels for the paper random effect, and tens of thousands of levels for the author random effect. Even when subsampling or using packages designed for large datasets (e.g., mgcv’s bam function: https://www.rdocumentation.org/packages/mgcv/versions/1.9-1/topics/bam), we found computational difficulties.

      As a result, we switched to modeling results at the paper level (e.g., self-citation count or rate). We found that results could be unstable when including author-level random effects because in many cases there was only one author per group. Instead, to avoid inappropriately narrow confidence bands, we resampled the dataset such that each author was only represented once. For example, if Author A had five papers in this dataset, then one of their five papers was randomly selected. We updated our description of our models in the Methods section (page 21, line 754):

      “4.10 Exploring effects of covariates with generalized additive models

      For these analyses, we used the full dataset size separately for First and Last Authors (Table S2). This included 115,205 articles and 5,794,926 citations for First Authors, and 114,622 articles and 5,801,367 citations for Last Authors. We modeled self-citation counts, self-citation rates, and number of previous papers for First Authors and Last Authors separately, resulting in six total models.

      We found that models could be computationally intensive and unstable when including author-level random effects because in many cases there was only one author per group. Instead, to avoid inappropriately narrow confidence bands, we resampled the dataset such that each author was only represented once. For example, if Author A had five papers in this dataset, then one of their five papers was randomly selected. The random resampling was repeated 100 times as a sensitivity analysis (Figure S12).

      For our models, we used generalized additive models from mgcv’s “gam” function in R 49. The smooth terms included all the continuous variables: number of previous papers, academic age, year, time lag, number of authors, number of references, and journal impact factor. The linear terms included all the categorical variables: field, gender affiliation country LMIC status, and document type. We empirically selected a Tweedie distribution 50 with a log link function and p=1.2. The p parameter indicates that the variance is proportional to the mean to the p power 49. The p parameter ranges from 1-2, with p=1 equivalent to the Poisson distribution and p=2 equivalent to the gamma distribution. For all fitted models, we simulated the residuals with the DHARMa package, as standard residual plots may not be appropriate for GAMs 51. DHARMa scales the residuals between 0 and 1 with a simulation-based approach 51. We also tested for deviation from uniformity, dispersion, outliers, and zero inflation with DHARMa. Non-uniformity, dispersion, outliers, and zero inflation were significant due to the large sample size, but small in effect size in most cases. The simulated quantile-quantile plots from DHARMa suggested that the observed and simulated distributions were generally aligned, with the exception of slight misalignment in the models for the number of previous papers. These analyses are presented in Figure S11 and Table S7.

      In addition, we tested for inadequate basis functions using mgcv’s “gam.check()” function 49. Across all smooth predictors and models, we ultimately selected between 10-20 basis functions depending on the variable and outcome measure (counts, rates, papers). We further checked the concurvity of the models and ensured that the worst-case concurvity for all smooth predictors was about 0.8 or less.”

      The direction of our results primarily stayed the same, with the exception of gender results. Men tended to self-cite slightly less (or equal self-citation rates) after accounting for numerous covariates. As such, we also modeled the number of previous papers to explain the discrepancy between our raw data and the modeled gender results. Please find the updated results text below (page 11, line 316):

      “2.9 Exploring effects of covariates with generalized additive models

      Investigating the raw trends and group differences in self-citation rates is important, but several confounding factors may explain some of the differences reported in previous sections. For instance, gender differences in self-citation were previously attributed to men having a greater number of prior papers available to self-cite 7,20,21. As such, covarying for various author- and article-level characteristics can improve the interpretability of self-citation rate trends. To allow for inclusion of author-level characteristics, we only consider First Author and Last Author self-citation in these models.

      We used generalized additive models (GAMs) to model the number and rate of self-citations for First Authors and Last Authors separately. The data were randomly subsampled so that each author only appeared in one paper. The terms of the model included several article characteristics (article year, average time lag between article and all cited articles, document type, number of references, field, journal impact factor, and number of authors), as well as author characteristics (academic age, number of previous papers, gender, and whether their affiliated institution is in a low- and middle-income country). Model performance (adjusted R2) and coefficients for parametric predictors are shown in Table 2. Plots of smooth predictors are presented in Figure 6.

      First, we considered several career and temporal variables. Consistent with prior works 20,21, self-citation rates and counts were higher for authors with a greater number of previous papers. Self-citation counts and rates increased rapidly among the first 25 published papers but then more gradually increased. Early in the career, increasing academic age was related to greater self-citation. There was a small peak at about five years, followed by a small decrease and a plateau. We found an inverted U-shaped trend for average time lag and self-citations, with self-citations peaking approximately three years after initial publication. In addition, self-citations have generally been decreasing since 2000. The smooth predictors showed larger decreases in the First Author model relative to the Last Author model (Figure 6).

      Then, we considered whether authors were affiliated with an institution in a low- and middle-income country (LMIC). LMIC status was determined by the Organisation for Economic Co-operation and Development. We opted to use LMIC instead of affiliation country or continent to reduce the number of model terms. We found that papers from LMIC institutions had significantly lower self-citation counts (-0.138 for First Authors, -0.184 for Last Authors) and rates (-12.7% for First Authors, -23.7% for Last Authors) compared to non-LMIC institutions. Additional results with affiliation continent are presented in Table S5. Relative to the reference level of Asia, higher self-citations were associated with Africa (only three of four models), the Americas, Europe, and Oceania.

      Among paper characteristics, a greater number of references was associated with higher self-citation counts and lower self-citation rates (Figure 6). Interestingly, self-citations were greater for a small number of authors, though the effect diminished after about five authors. Review articles were associated with lower self-citation counts and rates. No clear trend emerged between self-citations and journal impact factor. In an analysis by field, despite the raw results suggesting that self-citation rates were lower in Neuroscience, GAM-derived self-citations were greater in Neuroscience than in Psychiatry or Neurology.

      Finally, our results aligned with previous findings of nearly equivalent self-citation rates for men and women after including covariates, even showing slightly higher self-citation rates in women. Since raw data showed evidence of a gender difference in self-citation that emerges early in the career but dissipates with seniority, we incorporated two interaction terms: one between gender and academic age and a second between gender and the number of previous papers. Results remained largely unchanged with the interaction terms (Table S6).

      2.10 Reconciling differences between raw data and models

      The raw and GAM-derived data exhibited some conflicting results, such as for gender and field of research. To further study covariates associated with this discrepancy, we modeled the publication history for each author (at the time of publication) in our dataset (Table 2). The model terms included academic age, article year, journal impact factor, field, LMIC status, gender, and document type. Notably, Neuroscience was associated with the fewest number of papers per author. This explains how authors in Neuroscience could have the lowest raw self-citation rates but the highest self-citation rates after including covariates in a model. In addition, being a man was associated with about 0.25 more papers. Thus, gender differences in self-citation likely emerged from differences in the number of papers, not in any self-citation practices.”

      (2) The discussion of the data structure used in the regression models is somewhat opaque, both in the main text and the supplement. From what I gather, these models likely have each citation included in the model at least once (perhaps twice, once for first-author status and one for last-author status), with citations nested within citing papers, cited papers, and authors. Without inclusion of random effects, the interpretation and inference of the estimates may be misleading.

      Please see our response to point (1) to address random effects. We have also switched to GAMs (see point #3 below) and provided more detail in the methods. Notably, we decided against using author-level effects due to poor model stability, as there can be as few as one author per group. Instead, we subsampled the dataset such that only one paper appeared from each author.

      (3) I am concerned that the use of the inverse hyperbolic sine transform is a bit too prescriptive, and may be producing poor fits to the true predictor-outcome relationships. For example, in a figure like Fig S8, it is hard to know to what extent the sharp drop and sign reversal are true reflections of the data, and to what extent they are artifacts of the transformed fit.

      Thank you for raising this point. We have now switched to using generalized additive models (GAMs). GAMs provide a flexible approach to modeling that does not require transformations. We described this in detail in point (1) above and in Methods 4.10 Exploring effects of covariates with generalized additive models (page 21, line 754).

      “4.10 Exploring effects of covariates with generalized additive models

      For these analyses, we used the full dataset size separately for First and Last Authors (Table S2). This included 115,205 articles and 5,794,926 citations for First Authors, and 114,622 articles and 5,801,367 citations for Last Authors. We modeled self-citation counts, self-citation rates, and number of previous papers for First Authors and Last Authors separately, resulting in six total models.

      We found that models could be computationally intensive and unstable when including author-level random effects because in many cases there was only one author per group. Instead, to avoid inappropriately narrow confidence bands, we resampled the dataset such that each author was only represented once. For example, if Author A had five papers in this dataset, then one of their five papers was randomly selected. The random resampling was repeated 100 times as a sensitivity analysis (Figure S12).

      For our models, we used generalized additive models from mgcv’s “gam” function in R 48. The smooth terms included all the continuous variables: number of previous papers, academic age, year, time lag, number of authors, number of references, and journal impact factor. The linear terms included all the categorical variables: field, gender affiliation country LMIC status, and document type. We empirically selected a Tweedie distribution 49 with a log link function and p=1.2. The p parameter indicates that the variance is proportional to the mean to the p power 48. The p parameter ranges from 1-2, with p=1 equivalent to the Poisson distribution and p=2 equivalent to the gamma distribution. For all fitted models, we simulated the residuals with the DHARMa package, as standard residual plots may not be appropriate for GAMs 50. DHARMa scales the residuals between 0 and 1 with a simulation-based approach 50. We also tested for deviation from uniformity, dispersion, outliers, and zero inflation with DHARMa. Non-uniformity, dispersion, outliers, and zero inflation were significant due to the large sample size, but small in effect size in most cases. The simulated quantile-quantile plots from DHARMa suggested that the observed and simulated distributions were generally aligned, with the exception of slight misalignment in the models for the number of previous papers. These analyses are presented in Figure S11 and Table S7.

      In addition, we tested for inadequate basis functions using mgcv’s “gam.check()” function 48. Across all smooth predictors and models, we ultimately selected between 10-20 basis functions depending on the variable and outcome measure (counts, rates, papers). We further checked the concurvity of the models and ensured that the worst-case concurvity for all smooth predictors was about 0.8 or less.”

      (4) It seems there are several points in the analysis where papers may have been dropped for missing data (e.g., missing author IDs and/or initials, missing affiliations, low-confidence gender assessment). It would be beneficial for the reader to know what % of the data was dropped for each analysis, and for comparisons across countries it would be important for the authors to make sure that there is not differential missing data that could affect the interpretation of the results (e.g., differences in self-citation being due to differences in Scopus ID coverage).

      Thank you for raising this important point. In the methods section, we describe how the data are missing (page 18, line 623):

      “4.3 Data exclusions and missingness

      Data were excluded across several criteria: missing covariates, missing citation data, out-of-range values at the citation pair level, and out-of-range values at the article level (Table 3). After downloading the data, our dataset included 157,287 articles and 8,438,733 citations. We excluded any articles with missing covariates (document type, field, year, number of authors, number of references, academic age, number of previous papers, affiliation country, gender, and journal). Of the remaining articles, we dropped any for missing citation data (e.g., cannot identify whether a self-citation is present due to lack of data). Then, we removed citations with unrealistic or extreme values. These included an academic age of less than zero or above 38/44 for First/Last Authors (99th percentile); greater than 266/522 papers for First/Last Authors (99th percentile); and a cited year before 1500 or after 2023. Subsequently, we dropped articles with extreme values that could contribute to poor model stability. These included greater than 30 authors; fewer than 10 references or greater than 250 references; and a time lag of greater than 17 years. These values were selected to ensure that GAMs were stable and not influenced by a small number of extreme values.

      In addition, we evaluated whether the data were not missing at random (Table S8). Data were more likely to be missing for reviews relative to articles, for Neurology relative to Neuroscience or Psychiatry, in works from Africa relative to the other continents, and for men relative to women. Scopus ID coverage contributed in part to differential missingness. However, our exclusion criteria also contribute. For example, Last Authors with more than 522 papers were excluded to help stabilize our GAMs. More men fit this exclusion criteria than women.”

      Due to differential missingness, we wrote in the limitations (page 16, line 529):

      “Ninth, data were differentially missing (Table S8) due to Scopus coverage and gender estimation. Differential missingness could bias certain results in the paper, but we hope that the dataset is large enough to reduce any potential biases.”

      Reviewer #2 (Public Review):

      The authors provide a comprehensive investigation of self-citation rates in the field of Neuroscience, filling a significant gap in existing research. They analyze a large dataset of over 150,000 articles and eight million citations from 63 journals published between 2000 and 2020. The study reveals several findings. First, they state that there is an increasing trend of self-citation rates among first authors compared to last authors, indicating potential strategic manipulation of citation metrics. Second, they find that the Americas show higher odds of self-citation rates compared to other continents, suggesting regional variations in citation practices. Third, they show that there are gender differences in early-career self-citation rates, with men exhibiting higher rates than women. Lastly, they find that self-citation rates vary across different subfields of Neuroscience, highlighting the influence of research specialization. They believe that these findings have implications for the perception of author influence, research focus, and career trajectories in Neuroscience.

      Overall, this paper is well written, and the breadth of analysis conducted by authors, with various interactions between variables (eg. gender vs. seniority), shows that the authors have spent a lot of time thinking about different angles. The discussion section is also quite thorough. The authors should also be commended for their efforts in the provision of code for the public to evaluate their own self-citations. That said, here are some concerns and comments that, if addressed, could potentially enhance the paper:

      Thank you for your review and your generally positive view of our work.

      (1) There are concerns regarding the data used in this study, specifically its bias towards top journals in Neuroscience, which limits the generalizability of the findings to the broader field. More specifically, the top 63 journals in neuroscience are based on impact factor (IF), which raises a potential issue of selection bias. While the paper acknowledges this as a limitation, it lacks a clear justification for why authors made this choice. It is also unclear how the "top" journals were identified as whether it was based on the top 5% in terms of impact factor? Or 10%? Or some other metric? The authors also do not provide the (computed) impact factors of the journals in the supplementary.

      We apologize for the lack of clarity about our selection of journals. We agree that there are limitations to selecting higher impact journals. However, we needed to apply some form of selection in order to make the analysis manageable. For instance, even these 63 journals include over five million citations. We better describe our rationale behind the approach as follows (page 17, line 578):

      “We collected data from the 25 journals with the highest impact factors, based on Web of Science impact factors, in each of Neurology, Neuroscience, and Psychiatry. Some journals appeared in the top 25 list of multiple fields (e.g., both Neurology and Neuroscience), so 63 journals were ultimately included in our analysis. We recognize that limiting the journals to the top 25 in each field also limits the generalizability of the results. However, there are tradeoffs between breadth of journals and depth of information. For example, by limiting the journals to these 63, we were able to look at 21 years of data (2000-2020). In addition, the definition of fields is somewhat arbitrary. By restricting the journals to a set of 63 well-known journals, we ensured that the journals belonged to Neurology, Neuroscience, or Psychiatry research. It is also important to note that the impact factor of these journals has not necessarily always been high. For example, Acta Neuropathologica had an impact factor of 17.09 in 2020 but 2.45 in 2000. To further recognize the effects of impact factor, we decided to include an impact factor term in our models.”

      In addition, we have now provided the 2020 impact factors in Table S1.

      By exclusively focusing on high impact journals, your analysis may not be representative of the broader landscape of self-citation patterns across the neuroscience literature, which is what the title of the article claims to do.

      We agree that this article is not indicative of all neuroscience literature, but rather the top journals. Thus, we have changed the title to: “Trends in Self-citation Rates in High-impact Neurology, Neuroscience, and Psychiatry Journals”. We would also like to note that compared to previous bibliometrics works in neuroscience (Bertolero et al. 2020; Dworkin et al. 2020; Fulvio et al. 2021), this article includes a wider range of data.

      (2) One other concern pertains to the possibility that a significant number of authors involved in the paper may not be neuroscientists. It is plausible that the paper is a product of interdisciplinary collaboration involving scientists from diverse disciplines. Neuroscientists amongst the authors should be identified.

      In our opinion, neuroscience is a broad, interdisciplinary field. Individuals performing neuroscience research may have a neuroscience background. Yet, they may come from many backgrounds, such as physics, mathematics, biology, chemistry, or engineering. As such, we do not believe that it is feasible to characterize whether each author considers themselves a neuroscientist or not. We have added the following to the limitations section (page 16, line 528):

      “Eighth, authors included in this work may not be neurologists, neuroscientists, or psychiatrists. However, they still publish in journals from these fields.”

      (3) When calculating self-citation rate, it is important to consider the number of papers the authors have published to date. One plausible explanation for the lower self-citation rates among first authors could be attributed to their relatively junior status and short publication record. As such, it would also be beneficial to assess self-citation rate as a percentage relative to the author's publication history. This number would be more accurate if we look at it as a percentage of their publication history. My suspicion is that first authors (who are more junior) might be more likely to self-cite than their senior counterparts. My suspicion was further raised by looking at Figures 2a and 3. Considering the nature of the self-citation metric employed in the study, it is expected that authors with a higher level of seniority would have a greater number of publications. Consequently, these senior authors' papers are more likely to be included in the pool of references cited within the paper, hence the higher rate.

      While the authors acknowledge the importance of the number of past publications in their gender analysis, it is just as important to include the interplay of seniority in (1) their first and last author self-citation rates and (2) their geographic analysis.

      Thank you for this thoughtful comment. We agree that seniority and prior publication history play an important role in self-citation rates.

      For comparing First/Last Author self-citation rates, we have now included a plot similar to Figure 2a, where self-citation as a percentage of prior publication history is plotted.

      (page 4, line 161): “Analyzing self-citations as a fraction of publication history exhibited a similar trend (Figure S3). Notably, First Authors were more likely than Last Authors to self-cite when normalized by prior publication history.

      For the geographic analysis, we made two new maps: 1) that of the number of previous papers, and 2) that of the journal impact factor (see response to point #4 below).

      (page 5, line 185): “We also investigated the distribution of the number of previous papers and journal impact factor across countries (Figure S4). Self-citation maps by country were highly correlated with maps of the number of previous papers (Spearman’s r\=0.576, P=4.1e-4; 0.654, P=1.8e-5 for First and Last Authors). They were significantly correlated with maps of average impact factor for Last Authors (0.428, P=0.014) but not Last Authors (Spearman’s r\=0.157, P=0.424). Thus, further investigation is necessary with these covariates in a comprehensive model.”

      Finally, we included a model term for the number of previous papers (Table 2). We analyzed this both for self-citation counts and self-citation rates and found a strong relationship between publication history and self-citations. We also included the following section where we modeled the number of previous papers for each author (page 13, line 384):

      “2.10 Reconciling differences between raw data and models

      The raw and GAM-derived data exhibited some conflicting results, such as for gender and field of research. To further study covariates associated with this discrepancy, we modeled the publication history for each author (at the time of publication) in our dataset (Table 2). The model terms included academic age, article year, journal impact factor, field, LMIC status, gender, and document type. Notably, Neuroscience was associated with the fewest number of papers per author. This explains how authors in Neuroscience could have the lowest raw self-citation rates but the highest self-citation rates after including covariates in a model. In addition, being a man was associated with about 0.25 more papers. Thus, gender differences in self-citation likely emerged from differences in the number of papers, not in any self-citation practices.”

      (4) Because your analysis is limited to high impact journals, it would be beneficial to see the distribution of the impact factors across the different countries. Otherwise, your analysis on geographic differences in self-citation rates is hard to interpret. Are these differences really differences in self-citation rates, or differences in journal impact factor? It would be useful to look at the representation of authors from different countries for different impact factors.

      We made a map of this in Figure S4 (see our response to point #3 above).

      (page 5, line 185): “We also investigated the distribution of the number of previous papers and journal impact factor across countries (Figure S4). Self-citation maps by country were highly correlated with maps of the number of previous papers (Spearman’s r=0.576, P=4.1e-4; 0.654, P=1.8e-5 for First and Last Authors). They were significantly correlated with maps of average impact factor for Last Authors (0.428, P=0.014) but not Last Authors (Spearman’s r=0.157, P=0.424). Thus, further investigation is necessary with these covariates in a comprehensive model.”

      We also included impact factor as a term in our model. The results suggest that there are still geographic differences (Table 2, Table S5).

      (5) The presence of self-citations is not inherently problematic, and I appreciate the fact that authors omit any explicit judgment on this matter. That said, without appropriate context, self-citations are also not the best scholarly practice. In the analysis on gender differences in self-citations, it appears that authors imply an expectation of women's self-citation rates to align with those of men. While this is not explicitly stated, use of the word "disparity", and also presentation of self-citation as an example of self-promotion in discussion suggest such a perspective. Without knowing the context in which the self-citation was made, it is hard to ascertain whether women are less inclined to self-promote or that men are more inclined to engage in strategic self-citation practices.

      We agree that on the level of an individual self-citation, our study is not useful for determining how related the papers are. Yet, understanding overall trends in self-citation may help to identify differences. Context is important, but large datasets allow us to investigate broad trends. We added the following text to the limitations section (page 16, line 524):

      “In addition, these models do not account for whether a specific citation is appropriate, as some situations may necessitate higher self-citation rates.”

      Reviewer #3 (Public Review):

      This paper analyses self-citation rates in the field of Neuroscience, comprising in this case, Neurology, Neuroscience and Psychiatry. Based on data from Scopus, the authors identify self-citations, that is, whether references from a paper by some authors cite work that is written by one of the same authors. They separately analyse this in terms of first-author self-citations and last-author self-citations. The analysis is well-executed and the analysis and results are written down clearly. There are some minor methodological clarifications needed, but more importantly, the interpretation of some of the results might prove more challenging. That is, it is not always clear what is being estimated, and more importantly, the extent to which self-citations are "problematic" remains unclear.

      Thank you for your review. We attempted to improve the interpretation of results, as described in the following responses.

      When are self-citations problematic? As the authors themselves also clarify, "self-citations may often be appropriate". Researchers cite their own previous work for perfectly good reasons, similar to reasons of why they would cite work by others. The "problem", in a sense, is that researchers cite their own work, just to increase the citation count, or to promote their own work and make it more visible. This self-promotional behaviour might be incentivised by certain research evaluation procedures (e.g. hiring, promoting) that overly emphasise citation performance. However, the true problem then might not be (self-)citation practices, but instead, the flawed research evaluation procedures that emphasis citation performance too much. So instead of problematising self-citation behaviour, and trying to address it, we might do better to address flawed research evaluation procedures. Of course, we should expect references to be relevant, and we should avoid self-promotional references, but addressing self-citations may just have minimal effects, and would not solve the more fundamental issue.

      We agree that this dataset is not designed to investigate the downstream effects of self-citations. However, self-citation practices are more likely to be problematic when they differ across specific groups. This work can potentially spark more interest in future longitudinal designs to investigate whether differences in self-citation practices leads to differences in career outcomes, for example. We added the following text to clarify (page 17, line 565):

      “Yet, self-citation practices become problematic when they are different across groups or are used to “game the system.” Future work should investigate the downstream effects of self-citation differences to see whether they impact the career trajectories of certain groups. We hope that this work will help to raise awareness about factors influencing self-citation practices to better inform authors, editors, funding agencies, and institutions in Neurology, Neuroscience, and Psychiatry.”

      Some other challenges arise when taking a statistical perspective. For any given paper, we could browse through the references, and determine whether a particular reference would be warranted or not. For instance, we could note that there might be a reference included that is not at all relevant to the paper. Taking a broader perspective, the irrelevant reference might point to work by others, included just for reasons of prestige, so-called perfunctory citations. But it could of course also include self-citations. When we simply start counting all self-citations, we do not see what fraction of those self-citations would be warranted as references. The question then emerges, what level of self-citations should be counted as "high"? How should we determine that? If we observe differences in self-citation rates, what does it tell us?

      Our focus is when the self-citation practices differ across groups. We agree that, on a case-by-case basis, there is no exact number for a self-citation rate that is “high.” With a dataset of the current size, evaluating whether each individual self-citation is appropriate is not feasible. If we observe differences in self-citation rate, this may tell us about broad (not individual-level) trends and differences in self-citing practice. If one group is self-citing much more highly compared to another group–even after covarying relevant variables such as prior publication history–then the self-citation differences can likely be attributed to differences in self-citation practices/behaviors.

      For example, the authors find that the (any author) self-citation rate in Neuroscience is 10.7% versus 15.9% in Psychiatry. What does this difference mean? Are psychiatrists citing themselves more often than neuroscientists? First author men showed a self-citation rate of 5.12% versus a self-citation rate of 3.34% of women first authors. Do men engage in more problematic citation behaviour? Junior researchers (10-year career) show a self-citation rate of about 5% compared to a self-citation rate of about 10% for senior researchers (30-year career). Are senior researchers therefore engaging in more problematic citation behaviour? The answer is (most likely) "no", because senior authors have simply published more, and will therefore have more opportunities to refer to their own work. To be clear: the authors are aware of this, and also take this into account. In fact, these "raw" various self-citation rates may, as the authors themselves say, "give the illusion" of self-citation rates, but these are somehow "hidden" by, for instance, career seniority.

      We included numerous covariates in our model. In addition, to address the difference between “raw” and “modeled” self-citation rates, we added the following section (page 13, line 384):

      “2.10 Reconciling differences between raw data and models

      The raw and GAM-derived data exhibited some conflicting results, such as for gender and field of research. To further study covariates associated with this discrepancy, we modeled the publication history for each author (at the time of publication) in our dataset (Table 2). The model terms included academic age, article year, journal impact factor, field, LMIC status, gender, and document type. Notably, Neuroscience was associated with the fewest number of papers per author. This explains how authors in Neuroscience could have the lowest raw self-citation rates but the highest self-citation rates after including covariates in a model. In addition, being a man was associated with about 0.25 more papers. Thus, gender differences in self-citation likely emerged from differences in the number of papers, not in any self-citation practices.”

      Again, the authors do consider this, and "control" for career length and number of publications, et cetera, in their regression model. Some of the previous observations then change in the regression model. Neuroscience doesn't seem to be self-citing more, there just seem to be junior researchers in that field compared to Psychiatry. Similarly, men and women don't seem to show an overall different self-citation behaviour (although the authors find an early-career difference), the men included in the study simply have longer careers and more publications.

      But here's the key issue: what does it then mean to "control" for some variables? This doesn't make any sense, except in the light of causality. That is, we should control for some variable, such as seniority, because we are interested in some causal effect. The field may not "cause" the observed differences in self-citation behaviour, this is mediated by seniority. Or is it confounded by seniority? Are the overall gender differences also mediated by seniority? How would the selection of high-impact journals "bias" estimates of causal effects on self-citation? Can we interpret the coefficients as causal effects of that variable on self-citations? If so, would we try to interpret this as total causal effects, or direct causal effects? If they do not represent causal effects, how should they be interpreted then? In particular, how should it "inform author, editors, funding agencies and institutions", as the authors say? What should they be informed about?

      We apologize for our misuse of language. We will be more clear, as in most previous self-citation papers, that our analysis is NOT causal. Causal datasets do have some benefits in citation research, but a limitation is that they may not cover as wide of a range of authors. Furthermore, non-causal correlational studies can still be useful in informing authors, editors, funding agencies, and institutions. Association studies are widely used across various fields to draw non-causal conclusions. We made numerous changes to reduce our causal language.

      Before: “We then developed a probability model of self-citation that controls for numerous covariates, which allowed us to obtain significance estimates for each variable of interest.”

      After (page 3, line 113): “We then developed a probability model of self-citation that includes numerous covariates, which allowed us to obtain significance estimates for each variable of interest.”

      Before: “As such, controlling for various author- and article-level characteristics can improve the interpretability of self-citation rate trends.”

      After (page 11, line 321): “As such, covarying various author- and article-level characteristics can improve the interpretability of self-citation rate trends.”

      Before: “Initially, it appeared that self-citation rates in Neuroscience are lower than Neurology and Psychiatry, but after controlling for various confounds, the self-citation rates are higher in Neuroscience.”

      After (page 15, line 468): “Initially, it appeared that self-citation rates in Neuroscience are lower than Neurology and Psychiatry, but after considering several covariates, the self-citation rates are higher in Neuroscience.”

      We also added the following text to the limitations section (page 16, line 526):

      “Seventh, the analysis presented in this work is not causal. Association studies are advantageous for increasing sample size, but future work could investigate causality in curated datasets.”

      The authors also "encourage authors to explore their trends in self-citation rates". It is laudable to be self-critical and review ones own practices. But how should authors interpret their self-citation rate? How useful is it to know whether it is 5%, 10% or 15%? What would be the "reasonable" self-citation rate? How should we go about constructing such a benchmark rate? Again, this would necessitate some causal answer. Instead of looking at the self-citation rate, it would presumably be much more informative to simply ask authors to check whether references are appropriate and relevant to the topic at hand.

      We believe that our tool is valuable for authors to contextualize their own self-citation rates. For instance, if an author has published hundreds of articles, it is not practical to count the number of self-citations in each. We have added two portions of text to the limitations section:

      (page 16, line 524): “In addition, these models do not account for whether a specific citation is appropriate, though some situations may necessitate higher self-citation rates.”

      (page 16, line 535): “Despite these limitations, we found significant differences in self-citation rates for various groups, and thus we encourage authors to explore their trends in self-citation rates. Self-citation rates that are higher than average are not necessarily wrong, but suggest that authors should further reflect on their current self-citation practices.”

      In conclusion, the study shows some interesting and relevant differences in self-citation rates. As such, it is a welcome contribution to ongoing discussions of (self) citations. However, without a clear causal framework, it is challenging to interpret the observed differences.

      We agree that causal studies provide many benefits. Yet, association studies also provide many benefits. For example, an association study allowed us to analyze a wider range of articles than a causal study would have.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Statistical suggestions:

      (1) To improve statistical inference, nesting should be accounted for in all of the analyses. For example, the logistic regression model using citing/cited pairs should include random effects for article, author, and perhaps subfield, in order for independence of observations to be plausible. Similarly, bootstrapping and permutation would ideally occur at the author level rather than (or in addition to) the paper level.

      Detailed updates addressing these points are in the public review. In short, we found computational challenges with many levels of the random effects (>100,000) and millions of observations at the citation pairs level. As such, we decided to model citations rates and counts by paper. In this case, we found that results could be unstable when including author-level random effects because in many cases there was only one author per group. Instead, to avoid inappropriately narrow confidence bands, we resampled the dataset such that each author was only represented once. For example, if Author A had five papers in this dataset, then one of their five papers was randomly selected. We repeated the random resampling 100 times (Figure S12). We updated our description of our models in the Methods section (page 21, line 754).

      For permutation tests and bootstrapping, we now define an “exchangeability block” as a co-authorship group of authors. In this dataset, that meant any authors who published together (among the articles in this dataset) as a First Author / Last Author pairing were assigned to the same exchangeability block. It is not realistic to check for overlapping middle authors in all papers because of the collaborative nature of the field. In addition, we believe that self-citations are primarily controlled by first and last authors, so we can assume that middle authors do not control self-citation habits. We then performed bootstrapping and permutation tests in the constraints of the exchangeability blocks.

      (2) In general, I am having trouble understanding the structure of the regression models. My current belief is that rows are composed of individual citations from papers' reference lists, with the outcome representing their status as a self-citation or not, and with various citing article and citing author characteristics as predictors. However, the fact that author type is included in the model as a predictor (rather than having a model for FA self-citations and another for LA self-citations) suggests to me that each citation is entered as two separate rows - once noting whether it was a FA self-citation and once noting whether it was an LA self-citation - and then it is run as a single model.

      (2a) If I am correct, the model is unlikely to be producing valid inference. I would recommend breaking this analysis up into two separate models, and including article-, author-, and subfield-level random effects. You could theoretically include a citation-level random effect and keep it as one model, but each 'group' would only have two observations and the model would be fairly unstable as a result.

      (2b) If I am misunderstanding (and even if not), I would encourage you to provide a more detailed description of the dataset structure and the model - perhaps with a table or diagram

      We split the data into two models and decided to model on the level of a paper (self-citation rate and self-citation count). In addition, we subsampled the dataset such that each author only appears once to avoid misestimation of confidence intervals (see point (1) above). As described in the public review, we included much more detail in our methods section now to improve the clarity of our models.

      (3) I would suggest removing the inverse hyperbolic sine transform and replacing it with a more flexible approach to estimating the relationships' shape, like generalized additive models or other spline-based methods to ensure that the chosen method is appropriate - or at the very least checking that it is producing a realistic fit that reflects the underlying shape of the relationships.

      More details are available in the public review, but we now use GAMs throughout the manuscript.

      (4) For the "highly self-citing" analysis, it is unclear why papers in the 15-25% range were dropped rather than including them as their own category in an ordinal model. I might suggest doing the latter, or explaining the decision more fully

      We previously included this analysis as a paper-level model because our main model was at the level of citation pairs. Now, we removed this analysis because we model self-citation rates and counts by paper.

      (5) It would be beneficial for the reader to know what % of the data was dropped for each analysis, and for your team to make sure that there is not differential missing data that could affect the interpretation of the results (e.g., differences in self-citation being due to differences in Scopus ID coverage).

      Thank you for this suggestion. We added more detailed missingness data to 4.3 Data exclusions and missingness. We did find differential missingness and added it to the limitations section. However, certain aspects of this cannot be corrected because the data are just not available (e.g., Scopus coverage issues). Further details are available in the public review.

      Conceptual thoughts:

      (1) I agree with your decision to focus on the second definition of self-citation (self-cites relative to my citations to others' work) rather than the first (self-cites relative to others' citations to my work). But it does seem that the first definition is relevant in the context of gaming citation metrics. For example, someone who writes one paper per year with a reference list of 30% self-citations will have much less of an impact on their H-index than someone who writes 10 papers per year with 10% self-citations. It could be interesting to see how these definitions interact, and whether people who are high on one measure tend to be high on the other.

      We agree this would be interesting to investigate in the future. Unfortunately, our dataset is organized at the level of the paper and thus does not contain information regarding how many times the authors cite a particular work. We hope that we can explore this interaction in the future.

      (2) This is entirely speculative, but I wonder whether the increasing rate of LA self-citation relative to FA self-citation is partly due to PIs over-citing their own lab to build up their trainees' citation records and help them succeed in an increasingly competitive job market. This sounds more innocuous than doing it to benefit their own reputation, but it would provide another mechanism through which students from large and well-funded labs get a leg-up in the job market. Might be interesting to explore, though I'm not exactly sure how :)

      This is a very interesting point. We do not have any means to investigate this with the current dataset, but we added it to the discussion (page 14, line 421):

      “A third, more optimistic explanation is that principal investigators (typically Last Authors) are increasingly self-citing their lab’s papers to build up their trainee’s citation records for an increasingly competitive job market.”

      Reviewer #2 (Recommendations For The Authors):

      (1) In regards to point 1 in the public review: In the spirit of transparency, the authors would benefit from providing a rationale for their choice of top journals, and the methodology used to identify them. It would also be valuable to include the impact factor of each journal in the S1 table alongside their names.

      Given the availability and executability of code, it would be useful to see how and if the self-citation trends vary amongst the "low impact" journals (as measured by the IF). This could go in any of the three directions:

      a. If it is found that self-citations are not as prevalent in low impact journals, this could be a great starting point for a conversation around the evaluation of journals based on impact factor, and the role of self-citations in it.

      b. If it is found that self-citations are as prevalent in low impact journals as high impact journals, that just strengthens your results further.

      c. If it is found that self-citations are more prevalent in low impact journals, this would mean your current statistics are a lower bound to the actual problem. This is also intuitive in the sense that high impact journals get more external citations (and more exposure) than low impact journals, as such authors (and journals) may be less likely to self-cite.

      Expanding the dataset to include many more journals was not feasible. Instead, we included an impact factor term in our models, as detailed in the public review. We found no strong trends in the association between impact factor and self-citation rate/count. Another important note is that these journals were considered “high impact” in 2020, but many had lower impact factors in earlier years. Thus, our modeling allows us to estimate how impact factor is related to self-citations across a wide range of impact factors.

      It is crucial to consider utilizing such a comprehensive database as Scopus, which provides a more thorough list of all journals in Neuroscience, to obtain a more representative sample. Alternatively, other datasets like Microsoft Academic Graph, and OpenAlex offer information on the field of science associated with each paper, enabling a more comprehensive analysis.

      We agree that certain datasets may offer a wider view of the entire field. However, we included a large number of papers and journals relative to previous studies. In addition, Scopus provides a lot of detailed and valuable author-level information. We had to limit our calls to the Scopus API so restricted journals by 2020 impact factor.

      (2) In regards to point 2 in the public review: To enhance the accuracy and specificity of the analysis, it would be beneficial to distinguish neuroscientists among the co-authors. This could be accomplished by examining their publication history leading up to the time of publication of the paper, and identify each author's level of engagement and specialization within the field of neuroscience.

      Since the field of neuroscience is largely based on collaborations, we find that it might be impossible to determine who is a neuroscientist. For example, a researcher with a publication history in physics may now be focusing on computational neuroscience research. As such, we feel that our current work, which ensures that the papers belong to neuroscience, is representative of what one may expect in terms of neuroscience research and collaboration.

      (3) In regards to point 3 in the public review: I highly recommend plotting self-citation rate as the number of papers in the reference list over the number of total publications to date of paper publication.

      As described in the public review, we have now done this (Figure S3).

      (4) In regards to point 5 in the public review: It would be useful to consider the "quality" of citations to further the discussion on self-citations. For instance, differentiating between self-citations that are perfunctory and superficial from those that are essential for showing developmental work, would be a valuable contribution.

      Other databases may have access to this information, but ours unfortunately does not. We agree that this is an interesting area of work.

      (5) The authors are to be commended for their logistic regression models, as they control for many confounders that were lacking in their earlier descriptive statistics. However, it would be beneficial to rerun the same analysis but on a linear model whereby the outcome variable would be the number of self-citations per author. This would possibly resolve many of the comments mentioned above.

      Thank you for your suggestion. As detailed in the public review, we now model the number of self-citations. This is modeled on the paper level, not the author level, because our dataset was downloaded by paper, not by author.

      Minor suggestions:

      (1) Abstract says one of your findings is: "increasing self-citation rates of First Authors relative to Last Authors". Your results actually show the opposite (see Figure 1b).

      Thank you for catching this error. We corrected it to match the results and discussion in the paper:

      “…increasing self-citation rates of Last Authors relative to First Authors.”

      (2) It might be interesting to compute an average academic age for each paper, and look at self-citation vs average academic age plot.

      We agree that this would be an interesting analysis. However, to limit calls to the API, we collected academic age data only on First and Last Authors.

      (3) It may be interesting to look at the distribution of women in different subfields within neuroscience, and the interaction of those in the context of self-citations.

      Thank you for this interesting suggestion. We added the following analysis (page 9, line 305):

      “Furthermore, we explored topic-by-gender interactions (Figure S10). In short, men and women were relatively equally represented as First Authors, but more men were Last Authors across all topics. Self-citation rates were higher for men across all topics.”

      Reviewer #3 (Recommendations For The Authors):

      - In the abstract, "flaws in citation practices" seems worded rather strongly.

      We respectfully disagree, as previous works have shown significant bias in citation practices. For example, Dworkin et al. (Dworkin et al. 2020) found that neuroscience reference lists tended to under-cite women, even after including various covariates.

      - Links of the references to point to (non-accessible) paperpile references, you would probably want to update this.

      We apologize for the inconvenience and have now removed these links.

      - p 2, l 24: The explanation of ref. (5) seems to be a bit strangely formulated. The point of that article is that citations to work that reinforce a particular belief are more likely to be cited, which *creates* unfounded authority. The unfounded authority itself is hence no part of the citation practices

      Thank you for catching our misinterpretation. We have now removed this part of the sentence.

      - p 3, l 16: "h indices" or "citations" instead of "h-index".

      We now say “h-indices”.

      - p 5, l 5: how was the manual scoring done?

      We added the following to the caption of Figure S1.

      “Figure S1. Comparison between manual scoring of self-citation rates and self-citation rates estimated from Python scripts in 5 Psychiatry journals: American Journal of Psychiatry, Biological Psychiatry, JAMA Psychiatry, Lancet Psychiatry, and Molecular Psychiatry. 906 articles in total were manually evaluated (10 articles per journal per year from 2000-2020, four articles excluded for very large author list lengths and thus high difficulty of manual scoring). For manual scoring, we downloaded information about all references for a given article and searched for matching author names.”

      - p 5, l 23: Why this specific p-value upper bound of 4e-3? From later in the article, I understand that this stems from the 10000 bootstrap sample, with then taking a Bonferroni correction? Perhaps good to clarify this briefly somewhere.

      Thank you for this suggestion. We now perform Benjamini/Hochberg false discovery rate (FDR) correction, but we added a description of the minimum P value from permutations (page 21, line 748):

      “All P values described in the main text were corrected with the Benjamini/Hochberg 16 false discovery rate (FDR) correction. With 10,000 permutations, the lowest P value after applying FDR correction is P=2.9e-4, which indicates that the true point would be the most extreme in the simulated null distribution.”

      - Fig. 1, caption: The (a) and (b) labelling here is a bit confusing, because the first sentence suggests both figures portray the same, but do so for different time periods. Perhaps rewrite, so that (a) and (b) are both described in a single sentence, instead of having two different references to (a) and (b).

      Thank you for pointing this out. We fixed the labeling of this caption:

      “Figure 1. Visualizing recent self-citation rates and temporal trends. a) Kernel density estimate of the distribution of First Author, Last Author, and Any Author self-citation rates in the last five years. b) Average self-citation rates over every year since 2000, with 95% confidence intervals calculated by bootstrap resampling.”

      - p7, l 9: Regarding "academic age", note that there might be a difference between "age" effects and "cohort" effects. That is, there might be difference between people with a certain career age who started in 1990 and people with the same career age, but who started in 2000, which would be a "cohort" effect.

      We agree that this is a possible effect and have added it to the limitations (page 16, line 532):

      “Tenth, while we considered academic age, we did not consider cohort effects. Cohort effects would depend on the year in which the individual started their career.”

      - p 7, l 15: "jumps" suggests some sort of sudden or discontinuous transition, I would just say "increases".

      We now say “increases.”

      - Fig. 2: Perhaps it should be made more explicit that this includes only academics with at least 50 papers. Could the authors please clarify whether the same limitation of at least 50 papers also features in other parts of the analysis where academic age is used? This selection could affect the outcomes of the analysis, so its consequences should be carefully considered. One possibility for instance is that it selects people with a short career length who have been exceptionally productive, namely those that have had 50 papers, but only started publishing in 2015 or so. Such exceptionally productive people will feature more highly in the early career part, because they need to be so productive in order to make the cut. For people with a longer career, the 50 papers would be less of a hurdle, and so would select more and less productive people more equally.

      We apologize for the lack of clarity. We did not use this requirement where academic age was used. We mainly applied this requirement when aggregating by country, as we did not want to calculate self-citation rate in a country based on only several papers. We have clarified various data exclusions in our new section 4.3 Data exclusions and missingness.

      - p 8, l 11: The affiliated institution of an author is not static, but rather changes throughout time. Did the authors consider this? If not, please clarify that this refers to only the most recent affiliation (presumably). Authors also often have multiple affiliations. How did the authors deal with this?

      The institution information is at the time of publication for each paper. We added more detail to our description of this on page 19, line 656:

      “For both First and Last Authors, we found the country of their institutional affiliation listed on the publication. In the case of multiple affiliations, the first one listed in Scopus was used.”

      - p 10, l 6: How were these self-citation rates calculated? This is averaged per author (i.e. only considering papers assigned to a particular topic) and then averaged across authors? (Note that in this way, the average of an author with many papers will weigh equally with the average of an author with few papers, which might skew some of the results).

      We calculate it across the entire topic (i.e., do NOT calculate by author first). We updated the description as follows (page 7, line 211):

      “We then computed self-citation rates for each of these topics (Figure 4) as the total number of self-citations in each topic divided by the total number of references in each topic…”

      - p 13, l 18: Is the academic age analysis here again limited to authors having at least 50 papers?

      This is not limited to at least 50 papers. To clarify, the previous analysis was not limited to authors with 50 papers. It was instead limited to ages in our dataset that had at least 50 data points. e.g., If an academic age of 70 only had 20 data points in our dataset, it would have been excluded.

      - Fig. 5: Here, comparing Fig. 5(d) and 5(f) suggests that partly, the self-citation rate differences between men and women, might be the result of the differences in number of papers. That is, the somewhat higher self-citation rate at a given academic age, might be the result of the higher number of papers at that academic age. It seems that this is not directly described in this part of the analysis (although this seems to be the case from the later regression analysis).

      We agree with this idea and have added a new section as follows (page 13, line 384):

      “2.10 Reconciling differences between raw data and models

      The raw and GAM-derived data exhibited some conflicting results, such as for gender and field of research. To further study covariates associated with this discrepancy, we modeled the publication history for each author (at the time of publication) in our dataset (Table 2). The model terms included academic age, article year, journal impact factor, field, LMIC status, gender, and document type. Notably, Neuroscience was associated with the fewest number of papers per author. This explains how authors in Neuroscience could have the lowest raw self-citation rates by highest self-citation rates after including covariates in a model. In addition, being a man was associated with about 0.25 more papers. Thus, gender differences in self-citation likely emerged from differences in the number of papers, not in any self-citation practices.”

      - Section 2.10. Perhaps the authors could clarify that this analysis takes individual articles as the unit of analysis, not citations.

      We updated all our models to take individual articles and have clarified this with more detailed tables.

      - p 18, l 10: "Articles with between 15-25% self-citation rates were 10 discarded" Why?

      We agree that these should not be discarded. However, we previously included this analysis as a paper-level model because our main model was at the level of citation pairs. Now, we removed this analysis because we model self-citation rates and counts by paper.

      - p 20, l 5: "Thus, early-career researchers may be less incentivized to 5 self-promote (e.g., self-cite) for academic gains compared to 20 years ago." How about the possibility that there was less collaboration, so that first authors would be more likely to cite their own paper, whereas with more collaboration, they will more often not feature as first author?

      This is an interesting point. We feel that more collaboration would generally lead to even more self-citations, if anything. If an author collaborates more, they are more likely to be on some of the references as a middle author (which by our definition counts toward self-citation rates).

      - p 20, l 15: Here the authors call authors to avoid excessive self-citations. Of course, there's nothing wrong with calling for that, but earlier the authors were more careful to not label something directly as excessive self-citations. Here, by stating it like this, the authors suggest that they have looked at excessive self-citations.

      We rephrased this as follows:

      Before: “For example, an author with 30 years of experience cites themselves approximately twice as much as one with 10 years of experience on average. Both authors have plenty of works that they can cite, and likely only a few are necessary. As such, we encourage authors to be cognizant of their citations and to avoid excessive self-citations.”

      After: “For example, an author with 30 years of experience cites themselves approximately twice as much as one with 10 years of experience on average. Both authors have plenty of works that they can cite, and likely only a few are necessary. As such, we encourage authors to be cognizant of their citations and to avoid unnecessary self-citations.”

      - p 22, l 11: Here again, the same critique as p 20, l15 applies.

      We switched “excessively” to “unnecessarily.”

      - p 23, l 12: The authors here critique ref. (21) of ascertainment bias, namely that they are "including only highly-achieving researchers in the life 12 sciences". But do the authors not do exactly the same thing? That is, they also only focus on the top high-impact journals.

      We included 63 high-impact journals with tens of thousands of authors. In addition, some of these journals were not high-impact at the time of publication. For example, Acta Neuropathologica had an impact factor of 17.09 in 2020 but 2.45 in 2000. This still is a limitation of our work, but we do cover a much broader range of works than the listed reference (though their analysis also has many benefits since it included more detailed information).

      - p 26, l 22-26: It seems that the matching is done quite broadly (matching last names + initials at worst) for self-citations, while later (in section 4.9, p 31, l 9), the authors switch to only matching exact Scopus Author IDs. Why not use the same approach throughout? Or compare the two definitions (narrow / broad).

      Thank you for catching this mistake. We now use the approach of matching Scopus Author IDs throughout.

      - S8: it might be nice to explore open alternatives, such as OpenAlex or OpenAIRE, instead of the closed Scopus database, which requires paid access (which not all institutions have, perhaps that could also be corrected in the description in GitHub).

      Thank you for this suggestion. Unfortunately, switching databases would require starting our analysis from the beginning. On our GitHub page, we state: “Please email matthew.rosenblatt@yale.edu if you have trouble running this or do not have institutional access. We can help you run the code and/or run it for you and share your self-citation trends.” We feel that this will allow us to help researchers who may not have institutional access. In addition, we released our aggregated, de-identified (title and paper information removed) data on GitHub for other researchers to use.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study presents valuable findings on the potential of short-movie viewing fMRI protocol to explore the functional and topographical organization of the visual system in awake infants and toddlers. Although the data are compelling given the difficulty of studying this population, the evidence presented is incomplete and would be strengthened by additional analyses to support the authors' claims. This study will be of interest to cognitive neuroscientists and developmental psychologists, especially those interested in using fMRI to investigate brain organisation in pediatric and clinical populations with limited fMRI tolerance.

      We are grateful for the thorough and thoughtful reviews. We have provided point-bypoint responses to the reviewers’ comments, but first, we summarize the major revisions here. We believe these revisions have substantially improved the clarity of the writing and impact of the results.

      Regarding the framing of the paper, we have made the following major changes in response to the reviews:

      (1) We have clarified that our goal in this paper was to show that movie data contains topographic, fine-grained details of the infant visual cortex. In the revision, we now state clearly that our results should not be taken as evidence that movies could replace retinotopy and have reworded parts of the manuscript that could mislead the reader in this regard.

      (2) We have added extensive details to the (admittedly) complex methods to make them more approachable. An example of this change is that we have reorganized the figure explaining the Shared Response Modelling methods to divide the analytic steps more clearly.

      (3) We have clarified the intermediate products contributing to the results by adding 6 supplementary figures that show the gradients for each IC or SRM movie and each infant participant.

      In response to the reviews, we have conducted several major analyses to support our findings further:

      (1) To verify that our analyses can identify fine-grained organization, we have manually traced and labeled adult data, and then performed the same analyses on them. The results from this additional dataset validate that these analyses can recover fine-grained organization of the visual cortex from movie data.

      (2) To further explore how visual maps derived from movies compare to alternative methods, we performed an anatomical alignment control analysis. We show that high-quality maps can be predicted from other participants using anatomical alignment.

      (3) To test the contribution of motion to the homotopy analyses, we regressed out the motion effects in these analyses. We found qualitatively similar results to our main analyses, suggesting motion did not play a substantial role.

      (4) To test the contribution of data quantity to the homotopy analyses, we correlated the amount of movie data collected from each participant with the homotopy results. We did not find a relationship between data quantity and the homotopy results. 

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      Ellis et al. investigated the functional and topographical organization of the visual cortex in infants and toddlers, as evidenced by movie-viewing data. They build directly on prior research that revealed topographic maps in infants who completed a retinotopy task, claiming that even a limited amount of rich, naturalistic movie-viewing data is sufficient to reveal this organization, within and across participants. Generating this evidence required methodological innovations to acquire high-quality fMRI data from awake infants (which have been described by this group, and elsewhere) and analytical creativity. The authors provide evidence for structured functional responses in infant visual cortex at multiple levels of analyses; homotopic brain regions (defined based on a retinotopy task) responded more similarly to one another than to other brain regions in visual cortex during movie-viewing; ICA applied to movie-viewing data revealed components that were identifiable as spatial frequency, and to a lesser degree, meridian maps, and shared response modeling analyses suggested that visual cortex responses were similar across infants/toddlers, as well as across infants/toddlers and adults. These results are suggestive of fairly mature functional response profiles in the visual cortex in infants/toddlers and highlight the potential of movie-viewing data for studying finer-grained aspects of functional brain responses, but further evidence is necessary to support their claims and the study motivation needs refining, in light of prior research.

      Strengths:

      - This study links the authors' prior evidence for retinotopic organization of visual cortex in human infants (Ellis et al., 2021) and research by others using movie-viewing fMRI experiments with adults to reveal retinotopic organization (Knapen, 2021).

      - Awake infant fMRI data are rare, time-consuming, and expensive to collect; they are therefore of high value to the community. The raw and preprocessed fMRI and anatomical data analyzed will be made publicly available.

      We are grateful to the reviewer for their clear and thoughtful description of the strengths of the paper, as well as their helpful outlining of areas we could improve.

      Weaknesses:

      - The Methods are at times difficult to understand and in some cases seem inappropriate for the conclusions drawn. For example, I believe that the movie-defined ICA components were validated using independent data from the retinotopy task, but this was a point of confusion among reviewers. 

      We acknowledge the complexity of the methods and wish to clarify them as best as possible for the reviewers and the readers. We have extensively revised the methods and results sections to help avoid potential misunderstandings. For instance, we have revamped the figure and caption describing the SRM pipeline (Figure 5).

      To answer the stated confusion directly, the ICA components were derived from the movie data and validated on the (completely independent) retinotopy data. There were no additional tasks. The following text in the paper explains this point:

      “To assess the selected component maps, we correlated the gradients (described above) of the task-evoked and component maps. This test uses independent data: the components were defined based on movie data and validated against task-evoked retinotopic maps.” Pg. 11

      In either case: more analyses should be done to support the conclusion that the components identified from the movie reproduce retinotopic maps (for example, by comparing the performance of movie-viewing maps to available alternatives (anatomical ROIs, group-defined ROIs). 

      Before addressing this suggestion, we want to restate our conclusions: features of the retinotopic organization of infant visual cortex could be predicted from movie data. We did not conclude that movie data could ‘reproduce’ retinotopic maps in the sense that they would be a replacement. We recognize that this was not clear in our original manuscript and have clarified this point throughout, including in this section of the discussion:

      “To be clear, we are not suggesting that movies work well enough to replace a retinotopy task when accurate maps are needed. For instance, even though ICA found components that were highly correlated with the spatial frequency map, we also selected some components that turned out to have lower correlations. Without knowing the ground truth from a retinotopy task, there would be no way to weed these out. Additionally, anatomical alignment (i.e., averaging the maps from other participants and anatomically aligning them to a held-out participant) resulted in maps that were highly similar to the ground truth. Indeed, we previously23 found that adult-defined visual areas were moderately similar to infants. While functional alignment with adults can outperform anatomical alignment methods in similar analyses27, here we find that functional alignment is inferior to anatomical alignment. Thus, if the goal is to define visual areas in an infant that lacks task-based retinotopy, anatomical alignment of other participants’ retinotopic maps is superior to using movie-based analyses, at least as we tested it.” Pg. 21

      As per the reviewer’s suggestion and alluded to in the paragraph above, we have created anatomically aligned visual maps, providing an analogous test to the betweenparticipant analyses like SRM. We find that these maps are highly similar to the ground truth. We describe this result in a new section of the results:

      “We performed an anatomical alignment analog of the functional alignment (SRM) approach. This analysis serves as a benchmark for predicting visual maps using taskbased data, rather than movie data, from other participants. For each infant participant, we aggregated all other infant or adult participants as a reference. The retinotopic maps from these reference participants were anatomically aligned to the standard surface template, and then averaged. These averages served as predictions of the maps in the test participant, akin to SRM, and were analyzed equivalently (i.e., correlating the gradients in the predicted map with the gradients in the task-based map). These correlations (Table S4) are significantly higher than for functional alignment (using infants to predict spatial frequency, anatomical alignment > functional alignment: ∆Fisher Z M=0.44, CI=[0.32–0.58], p<.001; using infants to predict meridians, anatomical alignment > functional alignment: ∆Fisher Z M=0.61, CI=[0.47–0.74], p<.001; using adults to predict spatial frequency, anatomical alignment > functional alignment: ∆Fisher Z

      M=0.31, CI=[0.21–0.42], p<.001; using adults to predict meridians, anatomical alignment > functional alignment: ∆Fisher Z M=0.49, CI=[0.39–0.60], p<.001). This suggests that even if SRM shows that movies can be used to produce retinotopic maps that are significantly similar to a participant, these maps are not as good as those that can be produced by anatomical alignment of the maps from other participants without any movie data.” Pg. 16–17

      Also, the ROIs used for the homotopy analyses were defined based on the retinotopic task rather than based on movie-viewing data alone - leaving it unclear whether movie-viewing data alone can be used to recover functionally distinct regions within the visual cortex.

      We agree with the reviewer that our approach does not test whether movie-viewing data alone can be used to recover functionally distinct regions. The goal of the homotopy analyses was to identify whether there was functional differentiation of visual areas in the infant brain while they watch movies. This was a novel question that provides positive evidence that these regions are functionally distinct. In subsequent analyses, we show that when these areas are defined anatomically, rather than functionally, they also show differentiated function (e.g., Figure 2). Nonetheless, our intention was not to use the homotopy analyses to define the regions. We have added text to clarify the goal and novelty of this analysis.

      “Although these analyses cannot define visual maps, they test whether visual areas have different functional signatures.” Pg. 6

      Additionally, even if the goal were to define areas based on homotopy, we believe the power of that analysis would be questionable. We would need to use a large amount of the movie data to define the areas, leaving a low-powered dataset to test whether their function is differentiated by these movie-based areas.

      - The authors previously reported on retinotopic organization of the visual cortex in human infants (Ellis et al., 2021) and suggest that the feasibility of using movie-viewing experiments to recover these topographic maps is still in question. They point out that movies may not fully sample the stimulus parameters necessary for revealing topographic maps/areas in the visual cortex, or the time-resolution constraints of fMRI might limit the use of movie stimuli, or the rich, uncontrolled nature of movies might make them inferior to stimuli that are designed for retinotopic mapping, or might lead to variable attention between participants that makes measuring the structure of visual responses across individuals challenging. This motivation doesn't sufficiently highlight the importance or value of testing this question in infants. Further, it's unclear if/how this motivation takes into account prior research using movie-viewing fMRI experiments to reveal retinotopic organization in adults (e.g., Knapen, 2021). Given the evidence for retinotopic organization in infants and evidence for the use of movie-viewing experiments in adults, an alternative framing of the novel contribution of this study is that it tests whether retinotopic organization is measurable using a limited amount of movie-viewing data (i.e., a methodological stress test). The study motivation and discussion could be strengthened by more attention to relevant work with adults and/or more explanation of the importance of testing this question in infants (is the reason to test this question in infants purely methodological - i.e., as a way to negate the need for retinotopic tasks in subsequent research, given the time constraints of scanning human infants?).

      We are grateful to the reviewer for giving us the opportunity to clarify the innovations of this research. We believe that this research contributes to our understanding of how infants process dynamic stimuli, demonstrates the viability and utility of movie experiments in infants, and highlights the potential for new movie-based analyses (e.g., SRM). We have now consolidated these motivations in the introduction to more clearly motivate this work:

      “The primary goal of the current study is to investigate whether movie-watching data recapitulates the organization of visual cortex. Movies drive strong and naturalistic responses in sensory regions while minimizing task demands12, 13, 24 and thus are a proxy for typical experience. In adults, movies and resting-state data have been used to characterize the visual cortex in a data-driven fashion25–27. Movies have been useful in awake infant fMRI for studying event segmentation28, functional alignment29, and brain networks30. However, this past work did not address the granularity and specificity of cortical organization that movies evoke. For example, movies evoke similar activity in infants in anatomically aligned visual areas28, but it remains unclear whether responses to movie content differ between visual areas (e.g., is there more similarity of function within visual areas than between31). Moreover, it is unknown whether structure within visual areas, namely visual maps, contributes substantially to visual evoked activity. Additionally, we wish to test whether methods for functional alignment can be used with infants. Functional alignment finds a mapping between participants using functional activity – rather than anatomy – and in adults can improve signal-to-noise, enhance across participant prediction, and enable unique analyses27, 32–34.” Pg. 3-4

      Furthermore, the introduction culminates in the following statement on what the analyses will tell us about the nature of movie-driven activity in infants:

      “These three analyses assess key indicators of the mature visual system: functional specialization between areas, organization within areas, and consistency between individuals.” Pg. 5

      Furthermore, in the discussion we revisit these motivations and elaborate on them further:

      [Regarding homotopy:] “This suggests that visual areas are functionally differentiated in infancy and that this function is shared across hemispheres31.” Pg. 19

      [Regarding ICA:] “This means that the retinotopic organization of the infant brain accounts for a detectable amount of variance in visual activity, otherwise components resembling these maps would not be discoverable.” Pg. 19–20

      [Regarding SRM:] “This is initial evidence that functional alignment may be useful for enhancing signal quality, like it has in adults27,32,33, or revealing changing function over development45.” Pg. 21

      Additionally, we have expanded our discussion of relevant work that uses similar methods such as the excellent research from Knapen (2021) and others:

      “In adults, movies and resting-state data have been used to characterize the visual cortex in a data-driven fashion25-27.” Pg. 4

      “We next explored whether movies can reveal fine-grained organization within visual areas by using independent components analysis (ICA) to propose visual maps in individual infant brains25,26,35,42,43.” Pg. 9

      Reviewer #2 (Public Review):

      Summary:

      This manuscript shows evidence from a dataset with awake movie-watching in infants, that the infant brain contains areas with distinct functions, consistent with previous studies using resting state and awake task-based infant fMRI. However, substantial new analyses would be required to support the novel claim that movie-watching data in infants can be used to identify retinotopic areas or to capture within-area functional organization.

      Strengths:

      The authors have collected a unique dataset: the same individual infants both watched naturalistic animations and a specific retinotopy task. These data position the authors to test their novel claim, that movie-watching data in infants can be used to identify retinotopic areas.

      Weaknesses:

      To claim that movie-watching data can identify retinotopic regions, the authors should provide evidence for two claims:

      - Retinotopic areas defined based only on movie-watching data, predict retinotopic responses in independent retinotopy-task-driven data.

      - Defining retinotopic areas based on the infant's own movie-watching response is more accurate than alternative approaches that don't require any movie-watching data, like anatomical parcellations or shared response activation from independent groups of participants.

      We thank the reviewer for their comments. Before addressing their suggestions, we wish to clarify that we do not claim that movie data can be used to identify retinotopic areas, but instead that movie data captures components of the within and between visual area organization as defined by retinotopic mapping. We recognize that this was not clear in our original manuscript and have clarified this point throughout, including in this section of the discussion:

      “To be clear, we are not suggesting that movies work well enough to replace a retinotopy task when accurate maps are needed. For instance, even though ICA found components that were highly correlated with the spatial frequency map, we also selected some components that turned out to have lower correlations. Without knowing the ground truth from a retinotopy task, there would be no way to weed these out. Additionally, anatomical alignment (i.e., averaging the maps from other participants and anatomically aligning them to a held-out participant) resulted in maps that were highly similar to the ground truth. Indeed, we previously23 found that adult-defined visual areas were moderately similar to infants. While functional alignment with adults can outperform anatomical alignment methods in similar analyses27, here we find that functional alignment with infants is inferior to anatomical alignment. Thus, if the goal is to define visual areas in an infant that lacks task-based retinotopy, anatomical alignment of other participants’ retinotopic maps is superior to using movie-based analyses, at least as we tested it.” Pg. 21

      In response to the reviewer’s suggestion, we compare the maps identified by SRM to the averaged, anatomically aligned maps from infants. We find that these maps are highly similar to the task-based ground truth and we describe this result in a new section:

      “We performed an anatomical alignment analog of the functional alignment (SRM) approach. This analysis serves as a benchmark for predicting visual maps using taskbased data, rather than movie data, from other participants. For each infant participant, we aggregated all other infant or adult participants as a reference. The retinotopic maps from these reference participants were anatomically aligned to the standard surface template, and then averaged. These averages served as predictions of the maps in the test participant, akin to SRM, and were analyzed equivalently (i.e., correlating the gradients in the predicted map with the gradients in the task-based map). These correlations (Table S4) are significantly higher than for functional alignment (using infants to predict spatial frequency, anatomical alignment < functional alignment: ∆Fisher Z M=0.44, CI=[0.32–0.58], p<.001; using infants to predict meridians, anatomical alignment < functional alignment: ∆Fisher Z M=0.61, CI=[0.47–0.74], p<.001; using adults to predict spatial frequency, anatomical alignment < functional alignment: ∆Fisher Z

      M=0.31, CI=[0.21–0.42], p<.001; using adults to predict meridians, anatomical alignment < functional alignment: ∆Fisher Z M=0.49, CI=[0.39–0.60], p<.001). This suggests that even if SRM shows that movies can be used to produce retinotopic maps that are significantly similar to a participant, these maps are not as good as those that can be produced by anatomical alignment of the maps from other participants without any movie data.” Pg. 16–17

      Note that we do not compare the anatomically aligned maps with the ICA maps statistically. This is because these analyses are not comparable: ICA is run within-participant whereas anatomical alignment is necessarily between-participant — either infant or adults. Nonetheless, an interested reader can refer to the Table where we report the results of anatomical alignment and see that anatomical alignment outperforms ICA in terms of the correlation between the predicted and task-based maps.

      Both of these analyses are possible, using the (valuable!) data that these authors have collected, but these are not the analyses that the authors have done so far. Instead, the authors report the inverse of (1): regions identified by the retinotopy task can be used to predict responses in the movies. The authors report one part of (2), shared responses from other participants can be used to predict individual infants' responses in the movies, but they do not test whether movie data from the same individual infant can be used to make better predictions of the retinotopy task data, than the shared response maps.

      So to be clear, to support the claims of this paper, I recommend that the authors use the retinotopic task responses in each individual infant as the independent "Test" data, and compare the accuracy in predicting those responses, based on:

      -  The same infant's movie-watching data, analysed with MELODIC, when blind experimenters select components for the SF and meridian boundaries with no access to the ground-truth retinotopy data.

      -  Anatomical parcellations in the same infant.

      -  Shared response maps from groups of other infants or adults.

      -  (If possible, ICA of resting state data, in the same infant, or from independent groups of infants).

      Or, possibly, combinations of these techniques.

      If the infant's own movie-watching data leads to improved predictions of the infant's retinotopic task-driven response, relative to these existing alternatives that don't require movie-watching data from the same infant, then the authors' main claim will be supported.

      These are excellent suggestions for additional analyses to test the suitability for moviebased maps to replace task-based maps. We hope it is now clear that it was never our intention to claim that movie-based data could replace task-based methods. We want to emphasize that the discoveries made in this paper — that movies evoke fine-grained organization in infant visual cortex — do not rely on movie-based maps being better than alternative methods for producing maps, such as the newly added anatomical alignment.

      The proposed analysis above solves a critical problem with the analyses presented in the current manuscript: the data used to generate maps is identical to the data used to validate those maps. For the task-evoked maps, the same data are used to draw the lines along gradients and then test for gradient organization. For the component maps, the maps are manually selected to show the clearest gradients among many noisy options, and then the same data are tested for gradient organization. This is a double-dipping error. To fix this problem, the data must be split into independent train and test subsets.

      We appreciate the reviewer’s concern; however, we believe it is a result of a miscommunication in our analytic strategy. We have now provided more details on the analyses to clarify how double-dipping was avoided. 

      To summarize, a retinotopy task produced visual maps that were used to trace both area boundaries and gradients across the areas. These data were then fixed and unchanged, and we make no claims about the nature of these maps in this paper, other than to treat them as the ground truth to be used as a benchmark in our analyses. The movie data, which are collected independently from the same infant in the session, used the boundaries from the retinotopy task (in the case of homotopy) or were compared with the maps from the retinotopy task (in the case of ICA and SRM). In other words, the statement that “the data used to generate maps is identical to the data used to validate those maps” is incorrect because we generated the maps with a retinotopy task and validated the maps with the movie data. This means no double dipping occurred.

      Perhaps a cause of the reviewer’s interpretation is that the gradients used in the analysis are not clearly described. We now provide this additional description:  “Using the same manually traced lines from the retinotopy task, we measured the intensity gradients in each component from the movie-watching data. We can then use the gradients of intensity in the retinotopy task-defined maps as a benchmark for comparison with the ICA-derived maps.” Pg. 10

      Regarding the SRM analyses, we take great pains to avoid the possibility of data contamination. To emphasize how independent the SRM analysis is, the prediction of the retinotopic map from the test participant does not use their retinotopy data at all; in fact, the predicted maps could be made before that participant’s retinotopy data were ever collected. To make this prediction for a test participant, we need to learn the inversion of the SRM, but this only uses the movie data of the test participant. Hence, there is no double-dipping in the SRM analyses. We have elaborated on this point in the revision, and we remade the figure and its caption to clarify this point:

      We also have updated the description of these results to emphasize how double-dipping was avoided:

      “We then mapped the held-out participant's movie data into the learned shared space without changing the shared space (Figure 5c). In other words, the shared response model was learned and frozen before the held-out participant’s data was considered.

      This approach has been used and validated in prior SRM studies45.” Pg. 14

      The reviewer suggests that manually choosing components from ICA is double-dipping. Although the reviewer is correct that the manual selection of components in ICA means that the components chosen ought to be good candidates, we are testing whether those choices were good by evaluating those components against the task-based maps that were not used for the ICA. Our statistical analyses evaluate whether the components chosen were better than the components that would have been chosen by random chance. Critically: all decisions about selecting the components happen before the components are compared to the retinotopic maps. Hence there is no double-dipping in the selection of components, as the choice of candidate ICA maps is not informed by the ground-truth retinotopic maps. We now clarify what the goal of this process is in the results:

      “Success in this process requires that 1) retinotopic organization accounts for sufficient variance in visual activity to be identified by ICA and 2) experimenters can accurately identify these components.” Pg. 10

      The reviewer also alludes to a concern that the researcher selecting the maps was not blind to the ground-truth retinotopic maps from participants and this could have influenced the results. In such a scenario, the researcher could have selected components that have the gradients of activity in the places that the infant has as ground truth. The researcher who made the selection of components (CTE) is one of the researchers who originally traced the areas in the participants approximately a year prior to the identification of ICs. The researcher selecting the components didn’t use the ground-truth retinotopic maps as reference, nor did they pay attention to the participant IDs when sorting the IC components. Indeed, they weren’t trying to find participants-specific maps per se, but rather aimed to find good candidate retinotopic maps in general. In the case of the newly added adult analyses, the ICs were selected before the retinotopic mapping was reviewed or traced; hence, no knowledge about the participant-specific ground truth could have influenced the selection of ICs. Even with this process from adults, we find results of comparable strength as we found in infants, as shown in Figure S3. Nonetheless, there is a possibility that this researcher’s previous experience of tracing the infant maps could have influenced their choice of components at the participant-specific level. If so, it was a small effect since the components the researcher selected were far from the best possible options (i.e., rankings of the selected components averaged in the 64th percentile for spatial frequency maps and the 68th percentile for meridian maps). We believe all reasonable steps were taken to mitigate bias in the selection of ICs.

      Reviewer #3 (Public Review):

      The manuscript reports data collected in awake toddlers recording BOLD while watching videos. The authors analyse the BOLD time series using two different statistical approaches, both very complex but do not require any a priori determination of the movie features or contents to be associated with regressors. The two main messages are that 1) toddlers have occipital visual areas very similar to adults, given that an SRM model derived from adult BOLD is consistent with the infant brains as well; 2) the retinotopic organization and the spatial frequency selectivity of the occipital maps derived by applying correlation analysis are consistent with the maps obtained by standard and conventional mapping.

      Clearly, the data are important, and the author has achieved important and original results. However, the manuscript is totally unclear and very difficult to follow; the figures are not informative; the reader needs to trust the authors because no data to verify the output of the statistical analysis are presented (localization maps with proper statistics) nor so any validation of the statistical analysis provided. Indeed what I think that manuscript means, or better what I understood, may be very far from what the authors want to present, given how obscure the methods and the result presentation are.

      In the present form, this reviewer considers that the manuscript needs to be totally rewritten, the results presented each technique with appropriate validation or comparison that the reader can evaluate.

      We are grateful to the reviewer for the chance to improve the paper. We have broken their review into three parts: clarification of the methods, validation of the analyses, and enhancing the visualization.

      Clarification of the methods

      We acknowledge that the methods we employed are complex and uncommon in many fields of neuroimaging. That said, numerous papers have conducted these analyses on adults (Beckman et al., 2005; Butt et al., 2015; Guntupalli et al., 2016; Haak & Beckman, 2018; Knapen, 2021; Lu et al., 2017) and non-human primates (Arcaro & Livingstone, 2017; Moeller et al., 2009). We have redoubled our efforts in the revision to make the methods as clear as possible, expanding on the original text and providing intuitions where possible. These changes have been added throughout and are too vast in number to repeat here, especially without context, but we hope that readers will have an easier time following the analyses now. 

      Additionally, we updated Figures 3 and 5 in which the main ICA and SRM analyses are described. For instance, in Figure 3’s caption we now add details about how the gradient analyses were performed on the components: 

      “We used the same lines that were manually traced on the task-evoked map to assess the change in the component’s response. We found a monotonic trend within area from medial to lateral, just like we see in the ground truth.” Pg. 11

      Regarding Figure 5, we reconsidered the best way to explain the SRM analyses and decided it would be helpful to partition the diagram into steps, reflecting the analytic process. These updates have been added to Figure 5, and the caption has been updated accordingly.

      We hope that these changes have improved the clarity of the methods. For readers interested in learning more, we encourage them to either read the methods-focused papers that debut the analyses (e.g., Chen et al., 2015), read the papers applying the methods (e.g., Guntupalli et al., 2016), or read the annotated code we publicly release which implements these pipelines and can be used to replicate the findings.

      Validation of the analyses

      One of the requests the reviewer makes is to validate our analyses. Our initial approach was to lean on papers that have used these methods in adults or primates (e.g., Arcaro,

      & Livingstone, 2017; Beckman et al., 2005; Butt et al., 2015; Guntupalli et al., 2016; Haak & Beckman, 2018; Knapen, 2021; Moeller et al., 2009) where the underlying organization and neurophysiology is established. However, we have made changes to these methods that differ from their original usage (e.g., we used SRM rather than hyperalignment, we use meridian mapping rather than traveling wave retinotopy, we use movie-watching data rather than rest). Hence, the specifics of our design and pipeline warrant validation. 

      To add further validation, we have rerun the main analyses on an adult sample. We collected 8 adult participants who completed the same retinotopy task and a large subset of the movies that infants saw. These participants were run under maximally similar conditions to infants (i.e., scanned using the same parameters and without the top of the head-coil) and were preprocessed using the same pipeline. Given that the relationship between adult visual maps and movie-driven (or resting-state) analyses has been shown in many studies (Beckman et al., 2005; Butt et al., 2015; Guntupalli et al., 2016; Haak & Beckman, 2018; Knapen, 2021; Lu et al., 2017), these adult data serve as a validation of our analysis pipeline. These adult participants were included in the original manuscript; however, they were previously only used to support the SRM analyses (i.e., can adults be used to predict infant visual maps). The adult results are described before any results with infants, as a way to engender confidence. Moreover, we have provided new supplementary figures of the adult results that we hope will be integrated with the article when viewing it online, such that it will be easy to compare infant and adult results, as per the reviewer’s request. 

      As per the figures and captions below, the analyses were all successful with the adult participants: 1) Homotopic correlations are higher than correlations between comparable areas in other streams or areas that are more distant within stream. 2) A multidimensional scaling depiction of the data shows that areas in the dorsal and ventral stream are dissimilar. 3) Using independent components analysis on the movie data, we identified components that are highly correlated with the retinotopy task-based spatial frequency and meridian maps. 4) Using shared response modeling on the movie data, we predicted maps that are highly correlated with the retinotopy task-based spatial frequency and meridian maps.

      These supplementary analyses are underpowered for between-group comparisons, so we do not statistically compare the results between infants and adults. Nonetheless, the pattern of adult results is comparable overall to the infant results. 

      We believe these adult results provide a useful validation that the infant analyses we performed can recover fine-grained organization.

      The reviewer raises an additional concern about the lack of visualization of the results. We recognize that the plots of the summary statistics do not provide information about the intermediate analyses. Indeed, we think the summary statistics can understate the degree of similarity between the components or predicted visual maps and the ground truth. Hence, we have added 6 new supplementary figures showing the intensity gradients for the following analyses: 1. spatial frequency prediction using ICA, 2. meridian prediction using ICA, 3. spatial frequency prediction using infant SRM, 4.

      meridian prediction using infant SRM, 5. spatial frequency prediction using adult SRM, and 6. meridian prediction using adult SRM.

      We hope that these visualizations are helpful. It is possible that the reviewer wishes us to also visually present the raw maps from the ICA and SRM, akin to what we show in Figure 3A and 3B. We believe this is out of scope of this paper: of the 1140 components that were identified by ICA, we selected 36 for spatial frequency and 17 for meridian maps. We also created 20 predicted maps for spatial frequency and 20 predicted meridian maps using SRM. This would result in the depiction of 93 subfigures, requiring at least 15 new full-page supplementary figures to display with adequate resolution. Instead, we encourage the reader to access this content themselves: we have made the code to recreate the analyses publicly available, as well as both the raw and preprocessed data for these analyses, including the data for each of these selected maps.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) As mentioned in the public review, the authors should consider incorporating relevant adult fMRI research into the Introduction and explain the importance of testing this question in infants.

      Our public response describes the several citations to relevant adult research we have added, and have provided further motivation for the project.

      (2) The authors should conduct additional analyses to support their conclusion that movie data alone can generate accurate retinotopic maps (i.e., by comparing this approach to other available alternatives).

      We have clarified in our public response that we did not wish to conclude that movie data alone can generate accurate retinotopic maps, and have made substantial edits to the text to emphasize this. Thus, because this claim is already not supported by our analyses, we do not think it is necessary to test it further.

      (3) The authors should re-do the homotopy analyses using movie-defined ROIs (i.e., by splitting the movie-viewing data into independent folds for functional ROI definition and analyses).

      As stated above, defining ROIs based on the movie content is not the intended goal of this project. Even if that were the general goal, we do not believe that it would be appropriate to run this specific analysis with the data we collected. Firstly, halving the data for ROI definition (e.g., using half the movie data to identify and trace areas, and then use those areas in the homotopy analysis to run on the other half of data) would qualitatively change the power of the analyses described here. Secondly, we would be unable to define areas beyond hV4/V3AB with confidence, since our retinotopic mapping only affords specification of early visual cortex. Thus we could not conduct the MDS analyses shown in Figure 2.

      (4) If the authors agree that a primary contribution of this study and paper is to showcase what is possible to do with a limited amount of movie-viewing data, then they should make it clearer, sooner, how much usable movie data they have from infants. They could also consider conducting additional analyses to determine the minimum amount of fMRI data necessary to reveal the same detailed characteristics of functional responses in the visual cortex.

      We agree it would be good to highlight the amount of movie data used. When the infant data is first introduced in the results section, we now state the durations:

      “All available movies from each session were included (Table S2), with an average duration of 540.7s (range: 186--1116s).” Pg. 5

      Additionally, we have added a homotopy analysis that describes the contribution of data quantity to the results observed. We compare the amount of data collected with the magnitude of same vs. different stream effect (Figure 1B) and within stream distance effect (Figure 1C). We find no effect of movie duration in the sample we tested, as reported below:

      “We found no evidence that the variability in movie duration per participant correlated with this difference [of same stream vs. different stream] (r=0.08, p=.700).” Pg. 6-7

      “There was no correlation between movie duration and the effect (Same > Adjacent: r=-

      0.01, p=.965, Adjacent > Distal: r=-0.09, p=.740).” Pg. 7

      (5) If any of the methodological approaches are novel, the authors should make this clear. In particular, has the approach of visually inspecting and categorizing components generated from ICA and movie data been done before, in adults/other contexts?

      The methods we employed are similar to others, as described in the public review.

      However, changes were necessary to apply them to infant samples. For instance, Guntupalli et al. (2016) used hyperalignment to predict the visual maps of adult participants, whereas we use SRM. SRM and hyperalignment have the same goal — find a maximally aligned representation between participants based on brain function — but their implementation is different. The application of functional alignment to infants is novel, as is their use in movie data that is relatively short by comparison to standard adult data. Indeed, this is the most thorough demonstration that SRM — or any functional alignment procedure — can be usefully applied to infant data, awake or sleeping. We have clarified this point in the discussion.

      “This is initial evidence that functional alignment may be useful for enhancing signal quality, like it has in adults27,32,33, or revealing changing function over development45, which may prove especially useful for infant fMRI52.” Pg. 21

      (6) The authors found that meridian maps were less identifiable from ICA and movie data and suggest that this may be because these maps are more susceptible to noise or gaze variability. If this is the case, you might predict that these maps are more identifiable in adult data. The authors could consider running additional analyses with their adult participants to better understand this result.

      As described in the manuscript, we hypothesize that meridian maps are more difficult to identify than spatial frequency maps because meridian maps are a less smooth, more fine-grained map than spatial frequency. Indeed, it has previously been reported (Moeller et al., 2009) that similar procedures can result in meridian maps that are constituted by multiple independent components (e.g., a component sensitive to horizontal orientations, and a separate component sensitive to vertical components). Nonetheless, we have now conducted the ICA procedure on adult participants and again find it is easier to identify spatial frequency components compared to meridian maps, as reported in the public review.

      Minor corrections:

      (1) Typo: Figure 3 title: "Example retintopic task vs. ICA-based spatial frequency maps.".

      Fixed

      (2) Given the age range of the participants, consider using "infants and toddlers"? (Not to diminish the results at all; on the contrary, I think it is perhaps even more impressive to obtain awake fMRI data from ~1-2-year-olds). Example: Figure 3 legend: "A) Spatial frequency map of a 17.1-monthold infant.".

      We agree with the reviewer that there is disagreement about the age range at which a child starts being considered a toddler. We have changed the terms in places where we refer to a toddler in particular (e.g., the figure caption the reviewer highlights) and added the phrase “infants and toddlers” in places where appropriate. Nonetheless, we have kept “infants” in some places, particularly those where we are comparing the sample to adults. Adding “and toddlers” could imply three samples being compared which would confuse the reader.

      (3) Figure 6 legend: The following text should be omitted as there is no bar plot in this figure: "The bar plot is the average across participants. The error bar is the standard error across participants.".

      Fixed

      (4) Table S1 legend: Missing first single quote: Runs'.

      Fixed

      Reviewer #2 (Recommendations For The Authors):

      I request that this paper cite more of the existing literature on the fMRI of human infants and toddlers using task-driven and resting-state data. For example, early studies by (first authors) Biagi, Dehaene-Lambertz, Cusack, and Fransson, and more recent studies by Chen, Cabral, Truzzi, Deen, and Kosakowski.

      We have added several new citations of recent task-based and resting state studies to the second sentence of the main text:

      “Despite the recent growth in infant fMRI1-6, one of the most important obstacles facing this research is that infants are unable to maintain focus for long periods of time and struggle to complete traditional cognitive tasks7.”

      Reviewer #3 (Recommendations For The Authors):

      In the following, I report some of my main perplexities, but many more may arise when the material is presented more clearly.

      The age of the children varies from 5 months to about 2 years. While the developmental literature suggests that between 1 and 2 years children have a visual system nearly adult-like, below that age some areas may be very immature. I would split the sample and perhaps attempt to validate the adult SRM model with the youngest children (and those can be called infants).

      We recognize the substantial age variability in our sample, which is why we report participant-specific data in our figures. While splitting up the data into age bins might reveal age effects, we do not think we can perform adequately powered null hypothesis testing of the age trend. In order to investigate the contribution of age, larger samples will be needed. That said, we can see from the data that we have reported that any effect of age is likely small. To elaborate: Figures 4 and 6 report the participant-specific data points and order the participants by age. There are no clear linear trends in these plots, thus there are no strong age effects.

      More broadly, we do not think there is a principled way to divide the participants by age. The reviewer suggests that the visual system is immature before the first year of life and mature afterward; however, such claims are the exact motivation for the type of work we are doing here, and the verdict is still out. Indeed, the conclusion of our earlier work reporting retinotopy in infants (Ellis et al., 2021) suggests that the organization of the early visual cortex in infants as young as 5 months — the youngest infant in our sample — is surprisingly adult-like.

      The title cannot refer to infants given the age span.

      There is disagreement in the field about the age at which it is appropriate to refer to children as infants. In this paper, and in our prior work, we followed the practice of the most attended infant cognition conference and society, the International Congress of Infant Studies (ICIS), which considers infants as those aged between 0-3 years old, for the purposes of their conference. Indeed, we have never received this concern across dozens of prior reviews for previous papers covering a similar age range. That said, we understand the spirit of the reviewer’s comment and now refer to the sample as “infants and toddlers” and to older individuals in our sample as “toddlers” wherever it is appropriate (the younger individuals would fairly be considered “infants” under any definition).

      Figure 1 is clear and an interesting approach. Please also show the average correlation maps on the cortical surface.

      While we would like to create a figure as requested, we are unsure how to depict an area-by-area correlation map on the cortical surface. One option would be to generate a seed-based map in which we take an area and depict the correlation of that seed (e.g., vV1) with all other voxels. This approach would result in 8 maps for just the task-defined areas, and 17 maps for anatomically-defined areas. Hence, we believe this is out of scope of this paper, but an interested reader could easily generate these maps from the data we have released.

      Figure 2 results are not easily interpretable. Ventral and dorsal V1-V3 areas represent upper or lower VF respectively. Higher dorsal and ventral areas represent both upper and lower VF, so we should predict an equal distance between the two streams. Again, how can we verify that it is not a result of some artifacts?

      In adults, visual areas differ in their functional response properties along multiple dimensions, including spatial coding. The dorsal/ventral stream hypothesis is derived from the idea that areas in each stream support different functions, independent of spatial coding. The MDS analysis did not attempt to isolate the specific contribution of spatial representations of each area but instead tested the similarity of function that is evoked in naturalistic viewing. Other covariance-based analyses specifically isolate the contribution of spatial representations (Haak et al., 2013); however, they use a much more constrained analysis than what was implemented here. The fact that we find broad differentiation of dorsal and ventral visual areas in infants is consistent with adults (Haak & Beckman, 2018) and neonate non-human primates (Arcaro & Livingstone, 2017). 

      Nonetheless, we recognize that we did not mention the differences in visual field properties across areas and what that means. If visual field properties alone drove the functional response then we would expect to see a clustering of areas based on the visual field they represent (e.g., hV4 and V3AB should have similar representations). Since we did not see that, and instead saw organization by visual stream, the result is interesting and thus warrants reporting. We now mention this difference in visual fields in the manuscript to highlight the surprising nature of the result.

      “This separation between streams is striking when considering that it happens despite differences in visual field representations across areas: while dorsal V1 and ventral V1 represent the lower and upper visual field, respectively, V3A/B and hV4 both have full visual field maps. These visual field representations can be detected in adults41; however, they are often not the primary driver of function39. We see that in infants too: hV4 and V3A/B represent the same visual space yet have distinct functional profiles.” Pg. 8

      The reviewer raises a concern that the MDS result may be spurious and caused by noise. Below, we present three reasons why we believe these results are not accounted for by artifacts but instead reflect real functional differentiation in the visual cortex. 

      (1) Figure 2 is a visualization of the similarity matrix presented in Figure S1. In Figure S1, we report the significance testing we performed to confirm that the patterns differentiating dorsal and ventral streams — as well as adjacent areas from distal areas — are statistically reliable across participants. If an artifact accounted for the result then it would have to be a kind of systematic noise that is consistent across participants.

      (2) One of the main sources of noise (both systematic and non-systematic) with infant fMRI is motion. Homotopy is a within-participant analysis that could be biased by motion. To assess whether motion accounts for the results, we took a conservative approach of regressing out the framewise motion (i.e., how much movement there is between fMRI volumes) from the comparisons of the functional activity in regions. Although the correlations numerically decreased with this procedure, they were qualitatively similar to the analysis that does not regress out motion:

      “Additionally, if we control for motion in the correlation between areas --- in case motion transients drive consistent activity across areas --- then the effects described here are negligibly different (Figure S5).” Pg. 7

      (3) We recognize that despite these analyses, it would be helpful to see what this pattern looks like in adults where we know more about the visual field properties and the function of dorsal and ventral streams. This has been done previously (e.g., Haak & Beckman, 2018), but we have now run those analyses on adults in our sample, as described in the public review. As with infants, there are reliable differences in the homotopy between streams (Figure S1). The MDS results show that the adult data was more complex than the infant data, since it was best described by 3 dimensions rather than 2. Nonetheless, there is a rotation of the MDS such that the structure of the ventral and dorsal streams is also dissociable. 

      Figure 3 also raises several alternative interpretations. The spatial frequency component in B has strong activity ONLY at the extreme border of the VF and this is probably the origin of the strong correlation. I understand that it is only one subject, but this brings the need to show all subjects and to report the correlation. Also, it is important to show the putative average ICA for retinotopy and spatial frequencies across subjects and for adults. All methods should be validated on adults where we have clear data for retinotopy and spatial frequency.

      The reviewer notes that the component in Figure 3 shows strong negative response in the periphery. It is often the case, as reported elsewhere (Moeller et al., 2009), that ICA extracts portions of visual maps. To make a full visual map would require combining components into a composite (e.g., a component that has a high response in the periphery and another component that has a high response in the fovea). If we were to claim that this component, or others like it, could replace the need for retinotopic mapping, then we would want to produce these composite maps; however, our conclusion in this project is that the topographic information of retinotopic maps manifest in individual components of ICA. For this purpose, the analysis we perform adequately assesses this topography.

      Regarding the request to show the results for all subjects, we address this in the public response and repeat it here briefly: we have added 6 new figures to show results akin to Figure 3C and D. It is impractical to show the equivalent of Figure 3A and B for all participants, yet we do release the data necessary to see to visualize these maps easily.

      Finally, the reviewer suggests that we validate the analyses on adult participants. As shown in Figure S3 and reported in the public response, we now run these analyses on adult participants and observe qualitatively similar results to infants.

      How much was the variation in the presumed spatial frequency map? Is it consistent with the acuity range? 5-month-old infants should have an acuity of around 10c/deg, depending on the mean luminance of the scene.

      The reviewer highlights an important weakness of conducting ICA: we cannot put units on the degree of variation we see in components. We now highlight this weakness in the discussion:

      “Another limitation is that ICA does not provide a scale to the variation: although we find a correlation between gradients of spatial frequency in the ground truth and the selected component, we cannot use the component alone to infer the spatial frequency selectivity of any part of cortex. In other words, we cannot infer units of spatial frequency sensitivity from the components alone.” Pg. 20

      Figure 5 pipeline is totally obscure. I presumed that I understood, but as it is it is useless. All methods should be clearly described, and the intermediate results should be illustrated in figures and appropriately discussed. Using such blind analyses in infants in principle may not be appropriate and this needs to be verified. Overall all these techniques rely on correlation activities that are all biased by head movement, eye movement, and probably the dummy sucking. All those movements need to be estimated and correlated with the variability of the results. It is a strong assumption that the techniques should work in infants, given the presence of movements.

      We recognize that the SRM methods are complex. Given this feedback, we remade Figure 5 with explicit steps for the process and updated the caption (as reported in the public review).

      Regarding the validation of these methods, we have added SRM analyses from adults and find comparable results. This means that using these methods on adults with comparable amounts of data as what we collected from infants can predict maps that are highly similar to the real maps. Even so, it is not a given that these methods are valid in infants. We present two considerations in this regard. 

      First, as part of the SRM analyses reported in the manuscript, we show that control analyses are significantly worse than the real analyses (indicated by the lines on Figure 6). To clarify the control analysis: we break the mapping (i.e., flip the order of the data so that it is backwards) between the test participant and the training participants used to create the SRM. The fact that this control analysis is significantly worse indicates that SRM is learning meaningful representations that matter for retinotopy. 

      Second, we believe that this paper is a validation of SRM for infants. Infant fMRI is a nascent field and SRM has the potential to increase the signal quality in this population. We hope that readers will see these analyses as a proof of concept that SRM can be used in their work with infants. We have stated this contribution in the paper now.

      “Additionally, we wish to test whether methods for functional alignment can be used with infants. Functional alignment finds a mapping between participants using functional activity -- rather than anatomy -- and in adults can improve signal-to-noise, enhance across participant prediction, and enable unique analyses27,32-34.” Pg. 4

      “This is initial evidence that functional alignment may be useful for enhancing signal quality, like it has in adults27,32,33, or revealing changing function over development45.” Pg. 21

      Regarding the reviewer’s concern that motion may bias the results, we wish to emphasize the nature of the analyses being conducted here: we are using data from a group of participants to predict the neural responses in a held-out participant. For motion to explain consistency between participants, the motion would need to be timelocked across participants. Even if motion was time-locked during movie watching, motion will impair the formation of an adequate model that can contain retinotopic information. Thus, motion should only hurt the ability for a shared response to be found that can be used for predicting retinotopic maps. Hence, the results we observed are despite motion and other sources of noise.

      What is M??? is it simply the mean value??? If not, how it is estimated?

      M is an abbreviation for mean. We have now expanded the abbreviation the first time we use it.

      Figure 6 should be integrated with map activity where the individual area correlation should be illustrated. Probably fitting SMR adult works well for early cortical areas, but not for more ventral and associative, and the correlation should be evaluated for the different masks.

      With the addition of plots showing the gradients for each participant and each movie (Figures S10–S13) we hope we have addressed this concern. We additionally want to clarify that the regions we tested in the analysis in Figure 6 are only the early visual areas V1, V2, V3, V3A/B, and hV4. The adult validation analyses show that SRM works well for predicting the visual maps in these areas. Nonetheless, it is an interesting question for future research with more extensive retinotopic mapping in infants to see if SRM can predict maps beyond extrastriate cortex.

      Occipital masks have never been described or shown.

      The occipital mask is from the MNI probabilistic structural atlas (Mazziotta et al., 2001), as reported in the original version and is shared with the public data release. We have added the additional detail that the probabilistic atlas is thresholded at 0% in order to be liberally inclusive. 

      “We used the occipital mask from the MNI structural atlas63 in standard space -- defined liberally to include any voxel with an above zero probability of being labelled as the occipital lobe -- and used the inverted transform to put it into native functional space.” Pg. 27–28

      Methods lack the main explanation of the procedures and software description.

      We hope that the additions we have made to address this reviewer’s concerns have provided better explanations for our procedures. Additionally, as part of the data and code release, we thoroughly explain all of the software needed to recreate the results we have observed here.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This manuscript addresses two main issues:

      (i) do MAPKs play an important role in SAC regulation in single-cell organism such as S. pombe?

      (ii) what is the nature of their involvement and what are their molecular targets?

      The authors have extensively used the cold-sensitive β-tubulin mutant to activate or inactivate SAC employing an arrest-release protocol. Localization of Cdc13 (cyclin B) to the SPBs is used as a readout for the SAC activation or inactivation. The roles of two major MAPK pathways i.e. stress-activated pathway (SAP) and cell integrity pathway (CIP), have been explored in this context (with CIP more extensively than SAP). sty1Δ or pmk1Δ mutants were used to inactivate the SAP or CIP pathways and wis1DD or pek1DD expression was utilized to constitutively activate these pathways, respectively. Lowering of Slp1Cdc20 abundance (by phosphorylation of Slp1-Thr 480) is revealed as the main function of MAPK to augment the robustness of the spindle assembly checkpoint.

      Strengths:

      The experiments are generally well-conducted, and the results support the interpretations in various sections. The experimental data clearly supports some of the key conclusions:

      (1) While inactivation of SAP and CIP compromises SAC-imposed arrest, their constitutive activation delays the release from the SAC-imposed arrest.

      (2) CIP signaling, but not SAP signaling, attenuates Slp1Cdc20 levels.

      (3) Pmk1 and Cdc20 physically interact and Pmk1-docking sequences in Slp1 (PDSS) are identified and confirmed by mutational/substitution experiments.

      (4) Thr480 (and also S76) is identified as the residue phosphorylated by Pmk1. S28 and T31 are identified as Cdk1 phosphorylation sites. These are confirmed by mutational and other related analyses.

      (5) Functional aspects of the phosphorylation sites have been elucidated to some extent: (a) Phosphorylation of Slp1-T480 by Pmk1 reduces its abundance thereby augmenting the SAC-induced arrest; (b) S28, T31 (also S59) are phosphorylated by Cdk1; (c) K472 and K479 residues are involved in ubiquitylation of Slp1.

      Weaknesses:

      (1) Cdc13 localization to SPBs has been used as a readout for SAC activation/inactivation throughout the manuscript. However, the only image showing such localization (Figure 1C) is of poor quality where the Cdc13 localization to SPBs is barely visible. This should be replaced by a better image.

      We have replaced those pictures with a new set of representative images, which show clear presence or absence of SPB-localized Cdc13-GFP.

      (2) The overlapping error bars in Cdc13-localization data in some figures (for instance Figure 3E and 4H) make the effect of various mutations on SAC activation/inactivation rather marginal. In some of these cases, Western-blotting data support the authors' conclusions better.

      We agree that the overlapping error bars may look ambiguous in most figures showing time course curves, this is due to the fact that all these data from a group of strains have to be better presented in a single graph to more directly compare the potential effects. We have been fully aware of the drawback of these figure representations, that is why we always presented the data corresponding two major time points (0 and 50 min after release) from all time course analyses in an alternative way, namely using individual histograms to represent the data from each strain with means of repeats, absolute values, error bars and p values clearly labeled. In particular, the data from time point 0 min can provide important information on the SAC activation efficiency. Generally, we placed those data and graphs in corresponding supplemental figures, such as: Figure 1-figure supplement 1C, Figure 1-figure supplement 2D, Figure 3-figure supplement 3, Figure 4-figure supplement 6B, Figure 5-figure supplement 1, and Figure 6-figure supplement 2.

      In addition, as you have noticed, almost all time course data were backed up by our Western blotting data.

      (3) This specific point is not really a weakness but rather a loose end:

      One of the conclusions of this study is that MAPK (Pmk1) contributes to the robustness of SAC-induced arrest by lowering the abundance of Slp1Cdc20. The authors have used pmk1Δ or constitutively activating the MAPK pathways (Pek1DD) and documented their effect on SAC activation/inactivation dynamics. It is not clear if SAC activation also leads to activation of MAPK pathways for them to contribute to the SAC robustness. To tie this loose end, the author could have checked if the MAPK pathway is also activated under the conditions when SAC is activated. Unless this is shown, one must assume that the authors are attributing the effect they observe to the basal activity of MAPKs.

      We agree with your concern. We have followed your suggestion and performed further experiments. Please see our more detailed response to your point #ii(a) in your “Recommendations for the authors”.

      (4) This is also a loose end:

      The authors show that activation of stress pathways (by addition of KCl for instance) causes phosphorylation-dependent Slp1Cdc20 downregulation (Figure 6) under the SAC-activating condition. Does activation of the stress pathway cause phosphorylation-dependent Slp1Cdc20 downregulation under the non-SAC-activation condition or does it occur only under the SAC-activating condition?

      We agree with your concern. We have followed your suggestion and performed further experiments. Please see our more detailed response to your point #ii(b) in your “Recommendations for the authors”.

      (5) Although the authors have gone to some length to identify S28 and T31 (also S59) as phosphorylation sites for Cdk1, their functional significance in the context of MAPK involvement is not yet clear. Perhaps it is outside the scope of this study to dig deeper into this aspect more than the authors have.

      Based on our data from Mass spectrometry analysis, mutational analysis, in vitro and in vivo kinase assays using phosphorylation site-specific antibodies, we confirmed that at least S28 and T31 are Cdk1 phosphorylation sites. From our time course analysis of these phosphorylation-deficient mutants, it seems the mechanisms of Slp1 activity or protein abundance regulated by Cdk1 or MAPK are quite different. How these two or even more kinases coordinate to control Slp1 activity during APC/C activation is one very interesting issue to be investigated, however, as you have realized, it is indeed beyond the scope of our current study.

      (6) In its current state, the Discussion section is quite disjointed. The first section "Involvement of MAPKs in cell cycle regulation" should be in the Introduction section (very briefly, if at all). It certainly does not belong to the Discussion section. In any case, the Discussion section should be more organized with a better flow of arguments/interpretations.

      We have re-organized our “Discussion” section. Please see our more detailed response to your point #iii in your “Recommendations for the authors”.

      Reviewer #2 (Public Review):

      Summary:

      This study by Sun et al. presents a role for the S. pombe MAP kinase Pmk1 in the activation of the Spindle Assembly Checkpoint (SAC) via controlling the protein levels of APC/C activator Cdc20 (Slp1 in S. pombe). The data presented in the manuscript is thorough and convincing. The authors have shown that Pmk1 binds and phosphorylates Slp1, promoting its ubiquitination and subsequent degradation. Since Cdc20 is an activator of APC/C, which promotes anaphase entry, constitutive Pmk1 activation leads to an increased percentage of metaphase-arrested cells. The authors have used genetic and environmental stress conditions to modulate MAP kinase signalling and demonstrate their effect on APC/C activation. This work provides evidence for the role of MAP kinases in cell cycle regulation in S. pombe and opens avenues for exploration of similar regulation in other eukaryotes.

      Strengths:

      The authors have done a very comprehensive experimental analysis to support their hypothesis. The data is well represented, and including a model in every figure summarizes the data well.

      Weaknesses:

      As mentioned in the comments, the manuscript does not establish that MAP kinase activity leads to genome stability when cells are subjected to genotoxic stressors. That would establish the importance of this pathway for checkpoint activation.

      We understand your concern. We have followed your suggestion and performed further experiments to examine whether the absence of Pmk1 causes chromosome segregation defects. Please see our more detailed response to your point #5 in your “Recommendations for the authors”.

      Recommendations for the authors:

      Reviewing Editor

      Please go through the reviews and recommendations and revise the paper accordingly. I think nearly everything is very straightforward and all issues raised by the two expert referees are fully justified. I look forward to seeing an appropriately revised manuscript.

      Reviewer #1 (Recommendations For The Authors):<br /> (i) Cdc13 localization to SPBs has been used as a readout for SAC activation/inactivation throughout the manuscript. However, the only image showing such localization (Figure 1C) is of poor quality where the Cdc13 localization to SPBs is barely visible. This should be replaced by a better image.

      We have replaced those pictures with a new set of representative images, which show clear presence or absence of SPB-localized Cdc13-GFP.

      (ii) I reiterate the loose ends in this manuscript I have mentioned above. If the authors have already conducted these experiments, they should include the results in the manuscript to tighten the story further. (I am not suggesting that the authors must perform these experiments...if they have not).

      (a) One of conclusions of this study is that MAPK (Pmk1) contributes to the robustness of SAC-induced arrest by lowering the abundance of Slp1Cdc20. The authors have used pmk1Δ or constitutively activating the MAPK pathways (pek1DD) and documented their effect on SAC activation/inactivation dynamics. It is not clear if SAC activation also leads to activation of MAPK pathways for them to contribute to the SAC robustness. To tie this loose end, the author could have checked if the MAPK pathway is also activated under the conditions when SAC is activated. Unless this is shown, one must assume that the authors are attributing the effect they observe to the basal activity of MAPKs.

      Actually, our data shown in Figure 6B demonstrated that SAC activation per se cannot trigger activation of MAPK pathway CIP, because we did not observe any elevated Pmk1 phosphorylation (i.e. Pmk1-P detected by anti-phospho p42/44 antibodies) in nda3-arrested cells (Please see “control” samples in Figure 6B).

      To corroborate this observation, we further examined the Pmk1 phosphorylation/activation in Mad2-overexpressing cells, and could not detect elevated Pmk1 phosphorylation. This data again lends support to the notion that SAC activation per se cannot trigger activation of CIP signaling.

      We have added our newly obtained result in Figure 6-figure supplement 1 in our revised manuscript.

      (b) The authors show that activation of stress pathways (by addition of KCL instance) causes phosphorylation-dependent Slp1Cdc20 downregulation (Figure 6) under the SAC-activating conditions. Does activation of the stress pathway cause phosphorylation-dependent Slp1Cdc20 downregulation under the non-SAC-activation conditions or does it occur only under the SAC-activating condition?

      As you suggested, we have constructed cdc25-22 background strains with pmk1+ deleted or expressing Padh11-pek1DD to remove or constitutively activate CIP signaling, respectively. By immunoblotting, we followed the Slp1Cdc20 levels when cells went through mitosis after being released at 25 °C from G2/M-arrest at high temperature. We found that Slp1Cdc20 levels in pek1DD cells were only marginally reduced compared to wild-type cells, whereas we failed to observe any elevated Slp1Cdc20 levels in pmk1Δ cells. These results suggested that CIP signaling only plays a negligible role in influencing Slp1Cdc20 levels under the non-SAC-activation conditions.

      We have presented our newly obtained result in Figure 2-figure supplement 1 in our revised manuscript.

      (iii) The Discussion section is quite disjointed. The first section "Involvement of MAPKs in cell cycle regulation" should be in the Introduction section (very briefly, if at all). It certainly does not belong to the Discussion section. In any case, the Discussion section should be more organized with a better flow of arguments/interpretations.

      Thank you for suggestion on the organization and flow for “Discussion”. We have reorganized our “Discussion” sections and moved the previous “Involvement of MAPKs in cell cycle regulation” to the section “Introduction” and rewrote the corresponding paragraph.

      (iv) A minor point in this context:

      In the cold-sensitive β-tubulin mutant, growth at 18C causes loss of kinetochore-microtubule attachments as well as the intra-kinetochore tension. Both perturbations individually can lead to the activation of SAC. This study does not distinguish whether MAPK involvement in SAC dynamics is relevant to one perturbation or another or both. It would be pertinent to briefly mention this point in the Discussion section.

      As you suggested, we have added two sentences to briefly mention this point in our “Discussion” section.

      Reviewer #2 (Recommendations For The Authors):

      This study by Sun et al. presents a role for the S. pombe MAP kinase Pmk1 in the activation of the Spindle Assembly Checkpoint (SAC) via controlling the protein levels of APC/C activator Cdc20 (Slp1 in S. pombe). The data presented in the manuscript is thorough and convincing. The authors have shown that Pmk1 binds and phosphorylates Slp1, promoting its ubiquitination and subsequent degradation. Since Cdc20 is an activator of APC/C, which promotes anaphase entry, constitutive Pmk1 activation leads to an increased percentage of metaphase-arrested cells. The authors have used genetic and environmental stress conditions to modulate MAP kinase signalling and demonstrate their effect on APC/C activation. This work provides evidence for the role of MAP kinases in cell cycle regulation in S. pombe and opens avenues for exploration of similar regulation in other eukaryotes.

      Although the data largely supports the conclusions, a major addition will be testing whether cells accumulate chromosomal or inheritance defects when MAPK Pmk1 is absent. It will be interesting to know that this mechanism of SAC activation contributes to genome integrity.

      Some additions that can improve the manuscript are mentioned below:

      (1) In Figure 1, the authors should also test the effect of constitutive activation of Spk1 to rule out the involvement of the PSP pathway.

      To meet your curiosity and requirement, we have constructed yeast strains expressing constitutively active byr1DD alleles carrying S214D and T218D point mutations under the control of the adh21 or adh11 promoters (Padh21 or Padh11 in short), i.e. Padh21-6HA-byr1DD and Padh11-6HA-byr1DD, respectively. We examined the expression of these byr1DD alleles by Western blotting, and tested the TBZ sensitivity of these alleles and also checked whether they affect the efficiency of SAC activation or inactivation. Our results showed that constitutive activation of Spk1 by overexpressing Byr1DD does not cause yeast cells to be TBZ-sensitive or affect the efficiency of SAC activation or inactivation.

      We have added these new data in Figure 1-figure supplement 2 in our revised manuscript.

      (2) The number of analyzed cells (n) should be mentioned in the figure legends in Figure 1D, and all other figure panels should represent similar data in the consequent figures.

      We have added the information on sample size for all experiments involving time course analyses.

      (3) The authors should also use another arresting mechanism (e.g. nocodazole treatment) and corroborate the result in Figure 1C to rule out any effects due to the mutant.

      Figure 1C in our manuscript actually shows our experimental design and not the result. We guess here you asked for alternative strategy to arrest cells at metaphase and confirm our results shown in Figure 1D.

      We need to mention that, as a commonly used inhibitor of microtubule polymerization, Nocodazole is very effective in mammalian and human cells and also in budding yeast cells, but not effective at all in wild-type fission yeast cells. It has been found that Nocodazole is only active in fission yeast α- or β-tubulin mutants (please see Umesono, K., et al., J Mol Biol. 168 (2): 271-284 (1983); PMID: 6887245; DOI: 10.1016/s0022-2836(83)80018-7.) or multidrug resistance (MDR) transporter mutants (please see Kawashima, SA, et al., Chemistry & Biology 19, 893–901 (2012); PMID: 22840777; doi: 10.1016/j.chembiol.2012.06.008.). Therefore, this feature of Nocodazole has limited and restricted its routine use as a metaphase arrest or spindle checkpoint activation drug in fission yeast.

      Instead, in order to achieve the spindle checkpoint activation and metaphase arrest, we took advantage of a metaphase-arresting mechanism involving Mad2 overexpression, which has been described and used previously (Please see He, X., et al., Proc Natl Acad Sci USA. 94 (15): 7965-70 (1997); PMID: 9223296; DOI: 10.1073/pnas.94.15.7965, and May, K.M., et al., Current Biology, 27(8):1221-1228 (2017); PMID: 28366744; DOI: 10.1016/j.cub.2017.03.013). With this strategy, we could analyze the metaphase-arresting and SAC-activation efficiency by counting cells with short spindles as judged by GFP-Atb2 signals. We compared the frequencies of cells with short spindles in wild-type, pmk1Δ, sty1-T97A, and spk1Δ backgrounds after Mad2 has been induced to overexpress for 18 hours, and found that SAC-activating efficiency was compromised in pmk1Δ and sty1-T97A cells, but not in spk1Δ cells. This data indeed corroborated our result shown in Figure 1D and ruled out possible effects due to the nda3-KM311 mutant.

      We have added this new data in Figure 1-figure supplement 3 in our revised manuscript.

      (4) It would also be helpful to assess SAC or APC/C activation with another cellular readout in addition to Cdc13-GFP accumulation on SPBs, at least for initial experiments.

      Actually, Cdc13-GFP accumulation on SPBs has been routinely used as a reliable cellular readout for SAC or APC/C activation in the field. It was first developed and used by Kevin Hardwick lab in their paper (Vanoosthuyse V and Hardwick KG. Curr Biol. 2009, 19(14):1176-81. PMID: 19592249; doi: 10.1016/j.cub.2009.05.060.). This method was also used in a paper by Meadows JC, et al. (2011) (Dev Cell. 20(6):739-50. PMID: 21664573; doi: 10.1016/j.devcel.2011.05.008.).

      In our previous study, we also employed a different strategy to assess SAC inactivation or APC/C activation, in which degradation of nuclear Cut2-GFP was used as a cellular readout (Please see S4 Fig in Bai S, et al., PLoS Genet 18(9): e1010397 (2022); PMID: 36108046; DOI: 10.1371/journal.pgen.1010397.). Cut2 is the securin homologue in S. pombe and therefore also a target of APC/C at anaphase. Our data in the above paper confirmed that the degradation of both nuclear Cut2-GFP and SPB-localized Cdc13-GFP shows similar dynamics when cells are released from metaphase-arrest.

      As we described in our response to your comment #3, we employed short spindles visualized by GFP-Atb2 signals as an alternative readout for metaphase-arrest and SAC-activation in cells overexpressing Mad2. We confirmed that SAC-activation efficiency was compromised in pmk1Δ and sty1-T97A cells, but not in spk1Δ cells.

      (5) The authors have shown a role for Pmk1 in controlling the activation of APC/C and, hence, cell cycle progression through metaphase to anaphase. One crucial experiment would be to check if pmk1Δ cells show an accumulation of chromosomal aberrations or unequal distribution when subjected to genotoxic stressors. That would implicate a direct importance on Pmk1's role in cell cycle arrest and genome maintenance.

      As you suggested, we have constructed cdc25-22 GFP-atb2+ strains with pmk1+ present or deleted, and treated cells with 0.6 M KCl or 2 μg/mL caspofungin to activate MAPKs and checked whether the absence of pmk1 could cause defective chromosome segregation in anaphase cells. Indeed, we found that stressed pmk1Δ cells displayed greatly increased frequency of lagging chromosomes and chromosome mis-segregation at mitotic anaphase compared to similarly treated wild-type cells and also untreated pmk1Δ cells. This new data implicated a direct role for Pmk1 in cell cycle arrest and genome maintenance, especially when cells are exposed to adverse environment.

      We have presented this new data as Figure 7 in our revised manuscript.

      Typos:

      (1) In line 406, "docking" is misspelled as "docing".

      Thank you for pointing this out. We have corrected this mistake.

      (2) In Figure 6, panel "F" is not marked in the figure.

      We mistakenly mentioned and labeled “F” in Figure 6 legend. In our revised manuscript, we have added new results of protein levels of Pmk1 phosphorylation- and ubiquitylation-deficient Slp1Cdc20 mutants upon SAC activation detected by Western blotting in Figure 6-figure supplement 3.

      (3) In Figure S1, panel "D" is not marked.

      We apologize for our previous wording in our former Figure S1 legend, which was misleading. Actually, there was no panel “D” in Figure S1 (now Figure 1-figure supplement 1). We have rewritten the legend to avoid ambiguity.

    1. I think it's really important for us to develop a science of that like CR like critically important

      for - answer - Micheal Levin - adjacency - hyperobject - cognitive light cone - critically important to develop a science of this

      adjacency - between - multi scale competency architecture - cognitive light cone - hyperobject - awakening / enlightenment - adjacency relationship - At every stage of the multi scale competency architecture, - the living entities at a particular stage may maintain - feedback and - feedforward signals - with any - higher or - lower level systems. - Human INTERbeCOMings and other consciousness are no different - We exist at one level but are both - composed of lower level living parts and - compose larger social superorganism - Indeed, the spiritual acts variously described as - awakening - enlightenment - can be interpreted as transcending level cognitive light cone

    1. During the pandemic,people might have turned to their SNSs to engage with ongoing relationships or reigniteold ones. Indeed, in our study, participants reported that they wanted to reconnect withpeople from their past for several reasons, ranging from checking in on people they caredabout to rekindling friendships in order to reminisce

      I think another thing we may want to consider was how much of a change/constant updates these Social Networking Sites (SNSs) had experienced which restored relationships and even virtual communities in a more immersive way, as a result of the pandemic. One such example is X Space (fka Twitter), ClubHouse etc.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, the authors established an in vitro triple co-culture BBB model and demonstrated its advantages compared with the mono or double co-culture BBB model. Further, the authors used their established in vitro BBB model and combined it with other methodologies to investigate the specific mechanism that co-culture with astrocytes but also neurons enhanced the integrity of endothelial cells.

      Strengths:

      The results persuasively showed the established triple co-culture BBB model well mimicked several important characteristics of BBB compared with the mono-culture BBB model, including better barrier function and in vivo/in vitro correlation. The human-derived immortalized cells used made the model construction process faster and more efficient, and have a better in vivo correlation without species differences. This model is expected to be a useful high-throughput evaluation tool in the development of CNS drugs.

      Based on the previous experimental results, detailed studies investigated how co-culture with neurons and astrocytes promoted claudin-5 and VE-cadherin in endothelial cells, and the specific signaling mechanisms were also studied. Interestingly, the authors found that neurons also released GDNF to promote barrier properties of brain endothelial cells, as most current research has focused on the promoting effect of astrocytes-derived GDNF on BBB. Meanwhile, the author also validated the functions of GDNF for BBB integrity in vivo by silencing GDNF in mouse brains. Overall, the experiments and data presented support their claim that, in addition to astrocytes, neurons also have a promoting effect on the barrier function of endothelial cells through GDNF secretion.

      Weaknesses:

      Although the authors demonstrated a highly usable for predicting the BBB permeability, recorded TEER measurements are still far from the human BBB in vivo reported measurements of TEER, and expression of transporters was not promoted by co-culture, which may lead to the model being unsuitable for studying drug transport mediated by transporters on BBB.

      Thank the reviewer very much for the opportunity to improve our manuscript. The immortalized human cell lines, hCMEC/D3 cell, have poor barrier properties and differences in the expression of some transporters and metabolic enzymes as well as TEER compared to human physiological BBB. However, the use of human primary BMECs may be restricted by the acquisition of materials and ethical approval. Isolation and purification of human primary BMECs are time-consuming and laborious. Moreover, culture conditions can alter transcriptional activity (PMID: 37076016). All limit the establishment of BBB models based on primary human BMECs for high-throughput screening. Thus, hCMEC/D3 is still widely used to study characteristics of drug transport across BBB and the effects of certain diseases on BBB (PMID: 37076016; 38711118; 31163193) as it is easy to culture and can express a large number of transporters and metabolic enzymes in its physiological state. Therefore, hCMEC/D3 cells were selected to develop our in vitro BBB model.

      Reviewer #1 (Recommendations For The Authors):

      Point 1: The authors claim that GDNF is mainly released by human neuroblastoma SH-SY5Y cells in the in vitro BBB model, but there are still some differences between the characteristics of cell lines and neurons. The authors should discuss or provide evidence about the distribution and source of GDNF in the brain to support this conclusion.

      We greatly appreciate your helpful suggestions. According to your advice, we have revised the “Discussion” in the revised manuscript as follows:

      In “Discussion”:

      “GDNF is mainly expressed in astrocytes and neurons (Lonka-Nevalaita et al., 2010; Pochon et al., 1997). In adult animals, GDNF is mainly secreted by striatal neurons rather than astrocytes and microglial cells (Hidalgo-Figueroa et al., 2012). The present study also shows that GDNF mRNA levels in SH-SY5Y cells were significantly higher than that in U251 cells. GDNF was also detected in conditioned medium from SH-SY5Y cells. All these results demonstrate that neurons may secrete GDNF”.

      Point 2: The authors found that co-culture induced the proliferation of endothelial cells (Figure 1H). I suggest the authors discuss whether the proliferation of endothelial cells would affect their permeability.

      Thanks for your suggestion. According to your advice, we have investigated the effect of cell proliferation on the leakage of the cell layer and included the results in Figure 1—figure supplement 1. The present study showed that basic fibroblast growth factor (bFGF) increased cell proliferation of hCMEC/D3 cells but little affected the expression of both claudin-5 and VE-cadherin (in Figure 2F). The hCMEC/D3 cells were incubated with different doses of bFGF and permeabilities of fluorescein (NaF) and FITC-Dextran 3–5 kDa across hCMEC/D3 cell monolayer were measured. The results showed that incubation with bFGF increased cell proliferation and reduced permeabilities of fluorescein and FITC-Dextran across hCMEC/D3 cell monolayer. However, the permeability reduction was less than that by double co-culture with U251 cells or triple co-culture. These results inferred that contribution of cell proliferation to the barrier function of hCMEC/D3 cells was minor. We have made the modifications in “Results” of our manuscript as follows:

      In “Result”:

      “Furthermore, hCMEC/D3 cells were incubated with basic fibroblast growth factor (bFGF), which promotes cell proliferation without affecting both claudin-5 and VE-cadherin expression (Figure 2F). The results showed that incubation with bFGF increased cell proliferation and reduced permeabilities of fluorescein and FITC-Dex across hCMEC/D3 cell monolayer. However, the permeability reduction was less than that by double co-culture with U251 cells or triple co-culture. These results inferred that contribution of cell proliferation to the barrier function of hCMEC/D3 was minor (Figure 1—figure supplement 1)”.

      Point 3: The authors claimed that GDNF induced the expression of claudin-5 and VE-cadherin separately. However, Andrea Taddei et al. reported that VE-cadherin itself also regulates claudin-5 through the inhibitory activity of FoxO1 (Andrea Taddei et al., 2008). The authors did not consider whether the upregulation of claudin-5 is associated with the increase of VE-cadherin.

      Thank you for your suggestion. We also investigated whether VE-cadherin affected claudin-5 expression in hCMEC/D3 cells transfected with VE-cadherin siRNA. It was not consistent with the report by Taddei et al. that silencing VE-cadherin only slightly decreased the mRNA level of claudin-5 without significant difference. Furthermore, basal and GDNF-induced claudin-5 protein levels were unaltered by silencing VE-cadherin. The discrepancies may come from characteristics of the tested cells. Endothelial cells derived from murine embryonic stem cells with homozygous null mutation were used in Taddei’s study, while we transfected immortalized brain microvascular endothelial cells with siRNA. Several reports have demonstrated different mechanisms regulating expression of claudin-5 and VE-cadherin. In retinal endothelial cells, hyperglycemia remarkably reduced claudin-5 expression (but not VE-cadherin) (PMID: 24594192). However, in hCMEC/D3 cells, hypoglycemia significantly decreased claudin-5 (not VE-cadherin) expression but hyperglycemia increased VE-cadherin expression (not claudin 5) (PMID: 24708805). Therefore, the roles of VE-cadherin in regulation of claudin-5 in BBB should be further investigated.

      Following your valuable suggestion, we have modified the “Results”, “Discussion” and “Figure 4—figure supplement 1” in the revised manuscript as follows:

      In “Result”:

      “It was reported that VE-cadherin also upregulates claudin-5 via inhibiting FOXO1 activities (Taddei et al, 2008). Effect of VE-cadherin on claudin-5 was studied in hCMEC/D3 cells silencing VE-cadherin. It was not consistent with the report by Taddei et al. that silencing VE-cadherin only slightly decreased the mRNA level of claudin-5 without significant difference. Furthermore, basal and GDNF-induced claudin-5 protein levels were unaltered by silencing VE-cadherin (Figure 4—figure supplement 1). Thus, the roles of VE-cadherin in regulation of claudin-5 in BBB should be further investigated.”

      In “Discussion”:

      “Claudin-5 expression is also regulated by VE-cadherin (Taddei et al., 2008). Differing from the previous reports, silencing VE-cadherin with siRNA only slightly affected basal and GDNF-induced claudin-5 expression. The discrepancies may come from different characteristics of the tested cells. Several reports have supported the above deduction. In retinal endothelial cells, hyperglycemia remarkably reduced claudin-5 expression (but not VE-cadherin) (Saker et al., 2014). However, in hCMEC/D3 cells, hypoglycemia significantly decreased claudin-5 expression but hyperglycemia increased VE-cadherin expression (Sajja et al., 2014)”.

      “Figure 4—figure supplement 1: The contribution of VE-Cadherin on the GDNF-induced claudin-5 expression. Effects of the VE-Cadherin siRNA (siVE-Cad) on mRNA expression of VE-cadherin (A) and claudin-5 (B). Effects of siVE-Cad and GDNF on claudin-5 and VE-cadherin protein expression (C). NC: negative control plasmids. The above data are shown as the mean ± SEM. Four biological replicates per group. Two technical replicates for A and B, and one technical replicate for C. Statistical significance was determined using unpaired Student’s t-test or one-way ANOVA test followed by Fisher’s LSD test.”

      Point 4:  The annotation of significance with the p-values in the figures might not be visually concise and clear. It is recommended to provide the p-values in the legends or raw data.

      Thank you for your valuable suggestion. We have revised our figures in our revised manuscript. The specific p-values and statistical methods were summarized in the source data files of each figure.

      Point 5: The authors need to note the material of the Transwell membrane used to increase the reproducibility of experiments, because different materials may cause differences in permeability and TEER (DianeM. Wuest et al., 2013).

      We greatly appreciate your valuable suggestions. According to your advice, we have provided the information on the material of the Transwell membrane in the “Materials and Methods” in the revised manuscript as follows:

      In “Materials and Methods”:

      “U251 cells were seeded at 2 × 104 cells/cm2 on the bottom of Transwell inserts (PET, 0.4 µm pore size, SPL Life Sciences, Pocheon, Korea) coated with rat-tail collagen (Corning Inc., Corning, NY, USA)”.

      Point 6: It is not necessary to abbreviate "in vitro/in vivo correlation" in the legend of Figure 7 as it was not mentioned again in the following text.

      Thank you for your valuable suggestion. We have deleted the abbreviation of "Figure 7" of the revised manuscript.

      In “Figure 7”

      “Figure 7. In vitro/in vivo correlation assay of BBB permeability."

      Reviewer #2 (Public Review):

      Summary:

      Yang and colleagues developed a new in vitro blood-brain barrier model that is relatively simple yet outperforms previous models. By incorporating a neuroblastoma cell line, they demonstrated increased electrical resistance and decreased permeability to small molecules.

      Strengths:

      The authors initially elucidated the soluble mediator responsible for enhancing endothelial functionality, namely GDNF. Subsequently, they elucidated the mechanisms by which GDNF upregulates the expression of VE-cadherin and Claudin-5. They further validated these findings in vivo, and demonstrated predictive value for molecular permeability as well. The study is meticulously conducted and easily comprehensible. The conclusions are firmly supported by the data, and the objectives are successfully achieved. This research is poised to advance future investigations in BBB permeability, leakage, dysfunction, disease modeling, and drug delivery, particularly in high-throughput experiments. I anticipate an enthusiastic reception from the community interested in this area. While other studies have produced similar results with tri-cultures (PMID: 25630899), this study notably enhances electrical resistance compared to previous attempts.

      Weaknesses:

      (A) Considerable effort has been directed towards developing in vitro models that more closely resemble their in vivo counterparts, utilizing stem cell-derived NVU cells. Although these examples are currently rudimentary, they offer better BBB mimicry than Yang's study.

      Thank you very much for your valuable comments. Indeed, hCMEC/D3 cells, have poor barrier properties and low TEER compared to human physiological BBB. The human pluripotent stem cells BBB models (hPSC-BBB models) make it possible to provide a robust and scalable cell source for BBB modeling, although many challenges remain, particularly concerning reproducibility and recreation of multifaceted phenotypes in vitro with increasing complexity. Moreover, the hPSC-derived BBB models are highly dependent upon the heterogeneous incorporation of hPSC-derived BMEC origins, cells derived from different protocols are not well validated and standardized in the BBB models. Thus, the hPSC-BBB models are still being developed and their clinic applications are still at an early stage (PMID: 34815809; 35755780). The hCMEC/D3 cell line is still widely used to study characteristics of drug transport across BBB and the effects of certain diseases on BBB (PMID: 37076016; 38711118; 31163193) as it is easy to culture and can express a large number of transporters and metabolic enzymes in its physiological state. Therefore, hCMEC/D3 cells were selected to develop our in vitro BBB model.

      (B) Additionally, some instances might benefit from more robust statistical tests; nonetheless, I do not think this would significantly alter the experimental conclusions.

      Thank you for your valuable suggestions on the statistical methods used in our study, which made us realize our lack of rigor in selecting statistical methods. We have made modifications to statistical methods, and all statistical results showed the manuscript have been updated accordingly.

      (C) Similar experiments with tri-cultures yielding analogous results have been reported by other authors (PMID: 25630899). TEER values are a bit higher than the aforementioned experiments; however, this study has values at least one order of magnitude lower than physiological levels.

      Thank your advice. We also noticed that TEER values in the present study were different from previous reports, which may come from types of BEMCs, astrocytes, and neurons.

      Reviewer #2 (Recommendations For The Authors):

      Point 1: If you've already decided to enhance the model by incorporating additional cell types, why not include pericytes as well? As mentioned in the public review, other studies have explored tri-culture models; adding pericytes or other cell types could provide valuable insights.

      We greatly appreciate your helpful suggestions. As you mentioned, the barrier function of our model still needs further improvement, which is also a limitation of our current model. In our future research, we will aim to optimize our model by incorporating other NVU cells. Beyond drug screening, we also hope that our in vitro BBB model can serve as a versatile tool to investigate underlying factors associated with neuropathological disorders. According to your advice, we have modified “Discussion” in the revised manuscript as follows:

      In “Discussion”:

      “However, the study also has some limitations. In addition to neurons and astrocytes, other cells such as microglia, pericytes, and vascular smooth muscle cells, especially pericytes, may also affect BBB function. How pericytes affect BBB function and interaction among neurons, astrocytes, and pericytes needs further investigation.”

      Point 2: The decline in TEER after 6 days is concerning. Have you extended your experiments beyond day 7? If so, what were the outcomes? Did the system degrade, leading to decreased resistance, or did cell death occur?

      We greatly appreciate your helpful recommendation. We also observed that the TEER of our culture system began to decline on day 7. To ensure the reliability of our experiments, our experiments were conducted on day 6 of co-cultivation and did not extend beyond day 7. We speculate that the reason for the decrease in TEER values may be due to excessive cell contact, which could inhibit cell proliferation and long-term cultivation may lead to cell aging. Similar results showing a decrease in TEER of i_n vitro_ BBB models after prolonged culture have been reported in other studies (PMID: 31079318; 8470770). To eliminate misunderstandings, we have made the following modifications to our manuscript:

      In “Result”:

      “TEER values were measured during the co-culture (Figure 1B). TEER values of the four in vitro BBB models gradually increased until day 6. On day 7, the TEER values showed a decreasing trend. Thus, six-day co-culture period was used for subsequent experiments”.

      In “In vitro BBB permeability study” of “Materials and Methods”:

      “On day 7, the TEER values of BBB models showed a decreasing trend. Therefore, the subsequent experiments were all completed on day 6”.

      Point 3: It is standard practice for figures to be referenced in the order they appear in the manuscript. However, Figures 1A and 1B are not mentioned until the end of the methods section. Adding a brief sentence at the beginning of the main body referencing these figures would improve the clarity of the experimental approach.

      Thank you for your valuable suggestion. We had made modifications to Figure 1, and the details of the cell model establishment process had been included in Figure 9 which is mentioned in the “Materials and Methods” section.

      Point 4: To strengthen the evidence supporting the proliferative effect of GDNF, consider incorporating additional measures beyond cell count alone. While an increase in cell count could be attributed to reduced cell death (given GDNF's pro-survival properties), proliferation effects have also been shown (PMID: 28878618). I suggest demonstrating proliferation with markers or cell cycle analysis would provide more robust evidence.

      Thank you for your helpful suggestion. We used EdU incorporation and CCK-8 assays to further detect the proliferation of hCMEC/D3 cells, and corresponding results were added in the revised Figure 1H and Figure 1I. The description of results is shown as follows:

      In “Results”:

      “Co-culture with SH-SY5Y, U251, and U251 + SH-SY5Y cells also enhanced the proliferation of hCMEC/D3 cells. Moreover, the promoting effect of SH-SY5Y cells was stronger than that of U251 cells (Figure 1G-1I).”

      Point 5: Could you specify the use of technical replicates in your experiments? How many?

      Thank you for your helpful suggestion, and we apologize for the issue you pointed out. We have now specified the technical replicates of experiments in the legends of the revised manuscript. In general, the technical replicate number of ELISA and qPCR is two, and that of the rest experiments is one. And we have also made the following modifications to our manuscript:

      In “Statistical analyses” of “Materials and Methods”:

      “All results are presented as mean ± SEM. The average of technical replicates generated a single independent value that contributes to the n value used for comparative statistical analysis”.

      Point 6: Given the sample size of 4 in most experiments, it may be insufficient for passing a normality test. Therefore, it's advisable to employ non-parametric tests such as the Kruskal-Wallis test, followed by appropriate post-hoc tests.

      Thank you for your valuable and useful suggestion. We apologize for our initial oversight regarding statistics. Based on your suggestion, we have thoroughly reviewed and revised the statistical methods and statistical results in the manuscript. Referring to the ‘Statistics Guide’ of GraphPad (H. J. Motulsky, "The power of nonparametric tests", GraphPad Statistics Guide. Accessed 20 June 2024. https://www.graphpad.com/guides/prism/latest/statistics/stat_the_power_of_nonparametric_tes.htm), the Kruskal-Wallis test is more robust when the data does not follow a normal distribution or homogeneity of variance. However, due to its reliance on ranks, it may have lower sensitivity in detecting small differences. If the total sample size is tiny, the Kruskal-Wallis test will always give a P value greater than 0.05 no matter how much the groups differ. To address this, we first used the Shapiro-Wilk test to assume whether the samples come from Gaussian distributions. For samples meeting this criterion, parametric tests were employed. For samples that do not follow the Gaussian distribution, as per your advice, we utilized the non-parametric tests. We have modified the “Statistical analyses” in the revised manuscript as follows:

      In “Statistical analyses” of “Materials and Methods”:

      “The data were assessed for Gaussian distributions using Shapiro-Wilk test. Brown-Forsythe test was employed to evaluate the homogeneity of variance between groups. For comparisons between two groups, statistical significance was determined by unpaired 2-tailed t-test. The acquired data with significant variation were tested using unpaired t-test with Welch's correction, and non-Gaussian distributed data were tested using Mann-Whitney test. For multiple group comparisons, one-way ANOVA followed by Fisher’s LSD test was used to determine statistical significance. The acquired data with significant variation were tested using Welch's ANOVA test, and non-Gaussian distributed data were tested using Kruskal-Wallis test. P < 0.05 was considered statistically significant. The simple linear regression analysis was used to examine the presence of a linear relationship between two variables. Data were analyzed using GraphPad Prism software version 8.0.2 (GraphPad Software, La Jolla, CA, USA)”.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      This is a very nice study of Belidae weevils using anchored phylogenomics that presents a new backbone for the family and explores, despite a limited taxon sampling, several evolutionary aspects of the group. The phylogeny is useful to understand the relationships between major lineages in this group and preliminary estimation of ancestral traits reveals interesting patterns linked to host-plant diet and geographic range evolution. I find that the methodology is appropriate, and all analytical steps are well presented. The paper is well-written and presents interesting aspects of Belidae systematics and evolution. The major weakness of the study is the very limited taxon sampling which has deep implications for the discussion of ancestral estimations.

      Thank you for these comments.

      The taxon sampling only appears limited if counting the number of species. However, 70 % of belid species diversity belongs to just two genera. Moreover, patterns of host plant and host organ usage and distribution are highly conserved within genera and even tribes. Therefore, generic-level sampling is a reasonable measure of completeness. Although 60 % of the generic diversity was sampled in our study, we acknowledge that our discussion of ancestral estimations would be stronger if at least one genus of

      Afrocorynina and the South American genus of Pachyurini could be included.

      Reviewer #2 (Public Review):

      Summary:

      The authors used a combination of anchored hybrid enrichment and Sanger sequencing to construct a phylogenomic data set for the weevil family Belidae. Using evidence from fossils and previous studies they can estimate a phylogenetic tree with a range of dates for each node - a time tree. They use this to reconstruct the history of the belids' geographic distributions and associations with their host plants. They infer that the belids' association with conifers pre-dates the rise of the angiosperms. They offer an interpretation of belid history in terms of the breakup of Gondwanaland but acknowledge that they cannot rule out alternative interpretations that invoke dispersal.

      Strengths:

      The strength of any molecular-phylogenetic study hinges on four things: the extent of the sampling of taxa; the extent of the sampling of loci (DNA sequences) per genome; the quality of the analysis; and - most subjectively - the importance and interest of the evolutionary questions the study allows the authors to address. The first two of these, sampling of taxa and loci, impose a tradeoff: with finite resources, do you add more taxa or more loci? The authors follow a reasonable compromise here, obtaining a solid anchored-enrichment phylogenomic data set (423 genes, >97 kpb) for 33 taxa, but also doing additional analyses that included 13 additional taxa from which only Sanger sequencing data from 4 genes was available. The taxon sampling was pretty solid, including all 7 tribes and a majority of genera in the group. The analyses also seemed to be solid - exemplary, even, given the data available.

      This leaves the subjective question of how interesting the results are. The very scale of the task that faces systematists in general, and beetle systematists in particular, presents a daunting challenge to the reader's attention: there are so many taxa, and even a sophisticated reader may never have heard of any of them. Thus it's often the case that such studies are ignored by virtually everyone outside a tiny cadre of fellow specialists. The authors of the present study make an unusually strong case for the broader interest and importance of their investigation and its focal taxon, the belid weevils.

      The belids are of special interest because - in a world churning with change and upheaval, geologically and evolutionarily - relatively little seems to have been going on with them, at least with some of them, for the last hundred million years or so. The authors make a good case that the Araucaria-feeding belid lineages found in present-day Australasia and South America have been feeding on Araucaria continuously since the days when it was a dominant tree taxon nearly worldwide before it was largely replaced by angiosperms. Thus these lineages plausibly offer a modern glimpse of an ancient ecological community.

      Weaknesses:

      I didn't find the biogeographical analysis particularly compelling. The promise of vicariance biogeography for understanding Gondwanan taxa seems to have peaked about 3 or 4 decades ago, and since then almost every classic case has been falsified by improved phylogenetic and fossil evidence. I was hopeful, early in my reading of this article, that it would be a counterexample, showing that yes, vicariance really does explain the history of *something*. But the authors don't make a particularly strong claim for their preferred minimum-dispersal scenario; also they don't deal with the fact that the range of Araucaria was vastly greater in the past and included places like North America. Were there belids in what is now Arizona's petrified forest? It seems likely. Ignoring all of that is methodologically reasonable but doesn't yield anything particularly persuasive.

      Thank you for these comments.

      The criticism that the biogeographical analysis is “not very compelling” is true to a degree, but it is only a small part of the discussion and, as stated by the reviewer, cannot be made more “persuasive”, in part because of limitations in taxon sampling but also because of uncertainties of host associations (e.g. with ferns). We tried to draw persuasive conclusions while not being too speculative at the same time. Elaborating on our short section here would only make it much more speculative — and dispersal scenarios more so than vicariance ones (at least in Belinae).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I have a few comments relative to this last point of a more general nature:

      - I think it would be informative in Figure 1 to present family names for the outgroups.

      Family names for outgroups have been added to Figure 1.

      - There is a summary of matrix composition in the results but I think a table would be better listing all necessary information for each dataset (number of taxa, number of taxa with only Sanger data, parsimony informative sites, GC content, missing data, etc...).

      We added Table S4 with detailed information about the matrices.

      - Perhaps I missed it, but I didn't find how fossil calibrations were implemented in BEAST (which prior distribution was chosen and with which parameters).

      We used uniform priors, this has been added to the Methods section.

      - I am worried that the taxon sampling (ca. 10% of the family) is too low to conduct meaningful ancestral estimations, without mentioning the moderately supported relationships among genera and large time credibility intervals. This should be better acknowledged in the paper and perhaps should weigh more into the discussion.

      Belidae in general are a rare group of weevils, and it has been a huge effort and a global collaboration to sample all tribes and over 60 % of the generic diversity in the present study. A high degree of conservation of host plant associations, host plant organ usage and distribution are observed within genera and even tribes. Therefore, we feel strongly that the resulting ancestral states are meaningful.

      Moreover, 70 % of the belid species diversity belongs to only two genera, Rhinotia and Proterhinus. Our species sampling is about 36 % if we disregard the 255 species of these two genera.

      However, we acknowledge that our results could be improved by sampling more genera of Afrocorynina and Pachyurini. However, these taxa are very hard to collect. We have acknowledged the limitation of our taxon sampling, branching supports and timetree credibility intervals in the discussion to minimize speculative in conclusions.

      - It might be nice to have a more detailed discussion of flanking regions. In my experience and from the literature there seems to be increasing concern about the use of these regions in phylogenomic inferences for multiple solid reasons especially the more you go back in time (complex homology assessment, overall gappyness, difficulty to partition the data, etc...)

      We tested the impact of flanking regions on the results of our analyses and showed this data did not having a detrimental impact. We added more details about this to the results section of the paper, including information about the cutoffs we used to trim the flanking regions.

      Reviewer #2 (Recommendations For The Authors):

      Line 42, change "recent temporal origins" to "recent origins".

      Modified in the text.

      Line 97-98, "phylogenetic hypotheses have been proposed for all genera" This is ambiguous. The syntax makes it sound like these were separate hypotheses for each genus - the relationships of the species within them, maybe. However, the context implies that the hypotheses relate to the relationships between the genera. Clarify. "A phylogenetic hypothesis is available for generic relationships in each subfamily. . . " or something.

      Modified in the text.

      Line 162, ". . . all three subtribes (Agnesiotinidi, Belini. . . " Something's wrong here. Change "subtribes" to "tribes"?

      Modified in the text.

      Line 219, the comma after "unequivocally" needs to be a semicolon.

      Modified in the text.

      Line 327 and elsewhere, the abbreviation "AHE" is used but never spelled out; spell out what it stands for at first use. Or why not spell it out every single time? You hardly ever use it and scientists' habit of using lots of obscure abbreviations is a bad one that's worth resisting, especially now that it no longer requires extra ink and paper to spell things out.

      Modified in the text.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1:

      Minor

      (MN1) The segregants should be referred to as F2 segregants as they are derived from an F1 cross.

      We thank the reviewer for pointing out this important oversight. We indeed analyzed segregants of an F1 cross and have corrected this in the text.

      (MN2) The connections to eQTLs in other organisms should be addressed in the introduction and conclusion. For example, in humans, there has been little evidence for trans eQTLs in contrast to what has been found in yeast.

      We thank the reviewer for pointing this out and improved our introduction and conclusion with such connections.

      (M3) The authors state that an advantage of scRNAseq over bulk is that it captures rare cell populations (line 79), but this advantage is not exploited in this study.

      While we did not explicitly demonstrate the effect of using scRNA-seq on capturing variation in rare cell populations, the referenced literature (21, 40) provides evidence that pooled scRNA-seq captures important expression heterogeneity (which implicitly contains potentially rare expression states). In our study, this is leveraged on F2 segregants to assess expression variation within the same lineage (genotype). This impacts the partitioning of expression variance from genotype.

      Thus, we mentioned this point to further support the choice of using scRNA-seq for this analysis and showed that even a few single cells enable the reconstruction of the genome and expression profile of rare cell types.

      (MN4) The authors use ~5% of the lineages from the original study. There is no rationale for why this is an appropriate sample size. Is there an argument for using more cells in eQTL mapping or conversely could the authors ask if fewer cells would provide similar conclusions by downsampling?

      Although scRNA-seq is highly scalable, it has limitations in terms of throughput. Indeed, a single library with 10x Genomics generates data in the order of 10^4 wellcovered cells. With these limitations, our choice of ~5% of the lineages of the original study stems from the need to recover the same lineage multiple times within these 10^4 cells (in our study, each lineage is recovered on average 4 times). 

      While it is possible to run multiple libraries and sequencing lanes, budget limitations prevent us from running more libraries, especially since we expect power to scale with the square-root of the number of lineages (there is diminishing returns). 

      (MN5) I do not agree that the use of UMIs overcomes the challenges of low sequencing depth. UMIs mitigate the possible technical artifacts due to massive PCR amplification.

      We thank the reviewer for this comment and will clarify this in the manuscript. Indeed, we intended to refer to the breadth of coverage (instead of the depth), which would usually manifest with massive PCR amplification of few transcripts.

      (MN6) There is an inadequate reference to prior work on scRNAseq in yeast that established the methods used by the authors and eQTL mapping in human cells using scRNAseq.

      We thank the reviewer for this and have added more context on scRNA-seq methods benchmark in yeast (drop-seq etc) and sc-eQTL in human. Additionally, we have cited Jariani et al. (2020) in eLife where similar techniques were employed for scRNA-seq in yeast.

      (MN7) The use of empty quotes in Figure 4A is confusing and an alternative presentation method should be used.

      We will remove these empty quotes characters and replace them with a more meaningful representation like “none”.

      (MN8) The authors speculate about the use of predicted fitness instead of observed fitness, but this is something they could explicitly address in their current study.

      We thank the reviewer for this comment but have decided not to perform a whole new bulk-segregant analysis experiment (X-QTL) to identify QTL that way. However, we do agree that we could in principle use the QTL that were identified in our previous study (Nguyen Ba et al, 2022). Despite this, we do not see the need for this because the predicted fitness is the overlap between genotype and phenotype (within the variance partitioning framework, it is the ‘narrow-sense heritability’ if one ignores epistasis). Thus, the use of predicted fitness when partitioning for expression variation would be constrained to that overlap (as opposed to the real observed fitness). This means that within the variance partitioning framework, the overlap of genotype, expression, and fitness is fully recapitulated by using predicted fitness instead (given that this predicted fitness is accurate to the narrow-sense heritability). In our previous study, we found that the QTL essentially predict all of the narrow-sense heritability. We believe it is therefore evident that the use of predicted fitness would be sufficient if and only if the expression variation independent of genotype is not associated with observed fitness.

      We note that our study cannot generalize whether the overlap between genotype and expression fully captures fitness variation explained by expression. Indeed, we believe this is not generalizable to many other contexts (for example, in development). Thus, at present, the use of predicted fitness remains a speculation.

      Major:

      (MJ1) There is insufficient information provided about the nature of data. At a minimum, the following information should be provided to enable assessment of the study: What is the total library size, how many genes are identified per cell, how many UMIs are found per cell, what is the doublet rate, and how are doublets identified (e.g. on the basis of heterozygous calls at polymorphic loci?), how many times is each genotype observed, and how many polymorphic sites are identified per cell that are the basis of genotype inferences?

      We understand that these metrics are relevant to the reader to have an idea of the power of our approach and integrate them in the manuscript in Table 1.

      (MJ2) The prior study analyzed 18 different conditions, whereas this study only assays expression in a single condition. However, the power of the authors' approach is that its efficiency enables testing eQTLs in multiple conditions. The study would be greatly strengthened through analysis of at least one more condition, and ideally several more conditions. The previous fitness study would be a useful guide for choosing additional conditions as identifying those conditions that result in the greatest contrasts in fitness QTL would be best suited to testing the generalizations that can be drawn from the study.

      We agree that a major strength of our approach is that it rapidly allows eQTL mapping in several conditions. While the experiments presented here are likely less expensive than the classical eQTL mapping experiments, the cost of 10x genomics and sequencing is still an important consideration. The pleiotropy analysis of the prior study was substantially difficult to interpret and put in context, and thus we decided to focus on a proof of concept and leave room for a more thorough analysis of multiple environments for a future study. We acknowledge that this is a main weakness of our manuscript.

      (MJ3) Alternatively, the authors could demonstrate the power of their approach by applying it to a cross between two other yeast strains. As the cross between BY and RM has been exhaustively studied, applying this approach to a different cross would increase the likelihood of making novel biological discoveries.

      We thank the reviewers for this suggestion, and it is indeed something that our lab is considering. Currently, one of our main point of the manuscript still relies on growth measurements of segregants (the fitness), which we cannot obtain from segregants and scRNA-seq alone. 

      Unfortunately, in this experimental design, it is difficult to obtain the fitness of cells and the genotype simultaneously because the barcode of the segregant is not expressed and not frequently read during genotyping. Thus, we still need to perform a whole QTL panel for a new cross without substantial re-engineering. 

      That being said, we are working on this but feel that including a new panel in this study is beyond the scope of our manuscript. 

      (MJ4) Figure 1 is misleading as A presents the original study from 2022 without important details such as how genotypes were identified. It is unclear what the barcode is in this study and how it is used in the analysis. Is the barcode for each lineage transcribed so that it is identified in the scRNA-seq data? Or, does the barcode in B refer to the cell index barcode? A clearer presentation and explanation of terms are needed to understand the method.

      Because F2 segregant lineage barcodes are not expressed, the barcode indicated in Figure 1B refers to cell barcodes from 10x Genomics. Our present study does not make use of the lineage barcode. We clarified this in the figure clarifying that panel A refers to the original study from 2022 and explicitly mentioning ‘cell barcodes’. 

      (MJ5) The rationale for the analysis reported in Figure 2B is unclear. The fitness data are from the previous study and the goal is to estimate the heritability using the genotyping data from the scRNA-Seq data. What is the explanation for why the data don't agree for only one condition, i.e. 37C? And, what are we to understand from the overall result?

      The rationale of Figure 2A/B is to show that cell lineage genotyping with scRNA-seq yields consistent results with previous genotype-phenotype analyses of the same cross. While Figure 2A shows that the single-cell imputed genotypes resemble the reference panel (sequenced in the Nguyen Ba 2022 study), Figure 2B shows that the variance partitioning to associate genotype to phenotype can be performed using the single-cell genotypes themselves (bypassing the reference panel). We believe this is an interesting result given that the reads obtained by scRNA-seq are constrained to a subset of SNP. However, we note that if the imputed single-cell genotypes were perfectly matching with the reference panel, it would not be surprising that one could do genotype-phenotype mapping from the single-cell genotypes.

      In Figure 2B, we tested whether the similarity of the single-cell imputed genotypes to the reference panel was enough to estimate heritabilities (another summary statistic). 

      In the remaining paragraphs of that result section, we further discuss that the single-cell lineage genotypes can be used for QTL mapping as well, recapitulating many of the QTL identified in the reference panel (provided that one controls for power). This result did not make it as a main Figure but is included in Figure S4.

      That being said, we decided to update the figure by comparing the estimates in subsamples of batch1 scRNA-seq to subsamples of batch 1 reference panel and subsamples of the full reference panel. Subsamples were performed to control for power in the variance partitioning. We also noticed that the fitness of several F2 segregants is missing for the phenotypes 33C, 35C and 37C in the original study so we decided to exclude these environments.

      (MJ6) Figure 3 presents an analysis of variance partitioning as a Venn diagram. This summarized result is very hard to understand in the absence of any examples of what the underlying raw data look like. For example, what does trait variation look like if only genotype explains the variance or if only gene expression explains the variance? The presented highly summarized data is not intuitive and its presentation is poor - the result that is currently provided would be easier to read in a table format, but the reader needs more information to be able to interpret and understand the result.

      The Venn diagram is largely adopted in the context of variance partitioning (see Cohen, Jacob, and Patricia Cohen. 1975. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences.) but we realize that it has not been used often for displaying heritability estimates. To this end, we have added explanatory labels for the biological meaning of the areas or components of the diagram in the Figure and in the text. 

      (MJ7) I am concerned about the conclusions that can be drawn about expression heritability. The authors claim that expression heritability is correlated with expression levels. It seems likely that this reflects differing statistical power. How can this possibility be excluded?

      We thank the reviewer for highlighting this. We now explicitly acknowledge this potential confounding factor in the manuscript.

      (MJ8) Conversely, the authors claim that the genes with the lowest heritability are genes involved in the cell cycle. However, uniquely in scRNA-seq, cell cycle regulated genes appear to have the highest variance in the data as they are only expressed in a subset of cells. Without incorporating this fact one would erroneously conclude that the variation is not heritable. To test the heritability of cell cycle regulation genes the authors should partition the cells into each cell cycle stage based on expression.

      The reviewer is right to say that the low heritability of cell cycle control genes could be explained by the fact that these genes are only expressed in a subset of the dataset. Indeed, a high transcriptomic variance does not necessarily imply a low expression heritability: the cell cycle could be the residual of the expression heritability model, i.e. it explains expression variance with low association to genetic mutation.

      That being said, our result is consistent with results obtained from yeast bulk RNA-seq (Albert et al. 2018), in which cell cycle is averaged out. 

      In our study, we also average out the cell-cycle as we use the consensus expression and the consensus genome to estimate the heritability.

      (MJ9) I do not understand Figure S5 and how eQTL sites are assigned to these specific classes given that the authors say that causative variation cannot be resolved because of linkage disequilibrium.

      The rationale for Figure S5 is to show that the QTL model obtained from single-cell data is consistent with the reference panel QTL mapping experiment. Although there is uncertainty around the exact position of the QTL, we relied on the loci with the highest likelihood and showed that the datasets have consistent features. This is enabled by the fact that the QTL identified using the scRNA-seq genotypes are the ones with largest effect size in the reference panel, and are thus more likely to be mapped accurately.

      (MJ10) The paragraph starting at line 305 is very confusing. In particular, the authors state that they identify a hotspot of regulation at the mating type locus. It is not obvious why this would be the case. Moreover, they claim that they find evidence for both MATa and MATalpha gene expression. Information is not provided about how segregants were isolated, but assuming that the authors did not dissect 25,000 tetrads to obtain 100,000 segregants I would infer that random spore using SGA was used. In that case, all cells should be MATa. The authors should clarify and explain this observation.

      Although most of the cells have the MATa mating type (as selected by random spore using SGA), it is well known and discussed in Nguyen Ba et al. paper that there are few lineages with other mating types or diploids (they are leakers in the selection process). 

      Indeed, we verified that we can detect a small number of MATalpha cells or diploids within this pool.

      (MJ11) Ultimately, it is not clear what new biological findings the authors have made. There are no novel findings with respect to causative variation underlying eQTLs and I would encourage the authors to make clearer statements in their abstract, introduction, and conclusion about the key discoveries. E.g. What are the "new associations between phenotypic and transcriptomic variations" mentioned in the abstract?

      This paper focuses more on the proof of concept that scRNA-seq can help integrate expression data in GPM analysis to reveal broad scale associations between fitness and expression. Indeed, novel findings include new hotspots of expression regulation in the RM/BY genetic background, we find that trans-regulation of expression has more impact than cis-regulation on fitness and evaluate the strength of the association between the genome, the transcriptome and fitness (in one environment). Additionally, the analysis reveals biological questions that cannot be answered even by increasing the experimental scale of eQTL mapping experiments. For example, we find that most of the missing heritability is not explained by expression. These key points will be clarified in the abstract, introduction and conclusion as suggested by the editors.

      Reviewer #2:

      (MJ1) Most of the figures center on methods development and validation for the authors' single-cell RNA-seq in the yeast cross […] One potential novelty of the study is the methods per se: that is, showing that scRNA-seq works for concomitant genotyping and gene expression profiling in the natural variation context. The authors' rigor and effort notwithstanding: in my view, this can be described as modest in terms of principles. That is, the authors did a good job putting the scRNA-seq idea into practice, but their success is perhaps not surprising or highly relevant for work outside of yeast (as the discussion says).

      Although the scope of the method is limited, we think that it can apply to any largescale dataset in which transcription variance and genetic diversity are not small. This can help reduce the lack of associations between trait heritability and expression regulation, which is frequent as these two parameters are often not measured within the same dataset. 

      We can, however, think of some other settings where a similar experiment may be interesting. This includes, for example, pooling cells from different human individuals (with enough genetic diversity) and applying the same scRNA-seq method to back-identify the individuals and matching them to a particular phenotype. We believe our proof of concept is therefore an important contribution as these other experiments might have broad implications.

      (MJ2) The more substantive claim by the authors for the impact of the study is that they make new observations about the role of expression in phenotype (lines 333-335). The major display item of the manuscript on this theme is Figure 4A, reporting which loci that control growth phenotype (from an earlier paper) also control expression. This is solid but I regret to say that the results strike me as modest.

      This paper focuses more on the proof of concept that scRNA-seq can help integrate expression data in GPM analysis to reveal broad scale associations between fitness and expression. Indeed, novel findings include new hotspots of expression regulation in the RM/BY genetic background, we find that trans-regulation of expression has more impact than cis-regulation on fitness and evaluate the strength of the association between the genome, the transcriptome and fitness (in one environment). Additionally, the analysis reveals biological questions that cannot be answered even by increasing the experimental scale of eQTL mapping experiments. For example, we find that most of the missing heritability is not explained by expression. These key points will be clarified in the abstract, introduction and conclusion as suggested by the editors.

      (MJ3) The discussion makes some perhaps fairly big claims that the work has helped "bridge understanding of how genetic variation influences transcriptomic variation" and ultimately cellular phenotype. But with the data as they stand, the authors have missed an opportunity to crystallize exactly how a given variant affects expression (perhaps in waves of regulators affecting targets that affect more regulators) and then phenotype, except for the speculations in the text on lines 305-319. The field started down this road years ago with Bayesian causality inference methods applied to eQTL and phenotype mapping (via e.g. the work of Eric Schadt). The authors could now try Mendelian randomization-type fine-grained detailed models for more firepower toward the same end, and/or experimental tests of the genotype-to-expression-to-phenotype relationship. I would see these directions, motivated by fundamental questions that are relevant to the field at large, as leading to a major advance for this very crowded field. As it stands, I felt their absence in this manuscript especially if the authors are selling principles about linking expression and phenotype as their take-home.

      We thank the reviewer for this suggestion and agree that the analysis of the genotypeto-expression-to-phenotype relationship would benefit from a more fine-grain model. While we are interested in exploring this, we decided to limit the scope of this manuscript to the proof of concept that scRNA-seq can help gain insights about the genotypephenotype map at a broader scale.

      (MN1) I also wonder whether the co-mapping of expression and growth traits in Figure 4A would have been possible with e.g. the bulk RNA-seq from Albert et al., 2018, and I recommend that the authors repeat the Figure 4A-type analyses with the latter to justify their statement that their massive scRNA data set would actually be necessary for them to bear fruit (lines 386-388).

      By repeating our eQTL hotspot analysis with Albert et al. (2018) data, we observed a non-significant association between eQTL hotspot and QTL (χ2 p = 0.50). That being said, there are some differences in the Albert et al. Experiment that preclude us from conclusively saying whether the bulk RNA-seq experiments by Alberts would not bear fruit. Indeed, that experiment is only 4 times smaller in scale and so we would not expect dramatic differences. To highlight power differences, the Albert et al. Paper identified about 6 eQTL per gene, while our study identified about 21 which is consistent with the power differences.

      This highlights that this scRNA-seq experiment is scalable, so the technique may be useful for further studies. In addition, this pooled scRNA-seq strategy enables analysis of the association of transcription with phenotype.

      (MN2) I also read the discussion of the manuscript as bringing to the fore some of the challenges a reader has in judging the current state of the results to be of actionable impact. The discussion, and the manuscript, will be improved if the authors can put the work in context, posing concrete questions from the field and stating how they are addressed here and what's left to do.

      We agree with the reviewer and have summarized our answers to some of the questions in the field in the discussion section.

      All that being said, we acknowledge the limitations of our study.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The study investigated how root cap cell corpse removal affects the ability of microbes to colonize Arabidopsis thaliana plants. The findings demonstrate how programmed cell death and its control in root cap cells affect the establishment of symbiotic relationships between plants and fungi. Key details on molecular mechanisms and transcription factors involved are also given. The study suggests reevaluating microbiome assembly from the root tip, thus challenging traditional ideas about this process. While the work presents a key foundation, more research along the root axis is recommended to gain a better understanding of the spatial and temporal aspects of microbiome recruitment.

      We thank Reviewer #1 for their positive evaluation and critical feedback.

      Reviewer #2 (Public Review):

      Summary:

      The authors identify the root cap as an important key region for establishing microbial symbioses with roots. By highlighting for the first time the crucial importance of tight regulation of a specific form of programmed cell death of root cap cells and the clearance of their cell corpses, they start unraveling the molecular mechanisms and its regulation at the root cap (e.g. by identifying an important transcription factor) for the establishment of symbioses with fungi (and potentially also bacterial microbiomes).<br /> Strengths:

      It is often believed that the recruitment of plant microbiomes occurs from bulk soil to rhizosphere to endosphere. These authors demonstrate that we have to re-think microbiome assembly as a process starting and regulated at the root tip and proceeding along the root axis.

      Weaknesses:

      The study is a first crucial starting point to investigate the spatial recruitment of beneficial microorganisms along the root axis of plants. It identifies e.g. an important transcription factor for programmed cell death, but more detailed investigations along the root axis are now needed to better understand - spatially and temporally - the orchestration of microbiome recruitment.

      We appreciate Reviewers #2 insightful comments and agree that further investigations are needed to gain a deeper understanding of the intricate interplay between the spatial and temporal recruitment of the microbiome and developmental cell death in future studies.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      - Given that the smb-3 altered PCD phenotype has already been reported in several publications, the aim of using Evans blue staining to highlight LRC cell corpses along the root surface of smb-3 is not clear. Maybe S1 would be more informative as main figure.

      As an indicator of membrane integrity loss and cell death, Evans blue staining was used to characterize all dPCD mutants described in this study and their interactions with S. indica. To avoid redundancies with other publications, we restructured Figure 1, incorporating panel S1A to provide an introductory overview of the smb-3 phenotype. The former Figure 1B is now located in Figure S1.

      - It is not clear how the analysis of protein aggregates fits into the rationale, why analyze these formations? What role should they have in the process of PCD or interaction with microbes?

      The manuscript has been modified the following way to clarify the analysis of protein aggregates in the dPCD mutants: “The transcription factor SMB promotes the expression of various dPCD executor genes, including proteases that break down and clear cellular debris and protein aggregates following cell death induction. In the LRCs of smb-3 mutants, the absence of induction of these proteases potentially explains the accumulation of protein aggregates in uncleared dead LRC cells.”.

      - Is the accumulation of misfolded and aggregated proteins also present during physiological PCD of LRC cells in the WT?

      The biochemical mechanisms underlying PCD can vary depending on the affected cell types and tissues. Within the root tip of Arabidopsis, two different modes of PCD have been described, differentiating between columella root cap cells and LRC cells. For clarification the manuscript has been adjusted the following way:” Under physiological conditions in WT roots, we previously observed protein aggregate accumulation in sloughed columella cell packages, but not during dPCD of distal LRC clearance (Llamas et al., 2021). This aligns with the findings that dPCD of the columella is affected by the loss of autophagy, while dPCD of the LRC is not (Feng et al., 2022).”.

      - I suggest being more careful when using the term "root cap" instead of "LRC" to reduce ambiguity (i.e. lines 56; 137), maybe you need to double-check the text.

      We agree with the reviewer that a clear distinction between “root cap” and “LRC” is very important. We have adjusted the manuscript to avoid any misunderstandings.

      - A technical question regarding qPCR sample preparation: doesn't washing the smb-3 roots cause a loss of LRC stretched cells and would it therefore lead to an alteration of the results?

      The mechanical washing of roots is essential to ensure a clear distinction between intraradical fungal growth and accommodation around roots. While we cannot exclude the possibility that mechanical washing removes LRC cells, intraradical quantification of fungal biomass aims to measure S. indica growth in the epidermal and cortical cell layers, underneath the uncleared LRC cells. Thus, we complemented this assay with extraradical colonization assays to quantify external fungal biomass with intact LRC cells.

      - It is not clear if S. indica promotes PCD in wt and/or in smb-3, could you comment on it?

      It remains an open question whether and to what extent S. indica promotes PCD, although there are strong indications that this fungus activates different cell death pathways at various developmental stages, including dAdo mediated cell death. We posit that certain microbes have evolved to regulate and manipulate different dPCD processes to enhance colonization, implicating a complex crosstalk between various PCD pathways. We have adjusted the manuscript to underscore this perspective the following way:” Transcriptomic analysis of both established and predicted key dPCD marker genes revealed diverse patterns of upregulation and downregulation during S. indica colonization. These findings provide a valuable foundation for future studies investigating the dynamics of dPCD processes during beneficial symbiotic interactions and the potential manipulation of these processes by symbiotic partners.”.

      - How analysis of BFN1 expression in whole root confirms its downregulation at the onset of cell death in S. indica-colonized plants. Moreover, is the transcriptional regulation of BFN1 important for PCD, or is the BFN1 protein level correlated with the establishment of cell death?

      BFN1 gene expression in Arabidopsis shows a transient decrease around 6–8 days after S. indica inoculation, coinciding with the proposed onset of S. indica-induced cell death. While we can only speculate on a potential correlation between BFN1 downregulation and the onset of S. indica-induced cell death, we have described other pathways through which S. indica induces cell death. For example, it produces small metabolites such as dAdo through the synergistic activity of two secreted fungal effector proteins (Dunken et al., 2023). This suggests that S. indica recruits different pathways to induce cell death, which may vary depending on the host plant and interact with each other as shown for many other immunity related cell death pathways which share some components.

      Regarding the second part of the question, BFN1 expression correlates positively with cells primed for dPCD (Olvera-Carrillo et al., 2015). BFN1 protein accumulates in the ER lumen and is released into the cytoplasm upon cell death induction to exert its DNase functions (Fendrych et al., 2014). If accumulation of BFN1 is cause or consequence of cell death remains to be validated.

      - Line 190: there is a typo "in the nucleus", this is superfluous given that the reporter is nuclear.

      The manuscript has been adjusted accordingly; see line L208. However, we consider the distinction important as we aim to emphasize the difference between the nuclear localization of the fluorescent signal in "healthy" cells and the dispersed fluorescent signal spreading in the cytoplasm of cells priming or undergoing dPCD.

      - Line 255: there is a typo, stem cells can not differentiate.

      The manuscript has been adjusted.

      - During root hair development some epidermal cells undergo PCD to allow the emergence of root hairs. Furthermore, during plant defense against pathogens, epidermal cells undergo cell death to prevent further colonization. Have these cell death events been reported to occur under physiological conditions during development?

      Plant defence responses in roots and the hypersensitive response (HR) still remain largely unexplored. The HR is a defence mechanism that consists of a localized and rapid cell death at the site of pathogen invasion. It is triggered by pathogenic effector proteins, usually recognized by intracellular immune receptors (NLRs), and accompanied by other features such as ROS signalling, Ca2+ bursts and cell wall modifications (Balint-Kurti, 2019). Notably, HR has been widely described in leaves, but no strong evidence has been shown for the occurrence of HR in plant roots (Hermanns et al., 2003, Radwan et al., 2005). Additionally, previous studies have not shown any transcriptional parallels between common dPCD marker genes and HR PCD in Arabidopsis (Olvera-Carrillo et al., 2015; Salguero-Linares et al., 2022).

      While S. indica is a beneficial root endophyte that does not induce classical hypersensitive response (HR) in host plants, the impact of dPCD on S. indica colonization should not be overlooked. S. indica promotes root hair formation in its hosts (Saleem et al., 2022), and in Arabidopsis, root hair cells naturally undergo cell death 2–3 weeks after emergence (Tan et al., 2016). This aspect could be particularly relevant for understanding the dynamics of S. indica colonization.

      - Showing the analysis of pBFN1 in smb-3 would help in validating the idea that the downregulation of BFN1 by S. indica is regulated independently of SMB.

      SMB is known to be a root cap specific transcription factor (Willemsen et al., 2008; Fendrych et al., 2014). The pBFN1:tdTOMATO reporter line shows that BFN1 expression occurs in many different tissues undergoing dPCD, above and below ground, where SMB is not expressed or present. Therefore, we can postulate that the downregulation of BFN1 by S. indica in the differentiation zone is regulated independently of SMB.

      - A question of great interest still remains open: is it the microbe that induces the regulation of BFN1 causing a delay in cell clearance and favoring the infection or is it the plant that reduces BFN1 to favor the interaction with the microbe? In other words, is the mechanism a response to stress or a consolidation of the interaction with the host?

      We agree with this reviewer that this question remains open. Whether active interference by fungal effector proteins, fungal-derived signaling molecules, or a systemic response of Arabidopsis roots underlies BFN1 downregulation during S. indica colonization remains to be investigated. Yet, it is noteworthy that the downregulation of BFN1 in Arabidopsis is not specific to S. indica but also occurs during interactions with other beneficial microbes such as S. vermifera and two bacterial synthetic communities. This suggests that it could be a broader plant response to microbial presence. However, at this stage, we can only speculate on these possibilities. We therefore changed some of the statements in the paper to moderate our conclusions: e.g. “Expression of plant nuclease BFN1, which is associated with senescence, is modulated to facilitate root accommodation of beneficial microbes” to leave open who exactly is controlling BFN1, the plant or the microbes.

      Reviewer #2 (Recommendations For The Authors):

      This is a straightforward study, well executed and well written. I have only a few specific comments, and some concern the statistics which is a bit more serious and where I would like to get answers first. Looking at the figures, I am sure that the authors can easily clarify the issues in the manuscript.

      We appreciate the positive feedback and included clarifications in the statistical section in the material and methods.

      Statistics:

      - The statistics are not detailed in Material and Methods, but are only briefly indicated in the headings of the figures. Include a statistics section in Material and Methods.

      We added an extra paragraph with statistical analysis in the Material and Method section for clarifications, which reads as follows:” All statistical analyses, except for the transcriptomic analysis, were performed using Prism8. Individual figures state the applied statistical methods, as well as p and F values. p-values and corresponding asterisks are defined as following, p<0.05 *, p<0.01**, p<0.001***.”.

      - Figure 1/ Figure S3, etc: First of all, a **** with p< 0.00001 does not exist! Significance in statistics just means that we assume that there is a difference with some kind of probability that has been defined as p<0.05 *, p<0.01**, p<0.001***, and NOT more! Even if p<0.000001, it is still p<0.001***. Stating the meaning of asterisks in a separate Statistics section in Materials and Methods would also avoid repetitive explanations (e.g. Figure 4, L68: 'Asterisk indicates significantly different...').

      We agree and have updated the manuscript accordingly. See comment above.  

      - Also, it is advisable to reduce the digits of the p-values to a meaningful length (e.g. Figure 2 L 36: (*P<0.0466) should be (F[1, ?] = ?; p<0.047). The * is not necessary in the text, as p<0.05 is already given. We do not obtain more information by a more exact p-value, because all we need to know is that p<0.05.

      We adjusted the p-values accordingly throughout the manuscript.

      - It is NOT sufficient to communicate just the p-value of a statistical analysis. What is always needed is the F-value (student test and ANOVA) with both nominator and denominator degrees of freedom (e.g. F[2, 10] =) AND the p-value.

      We included F-values throughout the manuscript in all main and supplemental figures to provide more clarity for the readers.

      - The reason becomes clear in Fig. 2D where the authors state that they used 3 biological replicates, each with 40 plants. I assume the statistics was wrongly based on calculating with 120 plants (F[1,120] =) as technical replicates instead of correctly the biological replicates (3 means of 40 technical replicates each, (F[1,3] =))?? If F-value and df had been given, errors like this would be immediately visible - for any reviewer/reader, but also to the authors.<br /> \=>Please re-analyze the statistics correctly.

      To assess S. indica-induced growth promotion, we measured and compared the root length of Arabidopsis plants under S. indica colonization or mock conditions at three different time points. Each genotype and treatment combination involved measuring 50 plants, with each plant serving as an independent biological replicate inoculated with the same S. indica spore solution. For comprehensive statistical analysis, we conducted the experiment a total of 3 times, using fresh fungal inoculum each time, originally referred to as "three biological replicates." We maintain that including all plant measurements is essential for a thorough statistical analysis of our growth promotion experiment. However, in order to avoid confusion, we have updated the figure legend to clarify the experimental set-up as following: “(D) Root length measurements of WT plants and smb-3 mutant plants, during S. indica colonization (seed inoculated) or mock treatment. 50 plants for each genotype and treatment combination were observed and individually measured over a time period of two weeks. WT roots show S. indica-induced growth promotion, while growth promotion of smb-3 mutants was delayed and only observed at later stages of colonization. This experiment was repeater 2 more independent times, each time with fresh fungal material. Statistical analysis was performed via one-way ANOVA and Tukey’s post hoc test (F [11, 1785] = 1149; p < 0.001). For visual representation of statistical relevance each time point was additionally evaluated via one-way ANOVA and Tukey’s post hoc test at 8dpi (F [3, 593] = 69.24; p < 0.001), 10dpi (F [3, 596] = 47.59; p < 0.001) and 14dpi (F [3, 596] = 154.3; p < 0.001).”

      - Figure 2, L 18; Figure 5, L 95, Figure S5 L53, etc: I am worried about executing a statistical test 'before normalization' - what does it mean?? WHY was a normalization necessary, WHAT EXACTLY was normalized and do we see normalized plots that do NOT correspond to the data on which the statistics was based? At least this implies 'before normalization'! Please explain, and/or re-analyze the statistics correctly.

      We agree that the phrasing “before normalization” may lead to confusion, as the normalization of data to the mean of the control group does not alter the statistical analysis. Normalization was performed to achieve a clearer visual representation. Additionally, Evans blue staining is quantified by measuring the mean grey value, which does not correspond to a specific unit. Normalizing the data allows for the representation of relative staining intensities. The manuscript has been adjusted accordingly throughout.

      - Statistics in Figure 1: L8/9: 'in reference to B' is unclear, I guess the mean of the control was used as a reference? This would also explain the variation in relative staining intensity (Figure 1C). if normalization was carried out (see above) all control (WT) values should be exactly 1, but they are not. I guess it was normalized to the mean of the control?

      “In reference to X” or “corresponding to X” typically means that Figure X shows an example image from the dataset on which the statistical quantification is based. We have updated the manuscript throughout the main and supplemental figure legends to use “refers to image shown in X” to avoid confusion.  

      Figure S4, L 42: '(corresponding to A)', see comment above.

      See comment above.

      Figure 5B, L 87: '(in reference to A)'; L93: (in reference to C), etc. - see above. Unclear how A was used as a reference. Was it the mean of A? BUT again only 3 biological replicates! So it has to be the mean of 3 reps that was used as control! OR can we at least say that the 10 measured roots were independent of each other (crucial (!) precondition for executing student's test or ANOVA? Then you would have at least 10 replicates (mean of 4 pictures taken per root for each).

      Quantification of Evans blue staining intensity involved taking 4 pictures along the main root axis of each plant. We re-evaluated the statistical analysis correctly with the averaged datapoints for each plant root. We adjusted main figures (Fig.1C and 5B) and supplementary figures (Fig. S1C and S4B) and changed the material and methods section of the manuscript as following: “4 pictures were taken along the main root axis of each plant and averaged together, for an overview of cell death in the differentiation zone.”.

      - Statistics in Figure 4, L 69: what means 'adjusted p-value'? Which analysis?

      The material and method section of the manuscript has been adjusted as following for clarification: “Differential gene expression analysis was performed using the R package DESeq2 (Love et al., 2014). Genes with an FDR adjusted p-value < 0.05 were considered as differentially expressed genes (DEGs). The adjusted p-value refers to the transformation of the p-value obtained with the Wald test after considering multiple testing. To visualize gene expression, genes expression levels were normalized as Transcript Per kilobase million (TPM).”.

      - Statistics in Figure 5, L102-105: see above! Were the statistics correctly calculated with 7 reps, or wrongly with 30? # I guess each time point was normalized to the mean of WT? By the way, it is not clear if repeated measurements were done on the same plants. If repeated measurements were done on the SAME plants, then these data are statistically not independent anymore (time-series analysis), and e.g. MANOVA must be used and significant (!) before proceeding to ANOVA and Tukey.

      The statistics for quantifying intraradical colonization of Arabidopsis roots were calculated with 7 replicates. For each replicate, 30 plants were pooled to obtain sufficient material for RNA extraction and cDNA synthesis. Plants from the same genotype were harvested separately for each time point, ensuring that the time points are statistically independent from one another.

      Statistics Fig. S1, L 11-12: see above, '5 plants were imaged for each mock and ..., evaluating 4 pictures ...' That means you have means of 4 pictures for 5 biological replicates - the figure shows 20 replicates. However, the statistics must be based on 5 reps! You may indicate the 4 pictures per root by different colours. Change throughout all figures and calculate the statistics correctly (show this by indicating the correct df in your statistics as discussed above).

      We have conducted a re-evaluation of the statistical analysis of Evans blue staining for all figures presented throughout the manuscript. See comment above.

      Statistics Fig. S3, L 31: 'Relative quantification of ...' see above, relative to what? Explain this also clearly in Statistics in Materials and Methods.

      Relative quantification refers to normalizing data to the mean of the corresponding control group. Figure legends have been revised to clarify this point.

      Statistics Fig. S5, L 57/58: 'Genes are clustered using spearmen correlation as distance measure'. If I understand it correctly, Spearman correlation is NOT a distance measure. You used Spearman correlation to cluster gene expression. Now it would be interesting to know WHICH clustering method was used, e.g. a hierarchical or non-hierarchical clustering method? and which one, e.g. single linkage, complete linkage? The outcome depends very much on the clustering method. Therefore, this information is important.

      To perform gene clustering, we set the option “clustering_distance_rows = "spearman" “ of the Heatmap function included in the ComplexHeatmap package. The function first computes the distance matrix using the formula 1 - cor(x, y, method) with Spearman as correlation method. It then performs hierarchical clustering using the complete linkage method by default.

      # Arabidopsis is a genus name and by convention, this has to be written throughout the MS in italics - even if the authors define Arabidopsis thaliana (in italics) = Arabidopsis (without).

      # typos

      L 24: smb-3 mutants (must be explained)

      L 83 insert: ...two well-characterized SMB loss-of-function ...

      While smb-3 is a SMB loss-of-function mutant bfn1-1 is a BFN1 loss-of-function mutant, independent of SMB.

      L 93: The switch between the biotrophic..

      L 119: distal border

      L 125: aggregates in the smb-3 mutant

      L 132: between the meristematic

      L 177/178: was observed at 6 dpi in Arabidopsis colonized by S. indica.

      L 250: colonization stages by S. indica.

      L 288: and root cell death (RCD)

      L 289: and towards...

      L 296: dPCD protects the

      L 304: This raises the

      L 351: to remove loose

      All the above-mentioned typos have been addressed in the manuscript.

      Materials and Methods

      L 327: give composition and supplier of MYP medium

      L 344 name supplier of MS medium

      L 338 name supplier of PNM medium

      L 353: replace 'Following,..' with 'Subsequently, ..'

      L 360: replace 'on plate' with 'on the agar plate' - change throughout the Materials and methods!

      L 360: name supplier of Alexa Fluor 488

      L 363: name supplier of (MS) square plate

      L 377: insert comma: After cleaning, the roots...

      L 394: explain the acronym and name supplier of PBS

      L 399: explain the acronym and name supplier of TBST

      All the above-mentioned comments in the material and methods have been addressed in the manuscript.  

      Figure 2G) x-axis, change order: Hoechst/Proteostat

      Figure 3, L53: propidium iodide: name supplier

      Figure 4, L68: Asterisks

      L 60: explain LRC

      L 67, L69, L70: explain the acronym TPM and how expression values were measured in Materials and Methods, the brief explanation in the figure is unclear and not sufficient

      All the above-mentioned comments in the figure legends have been addressed.

      Figure S5, L50: explain 'SynComs'

      L 51: corrects 30 plans => 30 plants

      L 56: vaules => values

      L 57: use capital letter: Spearman correlation

      All the above-mentioned comments in the supplemental figure legends have been addressed.

    1. Readers come to digital work with expectations formed by print, including extensive and deep tacit knowledge of letter forms, print conventions, and print literary modes. Of necessity, electronic literature must build on these expectations even as it modifies and transforms them. At the same time, because electronic literature is normally created and performed within a context of networked and programmable media, it is also informed by the powerhouses of contemporary culture, particularly computer games, films, animations, digital arts, graphic design, and electronic visual culture. In this sense electronic literature is a "hopeful monster" (as geneticists call adaptive mutations) composed of parts taken from diverse traditions that may not always fit neatly together. Hybrid by nature, it comprises a trading zone (as Peter Galison calls it in a different context) in which different vocabularies, expertises and expectations come together to see what might come from their intercourse. (Note 2) Electronic literature tests the boundaries of the literary and challenges us to re-think our assumptions of what literature can do and be

      This thought beautifully captures the complex nature of electronic literature. It highlights how this new form builds upon existing expectations from print while simultaneously embracing the possibilities of the digital world. The "hopeful monster" analogy is apt, suggesting a hybrid creation born from diverse influences. By drawing on the powerhouses of contemporary culture, it pushes the boundaries of what we consider "literary," challenging us to rethink our assumptions about its forms and functions. Electronic literature thrives in this "trading zone," where different languages, expertises, and expectations meet and converge, creating something entirely new.

    1. Even without empirical evidence, one might find support for one or both methods from other studies conducted on similar strategies.

      Lesson 2: Critical Discussion: What is Scientific Based Research?

      This sentence stood out to me due to the use of social media now a days. There are many pages/groups that provide an area to share and collaborate with other professionals in the field and some may think that this is a great place to find ideas and supports, However, as professionals we should also be doing research behind these ideas/collaborations. This statement stood out because you may have an idea and go onto social media and see that someone else has also done something similar that has worked, but it may not be the best practice. Social media has allowed for multiple opportunities for educators to collaborate but it truly lacks the research and professionalism behind it. If you are to see an idea/practice online, as a professional, you should be doing research on it to ensure that it is scientifically reasonable and that it will provide an appropriate and positive learning approach to the centre and child.

    1. Why should one go tothe trouble of growing a crop when, like the state (!), one cansimply confiscate it from the granary.

      The development of farming and creation of states may not have been good for everyone. Scott compares collecting taxes, which we usually see as a normal part of running a state, to raiding, which we think of as a violent act. This makes one question whether early states were really that different or better than groups that raided. The following sentence "Raiding is our agriculture," shows that different cultures have their own ways of getting what they need, and raiding was just as valid as farming for some people. This idea challenges the usual story that farming was always a step forward.

    1. Author response:

      The following is the authors’ response to the original reviews.

      In this useful study, a solid machine learning approach based on a broad set of systems to predict the R2 relaxation rates of residues in intrinsically disordered proteins (IDPs) is described. The ability to predict the patterns of R2 will be helpful to guide experimental studies of IDPs. A potential weakness is that the predicted R2 values may include both fast and slow motions, thus the predictions provide only limited new physical insights into the nature of the relevant protein dynamics.

      Fast motions are less sequence-dependent (e.g., as shown by R1). Hence the sequence-dependent part of R2 singles out slow motion.

      Public Reviews:

      Reviewer #1 (Public Review):

      Solution state 15N backbone NMR relaxation from proteins reports on the reorientational properties of the N-H bonds distributed throughout the peptide chain. This information is crucial to understanding the motions of intrinsically disordered proteins and as such has focussed the attention of many researchers over the last 20-30 years, both experimentally, analytically and using numerical simulation.

      This manuscript proposes an empirical approach to the prediction of transverse 15N relaxation rates, using a simple formula that is parameterised against a set of 45 proteins. Relaxation rates measured under a wide range of experimental conditions are combined to optimize residuespecific parameters such that they reproduce the overall shape of the relaxation profile. The purely empirical study essentially ignores NMR relaxation theory, which is unfortunate, because it is likely that more insight could have been derived if theoretical aspects had been considered at any level of detail.

      NMR relaxation theory is very valuable in particular regarding motions on different timescales. However, it has very little to say about the sequence dependence of slow motions, which is the focus of our work.

      Despite some novel aspects, in particular the diversity of the relaxation data sets, the residuespecific parameters do not provide much new insight beyond earlier work that has also noted that sidechain bulkiness correlated with the profile of R2 in disordered proteins.

      The novel insight from our work is that R2 can mostly be predicted based on the local sequence.

      Nevertheless, the manuscript provides an interesting statistical analysis of a diverse set of deposited transverse relaxation rates that could be useful to the community.

      Thank you!

      Crucially, and somewhat in contradiction to the authors stated aims in the introduction, I do not feel that the article delivers real insight into the nature of IDP dynamics. Related to this, I have difficulty understanding how an approximate prediction of the overall trend of expected transverse relaxation rates will be of further use to scientists working on IDPs. We already know where the secondary structural elements are (from 13C chemical shifts which are essential for backbone assignment) and the necessary 'scaling' of the profile to match experimental data actually contains a lot of the information that researchers seek.

      Again, the novel insight is that slow motions that dictate the sequence dependence of R2 can mostly be predicted based on the local sequence. The scaling factor may contain useful information but does not tell us anything about the sequence dependence of IDP dynamics.

      This reviewer brings up a lot of valuable points, clearly from an NMR spectroscopist’s perspective. The emphasis of our paper is somewhat different from that perspective. For example, we were interested in whether tertiary contacts make significant contributions to R2, as sometimes claimed. Our results show that, in general, they do not; instead local contacts dominate the sequence dependence of R2.

      (1) The introduction is confusing, mixing different contributions to R2 as if they emanated from the same physics, which is not necessarily true. 15N transverse relaxation is said to report on 'slower' dynamics from 10s of nanoseconds up to 1 microsecond. Semi-classical Redfield theory shows that transverse relaxation is sensitive to both adiabatic and non-adiabatic terms, due to spin state transitions induced by stochastic motions, and dephasing of coherence due to local field changes, again induced by stochastic motions. These are faster than the relaxation limit dictated by the angular correlation function. Beyond this, exchange effects can also contribute to measured R2. The extent and timescale limit of this contribution depends on the particular pulse sequence used to measure the relaxation. The differences in the pulse sequences used could be presented, and the implications of these differences for the accuracy of the predictive algorithm discussed.

      Indeed pulse sequences affect the measured R2 values. We make the modest assumption that such experimental idiosyncrasy would not corrupt the sequence dependence of IDP dynamics. As for exchange effects, our expectation is that the current SeqDYN may not do well for R2s where slow exchange plays a dominant role in generating sequence dependence, as tertiary contacts would be prominent in those cases; we now present one such case (new Fig. S5).

      (2) Previous authors have noted the correlation between observed transverse relaxation rates and amino acid sidechain bulkiness. Apart from repeating this observation and optimizing an apparently bulkiness-related parameter on the basis of R2 profiles, I am not clear what more we learn, or what can be derived from such an analysis. If one can possibly identify a motif of secondary structure because raised R2 values in a helix, for example, are missed from the prediction, surely the authors would know about the helix anyway, because they will have assigned the 13C backbone resonances, from which helical propensity can be readily calculated.

      We think that a sequence-based method that is demonstrated to predict well R2 values from expensive NMR experiments is significant. That pi-pi and cation-pi interactions are prominent features of local contacts and may seed tertiary contacts and mediate inter-chain contacts that drive phase separation is a valuable insight.

      (3) Transverse relaxation rates in IDPs are often measured to a precision of 0.1s-1 or less. This level of precision is achieved because the line-shapes of the resonances are very narrow and high resolution and sensitivity are commonly measurable. The predictions of relaxation rates, even when applying uniform scaling to optimize best-agreement, is often different to experimental measurement by 10 or 20 times the measured accuracy. There are no experimental errors in the figures. These are essential and should be shown for ease of comparison between experiment and prediction.

      Again, our focus is not the precision of the absolute R2 values, but rather the sequence dependence of R2.

      (4) The impact of structured elements on the dynamic properties of IDPs tethered to them is very well studied in the literature. Slower motions are also increased when, for example the unfolded domain binds a partner, because of the increased slow correlation time. The ad hoc 'helical boosting' proposed by the authors seems to have the opposite effect. When the helical rates are higher, the other rates are significantly reduced. I guess that this is simply a scaling problem. This highlights the limitation of scaling the rates in the secondary structural element by the same value as the rest of the protein, because the timescales of the motion are very different in these regions. In fact the scaling applied by the authors contains very important information. It is also not correct to compare the RMSD of the proposed method with MD, when MD has not applied a 'scaling'. This scaling contains all the information about relative importance of different components to the motion and their timescales, and here it is simply applied and not further analysed.

      Actually, applying the boost factor achieves the effect of a different scaling factor for the secondary structure element than for the rest of the protein.

      Regarding comparing RMSEs of SeqDYN and MD, it is true that SeqDYN applies a scaling factor whereas MD does not. However, even if we apply scaling to MD results it will not change the basic conclusion that “SeqDYN is very competitive against MD in predicting _R_2, but without the significant computational cost.”

      (5) Generally, the uniform scaling of all values by the same number is serious oversimplification. Motions are happening on all timescales they are giving rise to different transverse relaxation. It is not possible to describe IDP relaxation in terms of one single motion. Detailed studies over more than 30 years, have demonstrated that more than one component to the autocorrelation function is essential in order to account for motions on different timescales in denatured, partially disordered or intrinsically unfolded states. If one could 'scale' everything by the same number, this would imply that only one timescale of motion were important and that all others could be neglected, and this at every site in the protein. This is not expected to be the case, and in fact in the examples shown by the authors it is also never the case. There are always regions where the predicted rates are very different from experiment (with respect to experimental error), presumably because local dynamics are occurring on different timescales to the majority of the molecule. These observations contain useful information, and the observation that a single scaling works quite well probably tells us that one component of the motion is dominant, but not universally. This could be discussed.

      The reviewer appears to equate a single scaling factor with a single type of motion -- this is not correct. A single scaling factor just means that we factor out effects (e.g., temperature or magnetic field) that are uniform across the IDP sequence.

      (6) With respect to the accuracy of the prediction, discussion about molecular detail such as pi-pi interactions and phase separation propensity is possibly a little speculative.

      It is speculative; we now add more support to this speculation (p. 18 and new Fig. S6).

      (7) The authors often declare that the prediction reproduces the experimental data. The comparisons with experimental data need to be presented in terms of the chi2 per residue, using the experimentally measured precision which as mentioned, is often very high.

      Again, our interest is the sequence dependence of R2, not the absolute R2 value and its measurement precision.

      Reviewer #2 (Public Review):

      Qin, Sanbo and Zhou, Huan-Xiang created a model, SeqDYN, to predict nuclear magnetic resonance (NMR) spin relaxation spectra of intrinsically disordered proteins (IDPs), based primarily on amino acid sequence. To fit NMR data, SeqDYN uses 21 parameters, 20 that correspond to each amino acid, and a sequence correlation length for interactions. The model demonstrates that local sequence features impact the dynamics of the IDP, as SeqDYN performs better than a one residue predictor, despite having similar numbers of parameters. SeqDYN is trained using 45 IDP sequences and is retrained using both leave-one-out cross validation and five-fold cross validation, ensuring the model's robustness. While SeqDYN can provide reasonably accurate predictions in many cases, the authors note that improvements can be made by incorporating secondary structure predictions, especially for alpha-helices that exceed the correlation length of the model. The authors apply SeqDYN to study nine IDPs and a denatured ordered protein, demonstrating its predictive power. The model can be easily accessed via the website mentioned in the text.

      While the conclusions of the paper are primarily supported by the data, there are some points that could be extended or clarified.

      (1) The authors state that the model includes 21 parameters. However, they exclude a free parameter that acts as a scaling factor and is necessary to fit the experimental data (lambda). As a result, SeqDYN does not predict the spectrum from the sequence de-novo, but requires a one parameter fitting. The authors mention that this factor is necessary due to non-sequence dependent factors such as the temperature and magnetic field strength used in the experiment.

      Given these considerations, would it be possible to predict what this scaling factor should be based on such factors?

      There are still too few data to make such a prediction.

      (2) The authors mention that the Lorentzian functional form fits the data better than a Gaussian functional form, but do not present these results.

      We tested the different functional forms at the early stage of the method development. The improvement of the Lorentzian over the Gaussian was slight and we simply decided on the Lorentzian and did not go back and do a systematic analysis.

      (3) The authors mention that they conducted five-fold cross validation to determine if differences between amino acid parameters are statistically significant. While two pairs are mentioned in the text, there are 190 possible pairs, and it would be informative to more rigorously examine the differences between all such pairs.

      We now present t-test results for other pairs in new Fig. S3.

      Reviewer #3 (Public Review):

      The manuscript by Qin and Zhou presents an approach to predict dynamical properties of an intrinsically disordered protein (IDP) from sequence alone. In particular, the authors train a simple (but useful) machine learning model to predict (rescaled) NMR R2 values from sequence. Although these R2 rates only probe some aspects of IDR dynamics and the method does not provide insight into the molecular aspects of processes that lead to perturbed dynamics, the method can be useful to guide experiments.

      A strength of the work is that the authors train their model on an observable that directly relates to protein dynamics. They also analyse a relatively broad set of proteins which means that one can see actual variation in accuracy across the proteins.

      A weakness of the work is that it is not always clear what the measured R2 rates mean. In some cases, these may include both fast and slow motions (intrinsic R2 rates and exchange contributions). This in turn means that it is actually not clear what the authors are predicting. The work would also be strengthened by making the code available (in addition to the webservice), and by making it easier to compare the accuracy on the training and testing data.

      Our method predicts the sequence dependence of R2, which is dominated by slower dynamics.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      (1) Should make sure to define abbreviations such as NMR and SeqDYN.

      We now spell out NMR at first use. SeqDYN is the name of our method and is not an abbreviation.

      (2) The authors do not mention how the curves in Figure 2A are calculated.

      As we stated in the figure caption, these curves are drawn to guide the eye.

      (3) May be interesting to explore how the model parameters (q) correlate with different measures of hydrophobicity (especially those derived for IDPs like Urry). This may point to a relationship between amino acid interactions and amino acid dynamics

      We now present the correlation between q and a stickiness parameter refined by Tesei et al. (new ref 45) and used for predicting phase separation equilibrium (new Fig. S6).

      (4) The authors demonstrate that secondary structure cannot be fully accounted for by their model. They make a correction for extended alpha-helices, but the strength of this correction seems to only be based on one sequence. Would a more rigorous secondary structure correction further improve the model and perhaps allow its transferability to ordered proteins?

      We have five 4 test cases (Figs. 4E, F and 5H, I). However, we doubt that the SeqDYN method will be transferable to ordered proteins.

      Reviewer #3 (Recommendations For The Authors):

      Changes that could strengthen the manuscript substantially.

      (1) The authors do not really define what they mean by dynamics, but given that they train and benchmark on R2 measurements, the directly probe whatever goes into the measured R2. Using a direct measurement is a strength since it makes it clear what they are predicting. It also, however, makes it difficult to interpret. This is made clear in the text when the authors, for example write "𝑅2 is the one most affected by slower dynamics (10s of ns to 1 μs and beyond)." First, with the "and beyond" it could literally mean anything. Second, the "normal" R2 rate is limited up to motions up to the (local) "tumbling/reorganization" time (which is much faster), so any slow motions that go into R2 would be what one would normally call "exchange". The authors should thus make it clearer what exactly it is they are probing. In the end, this also depends on the origin of the experimental data, and whether the "R2" measurements are exchange-free or not. This may be a mixture, which hampers interpretations and which may also explain some of the rescaling that needs to be done.

      We now remove “and beyond”, and also raise the possibility that R2 measurements based on 15N relaxation may have relatively small exchange contributions (p. 17).

      (2) Related to the above, the authors might consider comparing their predictions to the relaxation experiments from Kriwacki and colleagues on a fragment of p27. In that work, the authors used dispersion experiments to probe the dynamics on different timescales. The authors would here be able to compare both to the intrinsic R2 rates (when slow motions are pulsed away) as well as the effective R2 rates (which would be the most common measurement). This would help shed light on (at least in one case) which type of R2 the prediction model captures. https://doi.org/10.1021/jacs.7b01380

      We now report this comparison in new Fig. S5 and discuss its implications (p. 17-18).

      (3) In some cases, disagreement between prediction and experiments is suggested to be due to differences in temperature, and hence is used as an argument for the rescaling done. Here, the authors use a factor of 2.0 to explain a difference between 278K and 298K, and a factor of 2.4 to explain the difference between 288K and 298K. It would be surprising if the temperature effect from 288K->298K is larger than from 278K->298K. Does this not suggest that the differences come as much from other sources?

      Note that the scaling factors 2.0 and 2.4 were obtained on two different IDPs. It is most likely that different IDPs have different scaling factors for temperature change. As a simple model, the tumbling time for a spherical particle scales with viscosity and the particle volume; correspondingly the scaling factor for temperature change should be greater for a larger particle than for a smaller particle.

      (4) The authors find (as have others before) aromatic residues to be common at/near R2 peaks. They suggest this to be indicative for Pi-Pi interactions. Could this not be other types of interactions since these residues are also "just" more hydrophobic? Also, can the authors rule out that the increased R2 rates near aromatic residues is not due to increased dynamics, but simply due to increased Rex-terms due to greater fluctuations in the chemical shifts near these residues (due to the large ring current effects).

      We noted both pi-pi and cation-pi as possible interactions that raise R2. There can be other interactions involving aromatic residues, but it’s unlikely to be only hydrophobic as Arg is also in the high-q end. For the same reason, a ring-current based explanation would be inadequate.

      (5) The authors write: "We found that, by filtering PsiPred (http://bioinf.cs.ucl.ac.uk/psipred) (35) helix propensity scores (𝑝,-.) with a very high cutoff of 0.99, the surviving helix predictions usually correspond well with residues identified by NMR as having high helix propensities." It would be good to show the evidence for this in the paper, and quantify this statement.

      The cases of most interest are the ones with long predicted helices, of which there are only 3 in the training set. For Sev-NT and CBP-ID4, we already summarize the NMR data for helix identification in the first paragraph of Results; the third case is KRS-NT, which we elaborate in p. 14.

      (6) When analysing the nine test proteins, it would be very useful for the reader to get a number for the average accuracy on the nine proteins and a corresponding number for the training proteins. The numbers are maybe there, but hard to find/compare. This would be important so that one can understand how well the model works on the training vs testing data.

      We now present the mean RMSE comparison in p. 14.

      (7) The authors write: "The 𝑞 parameters, while introduced here to characterize the propensities of amino acids to participate in local interactions, appear to correlate with the tendencies of amino acids to drive liquid-liquid phase separation." It would be good to show this data and quantify this.

      We now list supporting data in p. 18 and present new Fig. S6 for further support.

      (8) It is great that the authors have made a webservice available for easy access to the work. They should in my opinion also make the training code and data available, as well as the final trained model. Here it would also be useful to show the results from the use of a Gaussian that was also tested, and also state whether this model was discarded before or after examining the testing data.

      We have listed the IDP characteristics and sequences in Tables S1 and S2. We’re unsure whether we can disseminate the experimental R2 data without the permission of the original authors. As for the Gaussian function, as stated above, it was abandoned at an early state, before examining the testing data.

      Changes that would also be useful

      (1) The authors should make it clearer what they predict and what they don't. They mention transient helix formation and various contacts, but there isn't a one-to-one relationship between these structural features and R2 rates. Hence, they should make it clearer that they don't predict secondary structure and that an increased R2 rate may be indicative of many different structural/dynamical features on many different time scales.

      We clearly state that we apply a helix boost after the regular SeqDYN prediction.

      (2) The authors write "Instead, dynamics has emerged as a crucial link between sequence and function for IDPs" and cite their own work (reference 1) as reference for this statement. As far as I can see, that work does not study function of IDPs. Maybe the authors could cite additional work showing that the dynamics (time scales) affects function of IDPs beyond "just" structure? Otherwise, the functional consequences are not clear. Maybe the authors mean that R2 rates are indicative of (residual) structure, but that is not quite the same. Also, even in that case, there are likely more appropriate references.

      Ref. 1 summarized a number of scenarios where dynamics is related to function.

      (3) The authors might want to look at some of the older literature on interpreting NMR relaxation rates and consider whether some of it is worth citing.

      Fitting/understanding R2 profiles https://doi.org/10.1021/bi020381o https://doi.org/10.1007/s10858-006-9026-9

      MD simulations and comparisons to R2 rates without ad hoc reweighting (in addition to the papers from the authors themselves). https://doi.org/10.1021/ja710366c https://doi.org/10.1021/ja209931w

      The R2 data for the two unfolded proteins are very helpful! We now present the comparison of these data to SeqDYN prediction in Fig. 6C, D. The MD papers are superseded by more recent studies (e.g., refs. 1 and 14).

      There are more like these.

      (4) In the analysis of unfolded lysozyme, I assume that the authors are treating the methylated cysteines (which are used in the experiments) simply as cysteine. If that is the case, the authors should ideally mention this specifically.

      Treatment of methylated cysteines is now stated in the Fig. 6 caption.

      (5) The authors write "Pro has an excessively low ms𝑅2 [with data from only two IDPs (32, 33)], but that is due to the absence of an amide proton." It would be useful with an explanation why lacking a proton gives rise to low 15N R2 rates.

      That assertion originated from ref. 32.

      (6) When applying the model, the authors predict msR2 and then compare to experimental R2 by rescaling with a factor gamma. It would be good to make it clearer whether this parameter is always fitted to the experiments in all the comparisons. It would be useful to list the fitted gamma values for all the proteins (e.g. in Table S1).

      We already give a summary of the scaling factors (“For 39 of the 45 IDPs, Υ values fall in the range of 0.8 to 2.0 s–1”, p. 10).

      (7) p. 14 "nineth" -> "ninth"

      Corrected

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Strengths: 

      The paper clearly presents the resource, including the testing of candidate enhancers identified from various insects in Drosophila. This cross-species analysis, and the inherent suggestion that training datasets generated in flies can predict a cis-regulatory activity in distant insects, is interesting. While I can not be sure this approach will prevail in the future, for example with approaches that leverage the prediction of TF binding motifs, the SCRMShaw tool is certainly useful and worth consideration for the large community of genome scientists working on insects. 

      We thank the reviewer for the positive comments, and would just like to point out that we agree: while we cannot of course know if other methods will overtake SCRMshaw for enhancer prediction—we assume they will, at some point (although motif-based approaches have not fared as well in the past)—for now, SCRMshaw provides strong performance and is a useful part of the current toolkit.

      Weaknesses: 

      While the authors made the effort to provide access to the SCRMShaw annotations via the RedFly database, the usefulness of this resource is somewhat limited at the moment. First, it is possible to generate tables of annotated elements with coordinates, but it would be more useful to allow downloads of the 33 genome annotations in GFF (or equivalent) format, with SCRMshaw predictions appearing as a new feature. Also, I should note that unlike most species some annotations seem to have issues in the current RedFly implementation. For example, Vcar and Jcoen turn empty. 

      We have addressed these weaknesses in several ways:

      (1) We have created GFF versions of the SCRMshaw predictions and provide them standalone and also merged into the available annotation GFFs for each of the 33 species

      (2) We have made these GFF files, and also the original SCRMshaw output files, available for download in a Dryad repository linked to the publication (https://doi.org/10.5061/dryad.3j9kd51t0).

      (3) We have added the inadvertently omitted species to the REDfly/SCRMshaw database.

      We agree that the database functions are still somewhat limited, but note that database development is ongoing and we expect functionality to increase over time. In the meantime, the Dryad repository ensures that all results reported in this paper are directly available.

      Reviewer #2 (Public Review): 

      Summary: 

      … Upon identification of predicted enhancer regions, the authors perform post-processing step filtering and identify the most likely predicted enhancer candidates based on the proximity of an orthologous target gene. …

      We respectfully point out a small misunderstanding here on the part of the reviewer. We stress that putative target gene assignments and identities have no impact at all on our prediction of regulatory sequences, i.e., they are not “based on the proximity of an orthologous target gene.” Predictions are solely based on sequence-dependent SCRMshaw scores, with no regard to the nature or identities of nearby annotated features. Putative target genes are mapped to Drosophila orthologs purely as a convenience to aid in interpreting and prioritizing the predicted regulatory elements. We have added language on page 8 (lines 189ff) to make this more clear in the text.

      Weaknesses:

      This work provides predicted enhancer annotations across many insect species, with reporter gene analysis being conducted on selected regions to test the predictions. However, the code for the SCRMshaw analysis pipeline used in this work is not made available, making reproducibility of this work difficult. Additionally, while the authors claim the predicted enhancers are available within the REDfly database, the predicted enhancer coordinates are currently not downloadable as Supplementary Material or from a linked resource. 

      We have placed all the code for this paper into a GitHub repository “Asma_etal_2024_eLife” (https://github.com/HalfonLab/Asma_etal_2024_eLife) to address this concern. As described in our response to Reviewer 1, above, all results are now available in multiple formats in a linked Dryad repository in addition to the REDfly/SCRMshaw database.

      The authors do not validate or benchmark the application of SCRMshaw against other published methods, nor do they seek to apply SCRMshaw under a variety of conditions to confirm the robustness of the returned predicted enhancers across species. Since SCRMshaw relies on an established k-mer enrichment of the training loci, its performance is presumably highly sensitive to the selection of training regions as well as the statistical power of the given k-mer counts. The authors do not justify their selection of training regions by which they perform predictions. 

      Our objective in this study was not to provide proof-of-principle for the SCRMshaw method, as we have established the efficacy of the approach at this point in several previous publications. Rather, the objective here was to make use of SCRMshaw to provide an annotation resource for insect regulatory genomics. Note that the training regions we used here are the same as those we have used in earlier work. Naturally, we performed various assessments to establish that the method was working here, but we make no claims in this work about SCRMshaw’s relative efficiency compared to other methods. Some of our prior publications include assessments of the sort the reviewer references, which suggest that SCRMshaw is at least comparable to other enhancer discovery approaches. We note that benchmarking of such methods is in fact extremely complicated due to the fact that there are no established true positive/true negative data sets against which to benchmark (we have explored this in Asma et al. 2019 BMC Bioinformatics).

      While there is an attempt made to report and validate the annotated predicted enhancers using previously published data and tools, the validation lacks the depth to conclude with confidence that the predicted set of regions across each species is of high quality. In vivo, reporter assays were conducted to anecdotally confirm the validity of a few selected regions experimentally, but even these results are difficult to interpret. There is no large-scale attempt to assess the conservation of enhancer function across all annotated species. 

      We respectfully disagree that there is insufficient validation. We bring several different lines of evidence to bear suggesting that our results fall into the accuracy range—roughly 75%—established both here and in previous work. We are also clear about the fact that these are predictions only and need to be viewed as such (e.g. line 638). Although “large-scale” in vivo validation assays would certainly be both interesting and worthwhile, the necessary resources for such an assessment places it beyond our present capability.

      Lastly, it is suggested that predicted regions are derived from the shared presence of sequence features such as transcription factor binding motifs, detected through k-mer enrichment via SCRMshaw. This assumption has not been examined, although there are public motif discovery tools that would be appropriate to discover whether SCRMshaw is assigning predicted regions based on previously understood motif grammar, or due to other sequence patterns captured by k-mer count distributions. Understanding the sequence-derived nature of what drives predictions is within the scope of this work and would boost confidence in the predicted enhancers, even if it is limited to a few training examples for the sake of clarity of interpretation. 

      Again, we respectfully disagree that “this assumption has not been examined.” Although we did not undertake this analysis here, we have in the past, where we have shown that known TFBS motifs can be recovered from sets of SCRMshaw predictions (e.g., Kazemian et al. 2014 Genome Biology and Evolution). We return to this point when we address the Comments to Authors, below.

      Reviewer #3 (Public Review): 

      Weaknesses:  

      The rates of predicted true positive enhancer identification vary widely across the genomes included here based on the simulations and comparison to datasets of accessible chromatin in a manner that doesn't map neatly onto phylogenetic distance. At this point, it is unclear why these patterns may arise, although this may become more clear as regulatory annotation is undertaken for more genomes. 

      We agree that we do not see clear patterns with respect to phylogenetic distance in our results. However, we note that this initial data set is still fairly small, and not carefully phylogenetically distributed. We are hoping that, as the reviewer suggests, some of these questions become more clear as we add more genomes to our analysis. Fortunately, the list of available genomes with chromosome-level assembly is growing rapidly, and as we move ahead we should have much greater ability to choose informative species.

      Functional assessment of predicted enhancers was performed through reporter gene assays primarily in Drosophila melanogaster imaginal discs, a system amenable to transgenics. Unfortunately, this mode of canonical imaginal disc development is only representative of a subset of all holometabolous insects; therefore, it is difficult to interpret reporter gene expression in a fly imaginal disc as evidence of a true positive enhancer that would be active in its native species whose adult appendages develop differently through the larval stage (for example, Coleopteran and Lepidopteran legs). However, the reporter gene assays from other tissues do offer strong evidence of true positive enhancer detection, and constraints on transgenic experiments in other systems mean that this approach is the best available. 

      Please see an extensive discussion of this point in our response to Reviewer 3, below.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors): 

      Major Concerns: 

      (1) While the GitHub source code for SCRMshaw is provided, the authors do not provide a repository of manuscriptspecific code and scripts for readers. This is a barrier to reproducibility and the code used to perform the analysis should be made available. Additionally, links to available scripts do not work, see Line 690. Post-processing scripts point to a general lab folder, but again, no specific analysis or code is sourced for the work in this specific manuscript (e.g. Line 637). 

      As noted above, we have corrected this oversight and established a specific GitHub repository for this manuscript “Asma_etal_2024_eLife” (https://github.com/HalfonLab/Asma_etal_2024_eLife). 

      (2) On lines 479-488, there is a discussion about the annotations being provided on REDfly, though no link is provided. 

      We have included a link in the text at this point (now line 515).

      Additionally, for transparency, it would be valuable to provide in Supplementary Table 1 the genomic coordinates of the original training sets in addition to their identity. 

      These coordinates have been added to Supplementary Table 1 as suggested.

      Also, it is suggested to provide genomic coordinates of the predicted enhancers for each training set across all species, perhaps with a column denoting a linked ID of one genomic coordinate in a species to another species (i.e. if there is a linked region found from D. melanogaster to J. coenia, labeling this column in both coordinate sets as blastoderm.mapping1_region1). Providing these annotations directly in the work enhances the transparency of the results. 

      We are unsure exactly what the reviewer means here by “a linked region.” It is critical to understanding our approach to recognize that the genome sequences have diverged to the point where there is no alignment of non-coding regions possible. Thus there is no way to directly “link” coordinates of a predicted enhancer from one species to those of a predicted enhancer in another species. The coordinates for each prediction are available on a per-species basis either through the database or in the files now available in the linked Dryad repository; these can be filtered for results from a specific training set. The database will allow users to select all results for a given orthologous locus, from any subset of species. More complex searches will continue to become available as we improve functionality of the database, an ongoing project in collaboration with the REDfly team.

      (3) Figure 2B: It is unclear what this figure shows. Are the No Fly Orthologs false positives, Orthology pipeline issues, or interesting biology? 

      We have clarified this in the Figure 2 legend. “No Mapped Fly Orthologs” indicates that our orthology mapping pipeline did not identify clear D. melanogaster orthologs. For any given gene, this could reflect either a true lack of a respective ortholog, or failure of our procedure to accurately identify an existing ortholog.

      (4) SCRMshaw appears to be a versatile tool, previously published in a variety of works. However, in this manuscript, there is little discussion of the sensitivity of SCRMshaw to different initial parameters, how the selection of training loci can impact outcomes, or how SCRMshaw k-mer discovery methods compare to other similar tools.

      - This paper would be strengthened by addressing this weakness. Some specific suggestions below: 

      In order to strengthen confidence that SCRMshaw is a reliable predictor of enhancer regions in other species, it is suggested that you benchmark against other k-mer-derived methods to assign enhancers, such as GSK-SVM developed by the Beer Lab in 2016  (https://www.beerlab.org/gkmsvm/, https://www.biorxiv.org/content/10.1101/2023.10.06.561128v1). 

      We have established the effectiveness of SCRMshaw as an enhancer discovery method in previous work, and the main goal of this study was to make use of the established method to annotate numerous insect genomes as a community resource. Our claim here is that SCRMshaw works well for this purpose; we do not attempt a strong claim about whether other approaches may work equally well or marginally better (although we do not believe this is the case, based on prior work). Benchmarking enhancer discovery is challenging, as we point out in Asma et al. 2019 (BMC Bioinformatics), and, while important, best left for a dedicated comprehensive study. A major problem is that there are no independent objective “truth” sets for enhancers from the various species we interrogate here. Thus, while we could also run, e.g., GSK-SVM, what criteria would we use to establish which method had better accuracy for a given species? Note that the work from Beer’s lab took advantage of the ability to match human-mouse orthologous (or syntenic) regions and available open-chromatin data to assess whether conserved enhancers were discovered, but this is not possible given the degree of divergence, limited synteny, and relative lack of additional data for the insect genomes we are annotating.

      - In Table S1, we see that 7-146 regions are used as training sets, which is a huge variety. Does an increase in training set size provide a greater "rate of return" for predicted regions? Is the opposite true? Addressing this question would allow readers to understand if they wish to use SCRMshaw, a reasonable scope for their own training region selections. 

      - Within a training set, does subsampling provide the same outcomes in terms of prediction rates? There is no exploration of how "brittle" the training sets are, and whether the generalized k-mer count distributions that are established in a training set are consistent across randomly selected subgroups. Performing this analysis would raise confidence in the method applied and the resulting annotations. 

      These are interesting and important questions, but again we feel they are beyond the scope of this particular study, which is focused primarily on using SCRMshaw and not on optimizing various search parameters. That said, this is of course something we have investigated, although as with other aspects of enhancer discovery, the absence of a true gold standard enhancer set makes evaluation difficult. We have not found a clear correlation between training set size and performance beyond the very general finding that performance appears to be best when training set size is moderate, e.g. 20-40 initial enhancers. We suspect that larger training sets often contain too many members that don’t fit the core regulatory model and thus add noise, whereas sets that are too small may not contain enough signal for best performance (although small sets can still be useful, especially if used in an iterative cycle; see Weinstein et al. 2023 PLoS Genetics). However, establishing this rigorously is highly challenging given the limitations with assessing true and false positive rates at scale.

      (5) In Figure 2C, when plotting hexMCD, IMM, pacRC, and then the merged set, it is unclear whether the scorespecific bar allows coordinate redundancy, though this is implied. What might be more useful is a revision of this plot where the hexMCD/IMM/pac-RC-specific loci are plotted, with the merged set alongside as is currently reported. This would give the reader a clearer understanding of the variability between these scoring methods and why this variability occurs. 

      We have added the breakdowns between IMM, hexMCD, and pacRC in Supplementary Table S2, and made more complete reference to this in the text (lines 682ff). Both the database and the data files in the Dryad repository allow exploration of the overlap between the different methods and contain both separate and merged (for overlap and redundancy) results.

      Additionally, there is no information in the Methods section of these three SCRMshaw scores and what they represent, even colloquially. While SCRMshaw has been applied in several papers previously, it would help with scientific clarity to describe in a sentence or two what each score is meant to represent and why one is different from another. 

      We had chosen to err on the side of brevity given prior publication of the SCRMshaw methodology, but we recognize now that we went too far in that direction. We have added more complete descriptions of the methods in both the Results (lines 164-167) and the Methods (lines 667-681) sections.

      (6) When describing results in Figure 2, an important question arises: "Is there an anti-correlation between the number of predicted regions and evolutionary distance?" This would be an expected result that could complement Figure 4's point that shared orthology across 16 species is rarer than across 10 species. Visualizing and adding this to Figure 2 or Figure 4 would be a powerful statement that would boost confidence in the returned predicted enhancers and/or orthologous regions. 

      This is an important question and one in which we are very interested. Unfortunately, we do not have sufficient data at this time to address this proper statistical rigor. As we remarked above in response to Reviewer 3, “We agree that we do not see clear patterns with respect to phylogenetic distance in our results. However, we note that this initial data set is still fairly small, and not carefully phylogenetically distributed. We are hoping that, as the reviewer suggests, some of these questions become more clear as we add more genomes to our analysis. Fortunately, the list of available genomes with chromosome-level assembly is growing rapidly, and as we move ahead we should have much greater ability to choose informative species.”

      (7) In Figure 3, the authors seek to convey that SCRMshaw predicts enhancer regions that are mapped nearby one another, across different loci widths, and that this occurrence of nearby predicted regions occurs more than a randomly selected control. This is presumably meant to validate that SCRMshaw is not providing predictions with low specificity, but rather to highlight the possibility that SCRMshaw is identifying groups of shadow enhancers. However, these plots are extremely difficult to decipher and do not strongly support the claims due to the low resolution and difficult interpretability of the boxplot interquartile distributions.

      Additionally, as the majority of predicted regions are around ~750bp, how does that address loci groups of <1000bp? This suggests that predicted regions are overlapping, and therefore cannot be meaningfully interpreted as shadow enhancers. This plot should either be moved to the supplements or reworked to more effectively convey the point that "SCRMshaw is detecting predicted regions that are proximal to one another and that this proximity is not due to chance". 

      - A suggestion to rework this plot is to change this instead to a bar plot, where the y-axis instead represents "number of predictions with at least 2 predicted regions proximal to one another" divided by "total number of predictions", separating bar color by simulated/observed values. The x-axis grouping can remain the same. Because this plot is a broad generalization of the statement you're trying to make above, knowing whether a few loci have 2 versus 4 proximal predicted enhancers doesn't enhance your point. 

      We agree with the reviewer that these are not the clearest plots, and thank them for the suggestions regarding revision. We tried many variations on visualizing these complex data, including those suggested by the reviewer, and have concluded that despite their weaknesses, these plots are still the best visualization. The main problem is that the observed data cluster heavily around zero, so that the box plots are very squat and mainly only the outlier large values are observed. The key point, however, is that the expected values almost never give values much greater than one, so that the observed outlier points are the only points seen in the upper ranges of the y-axis. This is true across the three species, across the bins of locus sizes, and across training sets (averaged into the box plots). The reviewer is correct as well about the bins where locus size is < 1000. However, inspection of the data shows that this is not a large concern, as very few data points lie in this range and we never see multiple predicted enhancers there. Thus we believe while not the prettiest of graphs, Figure 3 does effectively support the claims made in the text. In keeping with our view that it is preferable to have data in the main paper whenever possible, we choose to keep the figure in place rather than move it to the Supplement.

      - Label the species for the reader's understanding of each subplot on the plot. 

      We apologize for this oversight and have now labeled each plot with its relevant species.

      (8) SCRMshaw operates on k-mer count distributions compared to a genomic background across different species, allowing it to assign predicted regions without prior knowledge of an organism's cis-regulatory sequences. This is powerful and boosts the versatility of the method. However, understanding the cis-regulatory origins of the kinds of kmers that are driving the detection of orthologous regions across species is crucial and absolutely within the scope of the paper, particularly for the justification of the provided annotations. Is SCRMshaw making use of enriched motifs within the training region set to assign regions in other species? One would presume so, but it is necessary to show this. There are many motif discovery tools that are readily available and require little up-front knowledge and little to no use of a CLI, such as MEMESuite (https://meme-suite.org/meme/tools/meme). It is highly recommended that, even for a few training pairs that are well understood (e.g. mesoderm.mapping1, dorsal_ectoderm.mapping1), assess the motif enrichment within the original sequence set, then see whether motif enrichments are reflected in the predicted enhancers. As evolutionary distance increases between D. melanogaster and the species of interest, is the assignment of enriched motifs more sparse? Is there a loss of a key motif? These are the kinds of questions that will allow readers to understand how these annotations are assigned as well as boost confidence in their usage. 

      This is a very important point and a subject of significant interest to us. We have demonstrated in earlier work (e.g., Kazemian et al. 2014 Genome Biol. Evol.) that SCRMshaw-predicted enhancers do contain expected TFBS motifs, across multiple species—and that even an overall arrangement of sites is sometimes conserved. Thus we have previously answered, in part, the reviewer’s question. 

      What we also learned from our previous work is that filtering out relevant motifs from the noise inherent in motif-finding is both arduous and challenging. As the reviewer is no doubt aware, while using motif discovery tools is simple, interpreting the output is much less so. In response to the reviewer’s comments, we revisited this issue with data from a small sample of training sets. We can discover motifs; we can see that the motif profiles are different between different training sets; and we can observe the presence of expected motifs based on the activity profile of the enhancers (e.g., Single-minded binding sites in our mesectoderm/midline training and result data). However, to do this cleanly and with appropriate statistical rigor is beyond what we feel would be practical for this paper. We hope to return to this important question in the future when we have a larger and phylogenetically more evenly-distributed set of species, and the time and resources to address it appropriately.

      (9) Figures 5-7 need to have better descriptions. 

      We have added to the figure 6 and 7 legends in response to this comment; please note as well that there is substantial detail provided in the text. If there are specific aspects of the figures that are not clear or which lack sufficient description, we are happy to make additional changes.

      Minor Concerns 

      (1)  In Figure 1A, it is implied that "k-mer count distributions" are actually only "5-mer count distributions". However, in the published documentation of SCRMshaw, it is suggested that k-mers between 1-6 bp are involved in establishing sequence distributions. Please add a justification for the selection of these criteria. It would be helpful to understand the implications of using up to a 3-mer versus a 12-mer when assessing k-mer counts using SCRMshaw.

      We have clarified in the Figure 1 legend that this is just an example, and the k-mers of different sizes are used in the IMM method; we have also increased the description of the basic method in the Methods section. To be clear, the hexMCD sub-method is 6-mer based (5th-order Markov chain), as is pacRC, while the IMM method considers Markov chains of orders 0-5.

      (2) Control the y-axis to remove white space from Figure 2D. 

      We have amended the figure as suggested.

      Additionally, expand in the manuscript on expected results from SCRMshaw. Given training regions of 750 bp, is the expectation that you return predicted enhancers of the same length? This is not explicitly stated, only a description of outliers. 

      The scoring is not dependent on the length of the training sequences, and there is no direct expectation of predicted enhancer length. Scores are calculated on 10-bp intervals, and a peak-calling algorithm is used to determine the endpoints of each prediction based on where the scores drop below a cutoff value. Thus there is no explicit minimum prediction length beyond the smallest possible length of 10-bp. That said, the initial scoring takes place over a 500-bp sequence window (for reasons of computational efficiency), which does influence scores away from the smaller end of the possible range. We correct for this in part by reducing scores below a certain threshold to zero, to prevent multiple low-scoring regions from combining to give a low but positive score over a long interval. Indeed, we found that in the original version of SCRMshawHD (Asma et al. 2019), multiple low-scoring but above-threshold intervals would get concatenated together in broad peaks, leading to an unrealistically large average prediction length. In the version used here, described in Supplementary Figure S6, low-scoring windows are now first reset to zero and a new threshold is calculated before overlapping scores are summed. This helps to prevent the broad peak problem, and we find that it results in a median prediction length ~750 bp, more in line with expected enhancer sizes.

      Reviewer #3 (Recommendations For The Authors): 

      Line 161: Given that the SCRMshaw HD method is the basis for the pipeline, the methodology deserves at least an "in brief" recapitulation in this manuscript. 

      As we remark in our response to Reviewer 2, above, “We had chosen to err on the side of brevity given prior publication of the SCRMshaw methodology, but we recognize now that we went too far in that direction. We have added more complete descriptions of the methods in both the Results (lines 164-167) and the Methods (lines 667-681) sections.” 

      Line 219: Throughout the reporting of the results, there appeared to be a bit of inconsistency/potential typos regarding whether threshold or exact P values were reported. In lines 219, 222, 265, 696, and 811, the reported values seem to clearly be thresholds (< a standard cutoff), while in lines 291,293, 297,300, values appear to be exact but are reported as thresholds (<). 

      This is not an error but rather reflects two different types of analysis. The predictions per locus (originally lines 219, 222 etc) are evaluated using an empirical P-value based on 1000 permutations. As such, they are thresholded at 1/1000. The overlap with open chromatin regions, on the other hand, are based on a z-score with the P-values taken from a standard conversion of z-scores to P-values.

      Page 13/Table 2: At face value, it seems surprising that the overlap between Dmel SCRMshaw predictions with open chromatin is so much smaller than the overlap between predictions and open chromatin in other species, both in raw % (Tcas, D plexippus, H. himera) and fold enrichment (Tcas), given that the training sets for SCRMshaw are all derived from Dmel data. The discussion here does not touch on this aspect of the results, and the interpretation of this approach, in general, would be strengthened if the authors could comment on potential reasons why this pattern may be arising here, or at least acknowledge that this is an open question.

      There are many variables at play here, as the data are from different species, from different tissues, and from different methods. Thus we think it is difficult to read too much into the precise results from these comparisons—the main take-home is really just that there is a significant amount of overlap. In acknowledgment of this, we have slightly modified the text in this section so that it now notes (line 302ff): “These comparisons are imperfect, as the tissues used to obtain the chromatin data do not precisely correspond to the training sequences used for SCRMshaw, and the data were obtained using a variety of methods.”

      Line 318-329: The inferences from the reporter gene assay deserve a more nuanced treatment than they are given here. The important nuance that was not addressed by the discussion here is that the imaginal disc mode of development in Drosophila is not broadly representative of the development of larval/adult epithelial tissues across Holometabola; thus, inference of a true positive validation becomes complicated in cases where predicted enhancers from a species were tested and shown to drive expression in a fly imaginal disc that the native species have no direct disc counterpart to. For example, in line 388 a Tcas enhancer is reported to drive expression in the eye-antennal disc, and in lines 404 and 423 additional Tcas enhancers were reported to drive expression in the leg discs; however, Tribolium larvae do not possess antennal discs or leg discs set aside during embryogenesis in the sense that flies do - instead the homologous epithelial tissues form larval antennae and larval legs external to the body wall that are actively used at this life stage and are starkly different in morphology than an internally invaginated epithelial disc, that will directly give rise to adult tissues in subsequent molts. Is the interpretation of an expression pattern driven in a fly disc as a true positive really as straightforward as it was presented here, when in the native species the expression pattern driven by the enhancer in question would be in the context of an extremely different tissue morphology? That said, I understand and am deeply sympathetic to the constraints on the authors in performing transgenic experiments outside of the model fly; but these divergent modes of development across Holometabola deserve a mention and nuance in the interpretation here. 

      This is indeed a very important point, and we greatly appreciate Reviewer 3 pointing out this caveat when interpreting the outcomes of our cross-species reporter assay. Reviewer 3 is correct that the imaginal disc mode of adult tissue (i.e. imaginal) development found in Diptera does not represent the imaginal development across Holometabola. 

      In fact, imaginal development is quite diverse among Holometabola. For instance, larval leg and antennal cells appear to directly develop into the adult legs and antennae in Coleoptera (i.e. primordial imaginal cells function as larval appendage cells), while some cells within the larval legs and antennae are set aside during larval development specifically for adult appendages in Lepidopteran species (i.e. imaginal cells exist within the larval appendages but do not contribute to the formation of larval appendages). In contrast, an almost entire set of cells that develop into adult epithelia are set aside as imaginal discs during embryogenesis in Diptera. Furthermore, the imaginal disc mode of development appears to have evolved independently in

      Hymenoptera. Therefore, determining how imaginal primordial tissues correspond to each other among Holometabola has been a challenging task and a topic of high interest within the evo-devo and entomology communities.

      Nevertheless, despite these differences in mode of imaginal development, decades of evo-devo studies suggest that the gene regulatory networks (GRNs) operating in imaginal primordial tissues appear to be fairly well conserved among holometabolan species (for example, see Tomoyasu et al. 2009 regarding wing development and Angelini et al. 2012 regarding leg development between flies and beetles). These outcomes imply that a significant portion of the transcriptional landscape might be conserved across different modes of imaginal development. Therefore, an enhancer functioning in the Tribolium larval leg tissue (which also functions as adult leg primordium) could be active even in the leg imaginal disc of Drosophila, if the trans factors essential for the activation of the enhancer are conserved between the two imaginal tissues. 

      That being said, we fully expect there to be both false negative and false positive results in our cross-species reporter assay. We are optimistic about the biological relevance of the positive outcomes of our crossspecies reporter assay, especially when the enhancer activity recapitulates the expression of the corresponding gene in Drosophila (for example, Am_ex Fig6B and Tc_hth Fig7B). Nonetheless, the biological relevance of these enhancer activities needs to be further verified in the native species through reporter assays, enhancer knock-outs, or similar experiments.

      In recognition of the Reviewer’s important point, we added the following caveat in our Discussion (lines 549553): “Furthermore, the unique imaginal disc mode of adult epithelial development in D. melanogaster  might have prevented some enhancers of other species from working properly in D. melanogaster imaginal discs, likely producing additional false negative results. Evaluating enhancer activities in the native species will allow us to address the degree of false negatives produced by the cross-species setting.” We moreover mention this caveat in the Results section when we first introduce the reporter assays (line 342).

      Line 580: This is the first time that the weakness of the closest-gene pairing approach is mentioned. This deserves mention earlier in the manuscript, as unfortunately, this is one of the major bottlenecks to this and any other approaches to investigating enhancer function. Could the authors address this earlier, perhaps pages 7-8, and provide citations for current understanding in the field of how often closest-gene pairing approaches correctly match enhancers to target genes? 

      We have added text as suggested on p.7-8 acknowledging the shortcomings of the closest-gene approach. We also clarify at the end of that section (lines 173-181) that target gene assignments, while useful for interpretation, have no bearing on the enhancer predictions themselves (which are generated prior to the target gene assignment steps).

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public review):

      Summary:

      The authors demonstrate impairments induced by a high cholesterol diet on GLP-1R dependent glucoregulation in vivo as well as an improvement after reduction in cholesterol synthesis with simvastatin in pancreatic islets. They also map sites of cholesterol high occupancy and residence time on active versus inactive GLP-1Rs using coarse-grained molecular dynamics (cgMD) simulations and screened for key residues selected from these sites and performed detailed analyses of the effects of mutating one of these residues, Val229, to alanine on GLP-1R interactions with cholesterol, plasma membrane behaviour, clustering, trafficking and signalling in pancreatic beta cells and primary islets, and describe an improved insulin secretion profile for the V229A mutant receptor.

      These are extensive and very impressive studies indeed. I am impressed with the tireless effort exerted to understand the details of molecular mechanisms involved in the effects of cholesterol for GLP-1 activation of its receptor. In general the study is convincing, the manuscript well written and the data well presented.

      Some of the changes are small and insignificant which makes one wonder how important the observations are. For instance in figure 2 E (which is difficult to interpret anyway because the data are presented in percent, conveniently hiding the absolute results) does not show a significant result of the cyclodextrin except for insignificant increases in basal secretion. That is not identical to impairment of GLP-1 receptor signaling!

      We assume that the reviewer refers to Fig. 1E, where we show the percentage of insulin secretion in response to 11 mM glucose +/- exendin-4 stimulation in mouse islets pretreated with vehicle or MβCD loaded with 20 mM cholesterol. While we concur with the reviewer that the effect in this case is triggered by increased basal insulin secretion at 11 mM glucose, exendin-4 can no longer compensate for this increase by proportionally amplifying insulin responses in cholesterol-loaded islets, leading to a significantly decreased exendin-4-induced insulin secretion fold increase under these circumstances, as shown in Fig. 1F. We interpret these results as a defect in the GLP-1R capacity to amplify insulin secretion beyond the basal level to the same extent as in vehicle conditions. An alternative explanation is that there is a maximum level of insulin secretion in our cells, and 11 mM glucose + exendin-4 stimulation gets close to that value. With the increasing effect of cholesterol-loaded MβCD on basal secretion at 11 mM glucose, exendin-4 stimulation appears as working less well. A simple experiment to rule out this possibility would be to test insulin secretion following KCl stimulation under these conditions to determine if maximal stimulation has been reached or not. We will perform this control experiment in the revised manuscript to clarify this point. We will also include absolute insulin results as well as percentages of secretion to improve the completeness of the report.

      To me the most important experiment of them all is the simvastatin experiment, but the results rest on very few numbers and there is a large variation. Apparently, in a previous study using more extensive reduction in cholesterol the opposite response was detected casting doubt on the significance of the current observation. I agree with the authors that the use of cyclodextrin may have been associated with other changes in plasma membrane structure than cholesterol depletion at the GLP-1 receptor.

      We agree with the reviewer that the insulin secretion results in vehicle versus LPDS/simvastatin treated mouse islets (Fig. 1H, I) are relatively variable and we therefore plan to perform further biological repeats of this experiment for the paper revision to consolidate our current findings. 

      The entire discussion regarding the importance of cholesterol would benefit tremendously from studies of GLP-1 induced insulin secretion in people with different cholesterol levels before and after treatment with cholesterol-lowering agents. I suspect that such a study would not reveal major differences.

      We agree with the reviewer that such study would be highly relevant. While this falls outside the scope of the present paper, we encourage other researchers with access to clinical data on GLP-1RA responses in individuals taking cholesterol lowering agents to share their results with the scientific community. We will highlight this point in the paper discussion to emphasise the importance of more research in this area.

      Reviewer #2 (Public review):

      Summary:

      In this manuscript the authors provided a proof of concept that they can identify and mutate a cholesterol-binding site of a high-interest class B receptor, the GLP-1R, and functionally characterize the impact of this mutation on receptor behavior in the membrane and downstream signaling with the intent that similar methods can be useful to optimize small molecules that as ligands or allosteric modulators of GLP-1R can improve the therapeutic tools targeting this signaling system.

      Strengths:

      The majority of results on receptor behavior are elucidated in INS-1 cells expressing the wt or mutant GLP-1R, with one experiment translating the findings to primary mouse beta-cells. I think this paper lays a very strong foundation to characterize this mutation and does a good job discussing how complex cholesterol-receptor interactions can be (ie lower cholesterol binding to V229A GLP-1R, yet increased segregation to lipid rafts). Table 1 and Figure 9 are very beneficial to summarize the findings. The lower interaction with cholesterol and lower membrane diffusion in V229A GLP-1R resembles the reduced diffusion of wt GLP-1R with simv-induced cholesterol reductions, although by presumably decreasing the cholesterol available to interact with wt GLP-1R. This could be interesting to see if lowering cholesterol alters other behaviors of wt GLP-1R that look similar to V229A GLP-1R. I further wonder if the authors expect that increased cholesterol content of islets (with loading of MβCD saturated with cholesterol or high-cholesterol diets) would elevate baseline GLP-1R membrane diffusion, and if a more broad relationship can be drawn between GLP-1R membrane movement and downstream signaling.

      Membrane diffusion experiments are difficult to perform in intact islets as our method requires cell monolayers for RICS analysis. We do however agree that it would be interesting to perform further RICS analysis in INS-1 832/3 SNAP/FLAG-hGLP-1R cells pretreated with vehicle or MβCD loaded with 20 mM cholesterol, and we will therefore add this experiment to the paper revisions.

      Weaknesses:

      I think there are no obvious weaknesses in this manuscript and overall, I believe the authors achieved their aims and have demonstrated the importance of cholesterol interactions on GLP-1R functioning in beta-cells. I think this paper will be of interest to many physiologists who may not be familiar with many of the techniques used in this paper and the authors largely do a good job explaining the goals of using each method in the results section.

      The intent of some methods, for example the Laurdan probe studies, are better expanded in the discussion.

      To clarify the intent of the Laurdan experiments early in the manuscript, we will add the following text to the methods section in the paper revisions: “Laurdan, 6-dodecanoyl-2-dimethylaminonaphthalene (product D250) was purchased from ThermoFisher.  Laurdan (40 μM) was excited using a 405 nm solid state laser and SNAP/FLAG-hGLP-1R labelled with SNAP-Surface Alexa Fluor 647 with a pulsed (80 MHz) super-continuum white light laser at 647 nm. Laurdan emission was recorded in the ranges of 420–460 nm (IB) and 470–510 nm (IR), and the general polarisation (GP) formula (GP = IB-IR/IB+IR) used to retrieve the relative lateral packing order of lipids at the plasma membrane. Values of GP vary from 1 to −1, where higher numbers reflect lower fluidity or higher lateral lipid order, whereas lower numbers indicate increasing fluidity.”

      I found it unclear what exactly was being measured to assess 'receptor activity' in Fig 7E and F. 

      Figs. 7E and F refer to bystander complementation assays measuring the recruitment of nanobody 37 (Nb37)-SmBiT, which binds to active Gas, to either the plasma membrane (labelled with KRAS CAAX motif-LgBiT), or to endosomes (labelled with Endofin FYVE domain-LgBiT) in response to GLP-1R stimulation with exendin-4. This assay therefore measures GLP-1R activation specifically at each of these two subcellular locations. We will add a schematic of this assay to the methods section in the paper revisions to clarify the aim of these experiments.

      Certainly many follow-up experiments are possible from these initial findings and of primary interest is how this mutation affects insulin homeostasis in vivo under different physiological conditions. One of the biggest pathologies in insulin homeostasis in obesity/t2d is an elevation of baseline insulin release (as modeled in Fig 1E) that renders the fold-change in glucose stimulated insulin levels lower and physiologically less effective. No difference in primary mouse islet baseline insulin secretion was seen here but I wonder if this mutation would ameliorate diet-induced baseline hyperinsulinemia.

      We concur with the reviewer that it would be interesting to determine the effects of the GLP-1R V229A mutation on insulin secretion responses under diet-induced metabolic stress conditions. While performing in vivo experiments on glucoregulation in mice harbouring the V229A mutation falls outside the scope of the present study, in the paper revisions we will include ex vivo insulin secretion experiments in islets from GLP-1R KO mice transduced with adenoviruses expressing SNAP/FLAG-hGLP-1R WT or V229A and subsequently treated with vehicle versus MβCD loaded with 20 mM cholesterol to replicate the conditions of Fig. 1E.

      I would have liked to see the actual islet cholesterol content after 5wks high-cholesterol diet measured to correlate increased cholesterol load with diminished glucose-stimulated inulin. While not necessary for this paper, a comparison of islet cholesterol content after this cholesterol diet vs the more typical 60% HFD used in obesity research would be beneficial for GLP-1 physiology research broadly to take these findings into consideration with model choice.

      We will include these data and compare islet cholesterol levels after the high cholesterol diet with those of HFD-fed mouse islets in the paper revisions.

      Another area to further investigate is does this mutation alter ex4 interaction/affinity/time of binding to GLP-1 or are all of the described findings due to changes in behavior and function of the receptor?

      To answer this question, we will perform exendin-4 binding affinity experiments in INS-1 832/3 SNAP/FLAG-hGLP-1R WT versus V229A cells for the paper revisions.

      Lastly, I wonder if V229A would have the same impact in a different cell type, especially in neurons? How similar are the cholesterol profiles of beta-cells and neurons? How this mutation (and future developed small molecules) may affect satiation, gut motility, and especially nausea, are of high translational interest. The comparison is drawn in the discussion between this mutation and ex4-phe1 to have biased agonism towards Gs over beta-arrestin signaling. Ex4-phe1 lowered pica behavior (a proxy for nausea) in the authors previously co-authored paper on ex4-phe1 (PMID 29686402) and I think drawing a parallel for this mutation or modification of cholesterol binding to potentially mitigate nausea is worth highlighting.

      While experiments in neurons are outside the scope of the present study, we will add this worthy point to the discussion and hypothesise on possible effects of the V229A mutation on central GLP-1R effects in the revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      We thank the reviewers and the editorial team for a thoughtful and constructive assessment. We appreciate all comments, and we try our best to respond appropriately to every reviewer’s queries below. It appears to us that one main worry was regarding appropriate modelling of the complex and rich structure of confounding variables in our movie task. 

      One recent approach fits large feature vectors that include confounding variables along the variable(s) of interest to the activity of each voxel in the brain to disentangle the contributions of each variable to the total recorded brain response. While these encoding models have yielded some interesting results, they have two major drawbacks which makes using them unfeasible for our purposes (as we explain in more detail below): first, by fitting large vectors to individual voxels, they tend to over-estimate effect size; second, they are very ineffective at unveiling group-level effects due to high variability between subjects. Another approach able to deal with at least the second of these worries is “inter-subject-correlation”. In this technique brain responses are recorded from multiple subjects while they are presented with natural stimuli. For each brain area, response time courses from different subjects are correlated to determine whether the responses are similar across subjects. Our “peak and valley” analysis is a special case of this analysis technique, as we explain in the manuscript and below. 

      For estimating individual-level brain-activation, we opted for an approach that adapts a classical method of analysing brain data – convolution - to naturalistic settings. Amplitude modulated deconvolution extends classical brain analysis tools in several ways to handle naturalistic data:

      (1) The method does not assume a fixed hemodynamic response function (HRF). Instead, it estimates the HRF over a specified time window from the data, allowing it to vary in amplitude based on the stimulus. This flexibility is crucial for naturalistic stimuli, where the timing and nature of brain responses can vary widely. 

      (2) The method only models the modulation of the amplitude of the HRF above its average with respect to the intensity or characteristics of the stimulus. 

      (3) By allowing variation in the response amplitude, non-linear relationships between the stimulus and brain-response can be captured. 

      It is true that amplitude modulated deconvolution does not come without its flaws – for example including more than a few nuisance regressors becomes computationally very costly. Getting to grips with naturalistic data (especially with fMRI recordings) continuous to be an active area of research and presents a new and exciting challenge. We hope that we can convince reviewers and editors with this response and the additional analyses and controls performed, that the evidence presented for the visual context dependent recruitment of brain areas for abstract and concrete conceptual processing is not incomplete. 

      Overview of Additional Analyses and Controls Performed by the Authors:

      (1) Individual-Level Peaks and Valleys Analysis (Supplementary Material, Figures S3, S4, and S5)

      (2) Test of non-linear correlations of BOLD responses related to features used in the Peak and Valley Analysis (Supplementary Material, Figures S6, S7)

      (3) Comparison of Psycholinguistic Variables Surprisal and Semantic Diversity between groups of words analysed (no significant differences found)  

      (4) Comparison of Visual Variables Optical Flow, Colour Saturation, and Spatial Frequency for 2s Context Window between groups of words analysed (no significant differences found)

      These controls are in addition to the five low-level nuisance regressors included in our model, which are luminance, loudness, duration, word frequency, and speaking rate (calculated as the number of phonemes divided by duration) associated with each analysed word. 

      Public Reviews:

      Reviewer #1 (Public Review):

      Peaks and Valleys Analysis: 

      (1) Doesn't this method assume that the features used to describe each word, like valence or arousal, will be linearly different for the peaks and valleys? What about non-linear interactions between the features and how they might modulate the response? 

      Within-subject variability in BOLD response delays is typically about 1 second at most (Neumann et al., 2003). As individual words are presented briefly (a few hundred Ms at most) and the BOLD response to these stimuli falls within that window (1s/TR), any nonlinear interactions between word features and a participant’s BOLD response within that window are unlikely to significantly affect the detection of peaks and valleys.

      To quantitatively address the concern that non-linear modulations could manifest outside of that window, we include a new analysis in Figure S6, which compares the average BOLD responses of each participant in each cluster and each combination of features, showing that only a very few of all possible comparisons differ significantly from each other (~ 5000 combinations of features were significantly different from each other given an overall number of ~130.000 comparisons between BOLD responses to features, which amounts to 3.85%), suggesting that there are no relevant non-linear interactions between features. For a full list of the most non-linearly interacting features see Figure S7. 

      (2) Doesn't it also assume that the response to a word is infinitesimal and not spread across time? How does the chosen time window of analysis interact with the HRF? From the main figures and Figures S2-S3 there seem to be differences based on the timelag. 

      The Peak and Valley (P&V) method does not assume that the response to a word is infinitesimal or confined to an instantaneous moment. The units of analysis (words) fall within one TR, as they are at most hundreds of Ms long – for this reason, we are looking at one TR only. The response of each voxel at that TR will be influenced by the word of interest, as well as all other words that have been uttered within the 1s TR, and the multimodal features of the video stimulus that fall within that timeframe. So, in our P&V, we are not looking for an instantaneous response but rather changes in the BOLD signal that correspond to the presence of linguistic features within the stimuli. 

      The chosen time window of analysis interacts with the human response function (HRF) in the following way: the HRF unfolds over several seconds, typically peaking around 5-6 seconds after stimulus onset and returning to baseline within 20-30 seconds (Handwerker et al., 2004).

      Our P&V is designed to match these dynamics of fMRI data with the timing of word stimuli. We apply different lags (4s, 5s, and 6s) to account for the delayed nature of the HRF, ensuring that we capture the brain's response to the stimuli as it unfolds over time, rather than assuming an immediate or infinitesimal effect. We find that the P&V yields our expected results for a 5s and a 6s lag, but not a 4s lag. This is in line with literature suggesting that the HRF for a given stimulus peaks around 5-6s after stimulus onset (Handwerker et al., 2004). As we are looking at very short stimuli (a few hundred ms) it makes sense that the distribution of features would significantly change with different lags. The fact that we find converging results for both a 5s and 6s lag, suggests that the delay is somewhere between 5s and 6s. There is no way of testing this hypothesis with the resolution of our brain data, however (1 TR). 

      (3) Were the group-averaged responses used for this analysis? 

      Yes, the response for each cluster was averaged across participants. We now report a participant-level overview of the Peak and Valley analysis (lagged at 5s) with similar results as the main analysis in the supplementary material see Figures S3, S4, and S5.

      (4) Why don't the other terms identified in Figure 5 show any correspondence to the expected categories? What does this mean? Can the authors also situate their results with respect to prior findings as well as visualize how stable these results are at the individual voxel or participant level? It would also be useful to visualize example time courses that demonstrate the peaks and valleys. 

      The terms identified in figure 5 are sensorimotor and affective features from the combined Lancaster and Brysbaert norms. As for the main P&V analysis, we only recorded a cluster as processing a given feature (or term) when there were significantly more instances of words highly rated in that dimension occurring at peaks rather than valleys in the HRF. For some features/terms, there were never significantly more words highly rated on that dimension occurring at peaks compared to valleys, which is why some terms identified in figure 5 do not show any significant clusters.  We have now also clarified this in the figure caption. 

      We situate the method in previous literature in lines 289 – 296. In essence, it is a variant of the well-known method called “reverse correlation” first detailed in Hasson et al., 2004 (reference from the manuscript) and later adapter to a peak and valley analysis in Skipper et al., 2009 (reference from the manuscript). 

      We now present a more fine-grained characterisation of each cluster on an individual participant level in the supplementary material. We doubt that it would be useful to present an actual example time-course as it would only represent a fraction of over one hundred thousand analysed time-series. We do already present an exemplary time-course to demonstrate the method in Figure 1. 

      Estimating contextual situatedness: 

      (1) Doesn't this limit the analyses to "visual" contexts only? And more so, frequently recognized visual objects? 

      Yes, it was the point of this analysis to focus on visual context only, and it may be true that conducting the analysis in this way results in limiting it to objects that are frequently recognized by visual convolutional neural networks. However, the state-of-the-art strength of visual CNNs in recognising many different types of objects has been attested in several ways (He et al., 2015). Therefore, it is unlikely that the use of CNNs would bias the analysis towards any specific “frequently recognised” objects. 

      (2) The measure of situatedness is the cosine similarity of GloVe vectors that depend on word co-occurrence while the vectors themselves represent objects isolated by the visual recognition models. Expectedly, "science" and the label "book" or "animal" and the label "dog" will be close. But can the authors provide examples of context displacement? I wonder if this just picks up on instances where the identified object in the scene is unrelated to the word. How do the authors ensure that it is a displacement of context as opposed to the two words just being unrelated? This also has a consequence on deciding the temporal cutoff for consideration (2 seconds). 

      The cosine similarity is between the GloVe vectors of the word (that is situated or displaced) and the words referring to the objects identified by the visual recognition model. Therefore, the correlation is between more than just two vectors and both correlated representations depend on co-occurrence. The cosine similarity value reported is not from a comparison between GloVe vectors and vectors that are (visual) representations of objects from the visual recognition model. 

      A word is displaced if all the identified object-words in the defined context window (2s before word-onset) are unrelated to the word (_see lines 105-110 (pg. 5); lines 371-380 pg. 1516 and Figure 2 caption). Thus, a word is considered to be displaced if _all identified objects (not just two as claimed by the reviewer) in the scene are unrelated to the word. Given a context of 60 frames and an average of 5 identified objects per frame (i.e. an average candidate set of 300 objects that could be related) per word, the bar for “displacement” is set high. We provide some further considerations justifying the context window below in our responses to reviewers 2 and 3. 

      (3) While the introduction motivated the problem of context situatedness purely linguistically, the actual methods look at the relationship between recognized objects in the visual scene and the words. Can word surprisal or another language-based metric be used in place of the visual labeling? Also, it is not clear how the process identified in (2) above would come up with a high situatedness score for abstract concepts like "truth". 

      We disagree with the reviewer that the introduction motivated the problem of context situatedness purely linguistically, as we explicitly consider visual context in the abstract as well as the introduction. Examples in text include lines 71-74 and lines 105-115. This is also reflected in the cited studies that use visual context, including Kalenine et al., 2014; Hoffmann et al., 2013; Yee & Thompson-Schill, 2016; Hsu et al., 2011. However, we appreciate the importance of being very clear about this point, so we added various mentions of this fact at the beginning of the introduction to avoid confusion.

      We know that prior linguistic context (e.g. measured by surprisal) does affect processing. The point of the analysis was to use a non-language-based metric of visual context to understand how this affects conceptual representation in naturalist settings. Therefore, it is not clear to us why replacing this with a language-based metric such as surprisal would be an adequate substitution. However, the reviewer is correct that we did not control for the influence of prior context. We obtained surprisal values for each of our words but could not find any significant differences between conditions and therefore did not include this factor in the analyses conducted.  For considerations of differences in surprisal between each of the analysed sets of words, see the supplementary material.  

      The method would yield a high score of contextual situatedness for abstract concepts if there were objects in the scene whose GloVe embeddings have a close cosine distance to the GloVe embedding of that abstract word (e.g., “truth” and “book”). We believe this comment from the reviewer is rooted in a misconception of our method. They seem to think we compared GloVe vectors for the spoken word with vectors from a visual recognition model directly (in which case it is true that there would be a concern about how an abstract concept like “truth” could have a high situatedness). Apart from the fact that there would be concerns about the comparability of vectors derived from GloVe and a visual recognition model more generally, this present concern is unwarranted in our case, as we are comparing GloVe embeddings.  

      (4) It is a bit hard to see the overlapping regions in Figures 6A-C. Would it be possible to show pairs instead of triples? Like "abstract across context" vs. "abstract displaced"? Without that, and given (2) above, the results are not yet clear. Moreover, what happens in the "overlapping" regions of Figure 3? 

      To make this clearer, we introduced the contrasts (abstract situated vs displaced and concrete situated vs displaced) that were previously in the supplementary materials in the main text (now Figure 6, this was also requested by reviewer 2). We now show the overlap between the abstract situated (from the contrast in Figure 6) with concrete across context and the overlap between concrete displaced (from the contrast in Figure 6) with abstract across context separately in Figure 7. 

      The overlapping regions of Figure 3 indicate that both concrete and abstract concepts are processed in these regions (though at different time-points). We explain why this is a result of our deconvolution analysis on page 23:  

      “Finally, there was overlap in activity between modulation of both concreteness and abstractness (Figure 3, yellow). The overlap activity is due to the fact that we performed general linear tests for the abstract/concrete contrast at each of the 20 timepoints in our group analysis. Consequently, overlap means that activation in these regions is modulated by both concrete and abstract word processing but at different time-scales. In particular, we find that activity modulation associated with abstractness is generally processed over a longer time-frame. In the frontal, parietal, and temporal lobes, this was primarily in the left IFG, AG, and STG, respectively. In the occipital lobe, processing overlapped bilaterally around the calcarine sulcus.”

      Miscellaneous comments: 

      (1) In Figure 3, it is surprising that the "concrete-only" regions dominate the angular gyrus and we see an overrepresentation of this category over "abstract-only". Can the authors place their findings in the context of other studies? 

      The Angular Gyrus (AG) is hypothesised to be a general semantic hub; therefore it is not surprising that it should be active for general conceptual processing (and there is some overlap activation in posterior regions). We now situate our results in a wider range of previous findings in the results section under “Conceptual Processing Across Context”. 

      “Consistent with previous studies, we predicted that across naturalistic contexts, concrete and abstract concepts are processed in a separable set of brain regions. To test this, we contrasted concrete and abstract modulators at each time point of the IRF (Figure 3). This showed that concrete produced more modulation than abstract processing in parts of the frontal lobes, including the right posterior inferior frontal gyrus (IFG) and the precentral sulcus (Figure 3, red). Known for its role in language processing and semantic retrieval, the IFG has been hypothesised to be involved in the processing of action-related words and sentences, supporting both semantic decision tasks and the retrieval of lexical semantic information (Bookheimer, 2002; Hagoort, 2005). The precentral sulcus is similarly linked to the processing of action verbs and motor-related words (Pulvermüller, 2005). In the temporal lobes, greater modulation occurred in the bilateral transverse temporal gyrus and sulcus, planum polare and temporale. These areas, including primary and secondary auditory cortices, are crucial for phonological and auditory processing, with implications for the processing of sound-related words and environmental sounds (Binder et al., 2000). The superior temporal gyrus (STG) and sulcus (STS) also showed greater modulation for concrete words and these are said to be central to auditory processing and the integration of phonological, syntactic, and semantic information, with a particular role in processing meaningful speech and narratives (Hickok & Poeppel, 2007). In the parietal and occipital lobes, more concrete modulated activity was found bilaterally in the precuneus, which has been associated with visuospatial imagery, episodic memory retrieval, and self-processing operations and has been said to contribute to the visualisation aspects of concrete concepts (Cavanna & Trimble, 2006). More activation was also found in large swaths of the occipital cortices (running into the inferior temporal lobe), and the ventral visual stream. These regions are integral to visual processing, with the ventral stream (including areas like the fusiform gyrus) particularly involved in object recognition and categorization, linking directly to the visual representation of concrete concepts (Martin, 2007). Finally, subcortically, the dorsal and posterior medial cerebellum were more active bilaterally for concrete modulation. Traditionally associated with motor function, some studies also implicate the cerebellum in cognitive and linguistic processing, including the modulation of language and semantic processing through its connections with cerebral cortical areas (Stoodley & Schmahmann, 2009).

      Conversely, activation for abstract was greater than concrete words in the following regions (Figure 3, blue): In the frontal lobes, this included right anterior cingulate gyrus, lateral and medial aspects of the superior frontal gyrus. Being involved in cognitive control, decision-making, and emotional processing, these areas may contribute to abstract conceptualization by integrating affective and cognitive components (Shenhav et al., 2013). More left frontal activity was found in both lateral and medial prefrontal cortices, and in the orbital gyrus, regions which are key to social cognition, valuation, and decision-making, all domains rich in abstract concepts (Amodio & Frith, 2006). In the parietal lobes, bilateral activity was greater in the angular gyri (AG) and inferior parietal lobules, including the postcentral gyrus. Central to the default mode network, these regions are implicated in a wide range of complex cognitive functions, including semantic processing, abstract thinking, and integrating sensory information with autobiographical memory (Seghier, 2013). In the temporal lobes, activity was restricted to the STS bilaterally, which plays a critical role in the perception of intentionality and social interactions, essential for understanding abstract social concepts (Frith & Frith, 2003). Subcortically, activity was greater, bilaterally, in the anterior thalamus, nucleus accumbens, and left amygdala for abstract modulation. These areas are involved in motivation, reward processing, and the integration of emotional information with memory, relevant for abstract concepts related to emotions and social relations (Haber & Knutson, 2010, Phelps & LeDoux, 2005).

      Finally, there was overlap in activity between modulation of both concreteness and abstractness (Figure 3, yellow). The overlap activity is due to the fact that we performed general linear tests for the abstract/concrete contrast at each of the 20 timepoints in our group analysis. Consequently, overlap means that activation in these regions is modulated by both concrete and abstract word processing but at different time-scales. In particular, we find that activity modulation associated with abstractness is generally processed over a longer time-frame (for a comparison of significant timing differences see figure S9). In the frontal, parietal, and temporal lobes, this was primarily in the left IFG, AG, and STG, respectively. Left IFG is prominently involved in semantic processing, particularly in tasks requiring semantic selection and retrieval and has been shown to play a critical role in accessing semantic memory and resolving semantic ambiguities, processes that are inherently time-consuming and reflective of the extended processing time for abstract concepts (Thompson-Schill et al., 1997; Wagner et al., 2001; Hofman et al., 2015). The STG, particularly its posterior portion, is critical for the comprehension of complex linguistic structures, including narrative and discourse processing. The processing of abstract concepts often necessitates the integration of contextual cues and inferential processing, tasks that engage the STG and may extend the temporal dynamics of semantic processing (Ferstl et al., 2008; Vandenberghe et al., 2002). In the occipital lobe, processing overlapped bilaterally around the calcarine sulcus, which is associated with primary visual processing (Kanwisher et al., 1997; Kosslyn et al., 2001).”

      The finding that concrete concepts activate more brain voxels compared to abstract concepts is generally aligned with existing research, which often reports more extensive brain activation for concrete versus abstract words. This is primarily due to the richer sensory and perceptual associations tied to concrete concepts - see for example Binder et al., 2005 (figure 2 in the paper). Similarly, a recent meta-analysis by Bucur & Pagano (2021) consistently found wider activation networks for the “concrete > abstract” contrast compared to the “abstract > concrete contrast”.   

      (2) The following line (Pg 21) regarding the necessary differences in time for the two categories was not clear. How does this fall out from the analysis method? 

      - Both categories overlap **(though necessarily at different time points)** in regions typically associated with word processing - 

      This is answered in our response above to point (4) in the reviewer’s comments. We now also provide more information on the temporal differences in the supplementary material (Figure S9). 

      Reviewer #2 (Public Review):

      The critical contrasts needed to test the key hypothesis are not presented or not presented in full within the core text. To test whether abstract processing changes when in a situated context, the situated abstract condition would first need to be compared with the displaced abstract condition as in Supplementary Figure 6. Then to test whether this change makes the result closer to the processing of concrete words, this result should be compared to the concrete result. The correlations shown in Figure 6 in the main text are not focused on the differences in activity between the situated and displaced words or comparing the correlation of these two conditions with the other (concrete/abstract) condition. As such they cannot provide conclusive evidence as to whether the context is changing the processing of concrete/abstract words to be closer to the other condition. Additionally, it should be considered whether any effects reflect the current visual processing only or more general sensory processing. 

      The reviewer identifies the critical contrast as follows:

      “The situated abstract condition would first need to be contrasted with the displaced abstract condition. Then, these results should be compared to the concrete result.” 

      We can confirm that this is indeed what had been done and we believe the reviewer’s confusion stems from a lack of clarity on our behalf. We have now made various clarifications on this point in the manuscript, and we changed the figures to make clear that our results are indeed based on the contrasts identified by this reviewer as the essential ones.

      Figure 6 in the main text now reflects the contrast between situated and displaced abstract and concrete conditions (as requested by the reviewer, this was previously Figure S7 from the supplementary material). To compare the results from this contrast to conceptual processing across context, we use cosine similarity, and we mention these results in the text. We furthermore show the overlap between the conditions of interest (abstract situated x concrete across context; concrete displaced x abstract across context) in a new figure (Figure 7) to bring out the spatial distribution of overlap more clearly.

      We also discussed the extent to which these effects reflect current visual processing only or more general sensory processing in lines 863 – 875 (pg. 33 and 34).   

      “In considering the impact of visual context on the neural encoding of concepts generally, it is furthermore essential to recognize that the mechanisms observed may extend beyond visual processing to encompass more general sensory processing mechanisms. The human brain is adept at integrating information across sensory modalities to form coherent conceptual representations, a process that is critical for navigating the multimodal nature of real-world experiences (Barsalou, 2008; Smith & Kosslyn, 2007). While our findings highlight the role of visual context in modulating the neural representation of abstract and concrete words, similar effects may be observed in contexts that engage other sensory modalities. For instance, auditory contexts that provide relevant sound cues for certain concepts could potentially influence their neural representation in a manner akin to the visual contexts examined in this study. Future research could explore how different sensory contexts, individually or in combination, contribute to the dynamic neural encoding of concepts, further elucidating the multimodal foundation of semantic processing.”

      Overall, the study would benefit from being situated in the literature more, including a) a more general understanding of the areas involved in semantic processing (including areas proposed to be involved across different sensory modalities and for verbal and nonverbal stimuli), and b) other differences between abstract and concrete words and whether they can explain the current findings, including other psycholinguistic variables which could be included in the model and the concept of semantic diversity (Hoffman et al.,). It would also be useful to consider whether difficulty effects (or processing effort) could explain some of the regional differences between abstract and concrete words (e.g., the language areas may simply require more of the same processing not more linguistic processing due to their greater reliance on word co-occurrence). Similarly, the findings are not considered in relation to prior comparisons of abstract and concrete words at the level of specific brain regions. 

      We now present an overview of the areas involved in semantic processing (across different sensory modalities for verbal and nonverbal stimuli) when we first present our results (section: “Conceptual Processing Across Context”).

      We looked at surprisal as a potential cofound and found no significant differences between any of the set of words analysed. Mean surprisal of concrete words is 22.19, mean surprisal of abstract words is 21.86. Mean surprisal ratings for concrete situated words are 21.98 bits, 22.02 bits for the displaced concrete words, 22.10 for the situated abstract words and 22.25 for the abstract displaced words. We also calculated the semantic diversity of all sets of words and found now significant differences between the sets. The mean values for each condition are: abstract_high (2.02); abstract_low (1.95); concrete_high (1.88); concrete_low (2.19); abstract_original (1.96); concrete_original (1.92). Hence processing effort related to different predictability (surprisal), or greater semantic diversity cannot explain our findings. 

      We submit that difficulty effects do not explain any aspects of the activation found for conceptual processing, because we included word frequency in our model as a nuisance regressor and found no significant differences associated with surprisal. Previous work shows that surprisal (Hale, 2001) and word frequency (Brysbaert & New, 2009) are good controls for processing difficulty.

      Finally, we added considerations of prior findings comparing abstract and concrete words at the level of specific brain regions to the discussion (section: Conceptual Processing Across Context). 

      The authors use multiple methods to provide a post hoc interpretation of the areas identified as more involved in concrete, abstract, or both (at different times) words. These are designed to reduce the interpretation bias and improve interpretation, yet they may not successfully do so. These methods do give some evidence that sensory areas are more involved in concrete word processing. However, they are still open to interpretation bias as it is not clear whether all the evidence is consistent with the hypotheses or if this is the best interpretation of individual regions' involvement. This is because the hypotheses are provided at the level of 'sensory' and 'language' areas without further clarification and areas and terms found are simply interpreted as fitting these definitions. For instance, the right IFG is interpreted as a motor area, and therefore sensory as predicted, and the term 'autobiographical memory' is argued to be interoceptive. Language is associated with the 'both' cluster, not the abstract cluster, when abstract >concrete is expected to engage language more. The areas identified for both vs. abstract>concrete are distinguished in the Discussion through the description as semantic vs. language areas, but it is not clear how these are different or defined. Auditory areas appear to be included in the sensory prediction at times and not at others. When they are excluded, the rationale for this is not given. Overall, it is not clear whether all these areas and terms are expected and support the hypotheses. It should be possible to specify specific sensory areas where concrete and abstract words are predicted to be different based on a) prior comparisons and/or b) the known locations of sensory areas. Similarly, language or semantic areas could be identified using masks from NeuroSynth or traditional metaanalyses.  A language network is presented in Supplementary Figure 7 but not interpreted, and its source is not given. 

      “The language network” was extracted through neurosynth and projected onto the “overlap” activation map with AFNI. We now specify this in the figure caption. 

      Alternatively, there could be a greater interpretation of different possible explanations of the regions found with a more comprehensive assessment of the literature. The function of individual regions and the explanation of why many of these areas are interpreted as sensory or language areas are only considered in the Discussion when it could inform whether the hypotheses have been evidenced in the results section. 

      We added extended considerations of this to the results (as requested by the reviewer) in the section “Conceptual Processing Across Contexts”. 

      “Consistent with previous studies, we predicted that across naturalistic contexts, concrete and abstract concepts are processed in a separable set of brain regions. To test this, we contrasted concrete and abstract modulators at each time point of the IRF (Figure 3). This showed that concrete produced more modulation than abstract processing in parts of the frontal lobes, including the right posterior inferior frontal gyrus (IFG) and the precentral sulcus (Figure 3, red). Known for its role in language processing and semantic retrieval, the IFG has been hypothesised to be involved in the processing of action-related words and sentences, supporting both semantic decision tasks and the retrieval of lexical semantic information (Bookheimer, 2002; Hagoort, 2005). The precentral sulcus is similarly linked to the processing of action verbs and motor-related words (Pulvermüller, 2005). In the temporal lobes, greater modulation occurred in the bilateral transverse temporal gyrus and sulcus, planum polare and temporale. These areas, including primary and secondary auditory cortices, are crucial for phonological and auditory processing, with implications for the processing of sound-related words and environmental sounds (Binder et al., 2000). The superior temporal gyrus (STG) and sulcus (STS) also showed greater modulation for concrete words and these are said to be central to auditory processing and the integration of phonological, syntactic, and semantic information, with a particular role in processing meaningful speech and narratives (Hickok & Poeppel, 2007). In the parietal and occipital lobes, more concrete modulated activity was found bilaterally in the precuneus, which has been associated with visuospatial imagery, episodic memory retrieval, and self-processing operations and has been said to contribute to the visualisation aspects of concrete concepts (Cavanna & Trimble, 2006). More activation was also found in large swaths of the occipital cortices (running into the inferior temporal lobe), and the ventral visual stream. These regions are integral to visual processing, with the ventral stream (including areas like the fusiform gyrus) particularly involved in object recognition and categorization, linking directly to the visual representation of concrete concepts (Martin, 2007). Finally, subcortically, the dorsal and posterior medial cerebellum were more active bilaterally for concrete modulation. Traditionally associated with motor function, some studies also implicate the cerebellum in cognitive and linguistic processing, including the modulation of language and semantic processing through its connections with cerebral cortical areas (Stoodley & Schmahmann, 2009).

      Conversely,  activation for abstract was greater than concrete words in the following regions (Figure 3, blue): In the frontal lobes, this included right anterior cingulate gyrus, lateral and medial aspects of the superior frontal gyrus. Being involved in cognitive control, decisionmaking, and emotional processing, these areas may contribute to abstract conceptualization by integrating affective and cognitive components (Shenhav et al., 2013). More left frontal activity was found in both lateral and medial prefrontal cortices, and in the orbital gyrus, regions which are key to social cognition, valuation, and decision-making, all domains rich in abstract concepts (Amodio & Frith, 2006). In the parietal lobes, bilateral activity was greater in the angular gyri (AG) and inferior parietal lobules, including the postcentral gyrus. Central to the default mode network, these regions are implicated in a wide range of complex cognitive functions, including semantic processing, abstract thinking, and integrating sensory information with autobiographical memory (Seghier, 2013). In the temporal lobes, activity was restricted to the STS bilaterally, which plays a critical role in the perception of intentionality and social interactions, essential for understanding abstract social concepts (Frith & Frith, 2003). Subcortically, activity was greater, bilaterally, in the anterior thalamus, nucleus accumbens, and left amygdala for abstract modulation. These areas are involved in motivation, reward processing, and the integration of emotional information with memory, relevant for abstract concepts related to emotions and social relations (Haber & Knutson, 2010, Phelps & LeDoux, 2005).

      Finally, there was overlap in activity between modulation of both concreteness and abstractness (Figure 3, yellow). The overlap activity is due to the fact that we performed general linear tests for the abstract/concrete contrast at each of the 20 timepoints in our group analysis. Consequently, overlap means that activation in these regions is modulated by both concrete and abstract word processing but at different time-scales. In particular, we find that activity modulation associated with abstractness is generally processed over a longer timeframe (for a comparison of significant timing differences see figure S9). In the frontal, parietal, and temporal lobes, this was primarily in the left IFG, AG, and STG, respectively. Left IFG is prominently involved in semantic processing, particularly in tasks requiring semantic selection and retrieval and has been shown to play a critical role in accessing semantic memory and resolving semantic ambiguities, processes that are inherently timeconsuming and reflective of the extended processing time for abstract concepts (ThompsonSchill et al., 1997; Wagner et al., 2001; Hofman et al., 2015). The STG, particularly its posterior portion, is critical for the comprehension of complex linguistic structures, including narrative and discourse processing. The processing of abstract concepts often necessitates the integration of contextual cues and inferential processing, tasks that engage the STG and may extend the temporal dynamics of semantic processing (Ferstl et al., 2008; Vandenberghe et al., 2002). In the occipital lobe, processing overlapped bilaterally around the calcarine sulcus, which is associated with primary visual processing (Kanwisher et al., 1997; Kosslyn et al., 2001).”

      Additionally, these methods attempt to interpret all the clusters found for each contrast in the same way when they may have different roles (e.g., relate to different senses). This is a particular issue for the peaks and valleys method which assesses whether a significantly larger number of clusters is associated with each sensory term for the abstract, concrete, or both conditions than the other conditions. The number of clusters does not seem to be the right measure to compare. Clusters differ in size so the number of clusters does not represent the area within the brain well. Nor is it clear that many brain regions should respond to each sensory term, and not just one per term (whether that is V1 or the entire occipital lobe, for instance). The number of clusters is therefore somewhat arbitrary. This is further complicated by the assessment across 20 time points and the inclusion of the 'both' categories. It would seem more appropriate to see whether each abstract and concrete cluster could be associated with each different sensory term and then summarise these findings rather than assess the number of abstract or concrete clusters found for each independent sensory term. In general, the rationale for the methods used should be provided (including the peak and valley method instead of other possible options e.g., linear regression). 

      We included an assessment of whether each abstract and concrete cluster could be associated with each different sensory term and then summarised these findings on a participant level in the supplementary material (Figures S3, S4, and S5). 

      Rationales for the Amplitude Modulated Deconvolution are now provided on page 10 (specifically the first paragraph under “Deconvolution Analysis” in the Methods section) and for the P&V on pages 13, 14 and 15 (under “Peaks and Valley” (particularly the first paragraph) in the Methods section). 

      The measure of contextual situatedness (how related a spoken word is to the average of the visually presented objects in a scene) is an interesting approach that allows parametric variation within naturalistic stimuli, which is a potential strength of the study. This measure appears to vary little between objects that are present (e.g., animal or room), and those that are strongly (e.g., monitor) or weakly related (e.g., science). Additional information validating this measure may be useful, as would consideration of the range of values and whether the split between situated (c > 0.6) and displaced words (c < 0.4) is sufficient.  

      The main validation of our measure of contextual situatedness derives from the high accuracy and reliability of CNNs in object detection and recognition tasks, as demonstrated in numerous benchmarks and real-world applications. 

      One reason for low variability in our measure of contextual situatedness is the fact that we compared the GloVe vector of each word of interest with an average GloVe vector of all object-words referring to objects present in 56 frames (~300 objects on average). This means that a lot of variability in similarity measures between individual object-words and the word of interest is averaged out. Notwithstanding the resulting low variability of our measure, we thought that this would be the more conservative approach, as even small differences between individual measures (e.g. 0.4 vs 0.6) would constitute a strong difference on average (across the 300 objects per context window).  Therefore, this split ensures a sufficient distinction between words that are strongly related to their visual context and those that are not – which in turn allows us to properly investigate the impact of contextual relevance on conceptual processing.

      Finally, the study assessed the relation of spoken concrete or abstract words to brain activity at different time points. The visual scene was always assessed using the 2 seconds before the word, while the neural effects of the word were assessed every second after the presentation for 20 seconds. This could be a strength of the study, however almost no temporal information was provided. The clusters shown have different timings, but this information is not presented in any way. Giving more temporal information in the results could help to both validate this approach and show when these areas are involved in abstract or concrete word processing. 

      We provide more information on the temporal differences of when clusters are involved in processing concrete and abstract concepts in the supplementary material (Figure S9) and refer to this information where relevant in the Methods and Results sections. 

      Additionally, no rationale was given for this long timeframe which is far greater than the time needed to process the word, and long after the presence of the visual context assessed (and therefore ignores the present visual context). 

      The 20-second timeframe for our deconvolution analysis is justified by several considerations. Firstly, the hemodynamic response function (HRF) is known to vary both across individuals and within different regions of the brain. To accommodate this variability and capture the full breadth of the HRF, including its rise, peak, and return to baseline, a longer timeframe is often necessary. The 20-second window ensures that we do not prematurely truncate the HRF, which could lead to inaccurate estimations of neural activity related to the processing of words. Secondly and related to this point, unlike model-based approaches that assume a canonical HRF shape, our deconvolution analysis does not impose a predefined form on the HRF, instead reconstructing the HRF from the data itself – for this, a longer time-frame is advantageous to get a better estimation of the true HRF. Finally, and related to this point, the use of the 'Csplin' function in our analysis provides a flexible set of basis functions for deconvolution, allowing for a more fine-grained and precise estimation of the HRF across this extended timeframe. The 'Csplin' function offers more interpolation between time points, which is particularly advantageous for capturing the nuances of the HRF as it unfolds over a longer time-frame. 

      Although we use a 20-second timeframe for the deconvolution analysis to capture the full HRF, the analysis is still time-locked to the onset of each visual stimulus. This ensures that the initial stages of the HRF are directly tied to the moment the word is presented, thus incorporating the immediate visual context. We furthermore include variables that represent aspects of the visual context at the time of word presentation in our models (e.g luminance) and control for motion (optical flow), colour saturation and spatial frequency of immediate visual context. 

      Reviewer #3 (Public Review):

      The context measure is interesting, but I'm not convinced that it's capturing what the authors intended. In analysing the neural response to a single word, the authors are presuming that they have isolated the window in which that concept is processed and the observed activation corresponds to the neural representation of that word given the prior context. I question to what extent this assumption holds true in a narrative when co-articulation blurs the boundaries between words and when rapid context integration is occurring. 

      We appreciate the reviewer's critical perspective on the contextual measure employed in our study. We agree that the dynamic and continuous nature of narrative comprehension poses challenges for isolating the neural response to individual words. However, the use of an amplitude modulated deconvolution analysis, particularly with the CSPLIN function, is a methodological choice to specifically address these challenges. Deconvolution allows us to estimate the hemodynamic response function (HRF) without assuming its canonical shape, capturing nuances in the BOLD signal that may reflect the integration of rapid contextual shifts (only beyond the average modulation of the BOLD signal. The CSPLIN function further refines this approach by offering a flexible basis set for modelling the HRF and by providing a detailed temporal resolution that can adapt to the variance in individual responses. 

      Our choice of a 20-second window is informed by the need to encompass not just the immediate response to a word but also the extended integration of the contextual information. This is consistent with evidence indicating that the brain integrates information over longer timescales when processing language in context (Hasson et al., 2015). The neural representation of a word is not a static snapshot but a dynamic process that evolves with the unfolding narrative. 

      Further, the authors define context based on the preceding visual information. I'm not sure that this is a strong manipulation of the narrative context, although I agree that it captures some of the local context. It is maybe not surprising that if a word, abstract or concrete, has a strong association with the preceding visual information then activation in the occipital cortex is observed. I also wonder if the effects being captured have less to do with concrete and abstract concepts and more to do with the specific context the displaced condition captures during a multimodal viewing paradigm. If the visual information is less related to the verbal content, the viewer might process those narrative moments differently regardless of whether the subsequent word is concrete or abstract. I think the claims could be tailored to focus less generally on context and more specifically on how visually presented objects, which contribute to the ongoing context of a multimodal narrative, influence the subsequent processing of abstract and concrete concepts.

      The context measure, though admittedly a simplification, is designed to capture the local visual context preceding word presentation. By using high-confidence visual recognition models, we ensure that the visual information is reliably extracted and reflects objects that have a strong likelihood of influencing the processing of subsequent words. We acknowledge that this does not capture the full richness of narrative context; however, it provides a quantifiable and consistent measure of the immediate visual environment, which is an important aspect of context in naturalistic language comprehension.

      With regards to the effects observed in the occipital cortex, we posit that while some activation might be attributable to the visual features of the narrative, our findings also reflect the influence of these features on conceptual processing. This is especially because our analysis only looks at the modulation of the HRF amplitude beyond the average response (so also beyond the average visual response) when contrasting between conditions of high and low visual-contextual association with important (audio-visual) control variables included in the model. 

      Lastly, we concur that both concrete and abstract words are processed within a multimodal narrative, which could influence their neural representation. We believe our approach captures a meaningful aspect of this processing, and we have refined our claims to specify the influence of visually presented objects on the processing of abstract and concrete concepts, rather than making broader assertions about multimodal context. We also highlight several other signals (e.g. auditory) that could influence processing. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The approach taken here requires a lot of manual variable selection and seems a bit roundabout. Why not build an encoding model that can predict the BOLD time course of each voxel in a participant from the feature-of-interest like valence etc. and then analyze if (1) certain features better predict activity in a specific region (2) the predicted responses/regression parameters are more positive (peaks) or more negative (valleys) for certain features in a specific brain region (3) maybe even use contextual features use a large language model and then per word (like "truth") analyze where the predicted responses diverge based on the associated context. This seems like a simpler approach than having multiple stages of analysis. 

      It is not clear to us why an encoding model would be more suitable for answering the question at hand (especially given that we tried to clarify concerns about non-linear relationships between variables). On the contrary, fitting a regression model to each individual voxel has several drawbacks. First, encoding models are prone to over-estimate effect sizes (Naselaris et al., 2011). Second, encoding models are not good at explaining group-level effects due to high variability between individual participants (Turner et al., 2018). We would also like to point out that an encoding model using features of a text-based LLM would not address the visual context question - unless the LLM was multimodal. Multimodal LLMs are a very recent research development in Artificial Intelligence, however, and models like LLaMA (adapter), Google’s Gemini, etc. are not truly multimodal in the sense that would be useful for this study, because they are first trained on text and later injected with visual data. This relates to our concern that the reviewer may have misunderstood that we are interested in purely visual context of words (not linguistic context).

      (2) In multiple analyses, a subset of the selected words is sampled to create a balanced set between the abstract and concrete categories. Do the authors show standard deviation across these sets? 

      For the subset of words used in the context-based analyses, we give mean ratings of concreteness, log frequency and length and conduct a t-test to show that these variables are not significantly different between the sets. We also included the psycholinguistic control variables surprisal and semantic diversity, as well as the visual variables motion (optical flow), colour saturation and spatial frequency.  

      Reviewer #2 (Recommendations For The Authors):

      Figures S3-5 are central to the argument and should be in the main text (potentially combined).  

      These have been added to the main text

      S5 says the top 3 terms are DMN (and not semantic control), but the text suggests the r value is higher for 'semantic control' than 'DMN'? 

      Fixed this in the text, the caption now reads: 

      “This was confirmed by using the neurosynth decoder on the unthresholded brain image - top keywords were “Semantic Control” and “DMN”.”

      Fig. S7 is very hard to see due to the use of grey on grey. Not used for great effect in the final sentence, but should be used to help interpret areas in the results section (if useful). It has not been specified how the 'language network' has been identified/defined here. 

      We altered the contrast in the figure to make boundaries more visible and specified how the language network was identified in the figure caption. 

      In the Results 'This showed that concrete produced more modulation than abstract modulation in the frontal lobes,' should be parts of /some of the frontal lobes as this isn't true overall. 

      Fixed this in the text.  

      There are some grammatical errors and lack of clarity in the context comparison section of the results. 

      Fixed these in the text.

      Reviewer #3 (Recommendations For The Authors):

      •  The analysis code should be shared on the github page prior to peer review.  

      The code is now shared under: https://github.com/ViktorKewenig/Naturalistic_Encoding_Concepts

      •  At several points throughout the methods section, information was referred to that had not yet been described. Reordering the presentation of this information would greatly improve interpretability. A couple of examples of this are provided below. 

      Deconvolution Analysis: the use of amplitude modulation regression was introduced prior to a discussion of using the TENT function to estimate the shape of the HRF. This was then followed by a discussion of the general benefits of amplitude modulation. Only after these paragraphs are the modulators/model structure described. Moving this information to the beginning of the section would make the analysis clearer from the onset. 

      Fixed this in the text

      Peak and Valley Analysis: the hypotheses regarding the sensory-motor features and experiential features are provided prior to describing how these features were extracted from the data (e.g., using the Lancaster norms). 

      Fixed this in the text.

      •  The justification for and description of the IRF approach seems overdone considering the timing differences are not analyzed further or discussed. 

      We now present a further discussion of timing differences in the supplementary material.

      •  The need and suitability of the cluster simulation method as implemented were not clear. The resulting maps were thresholded at 9 different p values and then combined, and an arbitrary cluster threshold of 20 voxels was then applied. Why not use the standard approach of selecting the significance threshold and corresponding cluster size threshold from the ClustSim table? 

      We extracted the original clusters at 9 different p values with the corresponding cluster size from the ClustSim table, then only included clusters that were bigger than 20 voxels.  

      •  Why was the center of mass used instead of the peak voxel? 

      Peak voxel analysis can be sensitive to noise and may not reliably represent the region's activation pattern, especially in naturalistic imaging data where signal fluctuations are more variable and outliers more frequent. The centre of mass provides a more stable and representative measure of the underlying neural activity. Another reason for using the center of mass is that it better represents the anatomical distribution of the data, especially in large clusters with more than 100 voxels where peak voxels are often located at the periphery. 

      • Figure 1 seems to reference a different Figure 1 that shows the abstract, concrete, and overlap clusters of activity (currently Figure 3). 

      Fixed this in the text.

      • Table S1 seems to have the "Touch" dimension repeated twice with different statistics reported. 

      Fixed this in the text, the second mention of the dimension “touch” was wrong.  

      • It appears from the supplemental files that the Peaks and Valley analysis produces different results at different lag times. This might be expected but it's not clear why the results presented in the main text were chosen over those in the supplemental materials. 

      The results in the main text were chosen over those in the supplementary material, because the HRF is said to peak at 5s after stimulus onset. We added a specification of this rational to the “2. Peak and Valley Analysis” subsection in the Methods section.  

      References (in order of appearance) 

      (1) Neumann J, Lohmann G, Zysset S, von Cramon DY. Within-subject variability of BOLD response dynamics. Neuroimage. 2003 Jul;19(3):784-96. doi: 10.1016/s10538119(03)00177-0. PMID: 12880807.

      (2) Handwerker DA, Ollinger JM, D'Esposito M. Variation of BOLD hemodynamic responses across subjects and brain regions and their effects on statistical analyses. Neuroimage. 2004 Apr;21(4):1639-51. doi: 10.1016/j.neuroimage.2003.11.029. PMID: 15050587.

      (3) Binder JR, Westbury CF, McKiernan KA, Possing ET, Medler DA. Distinct brain systems for processing concrete and abstract concepts. J Cogn Neurosci. 2005 Jun;17(6):90517. doi: 10.1162/0898929054021102. PMID: 16021798

      (4) Bucur, M., Papagno, C. An ALE meta-analytical review of the neural correlates of abstract and concrete words. Sci Rep 11, 15727 (2021). heps://doi.org/10.1038/s41598-021-94506-9 

      (5) Hale., J. 2001. A probabilistic earley parser as a psycholinguistic model. In Proceedings of the second meeting of the North American Chapter of the Association for Computational Linguistics on Language technologies (NAACL '01). Association for Computational Linguistics, USA, 1–8. heps://doi.org/10.3115/1073336.1073357

      (6) Brysbaert, M., New, B. Moving beyond Kučera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods 41, 977–990 (2009). heps://doi.org/10.3758/BRM.41.4.977 

      (7) Hasson, U., Nir, Y., Levy, I., Fuhrmann, G., & Malach, R. (2004). Intersubject Synchronization of Cortical Activity During Natural Vision. Science, 303(5664), 6.

      (8) Naselaris T, Kay KN, Nishimoto S, Gallant JL. Encoding and decoding in fMRI. Neuroimage. 2011 May 15;56(2):400-10. doi: 10.1016/j.neuroimage.2010.07.073. Epub 2010 Aug 4. PMID: 20691790; PMCID: PMC3037423.

      (9) Turner BO, Paul EJ, Miller MB, Barbey AK. Small sample sizes reduce the replicability of task-based fMRI studies. Commun Biol. 2018 Jun 7;1:62. doi: 10.1038/s42003-0180073-z. PMID: 30271944; PMCID: PMC6123695.

      (10) He, K., Zhang, Y., Ren, S., & Sun, J. (2015). Deep Residual Learning for Image Recognition. Bioarchive (Tech Report). heps://doi.org/heps://doi.org/10.48550/arXiv.1512.03385

      (11) Hasson, U., & Egidi, G. (2015). What are naturalistic comprehension paradigms teaching us about language? In R. M. Willems (Ed.), Cognitive neuroscience of natural language use (pp. 228–255). Cambridge University Press. heps://doi.org/10.1017/CBO9781107323667.011

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Summary:

      The study made fundamental findings in investigations of the dynamic functional states during sleep. Twenty-one HMM states were revealed from the fMRI data, surpassing the number of EEG-defined sleep stages, which can define sub-states of N2 and REM. Importantly, these findings were reproducible over two nights, shedding new light on the dynamics of brain function during sleep.

      Strengths:

      The study provides the most compelling evidence on the sub-states of both REM and N2 sleep. Moreover, they showed these findings on dynamics states and their transitions were reproducible over two nights of sleep. These novel findings offered unique information in the field of sleep neuroimaging.

      Weaknesses:

      The only weakness of this study has been acknowledged by the authors: limited sample size.

      We thank the reviewer for the overall enthusiasm for this study.

      Reviewer #1 (Recommendations For The Authors):

      (1) Were there differences in the extent of head motion during sleep among sleep stages? How was the potential motion parameter differences handled during the statistical analyses?

      If there were large head motions that continued for a long time (e.g., longer than 1 minute), how did the authors deal with that scanning session? For an extremely long scanning session (3 hours), how was motion correction conducted? It would be great if the authors could provide more details.

      We found that N3 sleep stage had lowest head motion, followed by REM, N2, N1, and lastly Wake. In other words, participants have lower head motion during sleep than during Wakefulness. We added this information to the Supplemental Results, copied below.

      We performed standardized motion correction during preprocessing using AFNI regardless of the duration of the scans. We did not include motion parameters in the HMM model. Time frames with Excessive head motion (any of 6 head motion parameters exceeding 0.3 mm or degree) was censored. Previous analysis of the same data indicated that motion during extended sleep scans is comparable to the motion observed in shorter resting-state scans (Moehlman et al., 2019).

      In Supplemental Results, “Motion parameters with sleep stages.

      Averaged motion across six motion parameters decreased from wake to light sleep to deep sleep at night 2. For example, mean (standard deviation) motion for each sleep stage is as follows, N1: 0.043 (0.37); N2: 0.039 (0.033); N3: 0.035 (0.031); REM: 0.035 (0.032); Wake: 0.057 (0.052).

      Similarly, the percentage of timepoints retained after censoring decreased from wake to light sleep to deep sleep at night 2. N1: 91%; N2: 93%; N3: 96%; REM: 89%; Wake 90%.”

      In the method section, “Previous analysis of the same data indicated that motion during extended sleep scans is comparable to the motion observed in shorter resting-state scans (Moehlman et al., 2019). We also found that motion is lower during deep sleep compared to wake, see Supplemental Results.”

      (2) It is possible that the data input for the HMM analyses might vary among participants and between the two nights, how did the authors deal with this issue during statistical analyses?

      This is a great question. We standardized BOLD timecourses for each participant and each night to avoid differences among participants and between two nights. We revised the description in the method section to make this point clear.

      In the method section, “To prepare the data for analysis, we first standardized the participant-specific sets of 300 ROI timecourses (scaled to a mean of 0, and a standard deviation of 1), which were then concatenated across all participants. This standardization was performed separately for each night. ”

      (3) Figures 2 and 4, the top part seems to be missing, e.g., "Night 2" in Figure 2, and "N2-related" in Figure 4.

      Thank you for pointing out these errors. We fixed them.

      (4) Figure 3 seems to be more stretched vertically than horizontally.

      We revised the figure to ensure it appears balanced on both sides.

      Reviewer #2 (Public Review):

      Summary:

      Yang and colleagues used a Hidden Markov Model (HMM) on whole-night fMRI to isolate sleep and wake brain states in a data-driven fashion. They identify more brain states (21) than the five sleep/wake stages described in conventional PSG-based sleep staging, show that the identified brain states are stable across nights, and characterize the brain states in terms of which networks they primarily engage.

      Strengths:

      This work's primary strengths are its dataset of two nights of whole-night concurrent EEG-fMRI (including REM sleep), and its sound methodology.

      Weaknesses:

      The study's weaknesses are its small sample size and the limited attempts at relating the identified fMRI brain states back to EEG.

      We thank the reviewer for the positive feedback and helpful suggestions for this study.

      General appraisal:

      The paper's conclusions are generally well-supported, but some additional analyses and discussions could improve the work.

      The authors' main focus lies in identifying fMRI-based brain states, and they succeed at demonstrating both the presence and robustness of these states in terms of cross-night stability. Additional characterization of brain states in terms of which networks these brain states primarily engage adds additional insights.

      A somewhat missed opportunity is the absence of more analyses relating the HMM states back to EEG. It would be very helpful to the sleep field to see how EEG spectra of, say, different N2-related HMM states compare. Similarly, it is presently unclear whether anything noticeable happens within the EEG time course at the moment of an HMM class switch (particularly when the PSG stage remains stable). While the authors did look at slow wave density and various physiological signals in different HMM states, a characterization of the EEG itself in terms of spectral features is missing. Such analyses might have shown that fMRI-based brain states map onto familiar EEG substates, or reveal novel EEG changes that have so far gone unnoticed.

      We thank the reviewer for this great suggestion. We performed EEG spectral analysis on each HMM state. Results were added to Suppementary Results and Supplementary Figure 10 and 11 (Copied below). Specifically, we confirmed that N3-related states had highest Delta power and that the Deep-N2 module showed different spectral profiles compared to Light-N2 module.

      In Supplemental Results: “We conducted spectral analysis for each TR and calculated the average power spectrum for each common EEG brainwave—Delta (0.5-4 Hz), Theta (4-8 Hz), Alpha (8-13 Hz), Beta (13-30 Hz), and Gamma (30-100 Hz)—across the 21 HMM states. See Supplementary Figure 10 and 11 for night 2 and night 1 data, respectively. As expected, we found that N3-related states 8 and 10 had highest Delta power in both nights. In addition, the Deep-N2 module had higher power in Theta and Alpha bands compared to the Light-N2 module.”

      It is unclear how the presently identified HMM brain states relate to the previously identified NREM and wake states by Stevner et al. (2019), who used a roughly similar approach. This is important, as similar brain states across studies would suggest reproducibility, whereas large discrepancies could indicate a large dependence on particular methods and/or the sample (also see later point regarding generalizability).

      This is a great question. There are some similarities and differences between the current study and Stevner et al. (2019). We discussed this in the Supplementary Discussion. Copied below.

      In the Supplementary Discussion: “Both studies demonstrated that HMM states can be effectively divided into meaningful modules solely based on transition probabilities. Furthermore, both studies indicated that pre-sleep wakefulness differs from post-sleep wakefulness.

      However, despite the similar approaches used, key differences in data acquisition and analysis make it challenging to directly compare HMM states between these two studies. Firstly, Stevner et al. (2019) collected only 1-hour-long sleep data from 18 participants, whereas our current study includes 8-hour-long sleep data from 12 participants for two consecutive nights. As discussed in the main text, full sleep cycling cannot be obtained from 1-hour long sleep due to the lack of REM stage and incomplete sleep cycles. Secondly, in Stevner et al. (2019) (Figure 4e), the four wake-NREM stages had roughly the same duration. In contrast, in our current study (Night 2, Figure 2A), the N2 stage comprises 43% of total sleep, which aligns with the natural N2 composition of nocturnal sleep stages. This discrepancy might explain the different number of N2-related states found in the two studies, with 3 out of 19 in Stevner et al. (2019) versus 13 out of 21 in our current study.”

      More justice could be done to previous EEG-based efforts moving beyond conventional AASM-defined sleep/wake states. Various EEG studies performed data-driven clustering of brain states, typically indicating more than 5 traditional brain states (e.g., Koch et al. 2014, Christensen et al. 2019, Decat. et al 2022). Beyond that, countless subdivisions of classical sleep stages have been proposed (e.g., phasic/tonic REM, N2 with/without spindles, N3 with global/local slow waves, cyclic alternating patterns, and many more). While these aren't incorporated into standard sleep stage classification, the current manuscript could be misinterpreted to suggest that improved/data-driven classifications cannot be achieved from EEG, which is incorrect.

      We agree with the reviewer that previous EEG-based efforts should be mentioned. We now added this in the manuscript. Copied below.

      In the Discussion section, “Third, we chose to not include EEG features in our data-driven model. However, the current method is not limited to fMRI data and can be applied to EEG data. Given that previous data-driven studies based on EEG data have suggested that there might be more than five traditional sleep stages (Christensen et al., 2019; Decat et al., 2022; Koch et al., 2014), as well as subdivisions within these traditional sleep stages (Brandenberger et al., 2005; Decat et al., 2022; Simor et al., 2020), future studies may apply data-driven models on both fMRI and EEG data. ”

      More discussion of the limitations of the current sample and generalizability would be helpful. A sample of N=12 is no doubt impressive for two nights of concurrent whole-night EEG-fMRI. Still, any data-driven approach can only capture the brain states that are present in the sample, and 12 individuals are unlikely to express all brain states present in the population of young healthy individuals. Add to that all the potentially different or altered brain states that come with healthy ageing, other demographic variables, and numerous clinical disorders. How do the authors expect their results to change with larger samples and/or varying these factors? Perhaps most importantly, I think it's important to mention that the particular number of identified brain states (here 21, and e.g. 19 in Stevner) is not set in stone and will likely vary as a function of many sample- and methods-related factors.

      We thank the reviewer for the great suggestions. We now included these points when discussing limitations in the Discussion section. We think that a HMM model with larger sample size might produce more fine-grained results, but this remains to be investigated when a more extensive dataset becomes available.

      In the Discussion section, “Secondly, while our study involved a relatively small number of participants (12), it included a large amount of fMRI data (~16 hours scan) per participant. Although the HMM trained on data from 12 participants was robust, the generalizability of the current results to different populations—such as healthy aging individuals and clinical populations—needs to be demonstrated in future studies, particularly with larger sample sizes and more diverse populations.”

      “Fourth, while we selected 21 HMM brain sleep states based on model evaluation parameters in the current study, the exact number of sleep states is not fixed and likely depends on various sample- and methods-related factors, such as sample size and model setups.”

    1. Social workers treat each person in a caring and respectful fashion, mindful of individual differences and cultural and ethnic diversity. Social workers promote clients’ socially responsible self-determination. Social workers seek to enhance clients’ capacity and opportunity to change and to address their own needs. Social workers are cognizant of their dual responsibility to clients and to the broader society. They seek to resolve conflicts between clients’ interests and the broader society’s interests in a socially responsible manner consistent with the values, ethical principles, and ethical standards of the profession.

      Structural inequality/ power imbalances raise quite a few questions for me, especially when it comes to personal biases. How can we check those at the door, and acknowledge the way we are navigating our roles as social workers? I think it would be helpful if the code of ethics went into more detail about what these balances may mean, and subtle things they may look like.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review): 

      Summary: 

      This manuscript nicely outlines a conceptual problem with the bFAC model in A-motility, namely, how is the energy produced by the inner membrane AglRQS motor transduced through the cell wall into mechanical force on the cell surface to drive motility? To address this, the authors make a significant contribution by identifying and characterizing a lytic transglycosylase (LTG) called AgmT. This work thus provides clues and a future framework work for addressing mechanical force transmission between the cytoplasm and the cell surface. 

      Strengths: 

      (1) Convincing evidence shows AgmT functions as an LTG and, surprisingly, that mltG from E. coli complements the swarming defect of an agmT mutant. 

      (2) Authors show agmT mutants develop morphological changes in response to treatment with a b-lactam antibiotic, mecillinam. 

      (3) The use of single-molecule tracking to monitor the assembly and dynamics of bFACs in WT and mutant backgrounds. 

      (4) The authors understand the limitations of their work and do not overinterpret their data. 

      Weaknesses: 

      (1) A clear model of AgmT's role in gliding motility or interactions with other A-motility proteins is not provided. Instead, speculative roles for how AgmT enzymatic activity could facilitate bFAC function in A-motility are discussed. 

      We appreciate the reviewer for this comment. We have added a new figure, Fig. 6, and updated the Discussion to propose a mechanism, “rather than interacting with bFAC components directly and specifically, AgmT facilitates proper bFAC assembly indirectly through its LTG activity. LTGs usually break glycan strands and produce unique anhydro caps on their ends40-44. However, because AgmT is the only LTGs that is required for gliding, it is not likely to facilitate bFAC assembly by generating such modification on glycan strands. E. coli MltG is a glycan terminase that controls the length of newly synthesized PG glycans25. Likewise, AgmT could generate short glycan strands and thus uniquely modify the overall structure of M. xanthus PG, such as producing small pores that retard and retain the inner subcomplexes of bFACs (Fig. 6). On the contrary, the M. xanthus mutants that lack active AgmT could produce PG with increased strain length, which blocks bFACs from binding to the cell wall and precludes stable bFAC assembly. However, it would be very difficult to demonstrate how glycan length affects the connection between bFACs and PG”.

      (2) Although agmT mutants do not swarm, in-depth phenotypic analysis is lacking. In particular, do individual agmT mutant cells move, as found with other swarming defective mutants, or are agmT mutants completely nonmotile, as are motor mutants? 

      We appreciate the reviewer for bringing up an important question. Prompted by this question, we analyzed the gliding phenotype of the ΔagmT pilA mutant on the single cell level. We found that the ΔagmT pilA cells are not completely static. Instead, they move for less than half cell length before pauses and reversal. We moved on to quantify the velocity and gliding persistency and found that the gliding phenotype of the ΔagmT pilA cells matches the prediction on the bFACs that loses the connection between the inner subcomplexes and PG.  

      We then imaged individual ∆agmT pilA- cells on 1.5% agar surface at 10-s intervals using bright-field microscopy. To our surprise, instead of being static, individual ∆agmT pilA- cells displayed slow movements, with frequent pauses and reversals (Video 1). To quantify the effects of AgmT, we measured the velocity and gliding persistency (the distances cells traveled before pauses and reversals) of individual cells. Compared to the pilA- cells that moved at 2.30 ± 1.33 μm/min (n = 46) and high persistency (Video 2 and Fig. 2C, D), ∆agmT pilA- cells moved significantly slower (0.88 ± 0.62 μm/min, n = 59) and less persistent (Video 1 and Figure. 2C, D). Such aberrant gliding motility is distinct from the “hyper reversal” phenotype. Although the hyper reversing cells constitutively switching their moving directions, they usually maintain gliding velocity at the wild-type level27. due to the polarity regulators Instead, the slow and “slippery” gliding of the ∆agmT pilA- cells matches the prediction that when the inner complexes of bFACs lose connection with PG, bFACs can only generate short, and inefficient movements19. Our data indicate that AgmT is not essential component in the bFACs. Thus, AgmT is likely to regulate the assembly and stability of bFACs, especially their connection with PG.         

      (3) The bioinformatic and comparative genomics analysis of agmT is incomplete. For example, the sequence relationships between AgmT, MltG, and the 13 other LTG proteins in M. xanthus are not clear. Is E. coli MltG the closest homology to AgmT? Their relationships could be addressed with a phylogenetic tree and/or sequence alignments. Furthermore, are there other A-motility genes in proximity to agmT? Similarly, does agmT show specific co-occurrences with the other A-motility genes across genera/species?  

      We answered the first question in the Discussion (it was in the first Results section in the previous version), “Both M. xanthus AgmT and E. coli MltG belong to the YceG/MltG family, which is the first identified LTG family that is conserved in both Gram-negative and positive bacteria25,41. About 70% of bacterial genomes, including firmicutes, proteobacteria, and actinobacteria, encode YceG/MltG domains25. The unique inner membrane localization of this family and the fact that AgmT is the only M. xanthus LTG that belongs to this family (Table S2) could partially explain why it is the only LTG that contributes to gliding motility”.

      For the second, we added one sentence in the Results, “No other motility-related genes are found in the vicinity of agmT”.

      For the third question, we do not believe a co-occurrence analysis is necessary. Because M. xanthus gliding is very unique but “about 70% of bacterial genomes, including firmicutes, proteobacteria, and actinobacteria, encode YceG/MltG domains25”, gliding should show no co-occurrence with the YceG/MltG family LTGs.

      (4) Related to iii, what about the functional relationship of the endogenous 13 LTG genes? Although knockout mutants were shown to be motile, presumably because AgmT is present, can overexpression of them, similar to E. coli MltG, complement an agmT mutant? In other words, why does MltG complement and the endogenous LTG proteins appear not to be relevant? 

      We appreciate the reviewer for this question, which prompted us to think the uniqueness of AgmT more carefully. AgmT is unique for its inner-membrane localization, rather than activity. We answered this question in the discussion, “LTGs usually break glycan strands and produce unique anhydro caps on their ends40-44. However, because AgmT is the only LTGs that is required for gliding, it is not likely to facilitate bFAC assembly by generating such modification on glycan strands”. We then moved on to propose a possible mechanism, “E. coli MltG is a glycan terminase that controls the length of newly synthesized PG glycans25. Likewise, AgmT could generate short glycan strands and thus uniquely modify the overall structure of M. xanthus PG, such as producing small pores that retard and retain the inner subcomplexes of bFACs (Fig. 6). On the contrary, the M. xanthus mutants that lack active AgmT could produce PG with increased strain length, which blocks bFACs from binding to the cell wall and precludes stable bFAC assembly. However, it would be very difficult to demonstrate how glycan length affects the connection between bFACs and PG”. 

      (5) Based on Figure 2B, overexpression of MltG enhances A-motility compared to the parent strain and the agmT-PAmCh complemented strain, is this actually true? Showing expanded swarming colony phenotypes would help address this question. 

      We appreciate the reviewer for bringing up an important question. Prompted by this question, we analyzed the effects of MltG expression at the single-cell level. We found that “Consistent with its LTG activity, the expression of MltGEc restored gliding motility of the ΔagmT pilA- cells on both the colony (Fig. 2B) and single-cell (Fig. 2C, D) levels. Interestingly, in the absence of sodium vanillate, the leakage expression of MltGEc using the vanillate-inducible promoter was sufficient to compensate the loss of AgmT. A plausible explanation of this observation is that as E. coli grows much faster (generation time 20 - 30 min) than M. xanthus (generation time ~4 h), MltGEc could possess significantly higher LTG activity than AgmT. Induced by 200 μM sodium vanillate, the expression of MltGEc further but non significantly increased the velocity and gliding persistency (Fig. 2B-D). Importantly, the expression of MltGEc failed to restore gliding motility in the agmTEAEA pilA cells, even in the presence of 200 μM sodium vanillate (Fig. 2B). Consistent with the mecillinam resistance assay (Fig. 3C), this result suggests that AgmTEAEA still binds to PG and that in the absence of its LTG activity, AgmT does not anchor bFACs to PG”. These results are shown in the new panels C and D in Figure 2. 

      (6) Cell flexibility is correlated with gliding motility function in M. xanthus. Since AgmT has LTG activity, are agmT mutants less flexible than WT cells and is this the cause of their motility defect? 

      We appreciate the reviewer for bringing up an important question. We saw cells that lack AgmT making S-turns and U-turns frequently under microscope. We used a GRABS assay to quantify cell stiffness and found that neither the absence of AgmT nor the expression of MltGEc affect cell stiffness. We added this result in the manuscript, “The assembly of bFACs produces wave-like deformation on cell surface6,37, suggesting that their assembly may require a flexible PG layer2,6,11,12. As a major contributor to cell stiffness, PG flexibility affects the overall stiffness of cells38. To test the possibility that AgmT and MltGEc facilitate bFAC assembly by reducing PG stiffness, we adopted the GRABS assay38 to quantify if the lack of AgmT and the expression of MltGEc affects cell stiffness. To quantify changes in cell stiffness, we simultaneously measured the growth of the pilA-, ΔagmT pilA-, and ΔagmT Pvan-MltGEc pilA- (with 200 μM sodium vanillate) cells in a 1% agarose gel infused with CYE and liquid CYE and calculated the GRABS scores of the ΔagmT pilA-, and ΔagmT Pvan-MltGEc pilA- cells using the pilA- cells as the reference, where positive and negative GRABS scores indicate increased and decreased stiffness, respectively (see Materials and Methods and Ref38). The GRABS scores of the ΔagmT pilA-, and ΔagmT Pvan-MltGEc pilA- (with 200 μM sodium vanillate) cells were -0.06 ± 0.04 and -0.10 ± 0.07 (n = 4), respectively, indicating that neither AgmT nor MltGEc affects cell stiffness significantly. Whereas PG flexibility could still be essential for gliding, AgmT and MltGEc do not regulate bFAC assembly by modulating PG stiffness. Instead, these LTGs could connect bFACs to PG by generating structural features that are irrelevant to PG stiffness”.      

      Reviewer #2 (Public Review): 

      The manuscript by Carbo et al. reports a novel role for the MltG homolog AgmT in gliding motility in M. xanthus. The authors conclusively show that AgmT is a cell wall lytic enzyme (likely a lytic transglycosylase), its lytic activity is required for gliding motility, and that its activity is required for proper binding of a component of the motility apparatus to the cell wall. The data are generally well-controlled. The marked strength of the manuscript includes the detailed characterization of AgmT as a cell wall lytic enzyme, and the careful dissection of its role in motility. Using multiple lines of evidence, the authors conclusively show that AgmT does not directly associate with the motility complexes, but that instead its absence (or the overexpression of its active site mutant) results in the failure of focal adhesion complexes to properly interact with the cell wall. 

      An interpretive weakness is the rather direct role attributed to AgmT in focal adhesion assembly. While their data clearly show that AgmT is important, it is unclear whether this is the direct consequence of AgmT somehow promoting bFAC binding to PG or just an indirect consequence of changed cell wall architecture without AgmT. In E. coli, an MltG mutant has increased PG strain length, suggesting that M. xanthus's PG architecture may likewise be compromised in a way that precludes AglR binding to the cell wall. However, this distinction would be very difficult to establish experimentally. MltG has been shown to associate with active cell wall synthesis in E. coli in the absence of protein-protein interactions, and one could envision a similar model in M. xanthus, where active cell wall synthesis is required for focal adhesion assembly, and MltG makes an important contribution to this process. 

      Based on the data that AgmT does not assemble into bFACs and that heterologous MltGEc substitutes M. xanthus AgmT in gliding, we believe that AgmT facilitates the proper assembly of bFACs indirectly. At the end of Introduction, we pointed out, “Hence, the LTG activity of AgmT anchors bFAC to PG, potentially by modifying PG structure”. Following the reviewer’s recommendation, we revised the Discussion to emphasize that AgmT facilitates proper bFAC assembly indirectly through its LTG activity. For the reviewer’s convenience, the revised paragraph is pasted here, with the changes highlighted in blue:  

      “It is surprising that AgmT itself does not assemble into bFACs and that MltGEc substitutes AgmT in gliding. Thus, rather than interacting with bFAC components directly and specifically, AgmT facilitates proper bFAC assembly indirectly through its LTG activity. LTGs usually break glycan strands and produce unique anhydro caps on their ends40-44. However, because AgmT is the only LTGs that is required for gliding, it is not likely to facilitate bFAC assembly by generating such modification on glycan strands. E. coli MltG is a glycan terminase that controls the length of newly synthesized PG glycans25. Likewise, AgmT could generate short glycan strands and thus uniquely modify the overall structure of M. xanthus PG, such as producing small pores that retard and retain the inner subcomplexes of bFACs (Fig. 6). On the contrary, the M. xanthus mutants that lack active AgmT could produce PG with increased strain length, which blocks bFACs from binding to the cell wall and precludes stable bFAC assembly. However, it would be very difficult to demonstrate how glycan length affects the connection between bFACs and PG”.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      The last sentence of the Discussion implies that anchoring LTG (AgmT) in the inner membrane is important. I did not see this mentioned about AgmT. Does it contain an inner membrane anchoring domain? Along these lines, the AgmT and MltG proteins appear to be of different sizes (Figure 1A). Please clarify, perhaps including full-length sequence alignment and/or domain architecture for these proteins. 

      We revised the first paragraph in the Results and clarified, “Among these genes, agmT (ORF K1515_0491023) was predicted to encode an inner membrane protein with a single N-terminal transmembrane helix (residues 4 – 25) and a large “periplasmic solute-binding” domain22.”

      We appreciate the reviewer for spotting the mistake in Fig. 2A. The E. coli MltG sequence shown in the alignment starts from residue 158, instead of 88. We have corrected this mistake in the figure. M. xanthus AgmT and E. coli MltG are of similar sizes, with 239 and 240 amino acids, respectively. 

      In Figure 3 legend, define D3. 

      The definition of D_3_ was added into the figure legend.

      Figure 4A shows 100-frame composite micrographs, but no time interval between frames is given. 

      The imaging frequency, 10 Hz, was stated in the text. We also added this information into the figure legend.

      Line 98, the term "Especially" does not flow well, change to "This includes the characteristic..." or similar. 

      We deleted “especially” from the sentence.

      Line 179, "not" is not accurate, replace with "rarely." 

      Changed.

      Line 188, add a qualifier, "proper" before "bFACs assembly." 

      Added.

      Lines 196 and 202, provide the sizes of each protein in these fusion constructs. 

      We added these numbers to the figure legend.

      In Figure 5A add arrows to identify each band. State in legend whether this is a denaturing gel, if so, why are AgmT-PAmCherry homodimers present?

      Protein electrophoresis was done using SDS-PAGE. It is not unusual that some proteins, especially membrane proteins, are resistant to dissociation by SDS and appear as multimers in SDS-PAGE. The authors have seen this phenomenon repeatedly in both our experiments and the literature. Nevertheless, we clarified our experimental condition in the text, “Similar to many membrane proteins that resistant to dissociation by SDS34, immunoblot using an anti-mCherry antibody showed that AgmTPAmCherry accumulated in two bands in SDS-PAGE that corresponded to monomers and dimers of the full-length fusion protein, respectively (Fig. 5A)”.

      A few examples for membrane proteins remaining as oligomers are listed in below:

      Rath et al., 2009, PNAS 106: 1760-1765

      Sulistijo et al., 2003, J Biol Chem 278: 51950-51956

      Sukharev 2002, Biophy J 83: 290-298

      Neumann et al., 1998, J Bacteriol 180: 3312-3316

      Blakey et al., 2002, Biochem J 364: 527-535

      Wegner and Jones, 1984, J Biol Chem 259: 1834-1841

      Jiang et al., 2002, Nature 417: 515-522

      Heginbotham and Miller, 1997, Biochem 36: 10335-10342

      Gentile et al., 2002, J Biol Chem 277: 44050-44060

      Line 207, "near evenly along cell bodies" does not seem consistent with Figure 5B as there looks to be an enrichment of AgmT at cell poles. 

      We have replaced panel 5B with more typical images. Due to the shape difference between cell poles and the cylindrical nonpolar regions, many surface-associated proteins could appear “enriched” at cell poles. This effect was very obvious in Fig. 5B, possibly due to the unevenness of the agar surface. We examined our data carefully and did not find significant polar enrichment. Compared to AglZ that significantly enriches at poles and forms evenly-spaced clusters along the cell body, the localization of AgmT is completely different.  

      Lines 252 and 260, change "Fig. 5B" to "Fig. 5C." 

      We apologize for these mistakes. They have been corrected.

      Line 266, insert "the" before "cell envelope." 

      Added.

      Line 278, insert "presumably" between "AgmT generates (small openings)" 

      Corrected.

      Reviewer #2 (Recommendations For The Authors): 

      - Major comment: I would rephrase conclusions regarding a direct role of AgmT in focal adhesion assembly since these data are indirect (AglR binding to the cell wall is reduced in the absence of AgmT - this could also be interpreted as the absence of AgmT causing altered cell wall architecture that precludes AglR binding). Example: I don't think the data support line 222 "AgmT connects bFACs to PG", perhaps rephrased to accommodate more agnostic explanations. Likewise, line 308 states that MltG has been "adopted" by the gliding motility machinery. This conclusion cannot be drawn from the data presented. 

      We agree with the reviewer that the conclusions should be stated precisely. At the end of Introduction, we pointed out, “Hence, the LTG activity of AgmT anchors bFAC to PG, potentially by modifying PG structure”. Following the reviewer’s recommendation, we revised the Discussion to emphasize that AgmT facilitates bFAC assembly indirectly through its LTG activity. For the reviewer’s convenience, the revised paragraph is pasted here, with the changes highlighted in blue: 

      “It is surprising that AgmT itself does not assemble into bFACs and that MltGEc substitutes AgmT in gliding. Thus, rather than interacting with bFAC components directly and specifically, AgmT facilitates proper bFAC assembly indirectly through its LTG activity. LTGs usually break glycan strands and produce unique anhydro caps on their ends40-44. However, because AgmT is the only LTGs that is required for gliding, it is not likely to facilitate bFAC assembly by generating such modification on glycan strands. E. coli MltG is a glycan terminase that controls the length of newly synthesized PG glycans25. Likewise, AgmT could generate short glycan strands and thus uniquely modify the overall structure of M. xanthus PG, such as producing small pores that retard and retain the inner subcomplexes of bFACs (Fig. 6). On the contrary, the M. xanthus mutants that lack active AgmT could produce PG with increased strain length, which blocks bFACs from binding to the cell wall and precludes stable bFAC assembly. However, it would be very difficult to demonstrate how glycan length affects the connection between bFACs and PG”.

      However, we believe that the conclusion that “AgmT connects bFACs to PG" still stands true. Although AgmT is not likely to interact with the gliding machinery directly, its activity does increase the binding between bFACs and PG. 

      We agree with the reviewer that “adopt” may not be the best word to describe AgmT’s function in gliding. In the revised manuscript, we changed the phrase to “contributes to gliding motility”. 

      - Line 35: define "bFAC" at first use. 

      Fixed.

      - Figure 2: Mention in the caption why the pilA mutation is significant. Also, make more clear what one is supposed to see. You could include an arrow showing motile cells extruding from the colony edge, and mark + label the edge of the colony. 

      Following the reviewer’s recommendations, we described the motility phenotypes in detail in the main text, “On a 1.5% agar surface, the pilA- cells moved away from colony edges both as individuals and in “flare-like” cell groups, indicating that they were still motile with gliding motility. In contrast, the ∆aglR pilA- cells that lack an essential component in the gliding motor, were unable to move outward from the colony edge and thus formed sharp colony edges. Similarly, the ∆agmT pilA- cells also formed sharp colony edges, indicating that they could not move efficiently with gliding (Fig. 2B)”. 

      We also added a schematic block into panel B and two sentences into the legend, “To eliminate S-motility, we further knocked out the pilA gene that encodes pilin for type IV pilus. Cells that move by gliding are able to move away from colony edges.” 

      - Figure 3 caption. Mecillinam concentration should presumably be µg/mL, not g/mL?

      Also, remove the ".van,." in the second to last line. 

      We apologize for these mistakes. We have corrected them in the figure legend. 

      - Line 212 - at this point in the manuscript, the fact that AgmT likely does not assemble into bFACs is quite well established, so I would start this paragraph with something like "As an additional test, we...". 

      Revised as the reviewer recommended.

      - Figure 5C - this assay needs a protein loading control. How about whole-cell AglR before pelleting PG? 

      We do have a whole-cell loading control, which we have added into the revised figure.

      - Figure 5A - how are the dimers visible? Is this a native gel? If so, please add to the Methods section (I would find information on Western Blot there, but not on gel electrophoresis). 

      Protein electrophoresis was done using SDS-PAGE. It is not unusual that some proteins, especially membrane proteins, are resistant to dissociation by SDS and appear as multimers in SDS-PAGE. The authors have seen this phenomenon repeatedly in both our experiments and the literature. Nevertheless, we clarified our experimental condition in the text, “Similar to many membrane proteins that resistant to dissociation by SDS34, immunoblot using an anti-mCherry antibody showed that AgmTPAmCherry accumulated in two bands in SDS-PAGE that corresponded to monomers and dimers of the full-length fusion protein, respectively (Fig. 5A)”.

      A few examples for membrane proteins remaining as oligomers are listed in below:

      Rath et al., 2009, PNAS 106: 1760-1765

      Sulistijo et al., 2003, J Biol Chem 278: 51950-51956

      Sukharev 2002, Biophy J 83: 290-298

      Neumann et al., 1998, J Bacteriol 180: 3312-3316

      Blakey et al., 2002, Biochem J 364: 527-535

      Wegner and Jones, 1984, J Biol Chem 259: 1834-1841

      Jiang et al., 2002, Nature 417: 515-522

      Heginbotham and Miller, 1997, Biochem 36: 10335-10342

      Gentile et al., 2002, J Biol Chem 277: 44050-44060

    1. Reviewer #1 (Public review):

      Summary:

      Here, the authors propose that changes in m6A levels may be predictable via a simple model that is based exclusively on mRNA metabolic events. Under this model, m6A mRNAs are "passive" victims of RNA metabolic events with no "active" regulatory events needed to modulate their levels by m6A writers, readers, or erasers; looking at changes in RNA transcription, RNA export, and RNA degradation dynamics is enough to explain how m6A levels change over time.

      The relevance of this study is extremely high at this stage of the epi transcriptome field. This compelling paper is in line with more and more recent studies showing how m6A is a constitutive mark reflecting overall RNA redistribution events. At the same time, it reminds every reader to carefully evaluate changes in m6A levels if observed in their experimental setup. It highlights the importance of performing extensive evaluations on how much RNA metabolic events could explain an observed m6A change.

      Weaknesses:

      It is essential to notice that m6ADyn does not exactly recapitulate the observed m6A changes. First, this can be due to m6ADyn's limitations. The authors do a great job in the Discussion highlighting these limitations. Indeed, they mention how m6ADyn cannot interpret m6A's implications on nuclear degradation or splicing and cannot model more complex scenario predictions (i.e., a scenario in which m6A both impacts export and degradation) or the contribution of single sites within a gene.

      Secondly, since predictions do not exactly recapitulate the observed m6A changes, "active" regulatory events may still play a partial role in regulating m6A changes. The authors themselves highlight situations in which data do not support m6ADyn predictions. Active mechanisms to control m6A degradation levels or mRNA export levels could exist and may still play an essential role.

      (1) "We next sought to assess whether alternative models could readily predict the positive correlation between m6A and nuclear localization and the negative correlations between<br /> m6A and mRNA stability. We assessed how nuclear decay might impact these associations by introducing nuclear decay as an additional rate, δ. We found that both associations were robust to this additional rate (Supplementary Figure 2a-c)."<br /> Based on the data, I would say that model 2 (m6A-dep + nuclear degradation) is better than model 1. The discussion of these findings in the Discussion could help clarify how to interpret this prediction. Is nuclear degradation playing a significant role, more than expected by previous studies?

      (2) The authors classify m6A levels as "low" or "high," and it is unclear how "low" differs from unmethylated mRNAs.

      (3) The authors explore whether m6A changes could be linked with differences in mRNA subcellular localization. They tested this hypothesis by looking at mRNA changes during heat stress, a complex scenario to predict with m6ADyn. According to the collected data, heat shock is not associated with dramatic changes in m6A levels. However, the authors observe a redistribution of m6A mRNAs during the treatment and recovery time, with highly methylated mRNAs getting retained in the nucleus being associated with a shorter half-life, and being transcriptional induced by HSF1. Based on this observation, the authors use m6Adyn to predict the contribution of RNA export, RNA degradation, and RNA transcription to the observed m6A changes. However:

      (a) Do the authors have a comparison of m6ADyn predictions based on the assumption that RNA export and RNA transcription may change at the same time?

      (b) They arbitrarily set the global reduction of export to 10%, but I'm not sure we can completely rule out whether m6A mRNAs have an export rate during heat shock similar to the non-methylated mRNAs. What happens if the authors simulate that the block in export could be preferential for m6A mRNAs only?

      (c) The dramatic increase in the nucleus: cytoplasmic ratio of mRNA upon heat stress may not reflect the overall m6A mRNA distribution upon heat stress. It would be interesting to repeat the same experiment in METTL3 KO cells. Of note, m6A mRNA granules have been observed within 30 minutes of heat shock. Thus, some m6A mRNAs may still be preferentially enriched in these granules for storage rather than being directly degraded. Overall, it would be interesting to understand the authors' position relative to previous studies of m6A during heat stress.

      (d) Gene Ontology analysis based on the top 1000 PC1 genes shows an enrichment of GOs involved in post-translational protein modification more than GOs involved in cellular response to stress, which is highlighted by the authors and used as justification to study RNA transcriptional events upon heat shock. How do the authors think that GOs involved in post-translational protein modification may contribute to the observed data?

      (e) Additionally, the authors first mention that there is no dramatic change in m6A levels upon heat shock, "subtle quantitative differences were apparent," but then mention a "systematic increase in m6A levels observed in heat stress". It is unclear to which systematic increase they are referring to. Are the authors referring to previous studies? It is confusing in the field what exactly is going on after heat stress. For instance, in some papers, a preferential increase of 5'UTR m6A has been proposed rather than a systematic and general increase.

    1. Finally, just as a note of caution, college codes of conduct regarding communication often apply to any interaction between members of the community, whether or not they occur on campus or in a campus online environment. Any inappropriate, offensive, or threatening comments or messages may have severe consequences. Our communication in college conveys how we feel about others and how we’d like to interact with them. Unless you know for certain they don’t like it, you should use professional or semi-formal communication when interacting with college faculty and staff. For example, if you need to send a message explaining something or making a request, the recipient will likely respond more favorably to it if you address them properly and use thoughtful, complete sentences.

      I think addressing someone properly and with respect is very important and necessary.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this study, Gonzalez Alam et al. report a series of functional MRI results about the neural processing from the visual cortex to high-order regions in the default-mode network (DMN), compiling evidence from task-based functional MRI, resting-state connectivity, and diffusionweighted imaging. Their participants were first trained to learn the association between objects and rooms/buildings in a virtual reality experiment; after the training was completed, in the task-based MRI experiment, participants viewed the objects from the earlier training session and judged if the objects were in the semantic category (semantic task) or if they were previously shown in the same spatial context (spatial context task). Based on the task data, the authors utilised resting-state data from their previous studies, visual localiser data also from previous studies, as well as structural connectivity data from the Human Connectome Project, to perform various seed-based connectivity analysis. They found that the semantic task causes more activation of various regions involved in object perception while the spatial context task causes more activation in various regions for place perception, respectively. They further showed that those object perception regions are more connected with the frontotemporal subnetwork of the DMN while those place perception regions are more connected with the medial-temporal subnetwork of the DMN. Based on these results, the authors argue that there are two main pathways connecting the visual system to highlevel regions in the DMN, one linking object perception regions (e.g., LOC) leading to semantic regions (e.g., IFG, pMTG), the other linking place perception regions (e.g., parahippocampal gyri) to the entorhinal cortex and hippocampus.

      Below I provide my takes on (1) the significance of the findings and the strength of evidence, (2) my guidance for readers regarding how to interpret the data, as well as several caveats that apply to their results, and finally (3) my suggestions for the authors.

      (1) Significance of the results and strength of the evidence

      I would like to praise the authors for, first of all, trying to associate visual processing with high-order regions in the DMN. While many vision scientists focus specifically on the macroscale organisation of the visual cortex, relatively few efforts are made to unravel how neural processing in the visual system goes on to engage representations in regions higher up in the hierarchy (a nice precedent study that looks at this issue is by Konkle and Caramazza, 2017). We all know that visual processing goes beyond the visual cortex, potentially further into the DMN, but there's no direct evidence. So, in this regard, the authors made a nice try to look at this issue.

      We thank the reviewer for their positive feedback and for their very thoughtful and thorough comments, which have helped us to improve the quality of the paper.

      Having said this, the authors' characterisation of the organisation of the visual cortex (object perception/semantics vs. place perception/spatial contexts) does not go beyond what has been known for many decades by vision neuroscience. Specifically, over the past two decades, numerous proposals have been put forward to explain the macroscale organisation of the visual system, particularly the ventrolateral occipitotemporal cortex. A lateral-medial division has been reliably found in numerous studies. For example, some researchers found that the visual cortex is organised along the separation of foveal vision (lateral) vs. peripheral vision (medial), while others found that it is structured according to faces (lateral) vs. places (medial). Such a bipartite division is also found in animate (lateral) vs. inanimate (medial), small objects (lateral) vs. big objects (medial), as well as various cytoarchitectonic and connectomic differences between the medial side and the lateral side of the visual cortex. Some more recent studies even demonstrate a tripartite division (small objects, animals, big objects; see Konkle and Caramazza, 2013). So, in terms of their characterisation of the visual cortex, I think Gonzalez Alam et al. do not add any novel evidence to what the community of neuroscience has already known.

      The aim of our study was not to provide novel evidence about visual organisation, but rather to understand how these well-known visual subdivisions are related to functional divisions in memory-related systems, like the DMN. We agree that our study confirms the pattern observed by numerous other studies in visual neuroscience.  

      However, the authors' effort to link visual processing with various regions of the DMN is certainly novel, and their attempt to gather converging evidence with different methodologies is commendable. The authors are able to show that, in an independent sample of restingstate data, object-related regions are more connected with semantic regions in the DMN while place-related regions are more connected with navigation-related regions in the DMN, respectively. Such patterns reveal a consistent spatial overlap with their Kanwisher-type face/house localiser data and also concur with the HCP white-matter tractography data. Overall, I think the two pathways explanation that the authors seek to argue is backed by converging evidence. The lack of travelling wave type of analysis to show the spatiotemporal dynamics across the cortex from the visual cortex to high-level regions is disappointing though because I was expecting this type of analysis would provide the most convincing evidence of a 'pathway' going from one point to another. Dynamic caudal modelling or Granger causality may also buttress the authors' claim of pathway because many readers, like me, would feel that there is not enough evidence to convincingly prove the existence of a 'pathway'.

      By ‘pathway’ we are referring to a pattern of differential connectivity between subregions of visual cortex and subregions of DMN, suggesting there are at least two distinct routes between visual and heteromodal regions. However, these routes don’t have to reflect a continuous sequence of cortical areas that extend from visual cortex to DMN – and given our findings of structural connectivity differences that relate to the functional subdivisions we observe, this is unlikely to be the sole mechanism underpinning our findings. We have now clarified this in the discussion section of the manuscript. We agree it would be interesting to characterise the spatiotemporal dynamics of neural propagation along our pathways, and we have incorporated this proposal into the future directions section.

      “One important caveat is that we have not investigated the spatiotemporal dynamics of neural propagation along the pathways we identified between visual cortex and DMN. The dissociations we found in task responses, intrinsic functional connectivity and white matter connections all support the view that there are at least two distinct routes between visual and heteromodal DMN regions, yet this does not necessarily imply that there is a continuous sequence of cortical areas that extend from visual cortex to DMN – and given our findings of structural connectivity differences that relate to the functional subdivisions we observe, this is unlikely to be the sole mechanism underpinning our findings. It would be interesting in future work to characterise the spatiotemporal dynamics of neural propagation along visualDMN pathways using methods optimised for studying the dynamics of information transmission, like Granger causality or travelling wave analysis.”

      We have also edited the wording of sentences in the introduction and discussion that we thought might imply directionality or transmission of information along these pathways, or to clarify the nature of the pathways (please see a couple of examples below):

      In the Introduction:

      “We identified dissociable pathways of connectivity between from different parts of visual cortex to and DMN subsystems “

      In the Discussion:

      “…pathways from visual cortex to DMN -> …pathways between visual cortex and DMN“.

      (2) Guidance to the readers about interpretation of the data

      The organisation of the visual cortex and the organisation of the DMN historically have been studied in parallel with little crosstalk between different communities of researchers. Thus, the work by Gonzalez Alam et al. has made a nice attempt to look at how visual processing goes beyond the realm of the visual cortex and continues into different subregions of the DMN.

      While the authors of this study have utilised multiple methods to obtain converging evidence, there are several important caveats in the interpretation of their results:

      (1) While the authors choose to use the term 'pathway' to call the inter-dependence between a set of visual regions and default-mode regions, their results have not convincingly demonstrated a definitive route of neural processing or travelling. Instead, the findings reveal a set of DMN regions are functionally more connected with object-related regions compared to place-related regions. The results are very much dependent on masking and thresholding, and the patterns can change drastically if different masks or thresholds are used.

      We would like to qualify that our findings do not only reveal a set of any “DMN regions that are functionally more connected with object-related regions compared to place-related regions”. Instead, we show a double dissociation based on our functional task responses: DMN regions that were more responsive to semantic decisions about objects are more functionally and structurally connected to visual regions more activated by perceiving objects, while DMN regions that were more responsive to spatial decisions are more connected to visual regions activated by the contrast of scene over object perception.

      We do not believe that the thresholding or masking involved in generating seeds strongly affected our results. We are reassured of this by two facts:

      (1) We re-analysed the resting-state data using a stricter clustering threshold and this did not change the pattern of results (see response below).

      (2) In response to a point by reviewer #2, we re-analysed the data eroding the masks of the MT-DMN, and this also didn’t change the pattern of results (please see response to reviewer 2).

      In this way, our results are robust to variations in mask shape/size and thresholding.

      (2) Ideally, if the authors could demonstrate the dynamics between the visual cortex and DMN in the primary task data, it would be very convincing evidence for characterising the journey from the visual cortex to DMN. Instead, the current connectivity results are derived from a separate set of resting state data. While the advantage of the authors' approach is that they are able to verify certain visual regions are more connected with certain DMN regions even under a task-free situation, it falls short of explaining how these regions dynamically interact to convert vision into semantic/spatial decision.

      We agree that a valuable future direction would be to collect evidence of spatiotemporal dynamics of propagation of information along these pathways. This could be the focus of future studies designed to this aim, and we have suggested this in the manuscript based on the reviewer’s suggestion. Furthermore, as stated above, we have now qualified our use of the term ‘pathway’ in the manuscript to avoid confusion.

      “These pathways refer to regions that are coupled, functionally or structurally, together, providing the potential for communication between them.”

      (3) There are several results that are difficult to interpret, such as their psychophysiological interactions (PPI), representational similarity analysis, and gradient analysis. For example, typically for PPI analysis, researchers interrogate the whole brain to look for PPI connectivity. Their use of targeted ROI is unusual, and their use of spatially extensive clusters that encompass fairly large cortical zones in both occipital and temporal lobes as the PPI seeds is also an unusual approach. As for the gradient analysis, the argument that the semantic task is higher on Gradient 1 than the spatial task based on the statistics of p-value = 0.027 is not a very convincing claim (unhelpfully, the figure on the top just shows quite a few blue 'spatial dots' on the hetero-modal end which can make readers wonder if the spatial context task is really closer to the unimodal end or it is simply the authors' statistical luck that they get a p-value under 0.05). While it is statistically significant, it is weak evidence (and it is not pertinent to the main points the authors try to make).

      To streamline the manuscript, we have moved the PPI and RSA results to the

      Supplementary Materials. However, we believe the gradient analysis is highly pertinent to understanding the functional separation of these pathways. In the revision, we show that not only was the contrast between the Semantic and Spatial tasks significant, in addition, the majority of participants exhibited a pattern consistent with the result we report. To show the results more clearly, we have added a supplementary figure (Figure S8) focussed on comparisons at the participant level.

      This figure shows the position in the gradient for each peak per participant per task. The peaks for each participant across tasks are linked with a line. Cases where the pattern was reversed are highlighted with dashed lines (7/27 participants in each gradient). This allows the reader and reviewer to verify in how many cases, at the individual level, the pattern of results reported in the text held (see “Supplementary Analysis: Individual Location of pathways in whole-brain gradients”).  

      (3) My suggestion for the authors

      There are several conceptual-level suggestions that I would like to offer to the authors:

      (1)  If the pathway explanation is the key argument that you wish to convey to the readers, an effective connectivity type of analysis, such as Granger causality or dynamic caudal modelling, would be helpful in revealing there is a starting point and end point in the pathway as well as revealing the directionality of neural processing. While both of these methods have their issues (e.g., Granger causality is not suitable for haemodynamic data, DCM's selection of seeds is susceptible to bias, etc), they can help you get started to test if the path during task performance does exist. Alternatively, travelling wave type of analysis (such as the results by Raut et al. 2021 published in Science Advances) can also be useful to support your claims of the pathway.

      As we have stated above, we agree with the reviewer that, given the pattern of results obtained in our work, analyses that characterise the spatiotemporal dynamics of transmission of information along the pathways would be of interest. However, we are concerned that our data is not well-optimised for these analyses.

      (2)  I think the thresholding for resting state data needs to be explained - by the look of Figure 2E and 3E, it looks like whole-brain un-thresholded results, and then you went on to compute the conjunction between these un-thresholded maps with network templates of the visual system and DMN. This does not seem statistically acceptable, and I wonder if the conjunction that you found would disappear and reappear if you used different thresholds. Thus, for example, if the left IFG cluster (which you have shown to be connected with the visual object regions) would disappear when you apply a conventional threshold, this means that you need to seriously consider the robustness of the pathway that you seek to claim... it may be just a wild goose that you are chasing.

      We believe the reviewer might be confused regarding the procedure we followed to generate the ROIs used in the pathways connectivity analysis. As stated in the last paragraph of the “Probe phase” and “Decision phase” results subsections, the maps the reviewer is referring to (Fig. 3E, for example) were generated by seeding the intersection of our thresholded univariate analysis (Fig. 3A) with network templates. In the case of Fig 3E, these are the Semantic>Spatial decision results after thresholding, intersected with Yeo DMN (MT, FT and Core, combined). These seeds were then entered into a whole-brain seed-based spatial correlation analysis, which was thresholded and cluster-corrected using the defaults of CONN. The same is true for Fig. 2E, but using the thresholded Probe phase

      Semantic>Context regions. Thus, we do not believe the objections to statistical rigour the reviewer is raising apply to our results.

      The thresholding of the resting-state data itself was explained in the Methods (Spatial Maps and Seed-to-ROI Analysis). As stated above, we thresholded using the default of the CONN software package we used (cluster-forming threshold of p=.05, equivalent to T=1.65). For increased rigour, we reproduced the thresholded maps from Figs 2E and 3E further increasing the threshold from p=.05, equivalent to T=1.65, to p=.001, equivalent to T=3.1. The resulting maps were very similar, showing minimal change with a spatial correlation of r > .99 between the strict and lax threshold versions of the maps for both the probe and decision seeds. This can be seen in Figure 2E and Figure 33E, which depict the maps produced with stricter thresholding. These maps can also be downloaded from the Neurovault collection, and the re-analysis is now reported in the Supplementary Materials (see section “Supplementary Analysis: Resting-state maps with stricter thresholding”) Probe phase (compare with Fig. 2E):

      (3) There are several analyses that are hard to interpret and you can consider only reporting them in the supplementary materials, such as the PPI results and representational similarity analysis, as none of these are convincing. These analyses do not seem to add much value to make your argument more convincing and may elicit more methodological critiques, such as statistical issues, the set-up of your representational theory matrix, and so on.

      We have moved the PPI and RSA results to the supplementary materials. We agree this will help us streamline the manuscript.  

      Reviewer #2 (Public Review):

      Summary:

      In this manuscript, Alam et al. sought to understand how memory interacts with incoming visual information to effectively guide human behavior by using a task that combines spatial contexts (houses) with objects of one or multiple semantic categories. Three additional datasets (all from separate participants) were also employed: one that functionally localized regions of interest (ROIs) based on subtractions of different visually presented category types (in this case, scenes, objects, and scrambled objects); another consisting of restingstate functional connectivity scans, and a section of the Human Connectome Project that employed DTI data for structural connectivity analysis. Across multiple analyses, the authors identify dissociations between regions preferentially activated during scene or object judgments, between the functional connectivity of regions demonstrating such preferences, and in the anatomical connectivity of these same regions. The authors conclude that the processing streams that take in visual information and support semantic or spatial processing are largely parallel and distinct.

      Strengths:

      (1) Recent work has reconceptualized the classic default mode network as two parallel and interdigitated systems (e.g., Braga & Buckner, 2017; DiNicola et al., 2021). The current manuscript is timely in that it attempts to describe how information is differentially processed by two streams that appear to begin in visual cortex and connect to different default subnetworks. Even at a group level where neuroanatomy is necessarily blurred across individuals, these results provide clear evidence of stimulus-based dissociation.

      (2) The manuscript contains a large number of analyses across multiple independent datasets. It is therefore unlikely that a single experimenter choice in any given analysis would spuriously produce the overall pattern of results reported in this work.

      We thank the reviewer for their remarks on the strengths of our manuscript.

      Weaknesses:

      (1) Throughout the manuscript, a strong distinction is drawn between semantic and spatial processing. However, given that only objects and spatial contexts were employed in the primary experiment, it is not clear that a broader conceptual distinction is warranted between "semantic" and "spatial" cognition. There are multiple grounds for concern regarding this basic premise of the manuscript.

      a. One can have conceptual knowledge of different types of scenes or spatial contexts. A city street will consistently differ from a beach in predictable ways, and a kitchen context provides different expectations than a living room. Such distinctions reflect semantic knowledge of scene-related concepts, but in the present work spatial and "all other" semantic information are considered and discussed as distinct and separate.

      The “building” contexts we created were arbitrary, containing beds, desks and an assortment of furniture that did not reflect usual room distributions, i.e., a kitchen next to a dining room. We have made this aspect of our stimuli clearer in the Materials section of the task. 

      “The learning phase employed videos showing a walk-through for twelve different buildings (one per video), shot from a first-person perspective. The videos and buildings were created using an interior design program (Sweet Home 3D). Each building consisted of two rooms: a bedroom and a living room/office, with an ajar door connecting the two rooms. The order of the rooms (1st and 2nd) was counterbalanced across participants. Each room was distinctive, with different wallpaper/wall colour and furniture arrangements. The building contexts created by these rooms were arbitrary, containing furniture that did not reflect usual room distributions (i.e., a kitchen next to a dining room), to avoid engaging further conceptual knowledge about frequently-encountered spatial contexts in the real world.”

      To help the reviewer and readers to verify this and come to their own conclusions, we have also added the videos watched by the participants to the OSF collection.

      “A full list of pictures of the object and location stimuli employed in this task, as well as the videos watched by the participants can be consulted in the OSF collection associated with this project under the components OSF>Tasks>Training. “

      We agree that scenes or spatial contexts have conceptual characteristics, and we actually manipulated conceptual information about the buildings in our task, in order to assess the neural underpinnings of this effect. In half of the buildings, the rooms/contexts were linked through the presence of items that shared a common semantic category (our “same category building” condition): this presented some conceptual scaffolding that enabled participants to link two rooms together. These buildings could then be contrasted with “mixed category buildings” where this conceptual link between rooms was not available. We found that right angular gyrus was important in the linking together of conceptual and spatial information, in the contrast of same versus mixed category buildings.

      b. As a related question, are scenes uniquely different from all other types of semantic/category information? If faces were used instead of scenes, could one expect to see different regions of the visual cortex coupling with task-defined face > object ROIs? The current data do not speak to this possibility, but as written the manuscript suggests that all (non-spatial) semantic knowledge should be processed by the FT-DMN.

      Thanks for raising this important point. Previous work suggests that the human visual system (and possibly the memory system, as suggested by Deen and Freiwald, 2021) is sensitive to perceptual categories important to human behaviour, including spatial, object and social information. Previous work (Silson et al., 2019; Steel et al., 2021) has shown domain-specific regions in visual regions (ventral temporal cortex; VTC) whose topological organisation is replicated in memory regions in medial parietal cortex (MPC) for faces and places. In these studies, adding objects to the analyses revealed regions sensitive to this category sandwiched between those responsive to people and places in VTC, but not in MPC. However, consistent with our work, the authors find regions sensitive to memory tasks for places and objects (as well as people) in the lateral surface of the brain. 

      Our study was not designed to probe every category in the human visual system, and therefore we cannot say what would happen if we contrasted social judgments about faces with semantic judgments about objects. We have added this point as a limitation and future direction for research:

      “Likewise, further research should be carried out on memory-visual interactions for alternative domains. Our study focused on spatial location and semantic object processing and therefore cannot address how other categories of stimuli, such as faces, are processed by the visual-tomemory pathways that we have identified. Previous work has suggested some overlap in the neurobiological mechanisms for semantic and social processing (Andrews-Hanna et al., 2014; Andrews-Hanna & Grilli, 2021; Chiou et al., 2020), suggesting that the FT-DMN pathway may be highlighted when contrasting both social faces and semantic objects with spatial scenes. On the other hand, some researchers have argued for a ‘third pathway’ for aspects of social visual cognition (Pitcher & Ungerleider, 2021; Pitcher, 2023). Future studies that probe other categories will be able to confirm the generality (or specificity) of the pathways we described.”

      c. Recent precision fMRI studies characterizing networks corresponding to the FT-DMN and MTL-DMN have associated the former with social cognition and the latter with scene construction/spatial processing (DiNicola et al., 2020; 2021; 2023). This is only briefly mentioned by the authors in the current manuscript (p. 28), and when discussed, the authors draw a distinction between semantic and social or emotional "codes" when noting that future work is necessary to support the generality of the current claims. However, if generality is a concern, then emphasizing the distinction between object-centric and spatial cognition, rather than semantic and spatial cognition, would represent a more conservative and bettersupported theoretical point in the current manuscript.

      We appreciate this comment and we have spent quite a bit of time considering what the most appropriate terminology would be. The distinction between object and spatial cognition is largely appropriate to our probe phase, although we feel this label is still misleading for two reasons:

      First, we used a range of items from different semantic categories, not just “objects”, although we have used that term as a shorthand to refer to the picture stimuli we presented. The stimuli include both animals (land animals, marine animals and birds) and man-made objects (tools, musical instruments and sports equipment). This category information is now more prominent in the rationale (Introduction) and the Methods to avoid confusion.

      Interested readers can also review our “object” stimuli in the OSF collection associated with this manuscript:

      Introduction: “…participants learned about virtual environments (buildings) populated with objects belonging to different, heterogeneous, semantic categories, both man-made (tools, musical instruments, sports equipment) and natural (land animals, marine animals, birds).”

      Methods:

      “A full list of pictures of the object and location stimuli employed in this task can be consulted in the OSF collection associated with this project under the components OSF>Tasks>Training.”

      Secondly, we manipulated the task demands so that participants were making semantic judgments about whether two items were in the same category, or spatial judgments about whether two rooms had been presented in the same building. Our use of the terms “semantic” and “spatial” was largely guided by the tasks that participants were asked to perform.

      We have revised the terminology used in the discussion to reflect this more conservative term. However, since the task performed was semantic in nature (participants had to judge whether items belonged to semantic categories), we have modified the term proposed by the reviewer to “object-centric semantics”, which we hope will avoid confusion.  

      (2) Both the retrosplenial/parieto-occipital sulcus and parahippocampal regions are adjacent to the visual network as defined using the Yeo et al. atlas, and spatial smoothness of the data could be impacting connectivity metrics here in a way that qualitatively differs from the (non-adjacent) FT-DMN ROIs. Although this proximity is a basic property of network locations on the cortical surface, the authors have several tools at their disposal that could be employed to help rule out this possibility. They might, for instance, reduce the smoothing in their multi-echo data, as the current 5 mm kernel is larger than the kernel used in Experiment 2's single-echo resting-state data. Spatial smoothing is less necessary in multiecho data, as thermal noise can be attenuated by averaging over time (echoes) instead of space (see Gonzalez-Castillo et al., 2016 for discussion). Some multi-echo users have eschewed explicit spatial smoothing entirely (e.g., Ramot et al., 2021), just as the authors of the current paper did for their RSA analysis. Less smoothing of E1 data, combined with a local erosion of either the MTL-DMN and VIS masks (or both) near their points of overlap in the RSFC data, would improve confidence that the current results are not driven, at least in part, by spatial mixing of otherwise distinct network signals.

      A: The proximity of visual peripheral and DMN-C networks is a property of these networks’ organisation (Silson et al., 2019; Steel et al., 2021), and we agree the potential for spatial mixing of the signal due to this adjacency is a valid concern. Altering the smoothing kernel of the multi-echo data would not address this issue though, since no connectivity analyses were performed in task data. The reviewer is right about the kernel size for task data (5mm), but not about the single echo RS data, which actually has lower spatial resolution (6mm). 

      Since this objection is largely about the connectivity analysis, we re-analysed the RS data by shrinking the size of the visual probe and DMN decision ROIs for the context task using fslmaths. We eroded the masks until the smallest gap between them exceeded the size of our 6mm FWHM smoothing kernel, which eliminates the potential for spatial mixing of signals due to ROI adjacency. The eroded ROIs can be consulted in the OSF collection associated with this project (see component “ROI Analysis/Revision_ErodedMasks”. The results, presented in the supplementary materials as “Eroded masks replication analysis”, confirmed the pattern of findings reported in the manuscript (see SM analysis below). We did not erode the respective ROIs for the semantic task, given that adjacency is not an issue there. 

      “Eroded masks replication analysis:

      The Visual-to-DMN ANOVA showed main effects of seed (F(1,190)=22.82, p<.001), ROI (F(1,190)=9.48, p=.002) and a seed by ROI interaction (F(1,190)=67.02, p<.001). Post-hoc contrasts confirmed there was stronger connectivity between object probe regions and semantic versus spatial context decision regions (t(190)=3.38, p<.001), and between scene probe regions and spatial context versus semantic decision regions (t(190)=-7.66, p<.001).

      The DMN-to-Visual ANOVA confirmed this pattern: again, there was a main effect of ROI (F(1,190)=4.3, p=.039) and a seed by ROI interaction (F(1,190)=57.59, p<.001), with posthoc contrasts confirming stronger intrinsic connectivity between DMN regions implicated in semantic decisions and object probe regions (t(190)=5.06, p<.001), and between DMN regions engaged by spatial context decisions and scene probe regions (t(190)=3.25, p=.001).”

      (3) The authors identify a region of the right angular gyrus as demonstrating a "potential role in integrating the visual-to-DMN pathways." This would seem to imply that lesion damage to right AG should produce difficulties in integrating "semantic" and "spatial" knowledge. Are the authors aware of such a literature? If so, this would be an important point to make in the manuscript as it would tie in yet another independent source of information relevant to the framework being presented. The closest of which I am aware involves deficits in cued recall performance when associates consisted of auditory-visual pairings (Ben-Zvi et al., 2015), but that form of multi-modal pairing is distinct from the "spatial-semantic" integration forwarded in the current manuscript.

      This is a very interesting observation. There is a body of literature pointing to AG (more often left than right) as an integrator of multimodal information: It has been shown to integrate semantic and episodic memory, contextual information and cross-modality content.

      The Contextual Integration Model (Ramanan et al., 2017) proposes that AG plays a crucial role in multimodal integration to build context. Within this model, information that is essential for the representation of rich, detailed recollection and construction (like who, when, and, crucially for our findings, what and where) is processed elsewhere, but integrated and represented in the AG. In line with this view, Bonnici et al (2016) found AG engagement during retrieval of multimodal episodic memories, and that multivariate classifiers could differentiate multimodal memories in AG, while unimodal memories were represented in their respective sensory areas only. Recent work examining semantic processing in temporallyextended narratives using multivariate approaches concurs with a key role of left AG in context integration (Branzi et al., 2020).

      In addition to context integration, other lines of work suggest a role of AG as an integrator across modalities, more specifically. Recent perspectives suggest a role of AG as a dynamic buffer that allows combining distinct forms of information into multimodal representations (Humphreys et al., 2021), which is consistent with the result in our study of a region that brings together semantic and spatial representations in line with task demands. Others have proposed a role of the AG as a central connector hub that links three semantic subsystems, including multimodal experiential representation (Xu et al., 2017). Causal evidence of the role of AG in integrating multimodal features has been provided by Yazar et al (2017), who studied participants performing memory judgements of visual objects embedded in scenes, where the name of the object was presented auditorily. TMS to AG impaired participants’ ability to retrieve context features across multiple modalities. However, these studies do not single out specifically right AG.

      Some recent proposals suggest a causal role of right AG as a key region in the early definition of a context for the purpose of sensemaking, for which integrating semantic information with many other modalities, including vision, may be a crucial part (Seghier, 2023). TMS studies suggest a causal role for the right AG in visual attention across space

      (Olk et al. 2015, Petitet et al. 2015), including visual search and the binding of stimulus- and response-characteristics that can optimise it (Bocca et al. 2015). TMS over the right AG disrupts the ability to search for a target defined by a conjunction of features (Muggleton et al. 2008) and affects decision-making when visuospatial attention is required (Studer et al. 2014). This suggests that the AG might contribute to perceptual decision-making by guiding attention to relevant information in the visual environment (Studer et al. 2014). These, taken together, suggest a causal role of right AG in controlling attention across space and integrating content across modalities in order to search for relevant information. 

      Most of this body of research points to left, rather than right, AG as a key region for integration, but we found regions of right AG to be important when semantic and spatial information could be integrated. We might have observed involvement of the right AG in our study, as opposed to the more-often reported left, given that people have to integrate semantic information with spatial context, which relies heavily on visuospatial processes predominantly located in right hemisphere regions (cf. Sormaz et al., 2017), which might be more strongly connected to right than left AG. 

      Lastly, we are not aware of a literature on right AG lesions impairing the integration of semantic and spatial information but, in the face of our findings, this might be a promising new direction. We have added as a recommendation that patients with damage to right AG should be examined with specific tasks aimed at probing this type of integration. We have added the following to the discussion:

      “We found a region of the right AG that was potentially important for integrating semantic and spatial context information. Previous research has established a key role of the AG in context integration (Ramanan et al., 2017; Bonnici et al., 2016; Branzi et al., 2020) and specifically, in guiding multimodal decisions and behaviour (Humphreys et al., 2021; Xu et al., 2017; Yazar et al., 2017). Although some recent proposals suggest a causal role of right AG in the early establishment of meaningful contexts, allowing semantic integration across modalities (Seghier, 2023; Olk et al., 2015, Petitet et al., 2015; Bocca et al., 2015; Muggleton et al. 2008), the majority of this research points to left, rather than right, AG as a key region for integration. However, we might have observed involvement of the right AG in our study given that people were integrating semantic information with spatial context, and right-lateralised visuospatial processes (cf. Sormaz et al., 2017) might be more strongly connected to right than left AG. We are not aware of a literature on right AG lesions impairing the integration of semantic and spatial information but, in the face of our findings, this might be a promising new direction. Patients with damage to right AG should be examined with specific tasks aimed at probing this type of integration.”

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      (1) I mentioned the numerous converging analyses reported in this manuscript as a strength. However, in practice, it also makes results in numerous dense figures (routinely hitting 7-8 sub-panels) and results paragraphs which, as currently presented, are internally coherent but are not assembled into a "bigger picture" until the discussion. Readers may have an easier time with the paper if introductions to the different analyses ("probe phase", "decision phase", etc.) also include a bigger-picture summary of how the specific analysis is contributing to the larger argument that is being constructed throughout the manuscript. This may also help readers to understand why so many different analysis approaches and decisions were employed throughout the manuscript, why so many different masks were used, etc.

      Thank you for this suggestion. We agree that the range of analyses and their presentation can make digesting them difficult. To address this, we have outlined our analyses rationale at the beginning of the results as a sort of “big picture” summary that links all analyses together, and added introductory paragraphs to each analysis that needed them (namely, the probe, decision, and pathway connectivity analyses, as the gradient and integration analyses already had introductory paragraphs describing their rationale, and the PPI/RSA analyses were moved to supplementary materials), linking them to the summary, which we reproduce below:

      “To probe the organisation of streams of information between visual cortex and DMN, our neuroimaging analysis strategy consisted of a combination of task-based and connectivity approaches. We first delineated the regions in visual cortex that are engaged by the viewing of probes during our task (Figure 2), as well as the DMN regions that respond when making decisions about those probes (Figure 3): we characterised both by comparing the activation maps with well-established DMN and object/scene perception regions, analysed the pattern of activation within them, their functional connectivity and task associations. Having characterised the two ends of the stream, we proceeded to ask whether they are differentially linked: are the regions activated by object probe perception more strongly linked to DMN regions that are activated when making semantic decisions about object probes, relative to other DMN regions? Is the same true for the spatial context probe and decision regions? We answered this question through a series of connectivity analyses (Figure 4) that examined: 1) if the functional connectivity of visual to DMN regions (and DMN to visual regions) showed a dissociation, suggesting there are object semantic and spatial cognition processing ‘pathways’; 2) if this pattern was replicated in structural connectivity; 3) if it was present at the level of individual participants, and, 4) we characterised the spatial layout, network composition (using influential RS networks) and cognitive decoding of these pathways. Having found dissociable pathways for semantic (object) and spatial context (scene) processing, we then examined their position in a high-dimensional connectivity space (Figure 5) that allowed us to document that the semantic pathway is less reliant on unimodal regions (i.e., more abstract) while the spatial context pathway is more allied to the visual system. Finally, we used uni- and multivariate approaches to examine how integration between these pathways takes place when semantic and spatial information is aligned (Figure 6).”

      (2) At various points, figures are arranged out of sequence (e.g., panel d is referenced after panel g in Figure 2) or are missing descriptions of what certain colors mean (e.g., what yellow represents in Figure 6d). This is a minor issue, but one that's important and easy to address in future revisions.

      We thank the reviewer for bringing this issue to our attention. We have added descriptions for the yellow colour to the figure legends of Figures 6 and 7 (now in supplementary materials, Figure S9).

      We have also edited the text to follow a logical sequence with respect to referencing the panels in Figures 2 and 3, where panel d is now referenced after panel c. Lastly, we reorganised the layout of Figure 4 to follow the description of the results in the text.

    1. I along with others think the Anthropocene is morea boundary event than an epoch, like the K-Pg boundary between the Cretaceous and thePaleogene. 4 The Anthropocene marks severe discontinuities; what comes after will not be likewhat came before.

      It is discussed and argued what the Anthropocene represents and how long it will last. Nixon talks about how the Anthropocene had started when humans affected the biophysical along with the climate and atmosphere. But it is hard to distinguish the exact time and space the Anthropocene represents and measure how long it may last. Thinking about our effect in the Earth as humans, some may consider us invaders or weapons of destruction. Haraway calls us “ refugees”. I would consider the refugees the people of exploited communities and countries where environmental destruction takes place at the fault of our imperialist core. Because looking past the nihilistic perspective that we as humans are a poison to a once abundant and plentiful Earth, there are people who have always valued the planet and their relationship to it over everything else. And these are often the communities that are most exploited and unsupported.

    2. But, is there an inflection point of consequence that changes the name of the “game” oflife on earth for everybody and everything?

      To me this question was something that could not be easily answered. I say that because I would say yes and no to this question. I feel us as humans create the not so good changes to the earth and we also adapt to a lot of things that may not always benefit us. Even though there may be an inflection point of consequence that slows us down persuading us to create a change, I think that more than likely it is something that will be subsided or something that will soon be deemed as "normal".

    1. NASA website, can you see how the other answers may have a vested interest in encouraging readers to believe a particular theory? The encyclopedia may not intentionally attempt to mislead readers; however, the write-up is not current. And Wikipedia, being an open-source site where anyone may upload information, is not reliable enough to lend full credence to the articles. A professional, government organization that does not sell items related to the topic and provides its ethics policy for review is worthy of more consideration and research. This level of critical thinking and examined consideration is the only way to ensure you have all the information you need to make decisions. You likely know how to find some sources when you conduct research. And remember—we think and research all the time, not just in school or on the job. If you’re out with friends and someone asks where to find the best Italian food, someone will probably consult a phone app to present choices. This quick phone search may suffice to provide an address, hours, and possibly even menu choices, but you’ll have to dig more deeply if you want to evaluate the restaurant by finding reviews, negative press, or personal testimonies. Why is it important to verify sources? The words we write (or speak) and the sources we use to back up our ideas need to be true and honest, or we would not have any basis for distinguishing facts from opinions that may be, at the least damaging level, only uninformed musings but, at the worst level, intentionally misleading and distorted versions of the truth. Maintaining a strict adherence to verifiable facts is a hallmark of a strong thinker. You probably see information presented as fact on social media daily, but as a critical thinker, you must practice validating facts, especially if something you see or read in a post conveniently fits your perception.

      Looking through things is important as it helps a better understanding and looking at every detail to understand it more in depth.

    1. First: For those of us who are historians of "beyond the Americas and the modern," how have we had to renegotiate meanings of gender and sexuality as well as their analytic utility? And what can we bring back to the Americas and the modern from this conceptual travel?3

      I think this line underscores the importance of understanding that concepts like gender and sexuality are historically and culturally specific. In many societies outside of the Western framework, these categories may not exist in the same form or may intersect with other societal structures like religion.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This preprint explores the involvement of cyclic di-GMP in genome stability and antibiotic persistence regulation in bacterial biofilms. The authors proposed a novel mechanism that, due to bacterial adhesion, increases c-di-GMP levels and influences persister formation through interaction with HipH. While the work may provide useful insights that could attract researchers in biofilm studies and persistence mechanisms, the main findings are inadequately supported and require further validation and refinement in experimental design.

      We sincerely thank eLife for the through assessment of our manuscript. We appreciate the constructive criticism and see it as an opportunity to strengthen our research. In response to the reviewers' comments and suggestions, we have made significant improvements to our study. We have refined our experimental design and conducted additional experiments to provide more robust evidence supporting our findings. We believe that with these additional experiments and refinements, our study provides robust evidence for this novel mechanism, contributing significantly to the fields of biofilm research and bacterial persistence.

      Public Reviews:

      Reviewer #1 (Public Review):

      The authors propose a UPEC TA system in which a metabolite, c-di-GMP, acts as the AT with the toxin HipH. The idea is novel, but several key ideas are missing in regard to the relevant literature, and the experimental design is flawed. Moreover, they are absolutely not studying persister cells as Figure 1b clearly shows they are merely studying dying cells since no plateau in killing (or anything close to a plateau) was reached. So in no way has persistence been linked to c-di-GMP. Moreover, I do not think the authors have shown how the c-di-GMP sensor works. Also, there is no evidence that c-di-GMP is an antitoxin as no binding to HipH has been shown. So at best, this is an indirect effect, not a new toxin/antitoxin system as for all 7 TAs, a direct link to the toxin has been demonstrated for antitoxins.

      Thank you for your constructive comments on our manuscript. Your insights have prompted us to revisit our data and experimental design, leading to significant improvements in our study.

      (1) Clarification on Persister Cell Detection: We sincerely appreciate your astute observation regarding the interpretation of our killing curve in Figure 1B. Upon careful re-examination, we concur that our initial methodology had limitations in revealing the characteristic biphasic pattern associated with persister cells. To address these limitations, we have implemented two key modifications: shortening the sampling interval and extending the antibiotic treatment duration. ​These adjustments have resulted in an updated killing curve that now exhibits a more pronounced biphasic pattern and a prominent plateau in the late stage of killing, as illustrated in Response Figure 1.​ This refined pattern aligns with established characteristics of persister cell behavior in antibiotic tolerance studies, providing a more accurate representation of the persister population dynamics in our experimental system. We believe these methodological enhancements significantly improve the reliability and interpretability of our results, offering a clearer insight into the persister cell phenomenon under investigation.

      (2) Validation of c-di-GMP Sensor: We appreciate your point about the c-di-GMP sensor. The c-di-GMP sensor, developed by Howard C. Berg's team, is specifically designed to detect relative intracellular concentrations of c-di-GMP in Escherichia coli cells. This capability is crucial for understanding the dynamic regulation of c-di-GMP during bacterial responses to environmental stimuli. We have expanded our explanation of the sensor's detection mechanism in lines 138-146 of the manuscript, detailing how it functions to reflect changes in c-di-GMP levels within the cells accurately. The mechanism operates through a series of signaling events that are initiated when c-di-GMP binds to the sensor, leading to measurable outputs that correlate with intracellular concentrations. Additionally, we have provided a schematic chart in Figure S1B to visually support our description regarding the sensor. This figure demonstrates the sensor's responsiveness and specificity in detecting fluctuations in c-di-GMP levels, effectively linking the signaling molecule to cellular behavior. We hope these additions clarify the role of the c-di-GMP sensor in our research and address your concerns regarding its functionality.​

      (3) HipH and c-di-GMP Interaction: Our pull-down experiments presented in Figure 5A-E provide robust and compelling evidence for a direct physical interaction between HipH and c-di-GMP, and the effects of their interaction reminiscent of toxin-antitoxin systems. Yet we acknowledge c-di-GMP is not a traditional antitoxin since it is not genetically linked to HipH. We have revised our terminology to "TA-like system" to reflect this difference more accurately.

      Weaknesses:

      (1) L 53: biofilm persisters are no different than any other persisters (there is no credible evidence of any different persister cells) so this reviewer suggests changing 'biofilm persisters' to 'persisters' throughout the text.

      Thank you for your thoughtful consideration. Upon careful consideration of the current scientific literature, we agree that there is no substantial evidence supporting a distinct category of persister cells specific to biofilms. We have systematically replaced 'biofilm persisters' with 'persisters' throughout the manuscript​.

      (2) L 51: persister cells do not mutate and, once resuscitated, mutate like any other growing cell so this sentence should be deleted as it promotes an unnecessary myth about persistence.

      We sincerely appreciate your astute observation regarding the inaccuracy in line 51. We have removed the sentence in question from line 51​. And we also have thoroughly reviewed the entire manuscript to ensure no similar misconceptions are present elsewhere in the text.

      (3) L 69: please include the only metabolic model for persister cell formation and resuscitation here based on single cells (e.g., doi.org/10.1016/j.bbrc.2020.01.102 , https://doi.org/10.1016/j.isci.2019.100792 ); otherwise, you write as if there are no molecular mechanisms for persistence/resuscitation.

      Thank you for your valuable suggestion. We appreciate the opportunity to enhance the scientific context of our manuscript. We have added a brief explanation of how ppGpp mediates ribosome dimerization, leading to persistence, and how its degradation triggers resuscitation [1-3] (lines 68-74). We have described the role of cAMP-CRP in regulating persistence through its effects on metabolism and stress responses [4, 5] (lines 74-78). We also explore potential interactions or synergies between our proposed mechanisms and these established metabolic models [6] (lines 383-409). We believe this revision significantly enhances our manuscript by providing a more accurate representation of the current state of knowledge in the field and demonstrating how our work builds upon and contributes to existing models of bacterial persistence.

      (4) The authors should cite in the Intro or Discussion that others have proposed similar novel TAs including a ppGpp metabolic toxin paired with an enzymatic antitoxin SpoT that hydrolyzes the toxin (http://dx.doi.org/10.1016/j.molcel.2013.04.002).

      We are grateful for your expertise in pointing out this crucial reference. We sincerely appreciate your suggestion to include the reference to previously proposed novel toxin-antitoxin (TA) systems, particularly the ppGpp-SpoT system [6]. In light of this reference, we have expanded our discussion to include: 1) A brief overview of the ppGpp-SpoT system as a novel TA-like mechanism. 2) Comparisons between the ppGpp-SpoT system and our findings on the HipH-c-di-GMP interaction. 3) Reflections on how these systems challenge and expand traditional definitions of TA systems (lines 383-409). We believe this addition significantly enhances the context and strengthens the rationale for considering the HipH-c-di-GMP interaction as a TA-like system. Thank you for your valuable input in helping us situate our research within the broader landscape of TA system biology.

      (5) Figure 1b: there are no results in this paper related to persister cells. Figure 1b simply shows dying cells were enumerated. Hence, the population of stressed cells increased, not 'persister cells' (Figure 1f), in the course of these experiments.

      We sincerely appreciate your astute observation regarding the interpretation of our killing curve in Figure 1B. Upon careful re-examination, we concur that our initial methodology had limitations in revealing the characteristic biphasic pattern associated with persister cells. To address these limitations, we have implemented 1) Shortened sampling interval: We have reduced the interval between measurements to one hour. 2) Extended sampling duration: The total duration of sampling has been increased to 6 hours (Response Figure 1). The updated killing curve now exhibits a more pronounced biphasic pattern and a prominent plateau in the late stage of killing: 1) Initial rapid decline: From 0-1hours, we observe a steep decrease in bacterial survival (slope ≈ -3~-1.8); 2) Slower decline phase: From 4.5-6 hours, the rate of decline is markedly reduced (slope ≈ -0.17~-0.06). This pattern aligns more closely with established characteristics of persister cell behavior in antibiotic tolerance studies.

      (6) Figure S1: I see no evidence that the authors have shown this c-di-GMP detects different c-di-GMP levels since there appears to be no data related to varying c-di-GMP concentrations with a consistent decrease. Instead, there is a maximum. What are the concentration of c-di-GMP on the X-axis for panels C, D, and E? How were c-di-GMP levels varied such that you know the c-di-GMP concentration?

      We appreciate your point about the c-di-GMP sensor. To address this, we have included additional data on the sensor's mechanism and validation. The sensor, developed by Howard C. Berg's team, is designed for detecting intracellular c-di-GMP concentrations in E. coli [7].

      Sensor Design and Mechanism:The sensor developed for detecting c-di-GMP levels in Escherichia coli cells is based on a single fluorescent protein biosensor. The protein includes a Fluorescent Protein Base and a c-di-GMP Binding Domain. The fluorescent protein base is mVenusNB, which is the fastest-folding yellow fluorescent protein (YFP). The c-di-GMP binding domain is the MrkH protein is inserted between Y145 and N146 of mVenusNB. MrkH is a transcription factor with a high affinity for c-di-GMP. When MrkH binds to c-di-GMP, it undergoes a significant conformational change. The amino-terminal domain of MrkH rotates 138° relative to its carboxyl-terminal domain upon c-di-GMP binding.This rotation disrupts the mVenusNB chromophore environment, resulting in reduced fluorescence. The sensor system co-expresses mScarletI, a bright, rapidly folding red fluorescent protein. mScarletI serves as a reference for ratiometric measurements. Such design allows for ratiometric measurement of real-time monitoring of c-di-GMP levels in individual cells and control of variations in protein expression levels between cells. This enables the observation of dynamic changes in c-di-GMP concentration, such as the increase seen after E. coli surface attachment.

      Functioning and Accuracy: The sensor is designed to detect c-di-GMP in the 100 to 700 nM range, which is the physiological range in E. coli. The use of a low copy plasmid for expression ensures detection at low concentrations. The ratio (R) of mVenusNB to mScarletI fluorescence emission is measured for individual cells. The sensor shows at least a twofold dynamic range between low and high c-di-GMP conditions. Cells with low c-di-GMP (expressing phosphodiesterase PdeH) show higher R values compared to cells with high c-di-GMP (expressing constitutively active diguanylate cyclase WspR:D70E). A mutant biosensor (Sensor*) with the R113A mutation in MrkH is used as a control. This mutation eliminates c-di-GMP binding ability, allowing differentiation between specific c-di-GMP effects and other cellular changes.

      This biosensor system provides a sophisticated tool for visualizing and quantifying c-di-GMP levels in individual bacterial cells with high sensitivity and temporal resolution.​ By combining a c-di-GMP-sensitive fluorescent protein with a reference fluorescent protein and utilizing ratiometric analysis, the system can accurately reflect changes in intracellular c-di-GMP levels while controlling for other cellular variables.

      We have expanded our explanation of its detection mechanism in lines 138-146 and Figure S1B.

      (7) The viable portion of the VBNC population are persister cells so there is no reason to use VBNC as a separate term. Please see the reported errors often made with nucleic acid staining dyes in regard to VBNCs.

      We appreciate the opportunity to clarify the distinction between VBNC cells and persister cells in our manuscript. It is essential to recognize that VBNC cells and persister cells represent two fundamentally different states of bacterial dormancy. While both may exhibit viability under certain conditions, persister cells are characterized by their ability to resuscitate and grow when environmental conditions become favorable [8]. In contrast, VBNC cells are in a deep dormant state where they cannot be revived through normal culture conditions [9, 10]. This distinction is critical for accurately representing bacterial survival strategies and population dynamics, which is why we maintain the use of the term VBNC separately from persister cells. We have added related references in lines 259.

      Regarding the reported errors associated with nucleic acid staining dyes for identifying VBNC cells, we acknowledge that these methods can exhibit limitations. Specifically, nucleic acid stains may fail to reliably differentiate between metabolically active and inactive cells, leading to inaccuracies in quantifying the true VBNC population [11]. In our study, we have opted to utilize propidium iodide (PI) staining to assess cell viability more accurately, as it effectively distinguishes dead cells from viable cells based on membrane integrity [12]. By employing this methodology, we ensure a more precise estimation of the VBNC proportion without conflating it with persister cell dynamics.

      Reviewer #2 (Public Review):

      Summary:

      Hebin et al reported a fascinating story about antibiotic persistence in the biofilms. First, they set up a model to identify the increased persisters in the biofilm status. They found that the adhesion of bacteria to the surface leads to increased c-di-GMP levels, which might lead to the formation of persisters. To figure out the molecular mechanism, they screened the E.coli Keio Knockout Collection and identified the HipH. Finally, the authors used a lot of data to prove that c-di-GMP not only controls HipH over-expression but also inhibits HipH activity, though the inhibition might be weak.

      Thank you for your insightful summary of our research. We greatly appreciate your thoughtful consideration of our work.

      Strengths:

      They used a lot of state-of-the-art technologies, such as single-cell technologies as well as classical genetic and biochemistry approaches to prove the concept, which makes the conclusions very solid. Overall, it is a very interesting and solid story that might attract diverse readers working with c-di-GMP, persisters, and biofilm.

      Weaknesses:

      (1) Is HipH the only target identified by screening the E. coli Keio Knockout Collection?

      We appreciate your inquiry about our screening process and the identification of HipH. We did not screen the entire E. coli Keio Knockout Collection. Our approach was more targeted, focusing on mutants relevant to enzyme activity regulation. We selected specific mutants based on their potential involvement in c-di-GMP-mediated regulatory pathways. This focused approach allowed us to efficiently identify candidates likely to be involved in persister formation. Among the screened mutants, HipH emerged as a significant hit. Its identification was particularly noteworthy due to its known role in persister formation and its potential as a regulatory target of c-di-GMP. We acknowledge that our targeted approach may have overlooked other potential candidates. We are considering a more comprehensive screening approach in future studies to identify additional targets.

      (2) Since the story is complicated, a diagrammatic picture might be needed to illustrate the whole story. And the title does not accurately summarize the novelty of this study.

      Thank you for your valuable feedback. We fully agree with your assessment that a visual representation would greatly enhance the clarity of our complex findings. In response to your suggestion, we have added Response Figure 2 (Fig. 6 in revised manuscript, lines 976-981) to our manuscript. This new figure provides a comprehensive visual summary of the key processes and mechanisms uncovered in our study. This graphic summary provides a clear overview of the interconnected nature of surface adhesion, c-di-GMP signaling, and HipH regulation. It also highlights the complex role of c-di-GMP in persister formation and offers readers a visual aid to better understand the molecular mechanisms underlying our findings.

      We sincerely appreciate your thoughtful comment regarding the title and its reflection of the study's novelty. ​After careful consideration, we believe that our original title adequately captures the essence and significance of our research.​ We have strived to ensure that it accurately represents the scope and novelty of our work while maintaining clarity and conciseness. Nevertheless, we value your input and thank you for taking the time to provide this feedback, as it encourages us to critically evaluate our presentation.

      (3) The ratio of mVenusNB to mScarlet-I (R) negatively correlates with the concentration of c-di-GMP. Therefore, R-1 demonstrates a positive correlation with the concentration of c-di-GMP. Is this method validated with other methods to quantify c-di-GMP, or used in other studies?

      We appreciate your point about the c-di-GMP sensor. To address this, we have included additional data on the sensor's mechanism and validation. The sensor, developed by Howard C. Berg's team, is designed for detecting intracellular c-di-GMP concentrations in E. coli [7].

      Sensor Design and Mechanism:The sensor developed for detecting c-di-GMP levels in Escherichia coli cells is based on a single fluorescent protein biosensor. The protein includes a Fluorescent Protein Base and a c-di-GMP Binding Domain. The fluorescent protein base is mVenusNB, which is the fastest-folding yellow fluorescent protein (YFP). The c-di-GMP binding domain is the MrkH protein is inserted between Y145 and N146 of mVenusNB. MrkH is a transcription factor with a high affinity for c-di-GMP. When MrkH binds to c-di-GMP, it undergoes a significant conformational change. The amino-terminal domain of MrkH rotates 138° relative to its carboxyl-terminal domain upon c-di-GMP binding.This rotation disrupts the mVenusNB chromophore environment, resulting in reduced fluorescence. The sensor system co-expresses mScarletI, a bright, rapidly folding red fluorescent protein. mScarletI serves as a reference for ratiometric measurements. Such design allows for ratiometric measurement of real-time monitoring of c-di-GMP levels in individual cells and control of variations in protein expression levels between cells. This enables the observation of dynamic changes in c-di-GMP concentration, such as the increase seen after E. coli surface attachment.

      Functioning and Accuracy: The sensor is designed to detect c-di-GMP in the 100 to 700 nM range, which is the physiological range in E. coli. The use of a low copy plasmid for expression ensures detection at low concentrations. The ratio (R) of mVenusNB to mScarletI fluorescence emission is measured for individual cells. The sensor shows at least a twofold dynamic range between low and high c-di-GMP conditions. Cells with low c-di-GMP (expressing phosphodiesterase PdeH) show higher R values compared to cells with high c-di-GMP (expressing constitutively active diguanylate cyclase WspR:D70). A mutant biosensor (Sensor*) with the R113A mutation in MrkH is used as a control. This mutation eliminates c-di-GMP binding ability, allowing differentiation between specific c-di-GMP effects and other cellular changes.

      This biosensor system provides a sophisticated tool for visualizing and quantifying c-di-GMP levels in individual bacterial cells with high sensitivity and temporal resolution.​ By combining a c-di-GMP-sensitive fluorescent protein with a reference fluorescent protein and utilizing ratiometric analysis, the system can accurately reflect changes in intracellular c-di-GMP levels while controlling for other cellular variables.

      We have expanded our explanation of its detection mechanism in lines 138-146 and Figure S1B.

      (4) References are missing throughout the manuscript. Please add enough references for every conclusion.

      We appreciate your feedback regarding the references in our manuscript. We acknowledge the importance of proper citation to support our conclusions and provide context for our work. ​In response to your comment, we have conducted a comprehensive review of our manuscript and have significantly enhanced our referencing throughout.​ We have added appropriate citations to support each key statement and conclusion presented in our study. These additional references provide a robust foundation for our findings and place our work within the broader context of the field. The complete list of all references, including the newly added ones, can be found at the end of this response letter as well as in the revised manuscript.

      (5) The novelty of this study should be clearly written and compared with previous references. For example, is it the first study to report the mechanism that the adhesion of bacteria to the surface leads to increased persister formation?

      We sincerely appreciate the opportunity to highlight and elaborate the novelty of our research. This study provides novel insights into the relationship between bacterial adhesion to surfaces and the subsequent increase in persister cell formation, which has not been explicitly detailed in previous literature. While existing research has established that biofilms typically harbor higher numbers of persister cells, this investigation not only corroborates that finding but also elucidates the mechanisms through which surface adhesion contributes to this phenomenon.

      Past studies have predominantly focused on the general characteristics of persister cells and their role in biofilm resilience and antibiotic tolerance without specifically addressing the mechanistic link between adhesion and persister formation [13, 14]. For instance, previous work has shown that surface attachment leads to changes in metabolic activity and signaling pathways within bacterial cells, which could promote persistence, but it has not definitively established a causal relationship between adhesion and increased persister formation. Our study highlights that the elevation of cyclic di-GMP levels after surface adhesion triggers a cascade of physiological changes that significantly enhance the formation of persister cells. In particular, we report that adhesion-induced signaling pathways promote dormancy and tolerance to antibiotics, marking an important advancement from the previous understanding that treated persister cells might arise from random phenotypic variation during biofilm development. we have expanded our discussion in lines 366-381.

      In summary, we believe this study stands as one of the first to clearly delineate the mechanism by which bacterial adhesion leads to increased persister formation, providing a valuable contribution to the current understanding of bacterial persistence and biofilm ecology. Thus, we can assert that our findings are not only novel but also essential for informing future research and therapeutic strategies aimed at managing bacterial infections.

      (6) in vitro DNA cleavage assay. Why not use bacterial genomic DNA to test the cleaving of HipH on the bacterial genome?

      Thank you for your feedback regarding our experimental approach. The decision of not directly using genomic DNA in our experiments was made after careful consideration. The high molecular weight of genomic DNA, which presents significant challenges in handling and analysis. The difficulty in extracting intact genomic DNA, which could potentially compromise the integrity of our results. The challenges associated with electrophoretic separation of such large DNA molecules, which could limit our ability to accurately interpret the data.

      Instead, following established practices in molecular biology research and drawing from similar studies in the field [15-17], we opted to use plasmids as model DNA for our experiments.​ This approach offers several advantages: Plasmids are smaller and more manageable, making them easier to manipulate in laboratory conditions; They can be more readily extracted in intact form, ensuring the quality of our experimental material; Plasmid DNA is more amenable to electrophoretic separation, allowing for clearer and more precise analysis. Despite their smaller size, plasmids retain many of the key characteristics of genomic DNA that are relevant to our study. We believe this approach provides a robust and reliable model for our research while overcoming the practical limitations associated with genomic DNA. It allows us to investigate the fundamental principles we're interested in, while maintaining experimental feasibility and data integrity. We have added related references in lines 314 and 599.

      (7) C-di-GMP -HipH is not a TA, it does not fit in the definition of the TA systems. You can say C-di-gmp is an antitoxin based on your study, but C-di-gmp -HipH is not a TA pair.

      We appreciate your insightful feedback regarding the classification of the c-di-GMP-HipH interaction. We acknowledged that while our study suggests c-di-GMP may function as an antitoxin to HipH, the c-di-GMP-HipH pair does not constitute a classical TA system due to the lack of genetic linkage. We have replaced the term "TA system" with "TA-like system" when referring to the c-di-GMP-HipH interaction. This more accurately reflects the nature of their relationship while acknowledging that it differs from traditional TA systems.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) Either indent or skip a line to indicate a new paragraph; there is no need to do both.

      Thank you for your feedback regarding the formatting of our manuscript. We have revised the formatting throughout the main text by using a single blank line to separate paragraphs, without indentation.

      (2) L 77: need to define 'c-di-GMP' without using another abbreviation; please write '3,5-cyclic diguanylic acid', etc.

      Thank you for your valuable feedback regarding the proper introduction of abbreviations in our manuscript. We have revised line 86 to introduce the full name of c-di-GMP as "3,5-cyclic diguanylic acid". Following this initial introduction, we consistently use the abbreviation "c-di-GMP" throughout the rest of the manuscript.

      Reviewer #2 (Recommendations For The Authors):

      This is a fascinating story, but the title and the manuscript need careful revision to make it more clear. The novelty and logic are not very easy to follow.

      (1) Figure 1B, " h" is missing

      We sincerely thank you for your attentive review and for pointing out the missing "h" in Figure 1B. We have carefully reviewed and revised the figure legend in Figure 1B.​ The unit of time has been corrected to include "h" (hours) where appropriate, ensuring consistency and accuracy throughout the figure.

      (2) Line 222, the in vivo mice model should be cited with the reference.

      Thank you for the reminding. We have cited the following reference related to the mice model (line 231).

      Pang Y, et al., (2022) Bladder epithelial cell phosphate transporter inhibition protects mice against uropathogenic Escherichia coli infection. Cell reports 39: 110698

      References

      (1) Wood, T.K. and S. Song, Forming and waking dormant cells: The ppGpp ribosome dimerization persister model. Biofilm, 2020. 2: p. 100018.

      (2) Song, S. and T.K. Wood, ppGpp ribosome dimerization model for bacterial persister formation and resuscitation. Biochem Biophys Res Commun, 2020. 523(2): p. 281-286.

      (3) Wood, T.K., S. Song, and R. Yamasaki, Ribosome dependence of persister cell formation and resuscitation. J Microbiol, 2019. 57(3): p. 213-219.

      (4) Niu, H., J. Gu, and Y. Zhang, Bacterial persisters: molecular mechanisms and therapeutic development. Signal Transduct Target Ther, 2024. 9(1): p. 174.

      (5) Mok, W.W., M.A. Orman, and M.P. Brynildsen, Impacts of global transcriptional regulators on persister metabolism. Antimicrob Agents Chemother, 2015. 59(5): p. 2713-9.

      (6) Amato, S.M., M.A. Orman, and M.P. Brynildsen, Metabolic control of persister formation in Escherichia coli. Mol Cell, 2013. 50(4): p. 475-87.

      (7) Vrabioiu, A.M. and H.C. Berg, Signaling events that occur when cells of Escherichia coli encounter a glass surface. Proc Natl Acad Sci U S A, 2022. 119(6).

      (8) Liu, J., et al., Viable but nonculturable (VBNC) state, an underestimated and controversial microbial survival strategy. Trends Microbiol, 2023. 31(10): p. 1013-1023.

      (9) Pan, H. and Q. Ren, Wake Up! Resuscitation of Viable but Nonculturable Bacteria: Mechanism and Potential Application. Foods, 2022. 12(1).

      (10) Ayrapetyan, M., T. Williams, and J.D. Oliver, Relationship between the Viable but Nonculturable State and Antibiotic Persister Cells. J Bacteriol, 2018. 200(20).

      (11) Zhao, S., et al., Absolute Quantification of Viable but Nonculturable Vibrio cholerae Using Droplet Digital PCR with Oil-Enveloped Bacterial Cells. Microbiol Spectr, 2022. 10(4): p. e0070422.

      (12) Zhao, S., et al., Enumeration of Viable Non-Culturable Vibrio cholerae Using Droplet Digital PCR Combined With Propidium Monoazide Treatment. Front Cell Infect Microbiol, 2021. 11: p. 753078.

      (13) Pan, X., et al., Recent Advances in Bacterial Persistence Mechanisms. Int J Mol Sci, 2023. 24(18).

      (14) Patel, H., H. Buchad, and D. Gajjar, Pseudomonas aeruginosa persister cell formation upon antibiotic exposure in planktonic and biofilm state. Sci Rep, 2022. 12(1): p. 16151.

      (15) Maki, S., et al., Partner switching mechanisms in inactivation and rejuvenation of Escherichia coli DNA gyrase by F plasmid proteins LetD (CcdB) and LetA (CcdA). J Mol Biol, 1996. 256(3): p. 473-82.

      (16) Hockings, S.C. and A. Maxwell, Identification of four GyrA residues involved in the DNA breakage-reunion reaction of DNA gyrase. J Mol Biol, 2002. 318(2): p. 351-9.

      (17) Chan, P.F., et al., Structural basis of DNA gyrase inhibition by antibacterial QPT-1, anticancer drug etoposide and moxifloxacin. Nat Commun, 2015. 6: p. 10048.

    1. Reviewer #2 (Public review):

      Summary

      This work investigates how multiple DNA elements combine to regulate gene expression. The authors use an episomal reporter assay which measures the transcriptional output of the reporter under the regulation of an enhancer-enhancer-promoter triple. The authors test all combinations of 8 promoters and 59 enhancers in this assay. There are two main findings: (1) enhancer pairs generally combine additively on reporter output (2) the extent to which enhancers increase reporter output over the promoter (individually and as enhancer-enhancer pairs) is inversely related to the intrinsic strength of the promoter. Both of these findings are interesting and are well supported by the data.

      This study extends previous results on enhancer-promoter combinations to enhancer-enhancer-promoter triples. For example the near equivalence of Fig. 5b and Fig. S7b is intriguing. This experimental design also provides the ability to investigate the notion of selectivity (also commonly referred to as compatibility) between enhancer-enhancer pairs and promoters.

      The authors note many limitations, including the selection of the elements and the size and spacing of the tested elements. Some of the enhancer-enhancer-promoter triples they test were also investigated by a different experimental design in Brosh et al 2023. Brosh et al observed non-additivity between these elements while this study did not. Ultimately we do not know which mechanisms produce the non-additivity that has been observed in native loci and which experimental designs would preserve such mechanisms.

      Overall this is a nice experimental design and a great dataset for probing how enhancers and promoters combine to regulate gene expression. I have no major concerns, but I will try to clarify some methodological points I found confusing.

      Methodology<br /> The following two comments are meant to help the reader understand the methodology/terminology used in this paper and how it relates to other similar studies.

      The interpretation that "promoters scale enhancer signals in a non-linear manner" is potentially confusing. I believe that the authors use "non-linear" to refer to the slopes (represented by the letter 'b' in Fig. 5b) being not equal to 1. Given how the boost index is defined, this implies the relationship

      Activity of EEP = (Activity of CCP) * (Average Linear Boost)^b

      One potential source of confusion is that the Average Linear Boost term itself depends on the set of promoters that are assayed. Averaging across (many) promoters may alleviate this concern, in which case Average Linear Boost may be considered some form of intrinsic enhancer strength. If so, there is a correspondence between this terminology and the terminology presented in Bergman et al 2022. If b not equal to 1 refers to a non-linear scaling, then the reader may think that b=1 refers to a linear scaling. But if b=1, and the Average Linear Boost term is interpreted as intrinsic enhancer strength, then the equation above implies that the activity of EEP is equal to an intrinsic promoter strength times an intrinsic enhancer strength. This is essentially the relationship that is considered in Bergman et al 2022 and which is referred to in that paper as 'multiplicative'. The purpose of this comment is not to argue for what is the relationship that best explains the data, it is just to clarify the terminology.

      Enhancer-promoter selectivity: As a follow-up to a previous study (Martinez-Ara et al, Molecular Cell 2022) the authors mention that the data in this study also shows that enhancers show selectivity for certain promoters. I found the methodology hard to follow, so this section of the review is meant to guide the reader in understanding how the authors define 'selectivity'. The authors consider an enhancer to be not selective if its 'boost index' is the same across a set of promoters. 'Boost index' is defined to be the ratio of the reporter output with the enhancer and promoter divided by the reporter output with just the promoter. Conceptually, I think that considering the boost index is a reasonable way to quantify selectivity. The authors use a frequentist approach to classify each enhancer as selective or not selective. The null hypothesis is that the boost index of the enhancer is equal across a set of promoters. This can be visualized in Fig. 2C where the null hypothesis is that the mean of each vertical distribution is equal. Note that in Figure S4b of this paper (and in Figure 4B of their 2022 paper) the within-group variance is not plotted. Statistical significance is assessed using a Welch F-test.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      We thank the reviewer for the positive and constructive comments. We apologize for the very long delay in submitting this revised manuscript; due to personal circumstances we were not able to do this earlier.

      This manuscript by Martinez-Ara et al investigates how combinations of cis-regulatory elements combine to influence gene expression. Using a clever iteration on massively parallel reporter assays (MPRAs), the authors measure the combinatorial effects of pairs of enhancers on specific promoters. Specifically, they assayed the activity of 59x59 different enhancer-enhancer (E-E) combinations on 8 different promoters in mouse embryonic stem cells. The main claims of the paper are that E-E pairs combine nearly additively, and that supra-additive E-E pairs are rare and often promoter-dependent. The data in this study generally support these claims.

      This paper makes a good contribution to the ongoing discussions about the selectivity of gene regulatory elements. Recent works, such as those by Martinez-Ara et al. and Burgman et al., have indicated limited selectivity between E-P pairs on plasmid-based assays; this paper adds another layer to that by suggesting a similar lack of selectivity between E-E pairs.

      An interesting result in this manuscript is the observation that weak promoters allow more supra-additive E-E interactions than strong promoters (Figure 4b). This nonlinear promoter response to enhancers aligns with the model previously proposed in Hong et al. (from my own group), which posited that core promoter activities are nonlinearly scaled by the genomic environment, and that (similar to the trend observed in Figure 5b) the steepness of the scaling is negatively correlated with promoter strength.

      We now discuss the parallel with the Hong 2022 study (Discussion, lines 307-310).

      My only suggestion for the authors is that they include more plots showing how much the intrinsic strengths of the promoters and enhancers they are working with explain the trends in their data.

      Agreed, see below.

      Specific Suggestions

      Supplementary Figure 4 is presented as evidence for selectivity between single enhancers and promoters. Could the authors inspect the relationship between enhancer/promoter strength and this selectivity? Generating plots similar to Figure 4B and Figure 5B, but for single enhancers, should show if the ability of an enhancer to boost a promoter is inversely correlated to that promoter's intrinsic strength...

      Thank you for the suggestion, we have now repeated the analysis of Figure 5 for EP pairs instead of EEP triplets, and included it as new Supplementary Figure S7. Despite the lower statistical power, the trends are very similar. 

      ...Also, in Supplementary Figure 4, coloring each point by promoter type would clarify if certain promoters (the weak ones) consistently show higher boost indices across all enhancers. If they do not, the authors may want to speculate how single enhancers can show selectivity for promoters while the effect of adding a second enhancer to an existing E-P has little selectivity. An alternate explanation, based solely on the strength of the elements, would be that when the expression of a gene is low the addition of enhancer(s) has large effects, but when the expression of a gene is high (closer to saturation) the addition of enhancer(s) have small effects.

      We now added colour coding for each of the promoters in figure S4. We agree this clarifies the contribution of each promoter to the selectivity of each enhancer and it further confirms the responsiveness trends observed in Figure 5.

      Can anything more be said about the enhancers in E-E-P combinations that exhibit supra-additivity? Specifically, it would be interesting to know if certain enhancers, e.g. strong enhancers or enhancers with certain motifs, are more likely to show supra-additivity with a given promoter.

      Unfortunately, even with the number of enhancers that we tested, we lack statistical power to identify sequence motifs that may favour supra-additivity.

      Reviewer #2 (Public Review):

      We thank the reviewer for the supportive and constructive comments. We apologize for the very long delay in submitting this revised manuscript; due to personal circumstances we were not able to do this earlier.

      Summary

      This work investigates how multiple regulatory elements combine to regulate gene expression. The authors use an episomal reporter assay which measures the transcriptional output of the reporter under the regulation of an enhancer-enhancer-promoter triple. The authors test all combinations of 8 promoters and 59 enhancers in this assay. The main finding is that enhancer pairs generally combine additively on reporter output. The authors also find that the extent to which enhancers increase reporter output is inversely related to the intrinsic strength of the promoter.

      This manuscript presents a compact experiment that investigates an important open question in gene regulation. The results and data will be of interest to researchers studying enhancers. Given that my expertise is in modeling and computation, I will take the experimental results at face value and focus my review on the interpretation of the results and the computational methodology. I find the result of additivity between enhancers to be well supported. The findings on differential responsiveness between promoters are very interesting but the interpretation of such responses as 'non-linear' or 'following a power-law' may be misleading. More broadly, I think a more rigorous description of the mathematical methodology would increase the clarity and accessibility of this manuscript. A major unanswered question is whether the findings in this study apply to enhancers in their native genomic context. Regardless, investigating such questions in an episomal reporter assay is valuable.

      Main comments

      Applicability to native genomic context: The applicability of the results in this paper to enhancers in their native genomic context is unclear. As the authors state in the discussion section, the reporter gene is not integrated into the genome, the spacing between enhancers does not match their native context etc. It is thus unclear whether this experimental design is able to detect the non-additivity between enhancers which is known to be present in the genome. This could be investigated by testing the enhancer-enhancer-promoter tuples for which non-additivity has been observed in the genome (references are given in the introduction) in this assay.

      We appreciate the suggestion, but we chose not to go back to the lab to generate additional data to address this point. Of the cited previous studies, two are comparable to our study because they also used mESCs and included loci that we also studied:  Thomas et al. (2021) and Brosh et al. (2023). We now discuss how the findings of these two studies relate to our observations in the Discussion, lines 336-345.

      Interpretation of promoter responses as non-linear and following a power-law: In Fig 5, the authors demonstrate that enhancer-enhancer pairs boost reporter output more for weak promoters as opposed to strong promoters. I agree the data supports this finding, but I find the interpretation of such data as promoters scaling enhancers according to a power-law (as stated in the abstract) to be misleading. As mentioned on line 297, it is not possible to define an intrinsic measure of enhancer strength, thus the authors assign the base of the power-law to be the average boost index of the enhancer-enhancer pair across the 8 promoters. But this measure incorporates some aspect of a promoter and is not solely a property of enhancers...

      We agree that the power-law conclusion in the abstract was too strong; we have rephrased it as "non-linear".

      ...It would also be useful to know whether the results in Fig 5 apply to only enhancer-enhancer-promoter triples or also to enhancer-promoter pairs.

      We have now added this EP analysis as new Supplemental Figure S7. Although the statistical power is much lower, this shows very similar trends as the EEP analysis. We briefly report this, lines 275-278.

      Enhancer-promoter selectivity: As a follow-up to a previous study (Martinez-Ara et al, Molecular Cell 2022) the authors mention that the data in this study also shows that enhancers show selectivity for certain promoters. The authors mention that both studies use the same statistical methodology and the data in this study is consistent with the data from the 2022 paper. However, I think the statistical methodology in both studies needs further exposition. This section of the review is thus meant to ensure that I understand the author's methodology, to guide the reader in understanding how the authors define 'selectivity', and to probe certain assumptions underlying the methodology.

      My understanding of the approach is as follows: The authors consider an enhancer to be not selective if its 'boost index' is the same across a set of promoters. 'Boost index' is defined to be the ratio of the reporter output with the enhancer and promoter divided by the reporter output with just the promoter. Conceptually, I think that considering the boost index is a reasonable way to quantify selectivity.

      The authors use a frequentist approach to classify each enhancer as selective or not selective. The null hypothesis is that the boost index of the enhancer is equal across a set of promoters. This can be visualized in Fig. 2C where the null hypothesis is that the mean of each vertical distribution is equal. Note that in Figure S4 of this paper (and in Figure 4B of their 2022 paper) the within-group variance is not plotted. Statistical significance is assessed using a Welch F-test. This is a parametric test that assumes that the observations within each vertical distribution in Fig 2C are normally distributed (this test does allow for heteroskedasticity - which means that the variance may differ within each vertical distribution). Does the normality assumption hold? This analysis should be reported. If this assumption does not hold, is the Welch test well calibrated?

      We have tested the normality of all of the single enhancer + promoter combinations that were tested using the welch F-test. 94.1% of the 439 single enhancers + Promoter combinations show normal distributions (at a 1% FDR). We have added this to the methods section of the revised manuscript. Apart from this, non-normality has little to no influence on the Welch F-test performance (https://rips-irsp.com/articles/10.5334/irsp.198). Therefore, the use of the Welch F-test to score enhancer selectivity on these data is valid. Apart from this, we agree that a simple binary classification of selective vs non-selective is not descriptive enough for these kinds of data. We addressed this in our previous publication by exploring the relationship between selectivity and enhancer strength. However, in the objective in this publication was solely to show that this new dataset follows similar selectivity patterns to our previous publication. Furthermore, our analysis on the non-linearity of promoter response is a more quantitative continuation on the analysis on selectivity as this is probably one of the major contributors to enhancer selectivity. This was probably present in our previous paper but could not be analyzed as there were less combinations per promoter.

      For further clarity, we have now highlighted the individual promoters in Figure S4 by colors.

      Recommendations for the authors:

      Reviewer #2 (Recommendations for the authors):

      I found this to be an interesting manuscript and am glad this experiment was conducted. As I wrote in my public review, I think that clarifying the computational methods/ideas would really help. I also think it would be helpful to properly define the terms that are being used. For example, this manuscript uses the terminology cooperativity and synergy. Are these meant to be synonymous with supra-addivity?

      Thank you for this point. The revised manuscript no longer uses the word “cooperativity”. We now use “supra-additivity” when describing our data, and “synergy” as biological interpretation. In the Introduction we now clarify this distinction.

      Comments on enhancer selectivity:

      In the public review, I have given comments on the statistical methodology employed to assess enhancer selectivity. On a more subjective note, I'm not convinced that a frequentist approach to a binary classification of 'selective' vs 'not selective' is that useful here. I think it would be more useful to report an 'effect size' of the extent to which an enhancer is selective and to study the sources of this effect size. I think you've tried to do this in lines 329-339 of the discussion but I think the exposition could be clearer.

      Figure S4B may suggest how to do this. It appears that the distribution of boost indices for a given enhancer is trimodal (this is most obvious for the stronger enhancers on the top of the plot). Is it the case that each mode (for each enhancer) consists of the same set of promoters? I think what is implied by Figure 5B is that the stronger promoters are not boosted as much as the weaker promoters. So does the leftmost mode consist of Ap1m1, the middle mode consist of Klf2/Otx2/Nanog, and the rightmost mode of Sox2/Fgf5/Lefty1/Tbx3? If so, I would recommend emphasizing this in the text/figure and clarifying how this relates to selectivity. It seems that the chain of logic is as follows: (1) We define an enhancer to be selective if its boost indices across a set of promoters are not the same. (2) We generally observe that stronger promoters get boosted less than weaker promoters. (3) Thus selectivity arises due to differences in intrinsic strengths of the promoter. I think this is what is being implied in lines 329-339 of the discussion, but it took me multiple readings to understand this and I'm not convinced the power-law explanation is justified (see public review).

      We have modified this paragraph of the Discussion (now lines 350-359).

      Regarding the power-law: in the Results we state “roughly a power-law function”. We have removed the power-law claim from the abstract, that conclusion as phrased was indeed too firm.

      Reference to Zuin et al

      Lines 323 - 325: A reference is made to the data from Zuin et al "following approximately a power-law". What data in Zuin et al does this statement refer to? I do not believe the authors in Zuin et al claim that the relationship between GFP intensity and enhancer-promoter distance (Figure 1h,i from Zuin et al) follows a power law. It is certainly non-linear, but I have taken a look at this data myself and do not find it follows a power-law. Please either explain this further and rigorously justify the claim or adjust the wording accordingly.

      Good point, in the discussion of Zuin et al we have replaced “power law” with “non-linear decay function”

    1. Author response:

      Reviewer #1:

      Summary:

      One enduring mystery involving the evolution of genomes is the remarkable variation they exhibit with respect to size. Much of that variation is due to differences in the number of transposable elements, which often (but not always) correlates with the overall quantity of DNA. Amplification of TEs is nearly always either selectively neutral or negative with respect to host fitness. Given that larger effective population sizes are more efficient at removing these mutations, it has been hypothesized that TE content, and thus overall genome size, may be a function of effective population size. The authors of this manuscript test this hypothesis by using a uniform approach to analysis of several hundred animal genomes, using the ratio of synonymous to nonsynonymous mutations in coding sequence as a measure of the overall strength of purifying selection, which serves as a proxy for effective population size over time. The data convincingly demonstrates that it is unlikely that effective population size has a strong effect on TE content and, by extension, overall genome size (except for birds).

      Strengths:

      Although this ground has been covered before in many other papers, the strength of this analysis is that it is comprehensive and treats all the genomes with the same pipeline, making comparisons more convincing. Although this is a negative result, it is important because it is relatively comprehensive and indicates that there will be no simple, global hypothesis that can explain the observed variation.

      Weaknesses:

      In several places, I think the authors slip between assertions of correlation and assertions of cause-effect relationships not established in the results. 

      Several times in the text we use the expression “effect of dN/dS on…” which might indeed suggest a causal relationship. The phrasing refers to dN/dS being used in the regression as an independent variable that can be able to predict the variation of the dependent variables genome size and TE content. We are going to rephrase these expressions so that correlation is not mistaken with causation.

      In other places, the arguments end up feeling circular, based, I think, on those inferred causal relationships. It was also puzzling why plants (which show vast differences in DNA content) were ignored altogether.

      The analysis focuses on metazoans for two reasons: one practical and one fundamental. The practical reason is computational. Our analysis included TE annotation, phylogenetic estimation and dN/dS estimation, which would have been very difficult with the hundreds, if not thousands, of plant genomes available. If we had included plants, it would have been natural to include fungi as well, to have a complete set of multicellular eukaryotic genomes, adding to the computational burden. The second fundamental reason is that plants show important genome size differences due to more frequent whole genome duplications (polyploidization) than in animals. It is therefore possible that the effect of selection on genome size is different in these two groups, which would have led us to treat them separately, decreasing the interest of this comparison. For these reasons we chose to focus on animals that still provide very wide ranges of genome size and population size well suited to test the impact of drift.

      Reviewer #2:

      Summary:

      The Mutational Hazard Hypothesis (MHH) is a very influential hypothesis in explaining the origins of genomic and other complexity that seem to entail the fixation of costly elements. Despite its influence, very few tests of the hypothesis have been offered, and most of these come with important caveats. This lack of empirical tests largely reflects the challenges of estimating crucial parameters.

      The authors test the central contention of the MHH, namely that genome size follows effective population size (Ne). They martial a lot of genomic and comparative data, test the viability of their surrogates for Ne and genome size, and use correct methods (phylogenetically corrected correlation) to test the hypothesis. Strikingly, they not only find that Ne is not THE major determinant of genome size, as is argued by MHH, but that there is not even a marginally significant effect. This is remarkable, making this an important paper.

      Strengths:

      The hypothesis tested is of great importance.

      The negative finding is of great importance for reevaluating the predictive power of the tested hypothesis.

      The test is straightforward and clear.

      The analysis is a technical tour-de-force, convincingly circumventing a number of challenges of mounting a true test of the hypothesis.

      Weaknesses:

      I note no particular strengths, but I believe the paper could be further strengthened in three major ways.

      (1) The authors should note that the hypothesis that they are testing is larger than the MHH. The MHH hypothesis says that

      (i) low-Ne species have more junk in their genomes and

      (ii) this is because junk tends to be costly because of increased mutation rate to nulls, relative to competing non/less-junky alleles.

      The current results reject not just the compound (i+ii) MHH hypothesis, but in fact any hypothesis that relies on i. This is notably a (much) more important rejection. Indeed, whereas MHH relies on particular constructions of increased mutation rates of varying plausibility, the more general hypothesis i includes any imaginable or proposed cost to the extra sequence (replication costs, background transcription, costs of transposition, ectopic expression of neighboring genes, recombination between homologous elements, misaligning during meiosis, reduced organismal function from nuclear expansion, the list goes on and on). For those who find the MHH dubious on its merits, focusing this paper on the MHH reduces its impact - the larger hypothesis that the small costs of extra sequence dictate the fates of different organisms' genomes is, in my opinion, a much more important and plausible hypothesis, and thus the current rejection is more important than the authors let on.

      The MHH is arguably the most structured and influential theoretical framework proposed to date based on the null assumption (i), therefore setting the paper up with the MHH is somehow inevitable. Because of this, in the manuscript, we mostly discuss the peculiarities of TE biology that can drive the genome away from the MHH expectations, focusing on the mutational aspect. We however agree that the hazard posed by extra DNA is not limited to the gain of function via the mutation process, but can be linked to many other molecular processes as mentioned above. In a revised manuscript, we will make the concept of hazard more comprehensive and further stress that this applies not only to TEs but any nearly-neutral mutation affecting non-coding DNA.

      (2) In addition to the authors' careful logical and mathematical description of their work, they should take more time to show the intuition that arises from their data. In particular, just by looking at Figure 1b one can see what is wrong with the non-phylogenetically-corrected correlations that MHH's supporters use. That figure shows that mammals, many of which have small Ne, have large genomes regardless of their Ne, which suggests that the coincidence of large genomes and frequently small Ne in this lineage is just that, a coincidence, not a causal relationship. Similarly, insects by and large have large Ne, regardless of their genome size. Insects, many of which have large genomes, have large Ne regardless of their genome size, again suggesting that the coincidence of this lineage of generally large Ne and smaller genomes is not causal. Given that these two lineages are abundant on earth in addition to being overrepresented among available genomes (and were even more overrepresented when the foundational MHH papers collected available genomes), it begins to emerge how one can easily end up with a spurious non-phylogenetically corrected correlation: grab a few insects, grab a few mammals, and you get a correlation. Notably, the same holds for lineages not included here but that are highly represented in our databases (and all the more so 20 years ago): yeasts related to S. cerevisiae (generally small genomes and large median Ne despite variation) and angiosperms (generally large genomes (compared to most eukaryotes) and small median Ne despite variation). Pointing these clear points out will help non-specialists to understand why the current analysis is not merely a they-said-them-said case, but offers an explanation for why the current authors' conclusions differ from the MHH's supporters and moreover explain what is wrong with the MHH's supporters' arguments.

      We agree that comparing dispersion of the points from the non-phylogenetically corrected correlation with the results of the phylogenetic contrasts intuitively emphasizes the importance of accounting for species relatedness. Just looking at the clade colors in Figure 2 makes immediately stand out that a simple regression hides phylogenetic structure. We will stress this in the discussion to make the point clear.

      (3) A third way in which the paper is more important than the authors let on is in the striking degree of the failure of MHH here. MHH does not merely claim that Ne is one contributor to genome size among many; it claims that Ne is THE major contributor, which is a much, much stronger claim. That no evidence exists in the current data for even the small claim is a remarkable failure of the actual MHH hypothesis: the possibility is quite remote that Ne is THE major contributor but that one cannot even find a marginally significant correlation in a huge correlation analysis deriving from a lot of challenging bioinformatic work. Thus this is an extremely strong rejection of the MHH. The MHH is extremely influential and yet very challenging to test clearly. Frankly, the authors would be doing the field a disservice if they did not more strongly state the degree of importance of this finding.

      We respectfully disagree with the reviewer that there is currently no evidence for an effect of Ne on genome size evolution. While it is accurate that our large dataset allows us to reject the universality of Ne as the major contributor to genome size variation, this does not exclude the possibility of such an effect in certain contexts. Notably, there are several pieces of evidence that find support for Ne to determine genome size variation and to entail nearly-neutral TE dynamics under certain circumstances, e.g. of particularly strongly contrasted Ne and moderate divergence times (Lefébure et al. 2017; Mérel et al. 2024; Tollis and Boissinot 2013; Ruggiero et al. 2017). The strength of such works is to analyze the short-term dynamics of TEs in response to Ne within groups of species/populations, where the cost posed by extra DNA is likely to be similar. Indeed, the MHH predicts genome size to vary according to the combination of drift and mutation under the nearly-neutral theory of molecular evolution. Our work demonstrates that it is not true universally but does not exclude that it could exist locally. Moreover, defense mechanisms against TEs proliferation are often complex molecular machineries that might or might not evolve according to different constraints among clades. We have detailed these points in the discussion.

      Reviewer #3:

      Summary

      The Mutational Hazard Hypothesis (MHH) suggests that lineages with smaller effective population sizes should accumulate slightly deleterious transposable elements leading to larger genome sizes. Marino and colleagues tested the MHH using a set of 807 vertebrate, mollusc, and insect species. The authors mined repeats de novo and estimated dN/dS for each genome. Then, they used dN/dS and life history traits as reliable proxies for effective population size and tested for correlations between these proxies and repeat content while accounting for phylogenetic nonindependence. The results suggest that overall, lineages with lower effective population sizes do not exhibit increases in repeat content or genome size. This contrasts with expectations from the MHH. The authors speculate that changes in genome size may be driven by lineage-specific host-TE conflicts rather than effective population size.

      Strengths

      The general conclusions of this paper are supported by a powerful dataset of phylogenetically diverse species. The use of C-values rather than assembly size for many species (when available) helps mitigate the challenges associated with the underrepresentation of repetitive regions in short-read-based genome assemblies. As expected, genome size and repeat content are highly correlated across species. Nonetheless, the authors report divergent relationships between genome size and dN/dS and TE content and dN/dS in multiple clades: Insecta, Actinopteri, Aves, and Mammalia. These discrepancies are interesting but could reflect biases associated with the authors' methodology for repeat detection and quantification rather than the true biology.

      Weaknesses

      The authors used dnaPipeTE for repeat quantification. Although dnaPipeTE is a useful tool for estimating TE content when genome assemblies are not available, it exhibits several biases. One of these is that dnaPipeTE seems to consistently underestimate satellite content (compared to repeat masker on assembled genomes; see Goubert et al. 2015). Satellites comprise a significant portion of many animal genomes and are likely significant contributors to differences in genome size. This should have a stronger effect on results in species where satellites comprise a larger proportion of the genome relative to other repeats (e.g. Drosophila virilis, >40% of the genome (Flynn et al. 2020); Triatoma infestans, 25% of the genome (Pita et al. 2017) and many others). For example, the authors report that only 0.46% of the Triatoma infestans genome is "other repeats" (which include simple repeats and satellites). This contrasts with previous reports of {greater than or equal to}25% satellite content in Triatoma infestans (Pita et al. 2017). Similarly, this study's results for "other" repeat content appear to be consistently lower for Drosophila species relative to previous reports (e.g. de Lima & Ruiz-Ruano 2022). The most extreme case of this is for Drosophila albomicans where the authors report 0.06% "other" repeat content when previous reports have suggested that 18%->38% of the genome is composed of satellites (de Lima & Ruiz-Ruano 2022). It is conceivable that occasional drastic underestimates or overestimates for repeat content in some species could have a large effect on coevol results, but a minimal effect on more general trends (e.g. the overall relationship between repeat content and genome size).

      There are indeed some discrepancies between our estimates of low complexity repeats and those from the literature due to the approach used. Hence, occasional underestimates or overestimates of repeat content are possible. As noted, the contribution of “Other” repeats to the overall repeat content is generally very low, meaning an underestimation bias. We thank the reviewer for providing this interesting review. We will emphasize it in the discussion of our revised manuscript.

      Not being able to correctly estimate the quantity of satellites might pose a problem for quantifying the total content of junk DNA. However, the overall repeat content mostly composed of TEs correlates very well with genome size, both in the overall dataset and within clades (with the notable exception of birds) so we are confident that this limitation is not the explanation of our negative results. Moreover, while satellite information might be missing, this is not problematic to test our a priori hypothesis since we focus our attention on TEs, whose proliferation mechanism is very different from that of tandem repeats.

      Finally, divergence from the consensus can be estimated only for TEs. Therefore, recently active elements do not include simple and tandem repeats: yet the results based on recent TE content are very similar to those based on the overall repeat content.

      Another bias of dnaPipeTE is that it does not detect ancient TEs as well as more recently active TEs (Goubert et al. 2015). Thus, the repeat content used for PIC and coevolve analyses here is inherently biased toward more recently inserted TEs. This bias could significantly impact the inference of long-term evolutionary trends.

      Indeed, dnaPipeTE is not good at detecting old TE copies due to the read-based approach, biasing the outcome towards new elements. We agree on TE content being underestimated, especially in those genomes that tend to accumulate TEs rather than getting rid of them. However, the sum of old TEs and recent TEs is extremely well correlated to genome size (Pearson’s correlation: r = 0.87, p-value < 2.2e-16; PIC: slope = 0.22, adj-R2 = 0.42, p-value < 2.2e-16). Our main result therefore does not rely on an accurate estimation of old TEs. In contrast, we hypothesized that recent TEs could be interesting if selection acted on TEs insertion and dynamics rather than on non-coding DNA. Our results demonstrate that this is not the case: it should be noted that in spite of its limits for old TEs, dnaPipeTE is especially fitting for this specific analysis as it is not biased by very repetitive new TE families that are problematic to assemble. We will clearly emphasize the limitation of dnaPipeTE and discuss the consequences on our results in the discussion of the revised manuscript.

      Finally, in a preliminary analysis on the dipteran species, we show that the TE content estimated with dnaPipeTE is generally similar to that estimated from the assembly with earlGrey (Baril et al. 2024) across a good range of genome sizes going from drosophilid-like to mosquito-like (Pearson’s correlation: r = 0.88, p-value = 3.22e-10; see also the corrected Supplementary Figure S2 below). While for these species TEs are probably dominated by recent to moderately recent TEs, Aedes albopictus is an outlier for its genome size and the estimations with the two methods are largely consistent. However, the computation time required to estimate TE content using EarlGrey was significantly longer, with a ~300% increase in computation time, making it a very costly option (a similar issue is applicable to other assembly-based annotation pipelines). Given the rationale presented above, we decided to use dnaPipeTE instead of EarlGrey.

    1. Reviewer #2 (Public review):

      Summary:

      Fei, Lu, Shi, et al. present a thorough evaluation of the immune cell landscape in pre-eclamptic human placentas by single-cell multi-omics methodologies compared to normal control placentas. Based on their findings of elevated frequencies of inflammatory macrophages and memory-like Th17 cells, they employ adoptive cell transfer mouse models to interrogate the coordination and function of these cell types in pre-eclampsia immunopathology. They demonstrate the putative role of the IGF1-IGF1R axis as the key pathway by which inflammatory macrophages in the placenta skew CD4+ T cells towards an inflammatory IL-17A-secreting phenotype that may drive tissue damage, vascular dysfunction, and elevated blood pressure in pre-eclampsia, leaving researchers with potential translational opportunities to pursue this pathway in this indication.

      They present a major advance to the field in their profiling of human placental immune cells from pre-eclampsia patients where most extant single-cell atlases focus on term versus preterm placenta, or largely examine trophoblast biology with a much rarer subset of immune cells. While the authors present vast amounts of data at both the protein and RNA transcript level, we, the reviewers, feel this manuscript is still in need of much more clarity in its main messaging, and more discretion in including only key data that supports this main message most effectively.

      Strengths:

      (1) This study combines human and mouse analyses and allows for some amount of mechanistic insight into the role of pro-inflammatory and anti-inflammatory macrophages in the pathogenesis of pre-eclampsia (PE), and their interaction with Th17 cells.

      (2) Importantly, they do this using matched cohorts across normal pregnancy and common PE comorbidities like gestation diabetes (GDM).

      (3) The authors have developed clear translational opportunities from these "big data" studies by moving to pursue potential IGF1-based interventions.

      Weaknesses:

      (1) Clearly the authors generated vast amounts of multi-omic data using CyTOF and single-cell RNA-seq (scRNA-seq), but their central message becomes muddled very quickly. The reader has to do a lot of work to follow the authors' multiple lines of inquiry rather than smoothly following along with their unified rationale. The title description tells fairly little about the substance of the study. The manuscript is very challenging to follow. The paper would benefit from substantial reorganizations and editing for grammatical and spelling errors. For example, RUPP is introduced in Figure 4 but in the text not defined or even talked about what it is until Figure 6. (The figure comparing pro- and anti-inflammatory macrophages does not add much to the manuscript as this is an expected finding).

      (2) The methods lack critical detail about how human placenta samples were processed. The maternal-fetal interface is a highly heterogeneous tissue environment and care must be taken to ensure proper focus on maternal or fetal cells of origin. Lacking this detail in the present manuscript, there are many unanswered questions about the nature of the immune cells analyzed. It is impossible to figure out which part of the placental unit is analyzed for the human or mouse data. Is this the decidua, the placental villi, or the fetal membranes? This is of key importance to the central findings of the manuscript as the immune makeup of these compartments is very different. Or is this analyzed as the entirety of the placenta, which would be a mix of these compartments and significantly less exciting?

      (3) Similarly, methods lack any detail about the analysis of the CyTOF and scRNAseq data, much more detail needs to be added here. How were these clustered, what was the QC for scRNAseq data, etc? The two small paragraphs lack any detail.

      (4) There is also insufficient detail presented about the quantities or proportions of various cell populations. For example, gdT cells represent very small proportions of the CyTOF plots shown in Figures 1B, 1C, & 1E, yet in Figures 2I, 2K, & 2K there are many gdT cells shown in subcluster analysis without a description of how many cells are actually represented, and where they came from. How were biological replicates normalized for fair statistical comparison between groups?

      (5) The figures themselves are very tricky to follow. The clusters are numbered rather than identified by what the authors think they are, the numbers are so small, that they are challenging to read. The paper would be significantly improved if the clusters were clearly labeled and identified. All the heatmaps and the abundance of clusters should be in separate supplementary figures.

      (6) The authors should take additional care when constructing figures that their biological replicates (and all replicates) are accurately represented. Figure 2H-2K shows N=10 data points for the normal pregnant (NP) samples when clearly their Table 1 and test denote they only studied N=9 normal subjects.

      (7) There is little to no evaluation of regulatory T cells (Tregs) which are well known to undergird maternal tolerance of the fetus, and which are well known to have overlapping developmental trajectory with RORgt+ Th17 cells. We recommend the authors evaluate whether the loss of Treg function, quantity, or quality leaves CD4+ effector T cells more unrestrained in their effect on PE phenotypes. References should include, accordingly: PMCID: PMC6448013 / DOI: 10.3389/fimmu.2019.00478; PMC4700932 / DOI: 10.1126/science.aaa9420.

      (8) In discussing gMDSCs in Figure 3, the authors have missed key opportunities to evaluate bona fide Neutrophils. We recommend they conduct FACS or CyTOF staining including CD66b if they have additional tissues or cells available. Please refer to this helpful review article that highlights key points of distinguishing human MDSC from neutrophils: https://doi.org/10.1038/s41577-024-01062-0. This will both help the evaluation of potentially regulatory myeloid cells that may suppress effector T cells as well as aid in understanding at the end of the study if IL-17 produced by CD4+ Th17 cells might recruit neutrophils to the placenta and cause ROS immunopathology and fetal resorption.

      (9) Depletion of macrophages using several different methodologies (PLX3397, or clodronate liposomes) should be accompanied by supplementary data showing the efficiency of depletion, especially within tissue compartments of interest (uterine horns, placenta). The clodronate piece is not at all discussed in the main text. Both should be addressed in much more detail.

      (10) There are many heatmaps and tSNE / UMAP plots with unhelpful labels and no statistical tests applied. Many of these plots (e.g. Figure 7) could be moved to supplemental figures or pared down and combined with existing main figures to help the authors streamline and unify their message.

      (11) There are claims that this study fills a gap that "only one report has provided an overall analysis of immune cells in the human placental villi in the presence and absence of spontaneous labor at term by scRNA-seq (Miller 2022)" (lines 362-364), yet this study itself does not exhaustively study all immune cell subsets...that's a monumental task, even with the two multi-omic methods used in this paper. There are several other datasets that have performed similar analyses and should be referenced.

      (12) Inappropriate statistical tests are used in many of the analyses. Figures 1-2 use the Shapiro-Wilk test, which is a test of "goodness of fit", to compare unpaired groups. A Kruskal-Wallis or other nonparametric t-test is much more appropriate. In other instances, there is no mention of statistical tests (Figures 6-7) at all. Appropriate tests should be added throughout.

    2. Author response:

      Reviewer #1:

      Strengths:

      Utilization of both human placental samples and multiple mouse models to explore the mechanisms linking inflammatory macrophages and T cells to preeclampsia (PE).<br /> Incorporation of advanced techniques such as CyTOF, scRNA-seq, bulk RNA-seq, and flow cytometry.

      Identification of specific immune cell populations and their roles in PE, including the IGF1-IGF1R ligand-receptor pair in macrophage-mediated Th17 cell differentiation.<br /> Demonstration of the adverse effects of pro-inflammatory macrophages and T cells on pregnancy outcomes through transfer experiments.

      Weaknesses:

      Comment 1. Inconsistent use of uterine and placental cells, which are distinct tissues with different macrophage populations, potentially confounding results.

      Response1: We thank the reviewers' comments. We have done the green fluorescent protein (GFP) pregnant mice-related animal experiment, which was not shown in this manuscript. The wild-type (WT) female mice were mated with either transgenic male mice, genetically modified to express GFP, or with WT male mice, in order to generate either GFP-expressing pups (GFP-pups) or their genetically unmodified counterparts (WT-pups), respectively. Mice were euthanized on day 18.5 of gestation, and the uteri of the pregnant females and the placentas of the offspring were analyzed using flow cytometry. The majority of macrophages in the uterus and placenta are of maternal origin, which was defined by GFP negative. In contrast, fetal-derived macrophages, distinguished by their expression of GFP, represent a mere fraction of the total macrophage population, signifying their inconsequential or restricted presence amidst the broader cellular landscape. We will added the GPF pregnant mice-related data in Figure 4-figure supplement 1 to explain the different macrophage populations in the uterine and placental cells.

      Comment 2. Missing observational data for the initial experiment transferring RUPP-derived macrophages to normal pregnant mice.

      Response 2: We thank the reviewers' comments. In our experiments, PLX3397 or Clodronate Liposomes was used to deplete the macrophages of pregnant mice, and then we injected RUPP-derived pro-inflammatory macrophages and anti-inflammatory macrophages back into PLX3397 or Clodronate Liposomes-treated pregnant mice. And We found that RUPP-derived F480+CD206- pro-inflammatory macrophages induced immune imbalance at the maternal-fetal interface and PE-like symptoms (Figure 4E-4H and Figure 4-figure supplement 1 A-C).

      Comment 3. Unclear mechanisms of anti-macrophage compounds and their effects on placental/fetal macrophages.

      Response 3: We thank the reviewers' comments. PLX3397, the inhibitor of CSF1R, which is needed for macrophage development (Nature. 2023, PMID: 36890231; Cell Mol Immunol. 2022, PMID: 36220994), we have stated that on line 189-191. However, PLX3397 is a small molecule compound that possesses the potential to cross the placental barrier and affect fetal macrophages. We will discuss the impact of this factor on the experiment in the discussion section.

      Comment 4. Difficulty in distinguishing donor cells from recipient cells in murine single-cell data complicates interpretation.

      Response 4: We thank the reviewers' comments. Upon analysis, we observed a notable elevation in the frequency of total macrophages within the CD45+ cell population. Then we subsequently performed macrophage clustering and uncovered a marked increase in the frequency of Cluster 0, implying a potential correlation between Cluster 0 and donor-derived cells. RNA sequencing revealed that the F480+CD206- pro-inflammatory donor macrophages exhibited a Folr2+Ccl7+Ccl8+C1qa+C1qb+C1qc+ phenotype, which is consistent with the phenotype of cluster 0 in macrophages observed in single-cell RNA sequencing (Figure 4D and Figure 5E). Therefore, we believe that the donor cells is cluster 0 in macrophages.

      Comment 5. Limitation of using the LPS model in the final experiments, as it more closely resembles systemic inflammation seen in endotoxemia rather than the specific pathology of PE.

      Response 5: We thank the reviewers' comments. Firstly, our other animal experiments in this manuscript used the Reduction in Uterine Perfusion Pressure (RUPP) mouse model to simulate the pathology of PE. However, the RUPP model requires ligation of the uterine arteries in pregnant mice on day 12.5 of gestation, which hinders T cells returning from the tail vein from reaching the maternal-fetal interface. In addition, this experiment aims to prove that CD4+ T cells are differentiated into memory-like Th17 cells through IGF-1R receptor signalling to affect pregnancy by clearing CD4+ T cells in vivo with an anti-CD4 antibody followed by injecting IGF-1R inhibitor-treated CD4+ T cells. And we proved that injection of RUPP-derived memory-like CD4+ T cells into pregnant rats induces PE-like symptoms (Figure 6). In summary, the application of the LPS model in Figure 8 does not affect the conclusions.

      Reviewer #2:

      Strengths:

      (1) This study combines human and mouse analyses and allows for some amount of mechanistic insight into the role of pro-inflammatory and anti-inflammatory macrophages in the pathogenesis of pre-eclampsia (PE), and their interaction with Th17 cells.

      (2) Importantly, they do this using matched cohorts across normal pregnancy and common PE comorbidities like gestation diabetes (GDM).

      (3) The authors have developed clear translational opportunities from these "big data" studies by moving to pursue potential IGF1-based interventions.

      Weaknesses:

      Comment 1. Clearly the authors generated vast amounts of multi-omic data using CyTOF and single-cell RNA-seq (scRNA-seq), but their central message becomes muddled very quickly. The reader has to do a lot of work to follow the authors' multiple lines of inquiry rather than smoothly following along with their unified rationale. The title description tells fairly little about the substance of the study. The manuscript is very challenging to follow. The paper would benefit from substantial reorganizations and editing for grammatical and spelling errors. For example, RUPP is introduced in Figure 4 but in the text not defined or even talked about what it is until Figure 6. (The figure comparing pro- and anti-inflammatory macrophages does not add much to the manuscript as this is an expected finding).

      Response 1: We thank the reviewers' comments. According to the reviewer's suggestion, we will proceed with making the necessary revisions. Firstly, We will modify the title of the article to be more specific. Then, we will introduce the RUPP mouse model when interpreted Figure 4. Thirdly, we plan to simplify or consolidate the images from Figure5 to Figure7 to make them easier to follow. Finally, We will diligently correct the grammatical and spelling errors in the article. As for the figure comparing pro- and anti-inflammatory macrophages, The Editor requested a more comprehensive description of the macrophage phenotype during the initial submission. As a result, we conducted the transcriptomes of both uterine-derived pro-inflammatory and anti-inflammatory macrophages and conducted a detailed analysis of macrophages in single-cell data.

      Comment 2. The methods lack critical detail about how human placenta samples were processed. The maternal-fetal interface is a highly heterogeneous tissue environment and care must be taken to ensure proper focus on maternal or fetal cells of origin. Lacking this detail in the present manuscript, there are many unanswered questions about the nature of the immune cells analyzed. It is impossible to figure out which part of the placental unit is analyzed for the human or mouse data. Is this the decidua, the placental villi, or the fetal membranes? This is of key importance to the central findings of the manuscript as the immune makeup of these compartments is very different. Or is this analyzed as the entirety of the placenta, which would be a mix of these compartments and significantly less exciting?

      Response 2: We thank the reviewers' comments. Placental villi rather than fetal membranes and decidua were used for CyToF in this study. This detail about how human placenta samples were processed will be added to the Materials and Methods section.

      Comment 3. Similarly, methods lack any detail about the analysis of the CyTOF and scRNAseq data, much more detail needs to be added here. How were these clustered, what was the QC for scRNAseq data, etc? The two small paragraphs lack any detail.

      Response 3: We thank the reviewers' comments. The detail about the analysis of the CyTOF and scRNAseq data will be added in the Materials and Methods section.

      Comment 4. There is also insufficient detail presented about the quantities or proportions of various cell populations. For example, gdT cells represent very small proportions of the CyTOF plots shown in Figures 1B, 1C, & 1E, yet in Figures 2I, 2K, & 2K there are many gdT cells shown in subcluster analysis without a description of how many cells are actually represented, and where they came from. How were biological replicates normalized for fair statistical comparison between groups?

      Response 4: We thank the reviewers' comments. In Figure 1, CD45+ immune cells were clustered into 10 subpopulations, which included gdT cells. While Figure 2 displays the further clustering analysis of CD4+T, CD8+T, and gdT cells, with gdT cells being further subdivided into 22 clusters (Figure 2-figure supplement 1C). The number of biological replicates (samples) is consistent with Figure 1.

      Comment 5. The figures themselves are very tricky to follow. The clusters are numbered rather than identified by what the authors think they are, the numbers are so small, that they are challenging to read. The paper would be significantly improved if the clusters were clearly labeled and identified. All the heatmaps and the abundance of clusters should be in separate supplementary figures.

      Response 5: We thank the reviewers' comments. The t-SNE distributions of the 15 clusters of CD4+ T cells, 18 clusters of CD8+ T cells, and 22 clusters of gdT cells are shown separately in Figure 2A, F, and I. The heatmaps displaying the expression levels of markers in these clusters of CD4+ T cells, CD8+ T cells, and gdT cells are presented in Figure 2-figure supplement 1A, B, and C, respectively. The t-SNE distributions of the 29 clusters of CD11b+ cells are shown in Figure 3A, and the heatmap displaying the expression levels of markers in these clusters is presented in Figure 3B. As for sc-RNA sequencing, the heatmap and UMAP distributions of the 15 clusters of macrophages are shown separately in Figure 5C and 5D. The UMAP distributions and heatmap of the 12 clusters of T/NK cells are shown in Figure 6A and 6B. The UMAP distributions and heatmap of the 9 clusters of T/NK cells are shown in Figure 7A and 7B.

      Comment 6. The authors should take additional care when constructing figures that their biological replicates (and all replicates) are accurately represented. Figure 2H-2K shows N=10 data points for the normal pregnant (NP) samples when clearly their Table 1 and test denote they only studied N=9 normal subjects.

      Response 6: We thank the reviewers' careful checking. During our verification, we found that one sample in the NP group had pregnancy complications other than PE and GMD. The data in Figure 2H-2K was not updated in a timely manner. We will promptly update this data and reanalyze it.

      Comment 7. There is little to no evaluation of regulatory T cells (Tregs) which are well known to undergird maternal tolerance of the fetus, and which are well known to have overlapping developmental trajectory with RORgt+ Th17 cells. We recommend the authors evaluate whether the loss of Treg function, quantity, or quality leaves CD4+ effector T cells more unrestrained in their effect on PE phenotypes. References should include, accordingly: PMCID: PMC6448013 / DOI: 10.3389/fimmu.2019.00478; PMC4700932 / DOI: 10.1126/science.aaa9420.

      Response 7: We thank the reviewers' comments. We have done the Treg-related animal experiment, which was not shown in this manuscript. We will add the Treg-related data in Figure 6. The injection of CD4+ T cells derived from RUPP mouse, characterized by a reduced frequency of Tregs, could induce PE-like symptoms in pregnant mice. Additionally, we will add a necessary discussion about Tregs.

      Comment 8. In discussing gMDSCs in Figure 3, the authors have missed key opportunities to evaluate bona fide Neutrophils. We recommend they conduct FACS or CyTOF staining including CD66b if they have additional tissues or cells available. Please refer to this helpful review article that highlights key points of distinguishing human MDSC from neutrophils: https://doi.org/10.1038/s41577-024-01062-0. This will both help the evaluation of potentially regulatory myeloid cells that may suppress effector T cells as well as aid in understanding at the end of the study if IL-17 produced by CD4+ Th17 cells might recruit neutrophils to the placenta and cause ROS immunopathology and fetal resorption.

      Response 8: We thank the reviewers' comments. Although we do not have additional tissues or cells available to conduct FACS or CyTOF staining, including for CD66b, we plan to utilize CD15 and CD66b antibodies for immunofluorescence staining of placental tissue. Suppressing effector T cells is a signature feature of MDSCs, and T cells may also influence the functions of MDSCs, we will refer to this review and discuss it in the Discussion section of the article.

      Comment 9. Depletion of macrophages using several different methodologies (PLX3397, or clodronate liposomes) should be accompanied by supplementary data showing the efficiency of depletion, especially within tissue compartments of interest (uterine horns, placenta). The clodronate piece is not at all discussed in the main text. Both should be addressed in much more detail.

      Response 9: We thank the reviewers' comments. We already have the additional data on the efficiency ofmacrophage depletion involving PLX3397 and clodronate liposomes, which were not present in this manuscript, and we'll add it to the manuscript. The clodronate piece is mentioned in the main text (Line 197-201), but only briefly described, because the results using clodronate we obtained were similar to those using PLX3397.

      Comment 10. There are many heatmaps and tSNE / UMAP plots with unhelpful labels and no statistical tests applied. Many of these plots (e.g. Figure 7) could be moved to supplemental figures or pared down and combined with existing main figures to help the authors streamline and unify their message.

      Response 10: We thank the reviewers' comments. We plan to simplify or consolidate the images from Figure5 to Figure7 to make them easier to follow.

      Comment 11. There are claims that this study fills a gap that "only one report has provided an overall analysis of immune cells in the human placental villi in the presence and absence of spontaneous labor at term by scRNA-seq (Miller 2022)" (lines 362-364), yet this study itself does not exhaustively study all immune cell subsets...that's a monumental task, even with the two multi-omic methods used in this paper. There are several other datasets that have performed similar analyses and should be referenced.

      Response 11: We thank the reviewers' comments. We will search for more literature and reference additional studies that have conducted similar analyses.

      Comment 12. Inappropriate statistical tests are used in many of the analyses. Figures 1-2 use the Shapiro-Wilk test, which is a test of "goodness of fit", to compare unpaired groups. A Kruskal-Wallis or other nonparametric t-test is much more appropriate. In other instances, there is no mention of statistical tests (Figures 6-7) at all. Appropriate tests should be added throughout.

      We thank the reviewers' comments. As stated in the Statistical Analysis section (lines 601-604), the Kruskal-Wallis test was used to compare the results of experiments with multiple groups. Comparisons between the two groups in Figures 6-7 were conducted using Student's t-test. The aforementioned statistical methods will be included in the figure legends.

    1. De-scribing them will require great attention to detail: beneathevery setof figures, we must seek not a meaning, but a precautionl we mustsituate them not only in the inextricability of a functioning, but inthe coherenceof a tactic.

      He makes a point here of using these examples as a warning for the future. Being able to notice similarities or a rapid change in general thought due to political influence (although this may not always be public knowledge). I think this is a good point to ensure the take away from this is to be asking the right question.

    1. In all things purely social we can be as separate as the five fin-gers, and yet one as the hand in all things essential to mutual progress.

      Everyone is able to think their own thoughts and have their own opinions which may divide them like "fingers" but things that will move their society foward and benefit everyone will make everyone stand together as "one hand".

    1. Author response:

      Reviewer #1 (Public Review):

      Weakness #1: The authors claim to have identified drivers that label single DANs in Figure 1, but their confocal images in Figure S1 suggest that many of those drivers label additional neurons in the larval brain. It is also not clear why only some of the 57 drivers are displayed in Figure S1.

      As introduced in the results section, we screened 57 driver strains based on previous studies, either they were reported identifying a single (a pair of) dopaminergic neuron (DAN) in larvae or identifying only several DANs in the adult brain indicating the potential of identifying single dopaminergic neuron in larvae. In Figure 1, TH-GAL4 was used to cover all neurons in the DL1 cluster, while R58E02 and R30G08 were well known drivers for pPAM. Fly strains in Figure 1h, k, l, and m were reported as single DAN strains in larvae4, while strains in Figure 1e, f, g were reported identifying only several DANs in adult brains5,6. We examined these strains and only some of them labeled single DANs in 3rd instar larval brains (Figure 1f, g, h, l and m). Among them, only strains in Figure 1f and h labeled single DAN in the brain hemisphere, without labeling other non-DANs. Other strains labeled non-DANs in addition to single DANs (Figure 1g, l and m). Taking ventral nerve cord (VNC) into consideration, strain in Figure 1h also labeled neurons in VNC (Figure S1e), while strain in Figure 1f did not (Figure S1c).

      In summary, the strain in Figure 1f (R76F02AD;R55C10DBD, labeling DAN-c1) is a strain we screened labeling only a single DAN in the 3rd instar larval brains. Others (Figure 1g, h, l, and m) we still describe them as strains labeling single DANs, but they also label one to several non-DANs. In Figure 1, we mainly showed the strains labeling single DANs. The labeling patterns of other screened driver strains were summarized in Table1. Since all brain images of the rest 47 strains are available, we will state in Fig S1 that additional brain images can be provided upon request.

      Weakness #2: Critically, R76F02-AD; R55C10-DBD labels more than one neuron per hemisphere in Figure S1c, and the authors cite Xie et al. (2018) to note that this driver labels two DANs in adult brains. Therefore, the authors cannot argue that the experiments throughout their paper using this driver exclusively target DAN-c1.

      Figure S1c shows single DA neuron in each brain hemisphere. Additional GFP (+) signals were often observed, but not from cell bodies of DANs because they were not stained by a TH antibody. These additional GFP (+) signals were mainly neurites, including axonal terminals, but could be false positive signals or weakly stained non-neuronal cell bodies. This conclusion was based on analysis of a total of 22 larval brains. We will add this in the text or Fig S1 caption. Enlarged insert of GFP (+) signals will be added also to Figure S1c.  

      Weakness #3: Missing from the screen of 57 drivers is the driver MB320C, which typically labels only PPL1-γ1pedc in the adult and should label DAN-c1 in the larva. If MB320C labels DAN-c1 exclusively in the larva, then the authors should repeat their key experiments with MB320C to provide more evidence for DAN-c1 involvement specifically.

      We thank the reviewer for the suggestion. MB320C mainly labels PPL1-y1pedc in the adult brain, with one or two other weakly labeled cells. It will be interesting to investigate the pattern of this driver in 3rd instar larval brains. If it only covers DAN-c1, we can try to knock-down D2R in this strain to check whether it can repeat our results. This will be an interesting fly strain to test, but we believe that it will not be necessary for our current manuscript as DAN-c1 driver is very specific (for details, refer to our response to Reviewer#3). However, this line will be very useful for future experiments.

      Weakness #4: The authors claim that the SS02160 driver used by Eschbach et al. (2020) labels other neurons in addition to DAN-c1. Could the authors use confocal imaging to show how many other neurons SS02160 labels? Given that both Eschbach et al. and Weber et al. (2023) found no evidence that DAN-c1 plays a role in larval aversive learning, it would be informative to see how SS02160 expression compares with the driver the authors use to label DAN-c1.

      We did not have our own images showing DANs in brains of SS02160 driver cross line. However, Extended Data Figure 1 in the paper of Eschbach et al. (2020) shows strongly labeled four neurons on each brain hemisphere9, indicating that this driver is not a strain only labeling one neuron, DAN-c1.

      Weakness #5: The claim that DAN-c1 is both necessary and sufficient in larval aversive learning should be reworded. Such a claim would logically exclude any other neuron or even the training stimuli from being involved in aversive learning (see Yoshihara and Yoshihara (2018) for a detailed discussion of the logic), which is presumably not what the authors intended because they describe the possible roles of other DANs during aversive learning in the discussion.

      We agree that the words ‘necessary’ and ‘sufficient’ are too exclusive for other neurons. As mentioned in the Discussion part, we do think other dopaminergic neurons may also be involved in larval aversive learning. We are going to re-phrase these words by replacing them with more logically appropriate words, such as ‘important’, ‘essential’, or ‘mediating’.

      Weakness #6: Moreover, if DAN-c1 artificial activation conveyed an aversive teaching signal irrespective of the gustatory stimulus, then it should not impair aversive learning after quinine training (Figure 2k). While the authors interpret Figure 2k (and Figure 5) to indicate that artificial activation causes excessive DAN-c1 dopamine release, an alternative explanation is that artificial activation compromises aversive learning by overriding DAN-c1 activity that could be evoked by quinine.

      This is a great point! Yes, we cannot rule out the possibility that artificial activation compromises aversive learning by overriding DAN-c1 activity that could be evoked by quinine. The experimental results with TRPA1 could be caused by depletion of dopamine, or DA inactivation due to prolonged depolarization or adaptation. However, we still think that our hypothesis on the over-excitation of DAN-c1 is more consistent with our experimental results and other published data. Our justification is as follows:

      (1) Associative learning occurs only when the CS and US are paired. In wild type larvae, a specific odor (conditioned stimulus, CS, such as pentyl acetate) depolarizes a subset of Kenyon cells in the mushroom body, while gustatory unconditioned stimulus (US, quinine) induces dopamine release from DAN-c1 to the lower peduncle (LP) compartment in the mushroom body (Figure 7a). Only when the CS and US are paired, calcium influx caused by CS and Gas activated by D1R binding to dopamine will turn on a mushroom body specific version of adenylyl cyclase, rutabaga, which is the co-incidence detector in associative learning (Figure 7d).

      (2) Rutabaga transforms ATP into cAMP, activating PKA signaling pathway and modifying the synaptic strength from mushroom body neurons (MBN, also called Kenyan cells) to the mushroom body output neurons (MBON, Figure 7d). This change in synaptic strength will lead to learned responses when the same odor appears again.

      (3) In our work, we found D2R is expressed in DAN-c1, and knockdown D2R in DAN-c1 impairs larval aversive learning. As D2R reduces cAMP level and neuronal excitability3, we hypothesized that knockdown of D2R in DAN-c1 would remove the inhibition of D2R auto-receptor, and lead to more dopamine (DA) release when US (quinine) was delivered compared to the wild type larvae. The elevated DA release along with calcium influx caused by CS increases the cAMP level in MBN, which leads to the learning deficit (over-excitation, Figure 7b). Mutant larvae with excessive cAMP, dunce, showed aversive learning deficiency, supporting our hypothesis2.

      (4) Our results of TRPA1 can be explained by this over-excitation hypothesis. When DAN-c1 is activated (34C) in distilled water group, the artificial activation mimicked the gustatory activation of quinine. The larvae showed the aversive learning responses towards the odor (Figure 2k DW group). When DAN-c1 is activated (34C) in sucrose group, the artificial activation mimicked the gustatory activation of quinine, so the larvae showed a learning response combining both appetitive and aversive learning (Figure 2k SUC group).

      (5) When DAN-c1 is activated (34C) in quinine group, the artificial activation and the gustatory activation of quinine lead to elevated DA release from DAN-c1. During training, this elevated DA caused over-excitation of MBN, leading to failure of aversive learning (Figure 2k QUI group), which had a similar phenotype compared to larvae with D2R knockdown in DAN-c1.

      (6) Similarly, optogenetic activation of DAN-c1 during aversive training, leads to elevated DA release from DAN-c1 (both gustatory activation of quinine and artificial activation). This would also cause over-excitation of MBN, and lead to failure of aversive learning. Artificial activation in other stages (resting or testing) won’t cause elevated DA release during training, so the aversive learning was not affected (Figure 5b).

      (7) However, when optogenetic activation was applied during training, we did not observe aversive learning responses in the distilled water group, or a reduction in the sucrose group (Figure 5c, Figure 5d). Our explanation is that the optogenetic stimulus we applied is too strong, DAN-c1 has already released elevated DA in both groups. So, the aversive learning in these groups has already been impaired, they just showed the corresponding learning responses to distilled water or sucrose.

      (8) We also applied this over-excitation to activate MBNs. As MBN takes over both appetitive and aversive learnings, over-excitation of MBNs led to deficit in both types of learning, which follows our hypothesis (Figure 6).

      In summary, we hypothesized that DAN-c1 restricts DA release via activation of D2R, which is important for larval aversive learning. D2R knockdown or artificial activation of DAN-c1 during training would induce elevated DA release, leading to over-excitation of MBNs and failure of aversive learning.

      Weakness #7: The authors should not necessarily expect that D2R enhancer driver strains would reflect D2R endogenous expression, since it is known that TH-GAL4 does not label p(PAM) dopaminergic neurons.

      Just like the example of TH-GAL4, it is possible that the D2R driver strains may partially reflect the expression pattern of endogenous D2R in larval brains. When we crossed the D2R driver strains with the GFP-tagged D2R strain, however, we observed co-localization in DM1 and DL2b dopaminergic neurons, as well as in mushroom body neurons (Figure S3 c to h). In addition, D2R knockdown with D2R-miR directly supported that the GFP-tagged D2R strain reflected the expression pattern of endogenous D2R (Figure 4b to d, signals were reduced in DM1). In summary, we think the D2R driver strains supported the expression pattern we observed from the GFP-tagged D2R strain, especially in DM1 DANs.

      Weakness #8: Their observations of GFP-tagged D2R expression could be strengthened with an anti-D2R antibody such as that used by Lam et al., (1999) or Love et al., (2023).

      Love et al., (2023) used the antibody from Draper et al.10. We have tried the same antibody, but we were not able to observe clear signals after staining. Maybe it is not specific for the neurons in the fly larval brain, or our staining protocol did not fit with this antibody.

      Unfortunately, we were not able to find Lam (1999) paper.

      Weakness #9: Finally, the authors could consider the possibility other DANs may also mediate aversive learning via D2R. Knockdown of D2R in DAN-g1 appears to cause a defect in aversive quinine learning compared with its genetic control (Figure S4e). It is unclear why the same genetic control has unexpectedly poor aversive quinine learning after training with propionic acid (Figure S5a). The authors could comment on why RNAi knockdown of D2R in DAN-g1 does not similarly impair aversive quinine learning (Figure S5b).

      We also think that other DANs may be involved in aversive learning. We re-analyzed the learning assay data, seemingly D2R knockdown in DAN-g1 with miR partially affected aversive learning when trained with pentyl acetate (Figure S4e). We are going to build single statistic panels for DAN-g1 and DAN-d1. However, neither larvae with D2R knockdown in DAN-g1 using miR trained with propionic acid (Figure S5a), nor larvae with D2R knockdown in DAN-g1 using RNAi trained with pentyl acetate (Figure S5b) showing aversive learning deficit. We will add paragraphs about this in both Results and Discussion sections.

      Reviewer #2 (Public Review):

      Weakness#1: Is not completely clear how the system DAN-c1, MB neurons and Behavioral performance work. We can be quite sure that DAN-c1;Shits1 were reducing dopamine release and impairing aversive memory (Figure 2h). Similarly, DAN-c1;ChR2 were increasing dopamine release and also impaired aversive memory (Figure 5b). However, is not clear what is happening with DAN-c1;TrpA1 (Figure 2K). In this case the thermos-induction appears to impair the behavioral performance of all three conditions (QUI, DW and SUC) and the behavior is quite distinct from the increase and decrease of dopamine tone (Figure 2h and 5b).

      The study successfully examined the role of D2R in DAN-c1 and MB neurons in olfactory conditioning. The conclusions are well supported by the data, with the exception of the claim that dopamine release from DAN-c1 is sufficient for aversive learning in the absence of unconditional stimulus (Figure 2K). Alternatively, the authors need to provide a better explanation of this point.

      Please refer to our response to Weakness #6 of Public Reviewer #1.

      Reviewer #3 (Public Review):

      Weakness #1: It is a strength of the paper that it analyses the function of dopamine neurons (DANs) at the level of single, identified neurons, and uses tools to address specific dopamine receptors (DopRs), exploiting the unique experimental possibilities available in larval Drosophila as a model system. Indeed, the result of their screening for transgenic drivers covering single or small groups of DANs and their histological characterization provides the community with a very valuable resource. In particular the transgenic driver to cover the DANc1 neuron might turn out useful. However, I wonder in which fraction of the preparations an expression pattern as in Figure 1f/ S1c is observed, and how many preparations the authors have analyzed. Also, given the function of DANs throughout the body, in addition to the expression pattern in the mushroom body region (Figure 1f) and in the central nervous system (Figure S1c) maybe attempts can be made to assess expression from this driver throughout the larval body (same for Dop2R distribution).

      We thank the reviewer for the positive comments and the suggestions. For the strain R76F02AD; R55C10DBD, we examined 22 third instar larval brains expressing GFP or Syt-GFP and Den-mCherry, all of them clearly labeled DAN-c1. Half of them only labeled DAN-c1, the rest have 1 to 5 weak labeled soma without neurites. Barely 1 or 2 strong labeled cells appear. These non-DAN-c1 neurons are seldom dopaminergic neurons. In VNC, 8 out of 12 do not label cells, 3 have 2-4 strong labeled cells. These data supported that R76F02AD;R55C10DBD exclusively labeled DAN-c1 in 3rd instar larval brains.

      For the question about the pattern of R76F02AD; R55C10DBD and the expression pattern of D2R in larval body, it is an interesting question. However, our main focus was on the central nervous system and the learning behaviors in fruit fly larvae, we may investigate this question in the future.

      Weakness #2: A first major weakness is that the main conclusion of the paper, which pertains to associative memory (last sentence of the abstract, and throughout the manuscript), is not justified by their evidence. Why so? Consider the paradigm in Figure 2g, and the data in Figure 2h (22 degrees, the control condition), where the assay and the experimental rationale used throughout the manuscript are introduced. Different groups of larvae are exposed, for 30min, to an odour paired with either i) quinine solution (red bar), ii) distilled water (yellow bar), or iii) sucrose solution (blue bar); in all cases this is followed by a choice test for the odour on one side and a distilled-water blank on the other side of a testing Petri dish. The authors observe that odour preference is low after odour-quinine pairing, intermediate after odour-water pairing and high after odour-sucrose pairing. The differences in odour preference relative to the odour-water case are interpreted as reflecting odour-quinine aversive associations and odour-sucrose appetitive associations, respectively. However, these differences could just as well reflect non-associative effects of the 30-min quinine or sucrose exposure per se (for a classical discussion of such types of issues see Rescorla 1988, Annu Rev Neurosci, or regarding Drosophila Tully 1988, Behav Genetics, or with some reference to the original paper by Honjo & Furukubo-Tokunaga 2005, J Neurosci that the authors reference, also Gerber & Stocker 2007, Chem Sens).<br /> As it stands, therefore, the current 3-group type of comparison does not allow conclusions about associative learning.

      We adopted this single odor larval learning paradigm from Honjo’s papers1,2. In these works, Honjo et al. first designed and performed this single odor paradigm for larval olfactory associative learning. To address the reviewer’s question about the potential non-associative effects of the 30-min quinine or sucrose exposure, we would like to defend it primarily based on results from Honjo et al. (2005 and 2009). They applied the odorant to the larvae after training, only the ones had paired training with both odor and unconditioned stimulus (quinine or sucrose) showed learning responses. Larvae exposed 30 min in only odorant or unconditioned stimulus did not show different response to the odor compared to the naïve group1,2. To validate this paradigm induces associative learning responses, they also tested the paradigm from three aspects:

      (1) The odor responses are associative. Honjo et al. showed only when the odorant paired with unconditioned stimulus would induce corresponding attraction or repulsion of larvae to the odor. Neither odorant alone, unconditioned stimulus alone, nor temporal dissociation of odorant and unconditioned stimulus would induce learning responses.

      (2) The odor responses are odor specific. When applied a second odorant that was not used for training, larvae only showed learning responses to the unconditioned stimulus paired odor. This result ruled out the explanation of a general olfactory suppression and indicates larvae can discriminate and specifically alter the responses to the odor paired with unconditioned stimulus. Although the two-odor reciprocal training is not used, these results can show the association of unconditioned stimulus and the corresponding paired odor.

      (3) Well known learning deficit mutants did not show learned responses in this learning paradigm. Honjo et al. tested mutants (e.g., rut and dnc) showing learning deficits in the adult stage with two odor reciprocal learning paradigm. These mutant larvae also failed to show learning responses tested with the single odor larval learning paradigm.

      (4) In our study, we used two distinct odorants (pentyl acetate and propionic acid), as well as two D2R knockdown strains (UAS-miR and UAS-RNAi for D2R). We obtained similar results for larvae with D2R knockdown in DAN-c1. In addition, our naïve olfactory, naïve gustatory, and locomotion data ruled out the possibilities that the responses were caused by impaired sensory or motor functions. Comparison with the control group (odor paired with distilled water) ruled out the potential effects if habituation existed. All these results supported this single odor learning paradigm is reliable to assess the learning abilities of Drosophila larvae. And the failure of reduction in R.I when larvae with D2R knockdown in DAN-c1 were trained in quinine paired with the odorant is caused by deficit in aversive learning ability. We will add a paragraph to address this in the Discussion part.

      Weakness #3: A second major weakness is apparent when considering the sketch in Figure 2g and the equation defining the response index (R.I.) (line 480). The point is that the larvae that are located in the middle zone are not included in the denominator. This can inflate scores and is not appropriate. That is, suppose from a group of 30 animals (line 471) only 1 chooses the odor side and 29, bedazzled after 30-min quinine or sucrose exposure or otherwise confused by a given opto- or thermogenetic treatment, stay in the middle zone... a P.I. of 1.0 would result.

      It is a good question. We gave 5 min during the testing stage to allow the larvae to wander in the testing plate. Under most conditions, more than half of larvae (>50%) will explore around, and the rest may stay in the middle zone (will not be calculated). We used 25-50 larvae in each learning assay, so finally around 10-30 larvae will locate in two semicircular areas. Indeed, based on our raw data, a R.I. of 1 seldom appears. Most of the R.I.s fall into a region from -0.2 to 0.8. We should admit that the calculation equation of R. I. is not linear, so it would be sharper (change steeply) when it approaching to -1 and 1. However, as most of the values fall into the region from -0.2 to 0.8, we think ‘border effects’ can be neglected if we have enough numbers of larvae in the calculation (10-30).

      Weakness #4: Unless experimentally demonstrated, claims that the thermogenetic effector shibire/ts reduces dopamine release from DANs are questionable. This is because firstly, there might be shibire/ts-insensitive ways of dopamine release, and secondly because shibire/ts may affect co-transmitter release from DANs.

      Shibirets1 gene encodes a thermosensitive mutant of dynamin, expressing this mutant version in target neurons will block neurotransmitter release at the ambient temperature higher than 30C, as it represses vesicle recycling1. It is a widely used tool to examine whether the target neuron is involved in a specific physiological function. We cannot rule out that there might be Shibirets1 insensitive ways of dopamine release exist. However, blocking dopamine release from DAN-c1 with Shibirets1 has already led to learning responses changing (Figure 2h). This result indicated that the dopamine release from DAN-c1 during training is important for larval aversive learning, which has already supported our hypothesis.

      For the second question about the potential co-transmitter release, we think it is a great question. Recently Yamazaki et al. reported co-neurotransmitters in dopaminergic system modulate adult olfactory memories in Drosophila_11, and we cannot rule out the roles of co-released neurotransmitters/neuropeptides in larval learning. Ideally, if we could observe the real time changes of dopamine release from DAN-c1 in wild type and TH knockdown larvae would answer this question. However, live imaging of dopamine release from one dopaminergic neuron is not practical for us at this time. On the other hand, the roles of dopamine receptors in olfactory associative learning support that dopamine is important for _Drosophila learning. D1 receptor, dDA1, has been proven to be involved in both adult and larval appetitive and aversive learning12,13. In our work, D2R in the mushroom body showed important roles in both larval appetitive and aversive learning (Figure 6a). All this evidence reveals the importance of dopamine in Drosophila olfactory associative learning. In addition, there is too much unknow information about the co-release neurotransmitter/neuropeptides, as well as their potential complex ‘interaction/crosstalk’ relations. We believe that investigation of co-released neurotransmitter/neuropeptides is beyond the scope of this study at this time.

      Weakness #5: It is not clear whether the genetic controls when using the Gal4/ UAS system are the homozygous, parental strains (XY-Gal4/ XY-Gal4 and UAS-effector/ UAS-effector), or as is standard in the field the heterozygous driver (XY-Gal4/ wildtype) and effector controls (UAS-effector/ wildtype) (in some cases effector controls appear to be missing, e.g. Figure 4d, Figure S4e, Figure S5c).

      Almost all controls we used were homozygous parental strains. They did not show abnormal behaviors in either learnings or naïve sensory or locomotion assays. The only exception is the control for DAN-c1, the larvae from homozygous R76F02AD; R55C10DBD strain showed much reduced locomotion speed (Figure S6). To prevent this reduced locomotion speed affecting the learning ability, we used heterozygous R76F02AD; R55C10DBD/wildtype as control, which showed normal learning, naïve sensory and locomotion abilities (Figure 4e to i).

      For Figure 4d, it is a column graph to quantify the efficiency of D2R knockdown with miR. Because we need to induce and quantify the knockdown effect in specific DANs (DM1), only TH-GAL4 can be used as the control group, rather than UAS-D2R-miR.

      For the missing control groups in Figure S4e and S5c, we have shown them in other Figures (Figure 4e). We will re-organize the figures to make them easier to understand.

      Weakness #6: As recently suggested by Yamada et al 2024, bioRxiv, high cAMP can lead to synaptic depression (sic). That would call into question the interpretation of low-Dop2R leading to high-cAMP, leading to high-dopamine release, and thus the authors interpretation of the matching effects of low-Dop2R and driving DANs.

      We will read through this paper and try to add it as possible explanations for the learning mechanisms. As we introduced in the Discussion section, the learning mechanism is quite complex, mixing both non-linear neuronal circuits and multiple signaling pathways, in responding to complex environmental learning contexts. We will try to develop a better hypothesis with the best compatibility to accommodate our results with published data.

      Reference

      (1) Honjo, K. & Furukubo-Tokunaga, K. Induction of cAMP response element-binding protein-dependent medium-term memory by appetitive gustatory reinforcement in Drosophila larvae. J Neurosci 25, 7905-7913 (2005). https://doi.org/10.1523/JNEUROSCI.2135-05.2005

      (2) Honjo, K. & Furukubo-Tokunaga, K. Distinctive neuronal networks and biochemical pathways for appetitive and aversive memory in Drosophila larvae. J Neurosci 29, 852-862 (2009). https://doi.org/10.1523/JNEUROSCI.1315-08.2009

      (3) Neve, K. A., Seamans, J. K. & Trantham-Davidson, H. Dopamine receptor signaling. J Recept Signal Transduct Res 24, 165-205 (2004). https://doi.org/10.1081/rrs-200029981

      (4) Saumweber, T. et al. Functional architecture of reward learning in mushroom body extrinsic neurons of larval Drosophila. Nat Commun 9, 1104 (2018). https://doi.org/10.1038/s41467-018-03130-1

      (5) Aso, Y. & Rubin, G. M. Dopaminergic neurons write and update memories with cell-type-specific rules. Elife 5 (2016). https://doi.org/10.7554/eLife.16135

      (6) Xie, T. et al. A Genetic Toolkit for Dissecting Dopamine Circuit Function in Drosophila. Cell Rep 23, 652-665 (2018). https://doi.org/10.1016/j.celrep.2018.03.068

      (7) Hartenstein, V., Cruz, L., Lovick, J. K. & Guo, M. Developmental analysis of the dopamine-containing neurons of the Drosophila brain. J Comp Neurol 525, 363-379 (2017). https://doi.org/10.1002/cne.24069

      (8) Aso, Y. et al. The neuronal architecture of the mushroom body provides a logic for associative learning. Elife 3, e04577 (2014). https://doi.org/10.7554/eLife.04577

      (9) Eschbach, C. et al. Recurrent architecture for adaptive regulation of learning in the insect brain. Nat Neurosci 23, 544-555 (2020). https://doi.org/10.1038/s41593-020-0607-9

      (10) Draper, I., Kurshan, P. T., McBride, E., Jackson, F. R. & Kopin, A. S. Locomotor activity is regulated by D2-like receptors in Drosophila: an anatomic and functional analysis. Dev Neurobiol 67, 378-393 (2007). https://doi.org/10.1002/dneu.20355

      (11) Yamazaki, D., Maeyama, Y. & Tabata, T. Combinatory Actions of Co-transmitters in Dopaminergic Systems Modulate Drosophila Olfactory Memories. J Neurosci 43, 8294-8305 (2023). https://doi.org/10.1523/jneurosci.2152-22.2023

      (12) Selcho, M., Pauls, D., Han, K. A., Stocker, R. F. & Thum, A. S. The role of dopamine in Drosophila larval classical olfactory conditioning. PLoS One 4, e5897 (2009). https://doi.org/10.1371/journal.pone.0005897

      (13) Kim, Y. C., Lee, H. G. & Han, K. A. D1 dopamine receptor dDA1 is required in the mushroom body neurons for aversive and appetitive learning in Drosophila. J Neurosci 27, 7640-7647 (2007). https://doi.org/10.1523/JNEUROSCI.1167-07.2007

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Thank you very much for your editorial handling of our manuscript entitled 'A conserved fungal Knr4/Smi1 protein is vital for maintaining cell wall integrity and host plant pathogenesis'. We have taken on board the reviewers' comments and thank them for their diligence and time in improving our manuscript.

      Please find our responses to each of the comments below.

      Reviewer(s)' comments

      Reviewer #1


      Major comments:


      __1.1. As a more critical comment, I find the presentation of the figures somewhat confusing, especially with the mixing of main figures, supplements to the main figures, and actual supplemental data. On top of that, the figures are not called up in the right order (e.g. Figure 4 follows 2D, while 3 comes after 4; Figure 6 comes before 5...), and some are never called up (I think) (e.g. Figure 1B, Figure 2B). __


      __Response: __The figure order has been revised according to the reviewer's suggestion, while still following eLife's formatting guidelines for naming supplementals. Thank you.

      1.2. I agree that there should be more CWI-related genes in the wheat module linked to the FgKnr4 fungal module, or, vice-versa, CW-manipulating genes in the fungal module. It would at least be good if the authors could comment further on if they find such genes, and if not, how this fits their model.


      Response: Thank you for your insightful suggestion regarding the inclusion of more CWI-related genes in the wheat module linked to the FgKnr4 fungal module F16, or vice versa. We did observe a co-regulated response between the wheat module W05 which is correlated to the FgKnr4 module F16. Namely, we observed an enrichment of oxidative stress genes including respiratory burst oxidases and two catalases (lines 304 - 313) in the correlated wheat module (W05). Early expression of these oxidative stress inducing genes likely induces the CWI pathway in the fungus, which is regulated by FgKnr4. Knr4 functions as both a regulatory protein in the CWI pathway and as a scaffolding protein across multiple pathways in S. cerevisiae (Martin-Yken et al., 2016, https://onlinelibrary.wiley.com/doi/10.1111/cmi.12618 ). Scaffolding protein-encoding genes are typically expressed earlier than the genes they regulate to enable pre-assembly with their interacting partners, ensuring that signaling pathways are ready to activate when needed. In this context, the CWI integrity MAPKs Bck1 and Mkk1 are part of module F05, which includes two chitin synthases and a glucan synthase. This module is highly expressed during the late symptomless phase. The MAPK Mgv1, found in module F13, is expressed consistently throughout the infection process, which aligns with the expectation that MAPKs are mainly post-transcriptionally regulated. Thank you for bringing our attention to this, this is now included in the discussion (lines 427 - 443) along with eigengene expression plots of all modules added to the supplementary (Figure 3 - figure supplement 1).

      To explore potential shared functions of FgKnr4 with other genes in its module, we re-analyzed the high module membership genes within module F16, which includes FgKnr4, using Knetminer (Hassani-Pak et al., 2021; https://onlinelibrary.wiley.com/doi/10.1111/pbi.13583 ). This analysis revealed that 8 out of 15 of these genes are associated with cell division and ATP binding. Four of the candidate genes are also part of a predicted protein-protein interaction subnetwork of genes within module F16, which relate to cell cycle and ATP binding. In S. cerevisiae, the absence of Knr4 results in cell division dysfunction (Martin-Yken et al., 2016, https://onlinelibrary.wiley.com/doi/10.1111/cmi.12618 ). Accordingly, we tested sensitivity of ΔFgknr4 to microtubule inhibitor benomyl (a compound commonly used to identify mutants with cell division defects; Hoyt et al., 1991 https://www.cell.com/cell/pdf/0092-8674(81)90014-3.pdf). We found that the ΔFgknr4 mutant was more susceptible to benomyl, both when grown on solid agar and in liquid culture. This data has now been added Figure 7, and referred to in lines 338-348.

      __Specific issues: __


      1.3. In the case of figure 5, I generally find it hard to follow. In the text (line 262/263), the authors state that 5C shows "eye-shaped lesions" caused by ΔFgknr4 and ΔFgtri5, but I can't see neither (5C appears to be a ΔFgknr4 complementation experiment). The figure legend also states nothing in this regard.

      __Response: __Thank you for your suggestion. We have amended the manuscript to include an additional panel that shows the dissected spikelet without its outer glumes, making the eye shaped diseased regions more visible in Figure 5.

      __1.4. Figure 5D supposedly shows 'visibly reduced fungal burden' in ΔFgknr4-infected plants, but I can't really see the fungal burden in this picture, but the infected section looks a lot thinner and more damaged than the control stem, so in a way more diseased. __


      Response: __Thank you for your insight. We have revised our conclusions based on this image to state that while ΔFgknr4 can colonise host tissue, it does so less effectively compared to the wild-type strain as we are unable to quantitatively evaluate fungal burden using image-colour thresholding due to the overlapping colours of the fungal cells and wheat tissues. Decreased host colonisation is evidenced by (i) reduced fungal hyphae proliferation, particularly in the thicker adaxial cell layer, (ii) collapsed air spaces in wheat cells, and (iii) increased polymer deposition at the wheat cell walls, indicating an enhanced defence response. __Figure 5 has been amended to include these observations in the corresponding figure legend and the resin images now include insets with detailed annotation.

      __1.5. The authors then go on to state (lines 272-273) that they analyzed the amounts of DON mycotoxin in infected tissues, but don't seem to show any data for this experiment. __

      Response: __We have amended this to now include the data in __Figure 5 - figure supplement 2B, thank you.

      Reviewer #2


      __Major issues: __


      2.1 If Knf4 is involved in the CWI pathway, what other genes involved in the CWI pathway are in this fungal module? one of the reasons for developing modules or sub-networks is to assign common function and identify new genes contributing to the function. since FgKnr4 is noted to play a role in the CWI pathways, then genes in that module should have similar functions. If WGCN does not do that, what is the purpose of this exercise?


      Response: __Thank you for raising this point regarding the role of FgKnr4 in the CWI pathway and the expectations for genes of shared function within the FgKnr4 module F16. We did observe that the module containing FgKnr4 (F16) was also correlated to a wheat module (W05) which was significantly enriched for oxidative stress genes. This pathogen-host correlated pattern led us to study module F16, which otherwise lacks significant gene ontology term enrichment, unique gene set enrichments, and contains few characterised genes. This is now highlighted in __lines 233-246. This underscores the strength of the WGCNA. By using high-resolution RNA-seq data to map modules to specific infection stages, we identified an important gene that would have otherwise been overlooked. This approach contrasts with other network analyses that often rely on the guilt-by-association principle to identify novel virulence-related genes within modules containing known virulence factors, potentially overlooking significant pathways outside the scope of prior studies. Therefore, our analysis has already benefited from several advantages of WGCNA, including the identification of key genes with high module membership that may be critical for biological processes, as well as generating a high-resolution, stage-specific co-expression map of the F. graminearum infection process in wheat. This point is now emphasised in lines 233-252. As discussed in response to reviewer 1, Knr4 functions as both a regulatory protein in the CWI pathway and as a scaffolding protein across multiple pathways in S. cerevisiae (Martin-Yken et al., 2016, https://onlinelibrary.wiley.com/doi/10.1111/cmi.12618 ) which would explain its clustering separate from the CWI pathway genes. The high module membership genes within module F16 containing FgKnr4 were re-analysed using Knetminer (Hassani-Pak et al., 2021; https://onlinelibrary.wiley.com/doi/10.1111/pbi.13583 ), which found that 8/15 of these genes were related to cell division and ATP binding. Four of the candidate genes are also part of a predicted protein-protein interaction subnetwork of genes within module F16, which relate to cell cycle and ATP binding. In S. cerevisiae, the absence Knr4 leads to dysfunction in cell division. Accordingly, we tested sensitivity of ΔFgknr4 to the microtubule inhibitor benomyl (a compound commonly used to identify mutants with cell division defects; Hoyt et al., 1991 https://www.cell.com/cell/pdf/0092-8674(81)90014-3.pdf). We found that the ΔFgknr4 mutant was more susceptible to benomyl, both when grown on solid agar and in liquid culture. This data has now been added as Figure 7 and referred to in lines 338-348.


      2.2. Due to development defects in the Fgknr1 mutant, I would not equate to as virulence factor or an effector gene.


      __Response: __We are in complete agreement with the reviewer and are not suggesting that FgKnr4 is an effector or virulence factor, we have been careful with our wording to indicate that FgKnr4 is simply necessary for full virulence and its disruption results in reduced virulence and have outlined how we believe FgKnr4 participates in a fungal signaling pathway required for infection of wheat.


      2.3. What new information is provided with WGCN modules compared with other GCN network in Fusarium (examples of GCN in Fusarium is below) ____https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5069591/ https://doi.org/10.1186/s12864-020-6596-y____ DOI: 10.1371/journal.pone.0013021. The GCN networks from Fusarium have already identified modules necessary/involved in pathogenesis.

      Response: __The 2016 New Phytologist gene regulatory network (GRN) by Guo et al. is large and comprehensive. However, only three of the eleven datasets are in planta, with just one dataset focusing on F. graminearum infection on wheat spikes. The other two in planta datasets involve barley infection and Fusarium crown rot. By combining numerous in planta and in vitro datasets, the previous GRNs lack the fine resolution needed to identify genetic relationships under specific conditions, such as the various stages of symptomatic and symptomless F. graminearum infection of mature flowering wheat plants. This limitation is highlighted in the 2016 paper itself. This network is expanded in the Guo et al., 2020 BMC genomics paper where it includes one additional in planta and nine in vitro datasets. However, the in planta dataset involves juvenile wheat coleoptile infection, which serves as an artificial model for wheat infection but is not on mature flowering wheat plants reminiscent of Fusarium Head Blight of cereals in the field. This model differs significantly in the mode of action of F. graminearum, notably DON mycotoxin is not essential for virulence in this context (Armer et al. 2024, https://pubmed.ncbi.nlm.nih.gov/38877764/ ). The Guo et al., 2020 paper still faces the same issues in terms of resolution and the inability to draw conclusions specific to the different stages of F. graminearum infection. Additionally, these GRNs use Affymetrix data, which miss over 400 genes (~ 3 % of the genome) from newer gene models. In contrast, our study addresses these limitations by analysing a meticulously sampled, stage- and tissue-specific in planta RNA-seq dataset using the latest reference annotation. Our approach provides higher resolution and insights into host transcriptomic responses during the infection process. The importance of our study in the context of these GRNs is now addressed in the introduction (__lines 85-92).


      2.4. Ideally, the WGCN should have been used identify plant targets of Fusarium pathogenicity genes. This would have provided credibility and usefulness of the WGCN. Many bioinformatic tools are available to identify virulence factors and the utility of WGCN in this regard is not viable. However, if the authors had overlapped the known virulence factors in a fungal module to a particular wheat module, the impact of the WGCN would be great. The module W12 has genes from numerous traits represented and WGCN could have been used to show novel links between Fg and wheat. For example, does tri5 mutant affect genes in other traits?

      __Response: __Thank you for your suggestions. In this study we have shown the association between the main fungal virulence factor of F. graminearum, DON mycotoxin, with wheat detoxification responses. Through this we have identified a set of tri5 responsive genes and validated this correlation in two genes belonging to the phenylalanine pathway and one transmembrane detoxification gene. Although we could validate more genes in this tri5 responsive wheat module, our paper aimed to investigate previously unstudied aspects of the F. graminearum infection process and how the fungus responded to changing conditions within the host environment. We accomplished this by characterising a gene within a fungal module that had limited annotation enrichment and few characterised genes. Tri5 on the other hand is the most extensively studied gene in F. graminearum and while the network we generated may offer new insights into tri5 responsive genes, this is beyond the scope of our current study. In addition to the tri5 co-regulated response, we have also demonstrated the coordinated response between the fungal module F16, which contains FgKnr4 that is necessary for tolerance to oxidative stress, and the wheat module W05, which is enriched for oxidative stress genes.


      While our co-expression network approach can be used to explore and validate other early downstream signaling and defense components in wheat cells, several challenges must be considered: (a) the poor quality of wheat gene calls, (b) genetic redundancy due to both homoeologous genes and large gene families, and (c) the presence of DON, which can inhibit translation and prevent many transcriptional changes from being realised within the host responses. Additionally, most plant host receptors are not transcriptionally upregulated in response to pathogen infection (most R gene studies for the NBS-LRR and exLRR-kinase classes), making their discovery through a transcriptomics approach unlikely. These points will be included in our discussion (lines 408-413), thank you.

      Specific issues

      • *

      2.5. Since tri5 mutant was used a proof of concept to link wheat/Fg modules, it would have been useful to show that TRI14, which is not involved DON biosynthesis, but involved in virulence ( https://doi.org/10.3390/applmicrobiol4020058____) impact the wheat module genes.


      Response: __Our goal was to show that wheat genes respond to the whole TRI cluster, not just individual TRI genes. Therefore, the tri5 mutant serves as a solid proof-of-concept, because TRI5 is essential for DON biosynthesis, the primary function of the TRI gene cluster, thereby representing the function of the cluster as a whole. This is now clarified in __lines 217-219. Additionally, the uncertainties surrounding other TRI mutants would complicate the question we were addressing-namely, whether a wheat module enriched in detoxification genes is responding to DON mycotoxin, as implied by shared co-expression patterns with the TRI cluster. For instance, the referenced TRI14 paper indicates that DON is produced in the same amount in vitro in a single media. Although the difference is not significant, the average DON produced is lower for the two Δtri14 transformants tested. Therefore, we cannot definitively rule out that TRI14 is involved in DON biosynthesis and extrapolate this to DON production in planta. Despite this, the suggestion is interesting, and would make a nice experiment but we believe it does not contribute to the overall aim of this study.

      2.6. Moreover, prior RNAseq studies with tri5 mutant strain on wheat would have revealed the expression of PAL and other phenylpropanoid pathway genes?

      __Response: __We agree that this would be an interesting comparison to make but unfortunately no dataset comparing in planta expression of the tri5 mutant within wheat spikes exists.

      2.7. Table S1 lists 15 candidate genes of the F16 module; however, supplementary File 1 indicates 74 genes in the same module. The basis of exclusion should be explained. The author has indicated genes with high MM was used as representative of the module. The 59 remaining genes of this module did not meet this criteria? Give examples.


      Response: __The 15 genes with the highest module membership were selected as initial candidates for further shortlisting from the 74 genes within module F16. In WGCNA, genes with high module membership (MM) (i.e. intramodular connectivity) are predicted to be central to the biological functions of the module (Langfelder and Horvath, 2008; https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-9-559 ) and continues to be a metric to identify biologically significant genes within WGCN analyses (https://bmcplantbiol.biomedcentral.com/articles/10.1186/s12870-024-05366-0 Tominello-Ramirez et al., 2024; https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9151341/ ;Zheng et al., 2022; https://www.nature.com/articles/s41598-020-80945-3 Panahi and Hejazi et al 2021). Following methods by Mateus et al. (2019) (https://academic.oup.com/ismej/article/13/5/1226/7475138 ) key genes were defined as those exhibiting elevated MM within the module, which were also strongly correlated (R > |0.70|) with modules of the partner organism (wheat). We have clarified this point in the manuscript. Thank you for the suggestion. (__Lines 253-263).

      2.____8. A list from every module that pass this criteria will be useful resource for functional characterization studies.


      __Response: __A supplementary spreadsheet has been generated which includes full lists of the top 15 genes with the highest module membership within the five fungal modules correlated to wheat modules and a summary of shared attributes among them. Thank you for this suggestion.

      2.9. Figure 3 indicates TRI genes in the module F12; your PHI base in Supp File S2 lists only TRI14. Why other TRI genes such as TRI5 not present in this File?


      Response: For clarity, the TRI genes in module F12 are TRI3, TRI4, TRI11, TRI12, and TRI14 which was stated in Table 1. TRI5 clusters with its neighboring regulatory gene TRI6 in module F11, which exhibits a similar but reduced expression pattern compared to module F12. To improve clarity on this the TRI genes in module F12 are also listed in-text in line 168 and added to Figure 4. The enrichment and correlated relationship of W12 to a cluster's expression still imply a correlated response of the wheat gene to the TRI cluster's biosynthetic product (DON), which is absent in the Δtri5 mutant.

      TRI14 and TRI12 are listed in PHI-base. TRI12 was mistakenly excluded due to an unmapped Uniprot ID, which were added separately in the spreadsheet. We will recheck all unmapped ID lists to ensure all PHI-base entries are included in the final output. Thank you for pointing out this error.


      2.10. What is purpose of listing the same gene multiple times? Example, osp24 (a single gene in Fg) is listed 13 times in F01 module.


      __Response: __This is a consequence of each entry having a separate PHI ID, which represents different interactions including inoculations on different cultivar. Cultivar and various experimental details were omitted from the spreadsheet to reduce information density, however the multiple PHI base ID's will be kept separate to make the data more user friendly when working with the PHI-base database. An explanation for this is now provided in the file's explanatory worksheet, thank you.

      Reviewer #3:


      3.1. Why only use of high confidence transcripts maize to map the reads and not the full genome like Fusarium graminearum? I have never analyzed plant transcriptome.


      __Response: __ In the wheat genome, only high-confidence gene calls are used by the global community (Choulet et al., 2023; https://link.springer.com/chapter/10.1007/978-3-031-38294-9_4 ) until a suitable and stable wheat pan-genome becomes available.

      3.2. The regular output of DESeq are TPMs, how did the authors obtain the FPKM used in the analysis?


      Response: FPKM was calculated using the GenomicFeatures package and included on GitHub to enhance accessibility for other users. However, the input for WGCNA and this study as a whole was normalised counts rather than FPKM. The FPKM analysis was done to improve interoperability of the data for future users and made available on Github. To complement this, the information regarding FPKM calculation is now included in the methods section of the revised manuscript (line 491).

      3.3. Do the authors have a Southern blot to prove the location of the insertion and number of insertions in Zymoseptoria tritici mutant and complemented strains?


      __Response: __No, but the phenotype is attributed to the presence or absence of ZtKnr4, as the mutant was successfully complemented in multiple phenotypic aspects. This satisfies Koch's postulates which is the gold standard for reverse genetics experimentation (Falkow 1988; https://www.jstor.org/stable/4454582 ).

      __3.4. Boxplots and bar graphs should have the same format. In Figures 5 B and F and supplementary figure 6.3 the authors showed the distribution of samples but it is lacking in figure 3 B and all bar graphs. __


      __Response: __Graphs have been modified to display the distribution of all samples, thank you.

      3.5. Line 247 FGRAMPH1_0T23707 should be FGRAMPH1_01T23707


      __Response: __Thank you this has now been amended.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper reports a number of somewhat disparate findings on a set of colorectal tumour and infiltrating T-cells. The main finding is a combined machine-learning tool which combines two previous state-of-the-art tools, MHC prediction, and T-cell binding prediction to predict immunogenicity. This is then applied to a small set of neoantigens and there is a small-scale validation of the prediciton at the end.

      Strengths:

      The prediction of immunogenic neoepitopes is an important and unresolved question.

      Weaknesses:

      The paper contains a lot of extraneous material not relevant to the main claim. Conversely, it lacks important detail on the major claim.

      (1) The analysis of T cell repertoire in Figure 2 seems irrelevant to the rest of the paper. As far as I could ascertain, this data is not used further.

      We appreciate the reviewer for their valuable feedback. We concur with the reviewer's observation that the analysis of the TCR repertoire in Figure 2 should be moved to the supplementary section. We have moved Figures 2B to 2F to Supplementary Figure 2.

      However, the analysis of TCR profiles is still presented in Figure 2, as it plays a pivotal role in the process of neoantigen selection. This is because the TCR profiles of eight (out of 28) patients were used for neoantigen prediction. We have added the following sentences to the results section to explain the importance of TCR profiling: “Furthermore, characterizing T cell receptors (TCRs) can complement efforts to predict immunogenicity.” (Results, Lines 311-312, Page 11)

      (2) The key claim of the paper rests on the performance of the ML algorithm combining NETMHC and pmtNET. In turn, this depends on the selection of peptides for training. I am unclear about how the negative peptides were selected. Are they peptides from the same databases as immunogenic petpides but randomised for MHC? It seems as though there will be a lot of overlap between the peptides used for testing the combined algorithm, and the peptides used for training MHCNet and pmtMHC. If this is so, and depending on the choice of negative peptides, it is surely expected that the tools perform better on immunogenic than on non-immunogenic peptides in Figure 3. I don't fully understand panel G, but there seems very little difference between the TCR ranking and the combined. Why does including the TCR ranking have such a deleterious effect on sensitivity?

      We thank the reviewer for their valuable feedback. We believe the reviewer implies 'MHCNet' as NetMHCpan and 'pmtMHC' as pMTnet tools. First, the negative peptides, which have been excluded from PRIME (1), were not randomized with MHC (HLA-I) but were randomized with TCR only. Secondly, the positive peptides selected for our combined algorithms are chosen from many databases such as 10X Genomics, McPAS, VDJdb, IEDB, and TBAdb, while MHCNet uses peptides from the IEDB database and pMTNet uses a totally different dataset from ours for training. Therefore, there is not much overlap between our training data and the training datasets for MHCNet and pMTNet. Thus, the better performance of our tool is not due to overlapping training datasets with these tools or the selection of negative peptides.

      To enhance the clarity of the dataset construction, we have added Supplementary Figure 1, which demonstrates the workflow of peptide collection and the random splitting of data to generate the discovery and validation datasets. Additionally, we have revised the following sentence: "To objectively train and evaluate the model, we separated the dataset mentioned above into two subsets: a discovery dataset (70%) and a validation dataset (30%). These subsets are mutually exclusive and do not overlap.” (Methods, lines 221-223, page 8).

      Initially, the "combine" label in Figure 3G was confusing and potentially misleading when compared to our subsequent approach using a combined machine learning model. In Figure 3G, the "combine" approach simply aggregates the pHLA and pHLA-TCR criteria, whereas our combined machine learning model employs a more sophisticated algorithm to integrate these criteria effectively. The combined analysis in Figure 3G utilizes a basic "AND" algorithm between pHLA and pHLA-TCR criteria, aiming for high sensitivity in HLA binding and high specificity. However, this approach demonstrated lower efficacy in practice, underscoring the necessity for a more refined integration method through machine learning. This was the key point we intended to convey with Figure 3G. To address this issue, we have revised Figure 3G to replace "combined" with "HLA percentile & TCR ranking" to clarify its purpose and minimize confusion.

      (3) The key validation of the model is Figure 5. In 4 patients, the authors report that 6 out 21 neo-antigen peptides give interferon responses > 2 fold above background. Using NETMHC alone (I presume the tool was used to rank peptides according to binding to the respective HLAs in each individual, but this is not clear), identified 2; using the combined tool identified 4. I don't think this is significant by any measure. I don't understand the score shown in panel E but I don't think it alters the underlying statistic.

      Acknowledging the limitations of our study's sample size, we proceeded to further validate our findings with four additional patients to acquire more data. The final results revealed that our combined model identified seven peptides eliciting interferon responses greater than a two-fold increase, compared to only three peptides identified by NetMHCpan (Figure 5)

      In conclusion, the paper demonstrates that combining MHCNET and pmtMHC results in a modest increase in the ability to discriminate 'immunogenic' from 'non-immunogenic' peptide; however, the strength of this claim is difficult to evaluate without more knowledge about the negative peptides. The experimental validation of this approach in the context of CRC is not convincing.

      Reviewer #2 (Public Review):

      Summary:

      This paper introduces a novel approach for improving personalized cancer immunotherapy by integrating TCR profiling with traditional pHLA binding predictions, addressing the need for more precise neoantigen CRC patients. By analyzing TCR repertoires from tumor-infiltrating lymphocytes and applying machine learning algorithms, the authors developed a predictive model that outperforms conventional methods in specificity and sensitivity. The validation of the model through ELISpot assays confirmed its potential in identifying more effective neoantigens, highlighting the significance of combining TCR and pHLA data for advancing personalized immunotherapy strategies.

      Strengths:

      (1) Comprehensive Patient Data Collection: The study meticulously collected and analyzed clinical data from 27 CRC patients, ensuring a robust foundation for research findings. The detailed documentation of patient demographics, cancer stages, and pathology information enhances the study's credibility and potential applicability to broader patient populations.

      (2) The use of machine learning classifiers (RF, LR, XGB) and the combination of pHLA and pHLA-TCR binding predictions significantly enhance the model's accuracy in identifying immunogenic neoantigens, as evidenced by the high AUC values and improved sensitivity, NPV, and PPV.

      (3) The use of experimental validation through ELISpot assays adds a practical dimension to the study, confirming the computational predictions with actual immune responses. The calculation of ranking coverage scores and the comparative analysis between the combined model and the conventional NetMHCpan method demonstrate the superior performance of the combined approach in accurately ranking immunogenic neoantigens.

      (4) The use of experimental validation through ELISpot assays adds a practical dimension to the study, confirming the computational predictions with actual immune responses.

      Weaknesses:

      (1) While multiple advanced tools and algorithms are used, the study could benefit from a more detailed explanation of the rationale behind algorithm choice and parameter settings, ensuring reproducibility and transparency.

      We thank the reviewer for their comment. We have revised the explanation regarding the rationale behind algorithm choice and parameter settings as follows: “We examined three machine learning algorithms - Logistic Regression (LR), Random Forest (RF), and Extreme Gradient Boosting (XGB) - for each feature type (pHLA binding, pHLA-TCR binding), as well as for combined features. Feature selection was tested using a k-fold cross-validation approach on the discovery dataset with 'k' set to 10-fold. This process splits the discovery dataset into 10 equal-sized folds, iteratively using 9 folds for training and 1 fold for validation. Model performance was evaluated using the ‘roc_auc’ (Receiver Operating Characteristic Area Under the Curve) metric, which measures the model's ability to distinguish between positive and negative peptides. The average of these scores provides a robust estimate of the model's performance and generalizability. The model with the highest ‘roc_auc’ average score, XGB, was chosen for all features.” (Method, lines 225-234, page 8).

      (2) While pHLA-TCR binding displayed higher specificity, its lower sensitivity compared to pHLA binding suggests a trade-off between the two measures. Optimizing the balance between sensitivity and specificity could be crucial for the practical application of these predictions in clinical settings.

      We appreciate the reviewer's suggestion. Due to the limited availability of patient blood samples and time constraints for validation, we have chosen to prioritize high specificity and positive predictive value to enhance the selection of neoantigens.

      (3) The experimental validation was performed on a limited number of patients (four), which might affect the generalizability of the findings. Increasing the number of patients for validation could provide a more comprehensive assessment of the model's performance.

      This has been addressed earlier. Here, we restate it as follows: Acknowledging the limitations of our study's sample size, we proceeded to further validate our findings with four additional patients to acquire more data. The final results revealed that our combined model identified seven peptides eliciting interferon responses greater than a two-fold increase, compared to only three peptides identified by NetMHCpan (Figure 5).

      Reviewer #3 (Public Review):

      Summary:

      This study presents a new approach of combining two measurements (pHLA binding and pHLA-TCR binding) in order to refine predictions of which patient mutations are likely presented to and recognized by the immune system. Improving such predictions would play an important role in making personalized anti-cancer vaccinations more effective.

      Strengths:

      The study combines data from pre-existing tools pVACseq and pMTNet and applies them to a CRC patient population, which the authors show may improve the chance of identifying immunogenic, cancer-derived neoepitopes. Making the datasets collected publicly available would expand beyond the current datasets that typically describe caucasian patients.

      Weaknesses:

      It is unclear whether the pNetMHCpan and pMTNet tools used by the authors are entirely independent, as they appear to have been trained on overlapping datasets, which may explain their similar scores. The pHLA-TCR score seems to be driving the effects, but this not discussed in detail.

      The HLA percentile from NetMHCpan and the TCR ranking from pMTNet are independent. NetMHCpan predicts the interaction between peptides and MHC class I, while pMTNet predicts the TCR binding specificity of class I MHCs and peptides.Additionally, we partitioned the dataset mentioned above into two subsets: a discovery dataset (70%) and a validation dataset (30%), ensuring no overlap between the training and testing datasets.

      To enhance the clarity of the dataset construction, we have added Supplementary Figure 1, which demonstrates the workflow of peptide collection and the random splitting of data to generate the discovery and validation datasets. Additionally, we have revised the following sentence: "To objectively train and evaluate the model, we separated the dataset mentioned above into two subsets: a discovery dataset (70%) and a validation dataset (30%). These subsets are mutually exclusive and do not overlap.” (Methods, lines 221-223, page 8). We also included the dataset construction workflow in Supplementary Figure 1.

      Due to sample constraints, the authors were only able to do a limited amount of experimental validation to support their model; this raises questions as to how generalizable the presented results are. It would be desirable to use statistical thresholds to justify cutoffs in ELISPOT data.

      We chose a cutoff of 2 for ELISPOT, following the recommendation of the study by Moodie et al. (2). The study provides standardized cutoffs for defining positive responses in ELISPOT assays. It presents revised criteria based on a comprehensive analysis of data from multiple studies, aiming to improve the precision and consistency of immune response measurements across various applications.

      Some of the TCR repertoire metrics presented in Figure 2 are incorrectly described as independent variables and do not meaningfully contribute to the paper. The TCR repertoires may have benefitted from deeper sequencing coverage, as many TCRs appear to be supported only by a single read.

      We appreciate the reviewer’s feedback. We have moved Figures 2B through 2F to Supplementary Figure 2. We agree with the reviewer that deeper sequencing coverage could potentially benefit the repertoires. However, based on our current sequencing depth, we have observed that many of our samples (14 out of 28) have reached sufficient saturation, as indicated by Figure 2C. The TCR clones selected in our studies are unique molecular identifier (UMI)-collapsed reads, each representing at least three raw reads sharing the same UMI. This approach ensures that the data is robust despite the variability. It is important to note that Tumor-Infiltrating Lymphocytes (TILs) differ across samples, resulting in non-uniform sequencing coverage among them.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      (1) Please open source the raw and processed data, code, and software output (NetMHCpan, pMTnet), which are important to verify the results.

      NetMHCpan and pMTNet are publicly available software tools (3, 4). In our GitHub repository, we have included links to the GitHub repositories for NetMHCpan and pMTNet (https://github.com/QuynhPham1220/Combined-model).

      (2) Comparison with more state-of-the-art neoantigen prediction models could provide a more comprehensive view of the combined model's performance relative to the current field.

      To further evaluate our model, we gathered additional public data and assessed its effectiveness in comparison to other models. We utilized immunogenic peptides from databases such as NEPdb (5), NeoPeptide (6), dbPepneo (7), Tantigen (8), and TSNAdb (9), ensuring there was no overlap with the datasets used for training and validation. For non-immunogenic peptides, we used data from 10X Genomics Chromium Single Cell Immune Profiling (10-13).The findings indicate that the combined model from pMTNet and NetMHCpan outperforms NetTCR tool (14). To address the reviewer's inquiry, we have incorporated these results in Supplementary Table 6.

      (3) While the combined model shows a positive overall rank coverage score, indicating improved ranking accuracy, the scores are relatively low. Further refinement of the model or the inclusion of additional predictive features might enhance the ranking accuracy.

      We appreciate the reviewer’s suggestion. The RankCoverageScore provides an objective evaluation of the rank results derived from the final peptide list generated by the two tools. The combined model achieved a higher RankCoverageScore than pMTNet, indicating its superior ability to identify immunogenic peptides compared to existing in silico tools. In order to provide a more comprehensive assessment, we included an additional four validated samples to recalculate the rank coverage score. The results demonstrate a notable difference between NetMHCpan and the Combined model (-0.37 and 0.04, respectively). We have incorporated these findings into Supplementary Figure 6 to address the reviewer's question. Additionally, we have modified Figure 5E to present a simplified demonstration of the superior performance of the combined model compared to NetMHCpan.

      (4) Collect more public data and fine-tune the model. Then you will get a SOTA model for neoantigen selection. I strongly recommend you write Python scripts and open source.

      We thank the reviewer for their feedback. We have made the raw and processed data, as well as the model, available on GitHub. Additionally, we have gathered more public data and conducted evaluations to assess its efficiency compared to other methods. You can find the repository here: https://github.com/QuynhPham1220/Combined-model.

      Reviewer #3 (Recommendations For The Authors):

      The Methods section seems good, though HLA calling is more accurate using arcasHLA than OptiType. This would be difficult to correct as OptiType is integrated into pVACtools.

      We chose Optitype for its exceptional accuracy, surpassing 99%, in identifying HLA-I alleles from RNA-Seq data. This decision was informed by a recent extensive benchmarking study that evaluated its performance against "gold-standard" HLA genotyping data, as described in the study by Li et al.(15). Furthermore, we have tested two tools using the same RNA-Seq data from FFPE samples. The allele calling accuracy of Optitype was found to be superior to that of Acras-HLA. To address the reviewer's question, we have included these results in Supplementary Table 2, along with the reference to this decision (Method, line 200, page 07).

      I am not sufficiently expert in machine learning to assess this part of the methods.<br /> TCR beta repertoire analysis of biopsy is highly variable; though my expertise lies largely in sequencing using the 10X genomics platform, typically one sees multiple RNAs per cell. Seeing the majority of TCRs supported by only a single read suggests either problems with RNA capture (particularly in this case where the recovered RNA was split to allow both RNAseq and targeted TCR seq) or that the TCR library was not sequenced deeply enough. I'd like to have seen rarefaction plots of TCR repertoire diversity vs the number of reads to ensure that sufficiently deep sequencing was performed.

      We appreciate the suggestions provided by the reviewer. We agree that deeper sequencing coverage could potentially benefit the repertoires. However, based on our current sequencing depth, we have observed that many of our samples (14 out of 28) have reached sufficient saturation, as indicated by Figure 2C. In addition, the TCR clones selected in our studies are unique molecular identifier (UMI)-collapsed reads, each representing at least three raw reads sharing the same UMI. This approach ensures that the data is robust despite variability. It is important to note that Tumor-Infiltrating Lymphocytes (TILs) differ across samples, resulting in non-uniform sequencing coverage among them. We have already added the rarefaction plots of TCR repertoire diversity versus the number of reads in Figure 2C. These have been added to the main text (lines 329-335).

      In order to support the authors' conclusions that MSI-H tumors have fewer TCR clonotypes than MSS tumors (Figure S2a) I would have liked to see Figure 2a annotated so that it was easy to distinguish which patient was in which group, as well as the rarefaction plots suggested above, to be sure that the difference represented a real difference between samples and not technical variance (which might occur due to only 4 samples being in the MSI-H group).

      We thank the reviewer for their recommendation. Indeed, it's worth noting that the number of MSI-H tumors is fewer than the MSS groups, which is consistent with the distribution observed in colorectal cancer, typically around 15%. This distribution pattern aligns with findings from several previous studies, as highlighted in these studies (16, 17). To provide further clarification on this point, we have included rarefaction plots illustrating TCR repertoire diversity versus the number of reads in Supplementary Figure 3 (line 339). Additionally, MSI-H and MSS samples have been appropriately labeled for clarity.

      The authors write: "in accordance with prior investigations, we identified an inverse relationship between TCR clonality and the Shannon index (Supplementary Figure S1)" >> Shannon index is measure of TCR clonality, not an independent variable. The authors may have meant TCR repertoire richness (the absolute number of TCRs), and the Shannon index (a measure of how many unique TCRs are present in the index).

      We thank the reviewer for their comment regarding the correlation between the number of TCRs and the Shannon index. We have revised the figure to illustrate the relationship between the number of TCRs and the Shannon index, and we have relocated it to Figure 2B.

      The authors continue: "As anticipated, we identified only 58 distinct V (Figure 2C) and 13 distinct J segments (Figure 2D), that collectively generated 184,396 clones across the 27 tumor tissue samples, underscoring the conservation of these segments (Figure 2C & D)" >> it is not clear to me what point the authors are making: it is well known that TCR V and J genes are largely shared between Caucasian populations (https://pubmed.ncbi.nlm.nih.gov/10810226/), and though IMGT lists additional forms of these genes, many are quite rare and are typically not included in the reference sequences used by repertoire analysis software. I would clarify the language in this section to avoid the impression that patient repertoires are only using a restricted set of J genes.

      We thank for the reviewer’s feedback. We have revised the sentence as follows: " As anticipated, we identified 59 distinct V segments (Supplementary Figure 2C) and 13 distinct J segments (Supplementary Figure 2D), collectively sharing 185,627 clones across the 28 tumor tissue samples. This underscores the conservation of these segments (Supplementary Figure 2C & D)” (Result, lines 354-356, page 12)

      As a result I would suggest moving Figure 2 with the exception of 2A into the supplementals - I would have been more interested in a plot showing the distribution of TCRs by frequency, i.e. how what proportion of clones are hyperexpanded, moderately expanded etc. This would be a better measure of the likely immune responses.

      We thank the reviewer for their comment. With the exception of Figure 2A, we have relocated Figures 2B through 2F to Supplementary Figure 2.

      The authors write "To accomplish this, we gathered HLA and TCRβ sequences from established datasets containing immunogenic and non-immunogenic peptides (Supplementary Table 3)" >> The authors mean to refer to Table S4.

      We appreciate the reviewer's feedback. Here's the revised sentence: "To accomplish this, we gathered HLA and TCRβ sequences from established datasets containing immunogenic and non-immunogenic pHLA-TCR complexes (Supplementary Table 5)” (lines 368-370).

      The authors write "As anticipated, our analysis revealed a significantly higher prevalence of peptides with robust HLA binding (percentile rank < 2%) among immunogenic peptides in contrast to their non-immunogenic counterparts (Figure 3A & B, p< 0.00001)" >> this is not surprising, as tools such as NetMHCpan are trained on databases of immunogenic peptides, and thus it is likely that these aren't independent measures (in https://academic.oup.com/nar/article/48/W1/W449/5837056 the authors state that "The training data have been vastly extended by accumulating MHC BA and EL data from the public domain. In particular, EL data were extended to include MA data"). In the pMTNet paper it is stated that pMNet encoded pMHC information using "the exact data that were used to train the netMHCpan model" >> While I am not sufficiently expert to review details on machine learning training models, it would seem that the pHLA scores from NetMHCpan and pMTNet may not be independent, which would explain the concordance in scores that the authors describe in Figures 3B and 3D. I would invite the authors to comment on this.

      The HLA percentiles from NetMHCpan and TCR rankings from pMTNet are independent. NetMHCpan predicts the interaction between peptides and MHC class I, while pMTNet predicts the TCR binding specificity of class I MHCs and peptides. NetMHCpan is trained to predict peptide-MHC class I interactions by integrating binding affinity and MS eluted ligand data, using a second output neuron in the NNAlign approach. This setup produces scores for both binding affinity and ligand elution. In contrast, pMTNet predicts TCR binding specificity of class I pMHCs through three steps:

      (1) Training a numeric embedding of pMHCs (class I only) to numerically represent protein sequences of antigens and MHCs.

      (2) Training an embedding of TCR sequences using stacked auto-encoders to numerically encode TCR sequence text strings.

      (3) Creating a deep neural network combining these two embeddings to integrate knowledge from TCRs, antigenic peptide sequences, and MHC alleles. Fine-tuning is employed to finalize the prediction model for TCR-pMHC pairing.

      Therefore, pHLA scores from NetMHCpan and pMTNet are independent. Furthermore, Figures 3B and 3D do not show concordance in scores, as there was no equivalence in the percentage of immunogenic and non-immunogenic peptides in the two groups (≥2 HLA percentile and ≥2 TCR percentile).

      Many of the authors of this paper were also authors of the epiTCR paper, would this not have been a better choice of tool for assessing pHLA-TCR binding than pMTNet?

      When we started this project, EpiTCR had not been completed. Therefore, we chose pMTNet, which had demonstrated good performance and high accuracy at that time. The validated performance of EpiTCR is an ongoing project that will implement immunogenic assays (ELISpot and single-cell sequencing) to assess the prediction and ranking of neoantigens. This study is also mentioned in the discussion: "Moreover, to improve the accuracy and effectiveness of the machine learning model in predicting and ranking neoantigens, we have developed an in-house tool called EpiTCR. This tool will utilize immunogenic assays, such as ELISpot and single-cell sequencing, for validation." (lines 532-535).

      In Figure 3G it would appear that the pHLA-TCR score is driving the interaction, could the authors comment on this?

      The authors sincerely appreciate the reviewer for their valuable feedback. Initially, the "combine" label in Figure 3G was confusing and potentially misleading when compared to our subsequent approach using a combined machine learning model. In Figure 3G, the "combine" approach simply aggregates the pHLA and pHLA-TCR criteria, whereas our combined machine learning model employs a more sophisticated algorithm to integrate these criteria effectively.

      The combined analysis in Figure 3G utilizes a basic "AND" algorithm between pHLA and pHLA-TCR criteria, aiming for high sensitivity in HLA binding and high specificity. However, this approach demonstrated lower efficacy in practice, underscoring the necessity for a more refined integration method through machine learning. This was the key point we intended to convey with Figure 3G. To address this issue, we have revised Figure 3G to replace "combined" with "HLA percentile & TCR ranking" to clarify its purpose and minimize confusion.

      In Figure 4A I would invite the authors to comment on how they chose the sample sizes they did for the discovery and validation datasets: the numbers seem rather random. I would question whether a training dataset in which 20% of the peptides are immunogenic accurately represents the case in patients, where I believe immunogenic peptides are less frequent (as in Figure 5).

      We aimed to maximize the number of experimentally validated immunogenic peptides, including those from viruses, with only a small percentage from tumors available for training. This limitation is inherent in the field. However, our ultimate objective is to develop a tool capable of accurately predicting peptide immunogenicity irrespective of their source. Therefore, the current percentage of immunogenic peptides may not accurately reflect real-world patient cases, but this is not crucial to our development goals.

      For Figure 5C I would invite the authors to consider adding a statistical test to justify the cutoff at 2fold enrichments.

      Thank you for your feedback. Instead of conducting a statistical test, we have implemented standardized cutoffs as defined in the cited study (2). This research introduces refined criteria for identifying positive responses in ELISPOT assays through a comprehensive analysis of data from multiple studies. These criteria aim to improve the accuracy and consistency of immune response measurements across various applications. The reference to this study has been properly incorporated into the manuscript (Method, line 281, page 10).

      Minor points:

      "paired white blood cells" >> use "paired Peripheral Blood Mononuclear Cells".

      We appreciate the reviewer for the feedback. We agree with the reviewer's observation. The sentence has been revised as follows: "Initially, DNA sequencing of tumor tissues and paired Peripheral Blood Mononuclear Cells identifies cancer-associated genomic mutations. RNA sequencing then determines the patient's HLA-I allele profile and the gene expression levels of mutated genes." (Introduction, lines 55-58, page 2).

      "while RNA sequencing determines the patient's HLA-I allele profile and gene expression levels of mutated genes." >> RNA sequencing covers both the mutant and reference form of the gene, allowing assessment of variant allele frequency.

      "the current approach's impact on patient outcomes remains limited due to the scarcity of effective immunogenic neoantigens identified for each patient" >> Some clearer language here would have been preferred as different tumor types have different mutational loads

      We thank the reviewer for their valuable feedback. We agree with the reviewer's observation. The passage has been revised accordingly: “The current approach's impact on patient outcomes remains limited due to the scarcity of mutations in cancer patients that lead to effective immunogenic neoantigens.” (Introduction, lines 62-64, page 3).

      References

      (1) J. Schmidt et al., Prediction of neo-epitope immunogenicity reveals TCR recognition determinants and provides insight into immunoediting. Cell Rep Med 2, 100194 (2021).

      (2) Z. Moodie et al., Response definition criteria for ELISPOT assays revisited. Cancer Immunol Immunother 59, 1489-1501 (2010).

      (3) V. Jurtz et al., NetMHCpan-4.0: Improved Peptide-MHC Class I Interaction Predictions Integrating Eluted Ligand and Peptide Binding Affinity Data. J Immunol 199, 3360-3368 (2017).

      (4) T. Lu et al., Deep learning-based prediction of the T cell receptor-antigen binding specificity. Nat Mach Intell 3, 864-875 (2021).

      (5) J. Xia et al., NEPdb: A Database of T-Cell Experimentally-Validated Neoantigens and Pan-Cancer Predicted Neoepitopes for Cancer Immunotherapy. Front Immunol 12, 644637 (2021).

      (6) W. J. Zhou et al., NeoPeptide: an immunoinformatic database of T-cell-defined neoantigens. Database (Oxford) 2019 (2019).

      (7) X. Tan et al., dbPepNeo: a manually curated database for human tumor neoantigen peptides. Database (Oxford) 2020 (2020).

      (8) G. Zhang, L. Chitkushev, L. R. Olsen, D. B. Keskin, V. Brusic, TANTIGEN 2.0: a knowledge base of tumor T cell antigens and epitopes. BMC Bioinformatics 22, 40 (2021).

      (9) J. Wu et al., TSNAdb: A Database for Tumor-specific Neoantigens from Immunogenomics Data Analysis. Genomics Proteomics Bioinformatics 16, 276-282 (2018).

      (10) https://www.10xgenomics.com/resources/datasets/cd-8-plus-t-cells-of-healthy-donor-1-1-standard-3-0-2.

      (11) https://www.10xgenomics.com/resources/datasets/cd-8-plus-t-cells-of-healthy-donor-2-1-standard-3-0-2.

      (12) https://www.10xgenomics.com/resources/datasets/cd-8-plus-t-cells-of-healthy-donor-3-1-standard-3-0-2.

      (13) https://www.10xgenomics.com/resources/datasets/cd-8-plus-t-cells-of-healthy-donor-4-1-standard-3-0-2.

      (14) A. Montemurro et al., NetTCR-2.0 enables accurate prediction of TCR-peptide binding by using paired TCRalpha and beta sequence data. Commun Biol 4, 1060 (2021).

      (15) G. Li et al., Splicing neoantigen discovery with SNAF reveals shared targets for cancer immunotherapy. Sci Transl Med 16, eade2886 (2024).

      (16) Z. Gatalica, S. Vranic, J. Xiu, J. Swensen, S. Reddy, High microsatellite instability (MSI-H) colorectal carcinoma: a brief review of predictive biomarkers in the era of personalized medicine. Fam Cancer 15, 405-412 (2016).

      (17) N. Mulet-Margalef et al., Challenges and Therapeutic Opportunities in the dMMR/MSI-H Colorectal Cancer Landscape. Cancers (Basel) 15 (2023).

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: 

      This paper applies methods for segmentation, annotation, and visualization of acoustic analysis to zebra finch song. The paper shows that these methods can be used to predict the stage of song development and to quantify acoustic similarity. The methods are solid and are likely to provide a useful tool for scientists aiming to label large datasets of zebra finch vocalizations. The paper has two main parts: 1) establishing a pipeline/ package for analyzing zebra finch birdsong and 2) a method for measuring song imitation. 

      Strengths: 

      It is useful to see existing methods for syllable segmentation compared to new datasets. 

      It is useful, but not surprising, that these methods can be used to predict developmental stage, which is strongly associated with syllable temporal structure. 

      It is useful to confirm that these methods can identify abnormalities in deafened and isolated songs. 

      Weaknesses: 

      For the first part, the implementation seems to be a wrapper on existing techniques. For instance, the first section talks about syllable segmentation; they made a comparison between whisperseg (Gu et al, 2024), tweetynet (Cohen et al, 2022), and amplitude thresholding. They found that whisperseg performed the best, and they included it in the pipeline. They then used whisperseg to analyze syllable duration distributions and rhythm of birds of different ages and confirmed past findings on this developmental process (e.g. Aronov et al, 2011). Next, based on the segmentation, they assign labels by performing UMAP and HDBScan on the spectrogram (nothing new; that's what people have been doing). Then, based on the labels, they claimed they developed a 'new' visualization - syntax raster ( line 180 ). That was done by Sainburg et. al. 2020 in Figure 12E and also in Cohen et al, 2020 - so the claim to have developed 'a new song syntax visualization' is confusing. The rest of the paper is about analyzing the finch data based on AVN features (which are essentially acoustic features already in the classic literature). 

      First, we would like to thank this reviewer for their kind comments and feedback on this manuscript. It is true that many of the components of this song analysis pipeline are not entirely novel in isolation. Our real contribution here is bringing them together in a way that allows other researchers to seamlessly apply automated syllable segmentation, clustering, and downstream analyses to their data. That said, our approach to training TweetyNet for syllable segmentation is novel. We trained TweetyNet to recognize vocalizations vs. silence across multiple birds, such that it can generalize to new individual birds, whereas Tweetynet had only ever been used to annotate song syllables from birds included in its training set previously. Our validation of TweetyNet and WhisperSeg in combination with UMAP and HDBSCAN clustering is also novel, providing valuable information about how these systems interact, and how reliable the completely automatically generated labels are for downstream analysis. 

      Our syntax raster visualization does resemble Figure 12E in Sainburg et al. 2020, however it differs in a few important ways, which we believe warrant its consideration as a novel visualization method. First, Sainburg et al. represent the labels across bouts in real time; their position along the x axis reflects the time at which each syllable is produced relative to the start of the bout. By contrast, our visualization considers only the index of syllables within a bout (ie. First syllable vs. second syllable etc) without consideration of the true durations of each syllable or the silent gaps between them. This makes it much easier to detect syntax patterns across bouts, as the added variability of syllable timing is removed. Considering only the sequence of syllables rather than their timing also allows us to more easily align bouts according to the first syllable of a motif, further emphasizing the presence or absence of repeating syllable sequences without interference from the more variable introductory notes at the start of a motif. Finally, instead of plotting all bouts in the order in which they were produced, our visualization orders bouts such that bouts with the same sequence of syllables will be plotted together, which again serves to emphasize the most common syllable sequences that the bird produces. These additional processing steps mean that our syntax raster plot has much starker contrast between birds with stereotyped syntax and birds with more variable syntax, as compared to the more minimally processed visualization in Sainburg et al. 2020. There doesn’t appear to be any similar visualizations in Cohen et al. 2020. 

      The second part may be something new, but there are opportunities to improve the benchmarking. It is about the pupil-tutor imitation analysis. They introduce a convolutional neural network that takes triplets as an input (each tripled is essentially 3 images stacked together such that you have (anchor, positive, negative), Anchor is a reference spectrogram from, say finch A; positive means a different spectrogram with the same label as anchor from finch A, and negative means a spectrogram not related to A or different syllable label from A. The network is then trained to produce a low-dimensional embedding by ensuring the embedding distance between anchor and positive is less than anchor and negative by a certain margin. Based on the embedding, they then made use of earth mover distance to quantify the similarity in the syllable distribution among finches. They then compared their approach performance with that of sound analysis pro (SAP) and a variant of SAP. A more natural comparison, which they didn't include, is with the VAE approach by Goffinet et al. In this paper (https://doi.org/10.7554/eLife.67855, Fig 7), they also attempted to perform an analysis on the tutor pupil song. 

      We thank the reviewer for this suggestion, and plan to include a comparison of the triplet loss embedding space to the VAE space for song similarity comparisons in the revised manuscript.

      Reviewer #2 (Public Review):

      Summary: 

      In this work, the authors present a new Python software package, Avian Vocalization Network (AVN) aimed at facilitating the analysis of birdsong, especially the song of the zebra finch, the most common songbird model in neuroscience. The package handles some of the most common (and some more advanced) song analyses, including segmentation, syllable classification, featurization of song, calculation of tutor-pupil similarity, and age prediction, with a view toward making the entire process friendlier to experimentalists working in the field. 

      For many years, Sound Analysis Pro has served as a standard in the songbird field, the first package to extensively automate songbird analysis and facilitate the computation of acoustic features that have helped define the field. More recently, the increasing popularity of Python as a language, along with the emergence of new machine learning methods, has resulted in a number of new software tools, including the vocalpy ecosystem for audio processing, TweetyNet (for segmentation), t-SNE and UMAP (for visualization), and autoencoder-based approaches for embedding. 

      Strengths: 

      The AVN package overlaps several of these earlier efforts, albeit with a focus on more traditional featurization that many experimentalists may find more interpretable than deep learning-based approaches. Among the strengths of the paper are its clarity in explaining the several analyses it facilitates, along with high-quality experiments across multiple public datasets collected from different research groups. As a software package, it is open source, installable via the pip Python package manager, and features high-quality documentation, as well as tutorials. For experimentalists who wish to replicate any of the analyses from the paper, the package is likely to be a useful time saver. 

      Weaknesses: 

      I think the potential limitations of the work are predominantly on the software end, with one or two quibbles about the methods. 

      First, the software: it's important to note that the package is trying to do many things, of which it is likely to do several well and few comprehensively. Rather than a package that presents a number of new analyses or a new analysis framework, it is more a codification of recipes, some of which are reimplementations of existing work (SAP features), some of which are essentially wrappers around other work (interfacing with WhisperSeg segmentations), and some of which are new (similarity scoring). All of this has value, but in my estimation, it has less value as part of a standalone package and potentially much more as part of an ecosystem like vocalpy that is undergoing continuous development and has long-term support. 

      We appreciate this reviewer’s comments and concerns about the structure of the AVN package and its long-term maintenance. We have considered incorporating AVN into the VocalPy ecosystem but have chosen not to for a few key reasons. (1) AVN was designed with ease of use for experimenters with limited coding experience top of mind. VocalPy provides excellent resources for researchers with some familiarity with object-oriented programming to manage and analyze their datasets; however, we believe it may be challenging for users without such experience to adopt VocalPy quickly. AVN’s ‘recipe’ approach, as you put it, is very easily accessible to new users, and allows users with intermediate coding experience to easily navigate the source code to gain a deeper understanding of the methodology. AVN also consistently outputs processed data in familiar formats (tables in .csv files which can be opened in excel), in an effort to make it more accessible to new users, something which would be challenging to reconcile with VocalPy’s emphasis on their `dataset`classes. (2) AVN and VocalPy differ in their underlying goals and philosophies when it comes to flexibility vs. standardization of analysis pipelines. VocalPy is designed to facilitate mixing-and-matching of different spectrogram generation, segmentation, annotation etc. approaches, so that researchers can design and implement their own custom analysis pipelines. This flexibility is useful in many cases. For instance, it could allow researchers who have very different noise filtering and annotation needs, like those working with field recordings versus acoustic chamber recordings, analyze their data using this platform. However, when it comes to comparisons across zebra finch research labs, this flexibility comes at the expense of direct comparison and integration of song features across research groups. This is the context in which AVN is most useful. It presents a single approach to song segmentation, labeling, and featurization that has been shown to generalize well across research groups, and which allows direct comparisons of the resulting features. AVN’s single, extensively validated, standard pipeline approach is fundamentally incompatible with VocalPy’s emphasis on flexibility. We are excited to see how VocalPy continues to evolve in the future and recognize the value that both AVN and VocalPy bring to the songbird research community, each with their own distinct strengths, weaknesses, and ideal use cases. 

      While the code is well-documented, including web-based documentation for both the core package and the GUI, the latter is available only on Windows, which might limit the scope of adoption. 

      We thank the reviewer for their kind words about AVN’s documentation. We recognize that the GUI’s exclusive availability on Windows is a limitation, and we would be happy to collaborate with other researchers and developers in the future to build a Mac compatible version, should the demand present itself. That said, the python package works on all operating systems, so non-Windows users still have the ability to use AVN that way.  

      That is to say, whether AVN is adopted by the field in the medium term will have much more to do with the quality of its maintenance and responsiveness to users than any particular feature, but I believe that many of the analysis recipes that the authors have carefully worked out may find their way into other code and workflows. 

      Second, two notes about new analysis approaches: 

      (1) The authors propose a new means of measuring tutor-pupil similarity based on first learning a latent space of syllables via a self-supervised learning (SSL) scheme and then using the earth mover's distance (EMD) to calculate transport costs between the distributions of tutors' and pupils' syllables. While to my knowledge this exact method has not previously been proposed in birdsong, I suspect it is unlikely to differ substantially from the approach of autoencoding followed by MMD used in the Goffinet et al. paper. That is, SSL, like the autoencoder, is a latent space learning approach, and EMD, like MMD, is an integral probability metric that measures discrepancies between two distributions.

      (Indeed, the two are very closely related: https://stats.stackexchange.com/questions/400180/earth-movers-distance-andmaximum-mean-discrepency.) Without further experiments, it is hard to tell whether these two approaches differ meaningfully. Likewise, while the authors have trained on a large corpus of syllables to define their latent space in a way that generalizes to new birds, it is unclear why such an approach would not work with other latent space learning methods. 

      We recognize the similarities between these approaches, and plan to include a comparison of triplet loss embeddings compared with MMD and VAE embeddings compared with MMD and EMD in the revised manuscript. Thank you for this suggestion.  

      (2) The authors propose a new method for maturity scoring by training a model (a generalized additive model) to predict the age of the bird based on a selected subset of acoustic features. This is distinct from the "predicted age" approach of Brudner, Pearson, and Mooney, which predicts based on a latent representation rather than specific features, and the GAM nicely segregates the contribution of each. As such, this approach may be preferred by many users who appreciate its interpretability. 

      In summary, my view is that this is a nice paper detailing a well-executed piece of software whose future impact will be determined by the degree of support and maintenance it receives from others over the near and medium term. 

      Reviewer #3 (Public Review):

      Summary: 

      The authors invent song and syllable discrimination tasks they use to train deep networks. These networks they then use as a basis for routine song analysis and song evaluation tasks. For the analysis, they consider both data from their own colony and from another colony the network has not seen during training. They validate the analysis scores of the network against expert human annotators, achieving a correlation of 80-90%. 

      Strengths: 

      (1) Robust Validation and Generalizability: The authors demonstrate a good performance of the AVN across various datasets, including individuals exhibiting deviant behavior. This extensive validation underscores the system's usefulness and broad applicability to zebra finch song analysis, establishing it as a potentially valuable tool for researchers in the field. 

      (2) Comprehensive and Standardized Feature Analysis: AVN integrates a comprehensive set of interpretable features commonly used in the study of bird songs. By standardizing the feature extraction method, the AVN facilitates comparative research, allowing for consistent interpretation and comparison of vocal behavior across studies. 

      (3) Automation and Ease of Use. By being fully automated, the method is straightforward to apply and should introduce barely an adoption threshold to other labs. 

      (4) Human experts were recruited to perform extensive annotations (of vocal segments and of song similarity scores). These annotations released as public datasets are potentially very valuable. 

      Weaknesses: 

      (1) Poorly motivated tasks. The approach is poorly motivated and many assumptions come across as arbitrary. For example, the authors implicitly assume that the task of birdsong comparison is best achieved by a system that optimally discriminates between typical, deaf, and isolated songs. Similarly, the authors assume that song development is best tracked using a system that optimally estimates the age of a bird given its song. My issue is that these are fake tasks since clearly, researchers will know whether a bird is an isolated or a deaf bird, and they will also know the age of a bird, so no machine learning is needed to solve these tasks. Yet, the authors imagine that solving these placeholder tasks will somehow help with measuring important aspects of vocal behavior. 

      We appreciate this reviewer’s concerns and apologize for not providing sufficiently clear rationale for the inclusion of our phenotype classifier and age regression models in the original manuscript. These tasks are not intended to be taken as a final, ultimate culmination of the AVN pipeline. Rather, we consider the carefully engineered 55-interpretable feature set to be AVN’s final output, and these analyses serve merely as examples of how that feature set can be applied. That said, each of these models do have valid experimental use cases that we believe are important and would like to bring to the attention of the reviewer.

      For one, we showed how the LDA model that can discriminate between typical, deaf, and isolate birds’ songs not only allows us to evaluate which features are most important for discriminating between these groups, but also allows comparison of the FoxP1 knock-down (FP1 KD) birds to each of these phenotypes. Based on previous work (Garcia-Oscos et al. 2021), we hypothesized that FP1 KD in these birds specifically impaired tutor song memory formation while sparing a bird’s ability to refine their own vocalizations through auditory feedback. Thus, we would expect their songs to resemble those of isolate birds, who lack a tutor song memory, but not to resemble deaf birds who lack a tutor song memory and auditory feedback of their own vocalizations to guide learning. The LDA model allowed us to make this comparison quantitatively for the first time and confirm our hypothesis that FP1 KD birds’ songs are indeed most like isolates’. In the future, as more research groups publish their birds’ AVN feature sets, we hope to be able to make even more fine-grained comparisons between different groups of birds, either using LDA or other similar interpretable classifiers. 

      The age prediction model also has valid real-world use cases. For instance, one might imagine an experimental manipulation that is hypothesized to accelerate or slow song maturation in juvenile birds. This age prediction model could be applied to the AVN feature sets of birds having undergone such a manipulation to determine whether their predicted ages systematically lead or lag their true biological ages, and which song features are most responsible for this difference. We didn’t have access to data for any such birds for inclusion in this paper, but we hope that others in the future will be able to take inspiration from our methodology and use this or a similar age regression model with AVN features in their research. We will revise the original manuscript to make this clearer. 

      Along similar lines, authors assume that a good measure of similarity is one that optimally performs repeated syllable detection (i.e. to discriminate same syllable pairs from different pairs). The authors need to explain why they think these placeholder tasks are good and why no better task can be defined that more closely captures what researchers want to measure. Note: the standard tasks for self-supervised learning are next word or masked word prediction, why are these not used here? 

      There appears to be some misunderstanding regarding our similarity scoring embedding model and our rationale for using it. We will explain it in more depth here and provide some additional explanation in the manuscript. First, we are not training a model to discriminate between same and different syllable pairs. The triplet loss network is trained to embed syllables in an 8-dimensional space such that syllables with the same label are closer together than syllables with different labels. The loss function is related to the relative distance between embeddings of syllables with the same or different labels, not the classification of syllables as same or different. This approach was chosen because it has repeatedly been shown to be a useful data compression step (Schorff et al. 2015, Thakur et al. 2019) before further downstream tasks are applied on its output, particularly in contexts where there is little data per class (syllable label). For example, Schorff et al. 2015 trained a deep convolutional neural network with triplet loss to embed images of human faces from the same individual closer together than images of different individuals in a 128-dimensional space. They then used this model to compute 128-dimensional representations of additional face images, not included in training, which were used for individual facial recognition (this is a same vs. different category classifier), and facial clustering, achieving better performance than the previous state of the art. The triplet loss function results in a model that can generate useful embeddings of previously unseen categories, like new individuals’ faces, or new zebra finches’ syllables, which can then be used in downstream analyses. This meaningful, lower dimensional space allows comparisons of distributions of syllables across birds, as in Brainard and Mets 2008, and Goffinet et al. 2021. 

      Next word and masked word prediction are indeed common self-supervised learning tasks for models working with text data, or other data with meaningful sequential organization. That is not the case for our zebra finch syllables, where every bird’s syllable sequence depends only on its tutor’s sequence, and there is no evidence for strong universal syllable sequencing rules (James et al. 2020). Rather, our embedding model is an example of a computer vision task, as it deals with sets of twodimensional images (spectrograms), not sequences of categorical variables (like text). It is also not, strictly speaking, a self-supervised learning task, as it does require syllable labels to generate the triplets. A common self-supervised approach for dimensionality reduction in a computer vision task such as this one would be to train an autoencoder to compress images to a lower dimensional space, then faithfully reconstruct them from the compressed representation.  This has been done using a variational autoencoder trained on zebra finch syllables in Goffinet et al. 2021. In keeping with the suggestions from reviewers #1 and #2, we plan to include a comparison of our triplet loss model with the Goffinet et al. VAE approach in the revised manuscript.  

      (2) The machine learning methodology lacks rigor. The aims of the machine learning pipeline are extremely vague and keep changing like a moving target. Mainly, the deep networks are trained on some tasks but then authors evaluate their performance on different, disconnected tasks. For example, they train both the birdsong comparison method (L263+) and the song similarity method (L318+) on classification tasks. However, they evaluate the former method (LDA) on classification accuracy, but the latter (8-dim embeddings) using a contrast index. In machine learning, usually, a useful task is first defined, then the system is trained on it and then tested on a held-out dataset. If the sensitivity index is important, why does it not serve as a cost function for training?

      Again, there appears to be some misunderstanding of our similarity scoring methodology. Our similarity scoring model is not trained on a classification task, but rather on an embedding task. It learns to embed spectrograms of syllables in an 8dimensional space such that syllables with the same label are closer together than syllables with different labels. We could report the loss values for this embedding task on our training and validation datasets, but these wouldn’t have any clear relevance to the downstream task of syllable distribution comparison where we are using the model’s embeddings. We report the contrast index as this has direct relevance to the actual application of the model and allows comparisons to other similarity scoring methods, something that the triplet loss values wouldn’t allow. 

      The triplet loss method was chosen because it has been shown to yield useful lowdimensional representations of data, even in cases where there is limited labeled training data (Thakur et al. 2019). While we have one of the largest manually annotated datasets of zebra finch songs, it is still quite small by industry deep learning standards, which is why we chose a method that would perform well given the size of our dataset. Training a model on a contrast index directly would be extremely computationally intensive and require many more pairs of birds with known relationships than we currently have access to. It could be an interesting approach to take in the future, but one that would be unlikely to perform well with a dataset size typical to songbird research. 

      Also, usually, in solid machine learning work, diverse methods are compared against each other to identify their relative strengths. The paper contains almost none of this, e.g. authors examined only one clustering method (HDBSCAN). 

      We did compare multiple methods for syllable segmentation (WhisperSeg,  TweetyNet, and Amplitude thresholding) as this hadn’t been done previously. We chose not to perform extensive comparison of different clustering methods as Sainburg et al. 2020 already did so and we felt no need to reduplicate this effort. We encourage this reviewer to refer to Sainburg et al.’s excellent work for comparisons of multiple clustering methods applied to zebra finch song syllables.  

      (3) Performance issues. The authors want to 'simplify large-scale behavioral analysis' but it seems they want to do that at a high cost. (Gu et al 2023) achieved syllable scores above 0.99 for adults, which is much larger than the average score of 0.88 achieved here (L121). Similarly, the syllable scores in (Cohen et al 2022) are above 94% (their error rates are below 6%, albeit in Bengalese finches, not zebra finches), which is also better than here. Why is the performance of AVN so low? The low scores of AVN argue in favor of some human labeling and training on each bird. 

      Firstly, the syllable error rate scores reported in Cohen et al. 2022 are calculated very differently than the F1 scores we report here and are based on a model trained with data from the same bird as was used in testing, unlike our more general segmentation approach where the model was tested on different birds than were used in testing. Thus, the scores reported in Cohen et al. and the F1 scores that we report cannot be compared. 

      The discrepancy between the F1seg scores reported in Gu et al. 2023 and the segmentation F1 scores that we report are likely due to differences in the underlying datasets. Our UTSW recordings tend to have higher levels of both stationary and nonstationary background noise, which make segmentation more challenging. The recordings from Rockefeller were less contaminated by background noise, and they resulted in slightly higher F1 scores. That said, we believe that the primary factor accounting for this difference in scores with Gu et al. 2023 is the granularity of our ‘ground truth’ syllable segments. In our case, if there was ever any ambiguity as to whether vocal elements should be segmented into two short syllables with a very short gap between them or merged into a single longer syllable, we chose to split them. WhisperSeg had a strong tendency to merge the vocal elements in ambiguous cases such as these. This results in a higher rate of false negative syllable onset detections, reflected in the low recall scores achieved by WhisperSeg (see supplemental figure 2b), but still very high precision scores (supplemental figure 2a). While WhisperSeg did frequently merge these syllables in a way that differed from our ground truth segmentation, it did so consistently, meaning it had little impact on downstream measures of syntax entropy (Fig 3c) or syllable duration entropy (supplemental figure 7a). It is for that reason that, despite a lower F1 score, we still consider AVN’s automatically generated annotations to be sufficiently accurate for downstream analyses. 

      Should researchers require a higher degree of accuracy and precision with their annotations (for example, to detect very subtle changes in song before and after an acute manipulation) and be willing to dedicate the time and resources to manually labeling a subset of recordings from each of their birds, we suggest they turn toward one of the existing tools for supervised song annotation, such as TweetyNet.  

      (4) Texas bias. It is true that comparability across datasets is enhanced when everyone uses the same code. However, the authors' proposal essentially is to replace the bias between labs with a bias towards birds in Texas. The comparison with Rockefeller birds is nice, but it amounts to merely N=1. If birds in Japanese or European labs have evolved different song repertoires, the AVN might not capture the associated song features in these labs well. 

      We appreciate the reviewer’s concern about a bias toward birds from the UTSW colony. However, this paper shows that despite training (for the similarity scoring) and hyperparameter fitting (for the HDBSCAN clustering) on the UTSW birds, AVN performs as well if not better on birds from Rockefeller than from UTSW. To our knowledge, there are no publicly available datasets of annotated zebra finch songs from labs in Europe or in Asia but we would be happy to validate AVN on such datasets, should they become available. Furthermore, there is no evidence to suggest that there is dramatic drift in zebra finch vocal repertoire between continents which would necessitate such additional validation. While we didn’t have manual annotations for this dataset (which would allow validation of our segmentation and labeling methods), we did apply AVN to recordings share with us by the Wada lab in Japan, where visual inspection of the resulting annotations suggested comparable accuracy to the UTSW and Rockefeller datasets.  

      (5) The paper lacks an analysis of the balance between labor requirement, generalizability, and optimal performance. For tasks such as segmentation and labeling, fine-tuning for each new dataset could potentially enhance the model's accuracy and performance without compromising comparability. E.g. How many hours does it take to annotate hundred song motifs? How much would the performance of AVN increase if the network were to be retrained on these? The paper should be written in more neutral terms, letting researchers reach their own conclusions about how much manual labor they want to put into their data. 

      With standardization and ease of use in mind, we designed AVN specifically to perform fully automated syllable annotation and downstream feature calculations. We believe that we have demonstrated in this manuscript that our fully automated approach is sufficiently reliable for downstream analyses across multiple zebra finch colonies. That said, if researchers require an even higher degree of annotation precision and accuracy, they can turn toward one of the existing methods for supervised song annotation, such as TweetyNet. Incorporating human annotations for each bird processed by AVN is likely to improve its performance, but this would require significant changes to AVN’s methodology and is outside the scope of our current efforts.  

      (6) Full automation may not be everyone's wish. For example, given the highly stereotyped zebra finch songs, it is conceivable that some syllables are consistently mis-segmented or misclassified. Researchers may want to be able to correct such errors, which essentially amounts to fine-tuning AVN. Conceivably, researchers may want to retrain a network like the AVN on their own birds, to obtain a more fine-grained discriminative method. 

      Other methods exist for supervised or human-in-the-loop annotation of zebra finch songs, such as TweetyNet and DAN (Alam et al. 2023). We invite researchers who require a higher degree of accuracy than AVN can provide to explore these alternative approaches for song annotation. Incorporating human annotations for each individual bird being analyzed using AVN was never the goal of our pipeline, would require significant changes to AVN’s design, and is outside the scope of this manuscript.  

      (7) The analysis is restricted to song syllables and fails to include calls. No rationale is given for the omission of calls. Also, it is not clear how the analysis deals with repeated syllables in a motif, whether they are treated as two-syllable types or one. 

      It is true that we don’t currently have any dedicated features to describe calls. This could be a useful addition to AVN in the future. 

      What a human expert inspecting a spectrogram would typically call ‘repeated syllables’ in a bout are almost always assigned the same syllable label by the UMAP+HDBSCAN clustering. The syntax analysis module includes features examining the rate of syllable repetitions across syllable types. See https://avn.readthedocs.io/en/latest/syntax_analysis_demo.html#SyllableRepetitions

      (8) It seems not all human annotations have been released and the instruction sets given to experts (how to segment syllables and score songs) are not disclosed. It may well be that the differences in performance between (Gu et al 2023) and (Cohen et al 2022) are due to differences in segmentation tasks, which is why these tasks given to experts need to be clearly spelled out. Also, the downloadable files contain merely labels but no identifier of the expert. The data should be released in such a way that lets other labs adopt their labeling method and cross-check their own labeling accuracy. 

      All human annotations used in this manuscript have indeed been released as part of the accompanying dataset. Syllable annotations are not provided for all pupils and tutors used to validate the similarity scoring, as annotations are not necessary for similarity comparisons. We will expand our description of our annotation guidelines in the methods section of the revised manuscript. All the annotations were generated by one of two annotators. The second annotator always consulted with the first annotator in cases of ambiguous syllable segmentation or labeling, to ensure that they had consistent annotation styles. Unfortunately, we haven’t retained records about which birds were annotated by which of the two annotators, so we cannot share this information along with the dataset. The data is currently available in a format that should allow other research groups to use our annotations either to train their own annotation systems or check the performance of their existing systems on our annotations.  

      (9) The failure modes are not described. What segmentation errors did they encounter, and what syllable classification errors? It is important to describe the errors to be expected when using the method. 

      As we discussed in our response to this reviewer’s point (3), WhisperSeg has a tendency to merge syllables when the gap between them is very short, which explains its lower recall score compared to its precision on our dataset (supplementary figure 2). In rare cases, WhisperSeg also fails to recognize syllables entirely, again impacting its precision score. TweetyNet hardly ever completely ignores syllables, but it does tend to occasionally merge syllables together or over-segment them. Whereas WhisperSeg does this very consistently for the same syllable types within the same bird, TweetyNet merges or splits syllables more inconsistently. This inconsistent merging and splitting has a larger effect on syllable labeling, as manifested in the lower clustering v-measure scores we obtain with TweetyNet compared to WhisperSeg segmentations. TweetyNet also has much lower precision than WhisperSeg, largely because TweetyNet often recognizes background noises (like wing flaps or hopping) as syllables whereas WhisperSeg hardly ever segments nonvocal sounds. 

      Many errors in syllable labeling stem from differences in syllable segmentation. For example, if two syllables with labels ‘a’ and ‘b’ in the manual annotation are sometimes segmented as two syllables, but sometimes merged into a single syllable, the clustering is likely to find 3 different syllable types; one corresponding to ‘a’, one corresponding to ‘b’ and one corresponding to ‘ab’ merged. Because of how we align syllables across segmentation schemes for the v-measure calculation, this will look like syllable ‘b’ always has a consistent cluster label, but syllable ‘a’ can carry two different cluster labels, depending on the segmentation. In certain cases, even in the absence of segmentation errors, a group of syllables bearing the same manual annotation label may be split into 2 or 3 clusters (it is extremely rare for a single manual annotation group to be split into more than 3 clusters). In these cases, it is difficult to conclusively say whether the clustering represents an error, or if it actually captured some meaningful systematic difference between syllables that was missed by the annotator. Finally, sometimes rare syllable types with their own distinct labels in the manual annotation are merged into a single cluster. Most labeling errors can be explained by this kind of merging or splitting of groups relative to the manual annotation, not to occasional mis-classifications of one manual label type as another. 

      For examples of these types of errors, we encourage this reviewer and readers to refer to the example confusion matrices in figure 2f and supplemental figure 4b&e. We will also expand our discussion of these different types of errors in the revised manuscript. 

      (10) Usage of Different Dimensionality Reduction Methods: The pipeline uses two different dimensionality reduction techniques for labeling and similarity comparison - both based on the understanding of the distribution of data in lower-dimensional spaces. However, the reasons for choosing different methods for different tasks are not articulated, nor is there a comparison of their efficacy. 

      We apologize for not making this distinction sufficiently clear in the manuscript and will add additional explanation to the main text to make the reasoning more apparent. We chose to use UMAP for syllable labeling because it is a common embedding methodology to precede hierarchical clustering and has been shown to result in reliable syllable labels for birdsong in the past (Sainburg et al. 2020). However, it is not appropriate for similarity scoring, because comparing EMD scores between birds requires that all the birds’ syllable distributions exist within the same shared embedding space. This can be achieved by using the same triplet loss-trained neural network model to embed syllables from all birds. This cannot be achieved with UMAP because all birds whose scores are being compared would need to be embedded in the same UMAP space, as distances between points cannot be compared across UMAPs. In practice, this would mean that every time a new tutor-pupil pair needs to be scored, their syllables would need to be added to a matrix with all previously compared birds’ syllables, a new UMAP would need to be computed, and new EMD scores between all bird pairs would need to be calculated using their new UMAP embeddings. This is very computationally expensive and quickly becomes unfeasible without dedicated high power computing infrastructure. It also means that similarity scores couldn’t be compared across papers without recomputing everything each time, whereas EMD scores obtained with triplet loss embeddings can be compared, provided they use the same trained model (which we provide as part of AVN) to embed their syllables in a common latent space.  

      (11) Reproducibility: are the measurements reproducible? Systems like UMAP always find a new embedding given some fixed input, so the output tends to fluctuate. 

      There is indeed a stochastic element to UMAP embeddings which will result in different embeddings and therefore different syllable labels across repeated runs with the same input. Anecdotally, we observed that v-measures scores were quite consistent within birds across repeated runs of the UMAP, but we will add an additional supplementary figure to the revised manuscript showing this.

    1. Author response:

      The following is the authors’ response to the original reviews.

      General Response

      We are grateful for the constructive comments from reviewers and the editor.

      The main point converged on a potential alternative interpretation that top-down modulation to the visual cortex may be contributing to the NC connectivity we observed. For this revision, we address that point with new analysis in Fig. S8 and Fig. 6. These results indicate that top-down modulation does not account for the observed NC connectivity.

      We performed the following analyses.

      (1) In a subset of experiments, we recorded pupil dynamics while the mice were engaged in a passive visual stimulation experiment (Fig. S8A). We found that pupil dynamics, which indicate the arousal state of the animal, explained only 3% of the variance of neural dynamics. This is significantly smaller than the contribution of sensory stimuli and the activity of the surrounding neuronal population (Fig. S8B). In particular, the visual stimulus itself typically accounted for 10-fold more variance than pupil dynamics (Fig. S8C). This suggests that the population neural activity is highly stimulus-driven and that a large portion of functional connectivity is independent of top-down modulation. In addition, after subtracting the neural activity from the pupil-modulated portion, the cross-stimulus stability of the NC was preserved (Fig. S8D).

      We note that the contribution from pupil dynamics to neural activity in this study is smaller than what was observed in an earlier study (Stringer et al. 2019 Science). That can be because mice were in quiet wakefulness in the current study, while mice were in spontaneous locomotion in the earlier study. We discuss this discrepancy in the main text, in the subsection “Functional connectivity is not explained by the arousal state”.

      (2) We performed network simulations with top-down input (Fig. 6F-H). With multidimensional top-down input comparable to the experimental data, recurrent connections within the network are necessary to generate cross-stimulus stable NC connectivity (Fig. 6G). It took increasing the contribution from the top-down input (i.e., to more than 1/3 of the contribution from the stimulus), before the cross-stimulus NC connectivity can be generated by the top-down modulation (Fig. 6H). Thus, this analysis provides further evidence that top-down modulation was not playing a major role in the NC connectivity we observed.

      These new results support our original conclusion that network connectivity is the principal mechanism underlying the stability of functional networks.

      Public Reviews:

      Reviewer #1 (Public Review):

      Using multi-region two-photon calcium imaging, the manuscript meticulously explores the structure of noise correlations (NCs) across the mouse visual cortex and uses this information to make inferences about the organization of communication channels between primary visual cortex (V1) and higher visual areas (HVAs). Using visual responses to grating stimuli, the manuscript identifies 6 tuning groups of visual cortex neurons and finds that NCs are highest among neurons belonging to the same tuning group whether or not they are found in the same cortical area. The NCs depend on the similarity of tuning of the neurons (their signal correlations) but are preserved across different stimulus sets - noise correlations recorded using drifting gratings are highly correlated with those measured using naturalistic videos. Based on these findings, the manuscript concludes that populations of neurons with high NCs constitute discrete communication channels that convey visual signals within and across cortical areas.

      Experiments and analyses are conducted to a high standard and the robustness of noise correlation measurements is carefully validated. However, the interpretation of noise correlation measurements as a proxy from network connectivity is fraught with challenges. While the data clearly indicates the existence of distributed functional ensembles, the notion of communication channels implies the existence of direct anatomical connections between them, which noise correlations cannot measure.

      The traditional view of noise correlations is that they reflect direct connectivity or shared inputs between neurons. While it is valid in a broad sense, noise correlations may reflect shared top-down input as well as local or feedforward connectivity. This is particularly important since mouse cortical neurons are strongly modulated by spontaneous behavior (e.g. Stringer et al, Science, 2019). Therefore, noise correlation between a pair of neurons may reflect whether they are similarly modulated by behavioral state and overt spontaneous behaviors. Consequently, noise correlation alone cannot determine whether neurons belong to discrete communication channels.

      Behavioral modulation can influence the gain of sensory-evoked responses (Niell and Stryker, Neuron, 2010). This can explain why signal correlation is one of the best predictors of noise correlations as reported in the manuscript. A pair of neurons that are similarly gain-modulated by spontaneous behavior (e.g. both active during whisking or locomotion) will have higher noise correlations if they respond to similar stimuli. Top-down modulation by the behavioral state is also consistent with the stability of noise correlations across stimuli. Therefore, it is important to determine to what extent noise correlations can be explained by shared behavioral modulation.

      We thank the reviewer for the constructive and positive feedback on our study.

      The reviewer acknowledged the quality of our experiments and analysis and stated a concern that the noise correlation can be explained by top-down modulation. We have addressed this concern carefully in the revision, please see the General Response above.

      Reviewer #2 (Public Review):

      Summary:

      This groundbreaking study characterizes the structure of activity correlations over a millimeter scale in the mouse cortex with the goal of identifying visual channels, specialized conduits of visual information that show preferential connectivity. Examining the statistical structure of the visual activity of L2/3 neurons, the study finds pairs of neurons located near each other or across distances of hundreds of micrometers with significantly correlated activity in response to visual stimulation. These highly correlated pairs have closely related visual tuning sharing orientation and/or spatial and/or temporal preference as would be expected from dedicated visual channels with specific connectivity.

      Strengths:

      The study presents best-in-class mesoscopic-scale 2-photon recordings from neuronal populations in pairs of visual areas (V1-LM, V1-PM, V1-AL, V1-LI). The study employs diverse visual stimuli that capture some of the specialization and heterogeneity of neuronal tuning in mouse visual areas. The rigorous data quantification takes into consideration functional cell groups as well as other variables that influence trial-to-trial correlations (similarity of tuning, neuronal distance, receptive field overlap). The paper convincingly demonstrates the robustness of the clustering analysis and of the activity correlation measurements. The calcium imaging results convincingly show that noise correlations are correlated across visual stimuli and are strongest within cell classes which could reflect distributed visual channels. A simple simulation is provided that suggests that recurrent connectivity is required for the stimulus invariance of the results. The paper is well-written and conceptually clear. The figures are beautiful and clear. The arguments are well laid out and the claims appear in large part supported by the data and analysis results (but see weaknesses).

      Weaknesses:

      An inherent limitation of the approach is that it cannot reveal which anatomical connectivity patterns are responsible for observed network structure. The modeling results presented, however, suggest interestingly that a simple feedforward architecture may not account for fundamental characteristics of the data. A limitation of the study is the lack of a behavioral task. The paper shows nicely that the correlation structure generalizes across visual stimuli. However, the correlation structure could differ widely when animals are actively responding to visual stimuli. I do think that, because of the complexity involved, a characterization of correlations during a visual task is beyond the scope of the current study.

      An important question that does not seem addressed (but it is addressed indirectly, I could be mistaken) is the extent to which it is possible to obtain reliable measurements of noise correlation from cell pairs that have widely distinct tuning. L2/3 activity in the visual cortex is quite sparse. The cell groups laid out in Figure S2 have very sharp tuning. Cells whose tuning does not overlap may not yield significant trial-to-trial correlations because they do not show significant responses to the same set of stimuli, if at all any time. Could this bias the noise correlation measurements or explain some of the dependence of the observed noise correlations on signal correlations/similarity of tuning? Could the variable overlap in the responses to visual responses explain the dependence of correlations on cell classes and groups?

      With electrophysiology, this issue is less of a problem because many if not most neurons will show some activity in response to suboptimal stimuli. For the present study which uses calcium imaging together with deconvolution, some of the activity may not be visible to the experimenters. The correlation measure is shown to be robust to changes in firing rates due to missing spikes. However, the degree of overlap of responses between cell pairs and their consequences for measures of noise correlations are not explored.

      Beyond that comment, the remaining issues are relatively minor issues related to manuscript text, figures, and statistical analyses. There are typos left in the manuscript. Some of the methodological details and results of statistical testing also seem to be missing. Some of the visuals and analyses chosen to examine the data (e.g., box plots) may not be the most effective in highlighting differences across groups. If addressed, this would make a very strong paper.

      We thank the reviewer for acknowledging the contributions of our study.

      We agree with the reviewer that future studies on behaviorally engaged animals are necessary. Although we also agree with the reviewer that behavior studies are out the scope of the current manuscript, we have included additional analysis and discussion on whether and how top-down input would affect the NC connectivity in the revision. Please see the General Response above.

      Reviewer #3 (Public Review):

      Summary:

      Yu et al harness the capabilities of mesoscopic 2P imaging to record simultaneously from populations of neurons in several visual cortical areas and measure their correlated variability. They first divide neurons into 65 classes depending on their tuning to moving gratings. They found the pairs of neurons of the same tuning class show higher noise correlations (NCs) both within and across cortical areas. Based on these observations and a model they conclude that visual information is broadcast across areas through multiple, discrete channels with little mixing across them.

      NCs can reflect indirect or direct connectivity, or shared afferents between pairs of neurons, potentially providing insight on network organization. While NCs have been comprehensively studied in neuron pairs of the same area, the structure of these correlations across areas is much less known. Thus, the manuscripts present novel insights into the correlation structure of visual responses across multiple areas.

      Strengths:

      The study uses state-of-the art mesoscopic two-photon imaging.

      The measurements of shared variability across multiple areas are novel.

      The results are mostly well presented and many thorough controls for some metrics are included.

      Weaknesses:

      I have concerns that the observed large intra-class/group NCs might not reflect connectivity but shared behaviorally driven multiplicative gain modulations of sensory-evoked responses. In this case, the NC structure might not be due to the presence of discrete, multiple channels broadcasting visual information as concluded. I also find that the claim of multiple discrete broadcasting channels needs more support before discarding the alternative hypothesis that a continuum of tuning similarity explains the large NCs observed in groups of neurons.

      Specifically:

      Major concerns:

      (1) Multiplicative gain modulation underlying correlated noise between similarly tuned neurons

      (1a) The conclusion that visual information is broadcasted in discrete channels across visual areas relies on interpreting NC as reflecting, direct or indirect connectivity between pairs, or common inputs. However, a large fraction of the activity in the mouse visual system is known to reflect spontaneous and instructed movements, including locomotion and face movements, among others. Running activity and face movements are some of the largest contributors to visual cortex activity and exert a multiplicative gain on sensory-evoked responses (Niell et al, Stringer et al, among others). Thus, trial-by-fluctuations of behavioral state would result in gain modulations that, due to their multiplicative nature, would result in more shared variability in cotuned neurons, as multiplication affects neurons that are responding to the stimulus over those that are not responding ( see Lin et al, Neuron 2015 for a similar point).<br /> As behavioral modulations are not considered, this confound affects most of the conclusions of the manuscript, as it would result in larger NCs the more similar the tuning of the neurons is, independently of any connectivity feature. It seems that this alternative hypothesis can explain most of the results without the need for discrete broadcasting channels or any particular network architecture and should be addressed to support its main claims.

      (1b) In Figure 5 the observations are interpreted as evidence for NCs reflecting features of the network architecture, as NCs measured using gratings predicted NC to naturalistic videos. However, it seems from Figure 5 A that signal correlations (SCs) from gratings had non-zero correlations with SCs during naturalistic videos (is this the case?). Thus, neurons that are cotuned to gratings might also tend to be coactivated during the presentation of videos. In this case, they are also expected to be susceptible to shared behaviorally driven fluctuations, independently of any circuit architecture as explained before. This alternative interpretation should be addressed before concluding that these measurements reflect connectivity features.

      We thank the reviewer for acknowledging the contributions of our study.

      The reviewer suggested that gain modulation might be interfering with the interpretation of the NC connectivity. We have addressed this issue in the General Response above.

      Here, we will elaborate on one additional analysis we performed, in case it might be of interest. We carried out multiplicative gain modeling by implementing an established method (Goris et al. 2014 Nat Neurosci) on our dataset. We were able to perform the modeling work successfully. However, we found that it is not a suitable model for explaining the current dataset because the multiplicative gain induced a negative correlation. This seemed odd but can be explained. First, top-down input is not purely multiplicative but rather both additive and multiplicative. Second, the top-down modulation is high dimensional. Third, the firing rate of layer 2/3 mouse visual cortex neurons is lower than the firing rates for non-human primate recordings used in the development of the method (Goris et al. 2014 Nat Neurosci). Thus, we did not pursue the model further. We just mention it here in case the outcome might be of interest to fellow researchers.

      (2) Discrete vs continuous communication channels

      (2a) One of the author's main claims is that the mouse cortical network consists of discrete communication channels. This discreteness is based on an unbiased clustering approach to the tuning of neurons, followed by a manual grouping into six categories in relation to the stimulus space. I believe there are several problems with this claim. First, this clustering approach is inherently trying to group neurons and discretise neural populations. To make the claim that there are 'discrete communication channels' the null hypothesis should be a continuous model. An explicit test in favor of a discrete model is lacking, i.e. are the results better explained using discrete groups vs. when considering only tuning similarity? Second, the fact that 65 classes are recovered (out of 72 conditions) and that manual clustering is necessary to arrive at the six categories is far from convincing that we need to think about categorically different subsets of neurons. That we should think of discrete communication channels is especially surprising in this context as the relevant stimulus parameter axes seem inherently continuous: spatial and temporal frequency. It is hard to motivate the biological need for a discretely organized cortical network to process these continuous input spaces.

      (2b) Consequently, I feel the support for discrete vs continuous selective communication is rather inconclusive. It seems that following the author's claims, it would be important to establish if neurons belong to the same groups, rather than tuning similarity is a defining feature for showing large NCs.

      Thanks for pointing this out so that we can clarify.

      We did not mean to argue that the tuning of neurons is discrete. Our conclusions are not dependent on asserting a particular degree of discreteness. We performed GMM clustering to label neurons with an identity so that we could analyze the NC connectivity structure with a degree of granularity supported by the data. Our analysis suggested that communication happens within a class, rather than through mixed classes. We realized that using the term “discrete” may be confusing. In the revised text we used the term “unmixed” or “non-mixing” instead to emphasize that the communication happens between neurons belonging to the same tuning cluster, or class. 

      However, we do see how the question of discreteness among classes might be interesting to readers. To provide further information, we have included a new Fig. S2 to visualize the GMM classes using t-SNE embedding.

      Finally, as stated in point 1, the larger NCs observed within groups than across groups might be due to the multiplicative gain of state modulations, due to the larger tuning similarity of the neurons within a class or group.

      We have addressed this issue in the General Response above and the response to comment (1).

      Recommendations for the authors:

      Reviewing Editor (Recommendations For The Authors):

      A general recommendation discussed with the reviewers is to make use of behavioural recording to assess whether shared behaviourally driven modulations can explain the observed relation between SC and NC, independently of the network architecture. Alternatively, a simulation or model might also address this point as well as the possibility that the relation of SC and NC might be also independent of network architecture given the sparseness of the sensory responses in L2/3.

      We have addressed this in the General Response above.

      Broadly speaking, inferring network architecture based on NCs is extremely challenging. Consequently, the study could also be substantially improved by reframing the results in terms of distributed co-active ensembles without insinuation of direct anatomical connectivity between them.

      We agree that the inferring network architecture based on NCs is challenging. The current study has revealed some principles of functional networks measured by NCs, and we showed that cross-stimulus NC connectivity provides effective constraints to network modeling. We are explicit about the nature of NCs in the manuscript. For example, in the Abstract, we write “to measure correlated variability (i.e., noise correlations, NCs)”, and in the Introduction, we write “NCs are due to connectivity (direct or indirect connectivity between the neurons, and/or shared input)”. We are following conventions in the field (e.g., Sporns 2016; Cohen and Kohn 2011).

      Notice also that the abstract or title should make clear that the study was made in mice.

      Sorry for the confusion, we now clearly state the study was carried out in mice in the Abstract and Introduction.

      Reviewer #1 (Recommendations For The Authors):

      The manuscript presents a meticulous characterization of noise correlations in the visual cortical network. However, as I outline in the public review, I think the use of noise correlations to infer communication channels is problematic and I urge the authors to carefully consider this terminology. Language such as "strength of connections" (Figure 4D) should be avoided.

      We now state in the figure legend that the plot in Fig. 4D shows the average NC value.

      My general suggestion to the authors, which primarily concerns the interpretation of analyses in Figures 4-6, is to consider the possible impact of shared top-down modulation on noise correlations. If behavioral data was recorded simultaneously (e.g. using cameras to record face and body movements), behavioral modulation should be considered alongside signal correlation as a possible factor influencing NCs.

      We have addressed this issue in the General Response above.

      I may be misunderstanding the analysis in Figure 4C but it appears circular. If the fraction of neurons belonging to a particular tuning group is larger, then the number of in-group high NC pairs will be higher for that group even if high NC pairs are distributed randomly. Can you please clarify? I frankly do not understand the analysis in Figure 4D and it is unclear to me how the analyses in Figure 4C-D address the hypotheses depicted in the cartoons.

      Sorry for the confusion, we have clarified this in the Fig. 4 legend.

      Each HVA has a SFTF bias (Fig. 1E,F; Marshel et al., 2011; Andermann et al., 2011; Vries et al., 2020). Each red marker on the graph in Fig. 4C is a single V1-HVA pair (blue markers are within an area) for a particular SFTF group (Fig. 1). The x-axis indicates the number of high NC pairs in the SFTF group in the V1-HVA pair divided by the total number of high NC pairs per that V1-HVA pair (summed over all SFTF groups). The trend is that for HVAs with a bias towards a particular SFTF group, there are also more high NC pairs in that SFTF group, and thus it is consistent with the model on the right side. This is not circular because it is possible to have a SFTF bias in an HVA and have uniformly low NCs. The reviewer is correct that a random distribution of high NCs could give a similar effect, which is still consistent with the model: that the number of high NC pairs (and not their specific magnitudes) can account for SFTF biases in HVAs.

      To contrast with that model, we tested whether the average NC value for each tuning group varies. That is, can a small number of very high NCs account for SFTF biases in HVAs? That is what is examined in Fig. 4D. We found that the average NC value does not account for the SFTF biases. Thus, the SFTF biases were not related to the modulation in NC (i.e., functional connection strength). 

      I found the discussion section quite odd and did not understand the relevance of the discussion of the coefficient of variation of various quantities to the present manuscript. It would be more useful to discuss the limitations and possible interpretations of noise correlation measurements in more detail.

      We have revised the discussion section to focus on interpreting the results of the current study and comparing them with those of previous studies.

      Figure 3B: please indicate what the different colors mean - I assume it is the same as Figure 3A but it is unclear.

      We added text to the legend for clarification.

      Typos: Page 7: "direct/indirection wiring", Page 11: "pooled over all texted areas"

      We have fixed the typos.

      Reviewer #2 (Recommendations For The Authors):

      The significance of the results feels like it could be articulated better. The main conclusion is that V1 to HVA connections avoid mixing channels and send distinctly tuned information along distinct channels - a more explicit description of what this functional network understanding adds would be useful to the reader.

      Thanks for the suggestion. We have edited the introduction section and the discussion section to make the take-home message more clear.

      Previous studies with anatomical data already indicate distinctly tuned channels - several of which the authors cite - although inconsistently:

      • Kim et al 2018 https://doi.org/10.1016/j.neuron.2018.10.023

      • Glickfeld et al., 2013 (cited)

      • Han et al., 2022 (cited)

      • Han and Bonin 2023 (cited)

      Thanks for the suggestion, we now cite the Kim et al. 2018 paper.

      I think the information you provide is valuable - but the value should be more clearly spelled out - This section from the end of the discussion for example feels like abdicates that responsibility:<br /> "In summary, mesoscale two-photon imaging techniques open up the window of cellular-resolution functional connectivity at the system level. How to make use of the knowledge of functional connectivity remains unclear, given that functional connectivity provides important constraints on population neuron behavior."

      A discussion of how the results relate to previous studies and a section on the limitations of the study seems warranted.

      Thanks for the suggestion, we have extensively edited the discussion section to make the take-home message clear and discuss prior studies and limitations of the present study.

      Details:

      Analyses or simulations showing that the dependency of correlations on similarity of tuning is not an artifact of how the data was acquired is in my mind missing and if that is the case it is crucial that this be addressed.

      At each step of data analysis, we performed control analysis to assess the fidelity of the conclusion. For example, on the spike train inference (Fig. S4), GMM clustering (Fig. S1), and noise correlation analysis (Figs. 2, S5).

      None of the statistical testing seems to use animals as experimental units (instead of neurons). This could over-inflate the significance of the results. Wherever applicable and possible, I would recommend using hierarchical bootstrap for testing or showing that the differences observed are reproducible across animals.

      We analyzed the tuning selectivity of HVAs (Fig. 1F) using experimental units, rather than neurons. It is very difficult to observe all tuning classes in each experiment, so pooling neurons across animals is necessary for much of the analysis. We do take care to avoid overstating statistical results, and we show the data points in most figure to give the reader an impression of the distributions.

      Page 2. "The number of neurons belonged to the six tuning groups combined: V1, 5373; LM, 1316; AL, 656; PM, 491; LI, 334." Yet the total recorded number of neurons is 17,990. How neurons were excluded is mentioned in Methods but it should be stated more explicitly in Results.

      We have added text in the Fig. 1 legend to direct the audience to the Methods section for information on the exclusion / inclusion criteria.

      Figure 1C, left. I don't understand how correlation is the best way to quantify the consistency of class center with a subset of data. Why not use for example as the mean square error. The logic underlying this analysis is not explained in Methods.

      Sorry for the confusion, we have clarified this in the Methods section.

      We measured the consistency of the centers of the Gaussian clusters, which are 45-dimensional vectors in the PC dimensions. We measured the Pearson correlation of Gaussian center vectors independently defined by GMM clustering on random subsets of neurons. We found the center of the Gaussian profile of each class was consistent (Fig. 1C). The same class of different GMMs was identified by matching the center of the class.

      Figure 1E. There are statements in the text about cell groups being more represented in certain visual areas. These differences are not well represented in the box plots. Can't the individual data points be plotted? I have also not found the description and results of statistical testing for these data.

      We have replotted the figure (now Fig. 1F) with dot scatters which show all of the individual experiments.

      Figure 2A, right, since these are paired data, I am not quite sure why only marginal distributions are shown. It would be interesting to know the distributions of correlations that are significant.

      This is only for illustration showing that NCs are measurable and significantly different from zero or shuffled controls. The distribution of NCs is broad and has both positive and negative values. We are not using this for downstream analysis.

      Figure 4A, I wonder if it would not be better to concentrate on significant correlations.

      We focused on large correlation values rather than significant values because we wanted to examine the structure of “strongly connected” neuron pairs. Negative and small correlation values can be significant as well. Focusing on large values would allow us to generate a clear interpretation.  

      Figure 4B, 'Mean strength of connections' which I presume mean correlations is not defined anywhere that I can see.

      I believe the reviewer means Fig. 4D. It means the average NC value. We have edited the figure legend to add clarity.

      Figure 4F, a few words explaining how to understand the correlation matrix in text or captions would be helpful.

      Sorry for the confusion, we have clarified this part in figure legend for Fig. 4F.

      Page 5, right column: Incomplete sentence: "To determine whether it is the number of high NC pairs or the magnitude of the NCs,".

      We have edited this sentence.

      Page 5, right column: "Prior findings from studies of axonal projections from V1 to HVAs indicated that the number of SF-TF-specific boutons -rather than the strength of boutons- contribute to the SF-TF biases among HVAs (Glickfeld et al., 2013)." Glickfeld et al. also reported that boutons with tuning matched to the target area showed stronger peak dF/F responses.

      Thank you. We have revised this part accordingly.

      Page 9, the Discussion and Figure 7 which situates the study results in a broader context is welcome and interesting, but I have the feeling that more words should be spent explaining the figure and conceptual framework to a non-expert audience. I am a bit at a loss about how to read the information in the figure.

      Sorry for the confusion, we have added an explanation about this section (page 10, right column).

      As far as I can see, data availability is not addressed in the manuscript. The data, code to analyze the data and generate the figures, and simulation code should be made available in a permanent public repository. This includes data for visual area mapping, calcium imaging data, and any data accessory to the experiments.

      We have stated in the manuscript that code and data are available upon request. We regularly share data with no conditions (e.g., no entitlement to authorship), and we often do so even prior to publication.

      The sex of the mice should be indicated in Figure T1.

      The sex of the mice was mixed. This is stated in the Methods section.

      Methods:

      Section on statistical testing, computation of explained variance missing, etc. I feel many analyses are not thoroughly described.

      Sorry for the confusion, we have improved our method section.

      Signal correlation (similarity between two neurons' average responses to stimuli) and its relation to noise correlation is not formally defined.

      We have included the definition of signal correlation in the Methods.

      Number of visual stimulation trials is not stated in Methods. Only stated figure caption.

      The number of visual stimulus trials is provided in the last paragraph of the Methods section (Visual Stimuli).

      Fix typos: incorrect spelling, punctuation, and missing symbols (e.g. closing parentheses).

      We have carefully examined the spelling, punctuation, and grammar. We have corrected errors and we hope that none remain.

      Why use intrinsic imaging to locate retinotopic boundaries in mice already expressing GCaMP6s?

      We agree with the reviewer that calcium imaging of visual cortex can be used to identify the visual cortex.

      It is true that areas can be mapped using the GCaMP signals. That is not our preferred approach. Using intrinsic imaging to define the boundary between V1 and HVAs has been a well refined routine in our lab for over a decade. It is part of our standard protocol. One advantage is that the data (from intrinsic signals) is of the same nature every time. This enables us to use the same mapping procedure no matter what reporters mice might be expressing (and the pattern, e.g., patchy or restricted to certain cell types).

      Reviewer #3 (Recommendations For The Authors):

      The possibilty that larger intra-group NCs observed simply reflect a multiplicative gain on cotuned neurons could be addressed using pupil and/or face recordings: Does pupil size or facial motion predict NCs and if factored out, does signal correlation still predict NCs?

      Perhaps a variant of the network model presented in Figure 6 with multiplicative gain could also be tested to investigate these issues.

      We have addressed this issue in general response.

      Here, we will elaborate on one additional analysis we performed, in case it might be of interest. We carried out multiplicative gain modeling by implementing an established method (Goris et al. 2014 Nat Neurosci) on our dataset. We were able to perform the modeling work successfully. However, we found that it is not a suitable model for explaining the current dataset because the multiplicative gain induced a negative correlation. This seemed odd but can be explained. First, top-down input is not purely multiplicative but rather both additive and multiplicative. Second, the top-down modulation is high dimensional. Third, the firing rate of layer 2/3 mouse visual cortex neurons is lower than the firing rates for non-human primate recordings used in the development of the method (Goris et al. 2014 Nat Neurosci). Thus, we did not pursue the model further. We just mention it here in case the outcome might be of interest to fellow researchers.

      Similarly further analyses can be done to strengthen support for the claims that the observed NCs reflect discrete communication channels. A direct test of continuous vs categorical channels would strengthen the conclusions. One possible analysis would be to compare pairs with similar tuning (same SC) belonging to the same or different groups.

      Thanks for pointing this out so that we can clarify.

      We did not mean to argue that the tuning of neurons is discrete. Our conclusions are not dependent on asserting a particular degree of discreteness. We performed GMM clustering to label neurons with an identity so that we could analyze the NC connectivity structure with a degree of granularity supported by the data. Our analysis suggested that communication happens within a class, rather than through mixed classes. We realized that using the term “discrete” may be confusing. In the revised text we used the term “unmixed” or “non-mixing” instead to emphasize that the communication happens between neurons belonging to the same tuning cluster, or class. 

      However, we do see how the question of discreteness among classes might be interesting to readers. To provide further information, we have included a new Fig. S2 to visualize the GMM classes using t-SNE embedding.

      I also found many places where the manuscript needs clarification and /or more methodological details:<br /> • How many times was each of the stimulus conditions repeated? And how many times for the two naturalistic videos? What was the total duration of the experiments?

      The number of visual stimulus trials is provided in the last paragraph of the Methods section entitled Visual Stimuli. About 15 trials were recorded for each drifting grating stimulus, and about 20 trials were recorded for each naturalistic video.

      • Typo: Suit2p should be Suite2p (section Calcium image processing - Methods).

      We have fixed the typo.

      • What do the error bars in Figure 1E represent? Differences in group representation across areas from Figure 1E are mentioned in the text without any statistical testing.

      We have revised the Figure 1E (current Fig. 1F), and we now show all data points.

      • The manuscript would benefit from a comparison of the observed area-specific tuning biases across areas (Figure 1E and others) with the previous literature.

      We have included additional discussion on this in the last paragraph of the section entitled Visual cortical neurons form six tuning groups.

      • Why are inferred spike trains used to calculate NCs? Why can't dF/F be used? Do the results differ when using dF/F to calculate NC? Please clarify in the text.

      We believe inferred spike trains provide better resolution and make it easier to compare with quantitative values from electrical recordings. Notice that NC values computed using dF/F can be much larger than those computed by inferred spike trains. For example, see Smith & Hausser 2010 Nat Neurosci. Supplementary Figure S8.

      • The sentence seems incomplete or unclear: "That is, there are more high NC pairs that are in-group." Explicit vs what?

      We have revised this sentence.

      • Figure 1E is unclear to me. What is being plotted? Please add a color bar with the metric and the units for the matrix (left) and in the tuning curves (right panels). If the Y and X axes represent the different classes from the GMM, why are there more than 65 rows? Why is the matrix not full?

      We have revised this figure. Fig. 1D is the full 65 x 65 matrix. Fig. 1F has small 3x3 matrices mapping the responses to different TF and SF of gratings. We hope the new version is clearer.

      • How are receptive fields defined? How are their long and short axes calculated? How are their limits defined when calculating RF overlap?

      We have added further details in the Methods section entitled “Receptive field analysis”.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 Public:

      - The authors should carefully address the potential confounding of not counterbalancing the conditions of the first trial in both interoceptive tasks for the 9-month and 18-month age groups. The results of these groups could indeed be driven by having seen the synchronous trial first. 

      Upon addressing this comment, we noticed an error in our presentation scripts that resulted in a fixed-experimental design for most of the infants. Therefore, it is crucial to investigate the impact of the fixed-experimental design on our results. We have conducted extensive additional analyses comparing data from infants with the inadvertent fixed design to data from infants for whom the randomization was achieved as intended, which can be found in Supplementary Materials A. In summary, we do not find that the fixed order design had a strong impact on the findings, as we do not find that looking behavior differed systematically between different randomization orders, while also looking patterns across ages and tasks indicate that we were able to adequately capture variance associated with these features. Further, we have adapted the interpretation of the results across the manuscript to acknowledge the experimental error and its implications on the interpretation of the results.

      For instance, on pages 30 and 31 we have added the following paragraphs:

      “The data presented in this study holds several limitations. First, due to an error in our experimental scripts we unintentionally used a fixed-order design, in which almost all infants saw the same fixed order of condition (always starting with a synchronous trial), image assigned to condition, and location of the image (left/right) instead of a semi-randomized design. Such a fixed-order design holds several important limitations as visual preferences might be influenced by the experimental design, i.e., the first trial always being synchronous might have influenced a mean group preference. Further, we cannot rule out that mean group preferences were influenced by the stimuli used (as in most cases the same stimuli were used for synchronous/asynchronous trials) or by the location of the image in a given trial (left/right). Still, there is no strong theoretical argument as to why image used or location should have an impact on infants’ preferences. The stimuli were selected to be similar to each other, in order not to evoke a piori preferences. To further illustrate the impact of the fixed order design we have conducted several additional analyses, which can be found in Supplementary Materials A, which do not indicate that there was a strong impact of the fixed-order design. Specifically, we find no evidence for systematic differences between infants tested with the fixed design and infants tested with a randomized design.

      Despite these limitations fixed-order designs also hold advantages, as they are more suitable to investigate individual differences (Dang et al., 2020; Hedge et al., 2018). When each participant is exposed to the same procedure, individual differences are less likely to be attributed to effects of randomization but are more likely to reflect real differences between participants. Also, when considering the impact of the randomization, one must consider our results in relation to earlier studies (Maister et al. 2017, Weijs et al. 2022, Imafuku et al. 2023), some of which used the exact same stimuli as we did (Maister et al., 2017), with fully randomized designs. Results of these studies indicate no looking times differences depending on the stimulus assigned to each condition or systematic preferences for one of the stimuli.”

      - The conclusion that cardiac interoception remains stable across infancy is not fully warranted by the data. Given the small sample size of 18-month-old toddlers included in the final analyses, it might be misleading to state this without including the caveat that the study may be underpowered. In other words, the small sample size could explain the direction of the results for this age group. 

      We agree with the reviewer and explicitly acknowledge this issue now in the discission, p.  23: 

      “However, due to the small sample size at 18 months the results regarding changes and stability of interoceptive sensitivity in the second year of life must be considered speculative and need to be validated in further research.”

      Reviewer #1 (Recommendations For The Authors): 

      Below are some comments that the authors may wish to take into account: 

      - Why did the authors choose to apply different statistical analyses across the dataset (i.e. Bayesian t-test is used with the 3-month-old sample, whereas a paired t-test is used for the 9 and 18-month-olds)? 

      The use of different statistical analyses was driven by the timeline of the project, as we had to update our initial plans. Due to challenges related to the Covid-19 pandemic, it was not possible to recruit 3-month-old babies for out study at the time we started the data collection. Thus, we first collected the 9- and 18-month-olds, and the 3-month-olds later. For the 9- and 18-month-old samples we aimed at directly replicating the approach by Maister et al. (2017). However, for the 3-month-olds we wanted to focus more on classification of the strength of evidence in favor/against an effect, taking the results of the equivalence tests for the 9- and 18-month-olds into account.

      The following parts have been added to the manuscript to clarify our approach:

      Sample (p 33): “The 3-month-old sample was tested after completion of the 9- and 18-monthold samples. Initially, we had planned to start data collection with the 3-month-old sample.

      However, due to the Covid-19 pandemic this was not possible.”

      Statistical analysis (p. 41): “At 3 months we used a Bayesian paired t-test as the data collection was done after having collected the 9- and 18-month-old samples. Our intention in the analysis of the 3-month-old sample was to focus more strongly on strength of evidence in favor of/against an effect instead of a binary classification for/against an effect.”

      - I found the way in which sample sizes are reported a little unclear. This may be due to having the Results section before the Methods section (in line with journal requirements), but it would be helpful if the authors could clarify their sample size from the outset. For example, sample size for the 3-month-olds first says N = 80 (page 9), but then it becomes apparent that N = 53 completed the iBEAT and N = 40 completed the iBREATH. I think for the purpose of explaining the results, it might be more helpful to the reader to only know the final sample size and then specify recruited participants and dropout in the Methods. 

      We have adapted the description of sample sizes in the Results section. We now only refer to the number of infants included in a given analysis when reporting the results of the analysis. In addition, we have added the following clarification for the MEGA analysis (p. 11): “This approach allowed us to include 135 observations for the iBEATs from 125 infants, and 120 observations for the iBREATH from 107 infants. The sample size differs slightly from our preregistered approach given that we used the same preprocessing approach for the MEGAanalysis for all samples. “ 

      In addition, we now refer to the sample of the MEGA-analysis in the abstract, to make the understanding of our approach more intuitive.

      - I think the sentence "Interestingly, we find evidence for a positive relationship between cardiac and respiratory perception in our 18-month-old sample" at page 25 could be deleted given that the small sample size of 18-month-olds suggests this result should be interpreted with caution. The authors already explained this in the earlier paragraph (page 24) and simply re-stating this (weak) effect without further elaborating may not be necessary. 

      We have removed the sentence.

      - In multiple places in the manuscript, the authors hint at the association between interoception and certain social and self-related abilities (e.g. joint attention, mirror self-recognition), however, these are not fully elaborated on. Could the authors elaborate on the relation between mirror self-recognition and respiratory interoception (page 30)? Why would the ability to recognise the self-face be associated with the individual's ability to perceive their breathing pattern? How these two processes may be linked is not immediately obvious. 

      We have rephrased the sentence on page 30 to highlight that the increase in respiratory perception found in our results happens at a similar age as increases in other domains that might be related to interoception. “A hypothesis to be tested in future research is that developmental improvement in respiratory perception might be related to increases in other domains that show links to interoception. For instance, self-perception matures towards the end of the second year of life and has been conceptually related to interoception (Fotopoulou & Tsakiris, 2017; Musculus et al., 2021). Further, gross motor development may be considered in future research, which drastically matures in the first two years of life (WHO Multicentre Growth Reference Study Group, 2006) and has been shown to be related to respiratory function in children with cerebral palsy (Kwon & Lee, 2014).”

      - Aren't the 18-month-old infants effectively 19-month-olds? The mean age is 576.65 days, and the age window of recruitment was between 18 and 20 months. 

      We have added a sentence clarifying how we refer to the infants age ranges. “To stay coherent, we refer to each age group throughout the manuscript with regard to the lower end of the age range in which we included infants (e.g., we tested infants between 9 and 10 months, but refer to them as the 9-month-old group).”

      Reviewer #2 Public:

      Weaknesses: 

      (1) My primary concern is that this study did not counterbalance the conditions of the first trial in both iBEAT and iBREATH tests for the 9-month and 18-month age groups. In these tests, the first trial invariably involved a synchronous stimulus. I believe that the order of trials can significantly influence an infant's looking duration, and this oversight could potentially impact the results, especially where a marked preference for synchronous stimuli was observed among infants. 

      Upon conducting further analyses to address this comment, we noticed an error in our presentation scripts that resulted in the inadvertent use of a fixed-experimental design for most infants. Therefore, we have conducted extensive additional analysis which can be found in Supplementary Materials A. Specifically, we compared data from infants who were tested with the inadvertent fixed design to data from infants for whom the randomization was achieved as intended. Further, we have adapted the interpretation of the results across the manuscript to acknowledge the experimental error and its potential implications for the interpretation of the results.

      (2) The analysis indicated that the study's sample size was too small to effectively assess the effects within each age group. This limitation fundamentally undermines the reliability of the findings. 

      We have added a statement addressing this issue to the limitation section: “The reduced sample size might have impacted the statistical power to detect mean preferences for some age groups. Still, it must be noted that even the smaller sample sizes included were of similar size as used in previous studies on infant interoceptive sensitivity (Imafuku et al., 2023; Maister et al., 2017; Weijs et al., 2023).”

      (3) The authors attribute the infants' preferential-looking behavior solely to the effects of familiarity and novelty. However, the meaning of "familiarity" in relation to external stimuli moving in sync with an infant's heartbeat or breathing is not clearly defined. A deeper exploration of the underlying mechanisms driving this behavior, such as from the perspectives of attention and perception, is necessary. 

      We have adapted the respective paragraph in the discussion to clarify the term familiarity, and to also address that other aspects of attention and perception, might be relevant (p. 25): 

      “In this context familiarity might refer to the infant’s perception of congruence between internal signal and external stimuli which might drive the infant’s attention. Specifically, the synchronous condition should be easier to process due to the intersensory redundancy and predictability between interoceptive and external signals. “

      “However, it is important to consider that other cognitive and attentional mechanisms could also influence these responses.”

      Reviewer #2 (Recommendations For The Authors):  

      Introduction: 

      (1) The relevance of respiration to self-regulation and social interaction was not clearly described. 

      We have rephrased the relevant section to highlight that the increase in respiratory perception found in our results happens at a similar age as increases in other domains that might be related to interoception. “A hypothesis to be tested in future research is that developmental improvement in respiratory perception might be related to increases in other domains that show links to interoception. For instance, self-perception matures towards the end of the second year of life and has been conceptually related to interoception (Fotopoulou & Tsakiris, 2017; Musculus et al., 2021). Further, gross motor development may be considered in future research, which drastically matures in the first two years of life (WHO Multicentre Growth Reference Study Group, 2006) and has been shown to be related to respiratory function in children with cerebral palsy (Kwon & Lee, 2014).”

      (2) In the last line of page 5, it might be more appropriate to use the term "meta-cognitive awareness" instead of "meta-perception," as the latter can refer to a different concept. 

      We have changed the word as recommended. 

      (3) The authors predicted a positive correlation in sensitivity between the cardiac and respiratory domains, despite studies in adults suggesting these are not related. How did the authors arrive at this prediction, and how do they interpret the results showing a correlation only in 18-montholds, the age group closest to adults in this study? 

      We have elaborated on our reasoning for our prediction (p. 7): “Adult cardiac and respiratory interoception paradigms typically use two conceptually different paradigms. Thus, null results in the adult literature might be due to the unique characteristics of those paradigms.”

      Further, we have expanded on this result in the discussion (p. 24): “Still, we find a relationship between cardiac and respiratory signals in the oldest sample tested here, the 18-month-olds, which is closest to adults. Although this effect needs to be interpreted with caution due to the small sample size, this might indicate that using conceptually similar experimental paradigms might be a promising avenue to investigate relationships between different interoceptive modalities in adults.”

      Results: 

      (4) Please provide the descriptive statistics (means and standard deviations of looking time) for each independent condition, especially for the 18-month and 3-month age groups where this information is missing and only differences in looking times between conditions were mentioned. Furthermore, since the asynchronous condition includes both fast and slow stimuli, descriptive statistics for each should be included to help readers determine whether effects are due to synchronicity or stimulus speed. 

      We have added the information on mean and sd of looking times to synch and asynch trials to the results section. Mean looking times to both types of asynchronous trials can be found in supplementary materials C. We have added the information about standard deviations to this part. 

      (5) Regarding the MEGA analysis for iBEATs, where a main effect of condition was found (OR = 1.13, t(1769) = 2.541, p = .011), are these t-value and p-value based on the GLMM analysis, or did the authors conduct a separate t-test? This query arises because the p-value of the main effect differs from that in Table 2. Also, is it conventional to present GLMM results in the manner of Table 2, comparing specific level combinations (i.e., synchronous condition and 3month age group), instead of listing main effects and interactions? 

      Thank you very much for pointing out that the results of the GLMM were not reported as precise as possible, which might lead to confusion over the presented p-values. The main effect of condition refers to a post-hoc comparison using estimated marginal means from the GLMM across all age groups, while Table 2 refers to the main effect of condition for age group 3 months. 

      To make the results more accessible we have restructured parts of the manuscript following your suggestions: In the main manuscript we now focus on the interaction effects for condition and age, as well as the post hoc comparison, while we now report null-full model comparison, and tables for all age groups in the supplements. 

      We have added the following clarifying sentences to the manuscript, p. 12:

      “In reporting these results we focus on whether we found evidence for interactions between age groups, and whether we found evidence for a general effect across age groups. In-depth results and tables can be found in Supplementary Materials C. 

      […]

      Next, we computed post hoc comparisons using estimated marginal means from the MEGAanalysis across all age groups to investigate whether we find indications for a similar effect across ages.”

      (6) I am confused about the results indicating a significant effect of condition for the iBREATH dataset excluding 18-month-olds (Table 5, OR = 1.15, t(1050) = 2.397, p = .017), as the description in Table 5 suggests no statistical significance (p = .070). The decision to exclude the 18-month group seems arbitrary, particularly since the age-by-condition interaction was not significant in the GLMM across all three age groups. 

      Thank you very much for the comment, we have removed the analysis excluding the 18-month-old group

      (7) Regarding the relationship between cardiac and respiratory interoceptive sensitivity, the statement "However, we found a significant interaction between iBEATs scores and age at the 18-month level" (p16) seems unclear. Clarification is needed, as mentioning age interaction at a specific age stage is unusual. A pairwise comparison between 3 and 9 months should also be included. 

      Thank you for pointing out that the results could be presented more clearly! Similar to the other MEGA analyses we have put detailed tables of the results of the beta regression in the supplements and have kept a single table with the most important results in the main manuscript. Further, we have clarified the text passage as follows: “However, we found a significant interaction between the iBEATs scores and age, specifically comparing the 3- and 18-month-old groups (β = 3.13, SE = 1.41, p = .027). This interaction indicates that the relationship between iBEATs and iBREATH scores changes between 3 and 18 months of age.”  Also, we have now included a pairwise comparison between 3- and 9-month-olds. 

      Discussion: 

      (8) In pages 27-28, the authors discuss the results of the specification curve analysis, but there is no explanation for the 7th entry (statistical analysis) in Table 9. This entry seems particularly important. 

      We did not include an explanation for the 7th entry, as the impact of the statistical test used was comparatively less pronounced. However, to acknowledge this result we have added the following sentence to the discussion: “Moreover, the statistical test used (paired t-test vs linear mixed model, Table 9, 7th entry) had a rather small impact on the results. However, given the large number of analyses conducted, this might be related to not being able to precisely formulate the model to fit the complexity of the data for each specification.”

      Methods: 

      (9) What were the colors of the stimuli? 

      We have added the colors of the stimuli to the methods section. Further, the stimuli can be found in the osf project associated with the manuscript.

      (10) The percentage of trials excluded during preprocessing should be stated. Additionally, the number of trials included in the statistical analyses for each condition (including synchronous, fast, and slow) should be detailed separately. 

      We have added information on numbers of trials completed and included in Table 7.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Amason et al. investigated the formation of granulomas in response to Chromobacterium violaceum infection, aiming to uncover the cellular mechanisms governing the granuloma response. They identify spatiotemporal gene expression of chemokines and receptors associated with the formation and clearance of granulomas, with a specific focus on those involved in immune trafficking. By analyzing the presence or absence of chemokine/receptor RNA expression, they infer the importance of immune cells in resolving infection. Despite observing increased expression of neutrophil-recruiting chemokines, treatment with reparixin (an inhibitor of CXCR1 and CXCR2) did not inhibit neutrophil recruitment during infection. Focusing on monocyte trafficking, they found that CCR2 knockout mice infected with C. violaceum were unable to form granulomas, ultimately succumbing to infection.

      The spatial transcriptomics data presented in the figures could be considered a valuable resource if shared, with the potential for improved and clarified analyses. The primary conclusion of the paper, that C. violaceum infection in the liver cannot be contained without macrophages, would benefit from clarification.

      We thank the reviewer for their time and effort in evaluating our manuscript.

      While the spatial transcriptomic data generated in the figures are interesting and valuable, they could benefit from additional information. The manual selection of regions of granulomas for analysis could use additional context - was the rest of the liver not sequenced, or excluded for other reasons? Including a healthy liver in the analysis could serve as a control for any lasting effects at the final time point of 21 days.

      We revised the text in the methods section to include additional information about manual selection of regions. The entire tissue section was sequenced, but using H&E as a guide, we manually selected each representative lesion and a surrounding layer of healthy hepatocytes at each timepoint. We agree that an uninfected control could be useful, however we did not include an uninfected mouse in the experiment because we were most interested in the cells that make up the granuloma, not hepatocytes outside the lesion. Additionally, we find that in the 21 DPI timepoint the surrounding hepatocytes appear to have returned to a homeostatic transcriptional state; at 21 DPI the majority of mice have undetectable CFU burdens.

      Providing more context for the scalebars throughout the spatial analyses, such as whether the data are raw counts or normalized based on the number of reads per spatial spot, would be helpful for interpretation, as changes in expression could signal changes in the numbers of cells or changes in the gene expression of cells.

      The scalebars for the SpatialFeaturePlots display the normalized gene expression values. The data are normalized based on the number of reads per spatial spot, using the sctransform method published in (Hafemeister & Satija, 2019). We agree that the changes in expression could result from changes in cell numbers and/or changes in gene expression on a per cell basis. However, the sctransform method is designed to preserve biological variation while minimizing technical effects observed in transcriptomics platforms. Regardless of the heterogeneity of sequencing depth, it is clear from these plots that gene expression changes dynamically over time and space, which was the focus of our analysis. We have updated the figure legends to clarify scalebar units, and revised the methods section. 

      In Figure 4, qualitative measurements are valuable, but having an idea of the raw data for a few of the pursued chemokines/receptors would aid interpretation

      All of the SpatialFeaturePlots utilized to generate Figure 4 have been included in the manuscript, either in the main figures or in the supplemental figures. For example, the SpatialFeaturePlots of Cxcl4, Cxcl9, and Cxcl10 are all in Figure 4 – figure supplement 1.

      In Figure 4 it would also be beneficial to clarify whether the reported values are across all clusters and consider focusing on clusters with the greatest change in expression.

      Figure 4 summarizes the expression of each gene at each timepoint for the entire selected area, independently of cluster identity. Different clusters do show variability in the relative change in expression. To better show these data, we have included an additional graphic that summarizes the top twenty upregulated genes for each cluster, many of which include chemokines (new Table 4). The average log2FC values for each of these genes can be found in Table 4 – source data 1.   

      Figures 5E and F would benefit from clarification regarding the x-axis units and whether the expression levels are summed across all clusters for each time point

      Figures 5E and 5F display the normalized gene expression values for all spots (independent of cluster identity) at each timepoint. We have updated the figure legend to reflect this clarification.

      Additionally, information on the sequencing depth of the samples would be helpful, particularly as shallow sequencing of RNA can result in poor capture of low-expression transcripts.

      We agree with the reviewer that sequencing depth is an additional factor to take into consideration. We have included an additional supplemental figure (Figure 1 – figure supplement 1A-B) to display raw counts spatially at the various timepoints, and within each cluster.

      Regarding the conclusion of the essentiality of macrophages in granuloma formation, it may be prudent to further investigate the role of macrophages versus CCR2. Consideration of experiments deleting macrophages directly, instead of CCR2, could provide more definitive evidence of the necessity of macrophage migration in containing infections.

      While CCR2 is expressed on a number of other cells besides monocytes, it is well-documented that loss of CCR2 results in accumulation of monocytes in the bone marrow and a significant reduction in the blood-monocyte population. As a result, monocytes are not recruited to the site of infection in numerous prior publications in the field; we confirm this as shown by flow cytometry and IHC. Nonetheless, future studies will aim to rescue Ccr2–/– mice via adoptive transfer of monocytes to further show that monocyte-derived macrophages are essential for defense against infection. We also intend to perform clodronate depletion experiments at various timepoints, however, clodronate will also deplete Kupffer cells and has off-target effects on neutrophils. Overall, the established importance of CCR2 for monocyte egress from the bone marrow and our observation that the macrophage ring fails to form give us sufficient confidence to conclude that monocyte-derived macrophages are essential for this innate granuloma.

      Analyzing total cell counts in the liver after infection could provide insight into whether the decrease in the fraction of macrophages is due to decreased numbers or infiltration of other cell types...

      Our flow data suggest that the decrease in macrophages in Ccr2–/– mice is due to both a decrease in macrophage number and an increase in the infiltration of other cell types (namely neutrophils). To better illustrate this, we now include an additional quantification of the total cell counts in the liver and spleen (new Figure 6 – figure supplement 1), which supports our conclusion that Ccr2–/– mice have a defect in granuloma macrophage numbers. We have also repeated the experiment to reach sufficient numbers to perform statistical analysis (revised Figure 6F–K).

      Reviewer #2 (Public Review):

      Summary:

      In this study, Amason et al employ spatial transcriptomics and intervention studies to probe the spatial and temporal dynamics of chemokines and their receptors and their influence on cellular dynamics in C. violaceum granulomas. As a result of their spatial transcriptomic analysis, the authors narrow in on the contribution of neutrophil- and monocyte-recruiting pathways to host response. This results in the observation that monocyte recruitment is critical for granuloma formation and infection control, while neutrophil recruitment via CXCR2 may be dispensable.

      We thank the reviewer for their thoughtful comments and suggestions.

      Strengths:

      Since C. violaceum is a self-limiting granulomatous infection, it makes an excellent case study for 'successful' granulomatous inflammation. This stands in contrast to chronic, unproductive granulomas that can occur during M. tuberculosis infection, sarcoidosis, and other granulomatous conditions, infectious or otherwise. Given the short duration of C. violaceum infection, this study specifically highlights the importance of innate immune responses in granulomas.

      Another strength of this study is the temporal analysis. This proves to be important when considering the spatial distribution and timing of cellular recruitment. For example, the authors observe that the intensity and distribution of neutrophil- and monocyte-recruiting chemokines vary substantially across infection time and correlate well with their previous study of cellular dynamics in C. violaceum granulomas.

      The intervention studies done in the last part of the paper bolster the relevance of the authors' focus on chemokines. The authors provide important negative data demonstrating the null effect of CXCR1/2 inhibition on neutrophil recruitment during C. violaceum infection. That said, the authors' difficulty with solubilizing reparixin in PBS is an important technical consideration given the negative result...

      We agree with the reviewer, and the limited solubility of reparixin and other chemokine-receptor inhibitors is a major caveat of this study and others in the field. In future studies, there are several other inhibitors that could be used to further assess the role of CXCR1/2.

      On the other hand, monocyte recruitment via CCR2 proves to be indispensable for granuloma formation and infection control. I would hesitate to agree with the authors' interpretation that their data proves macrophages are serving as a physical barrier from the uninvolved liver. It is possible and likely that they are contributing to bacterial control through direct immunological activity and not simply as a structural barrier.

      We agree that macrophages do not form a physical or structural barrier, a word that implies epithelial-like function. Instead, we agree that macrophages mostly act immunologically. We revised the text to remove the term barrier.

      Weaknesses:

      There are several shortcomings that limit the impact of this study. The first is that the cohort size is very limited. While the transcriptomic data is rich, the authors analyze just one tissue from one animal per time point. This assumes that the selected individual will have a representative lesion and prevents any analysis of inter-individual variability.

      Granulomas in other infectious diseases, such as schistosomiasis and tuberculosis, are very heterogeneous, both between and within individuals. It will be difficult to assert how broadly generalizable the transcriptomic features are to other C. violaceum granulomas...

      We thank the reviewers for highlighting this key difference between granulomas in other infectious diseases, and granulomas induced by C. violaceum. Based on many prior experiments, we observe that C. violaceum-induced granulomas are very reproducible between and within individuals (highlighted in our previous publication). As this is a major advantage of this model system, we chose specific timepoints based on key events that consistently occur in the majority of lesions assessed at each timepoint, allowing us to be confident in the selection of representative granulomas. However, it is worth noting that granulomas within an individual mouse are seeded and resolved somewhat asynchronously. This did indeed affect our spatial transcriptomic data, as the 7 DPI timepoint was not histologically representative of a typical 7 DPI granuloma. Therefore, we excluded the 7 DPI timepoint from our analyses.

      Furthermore, this undermines any opportunity for statistical testing of features between time points, limiting the potential value of the temporal data.

      We agree with the reviewer that there is much more characterization and quantification that can be done. As demonstrated by the abundance of spatial and temporal data for the chemokine family alone, the spatial transcriptomics dataset is rich and will likely supply us with many years of analyses and investigations. Our current approach is to use the spatial transcriptomics dataset as a hypothesis-generating tool, followed by in vivo studies that seek to uncover physiological relevance for our observations. In the current paper, the strength of the spatial transcriptomic data for CCL2, CCL7 and their receptor CCR2 prompted us to study Ccr2–/– mice. These mice then prove the relevance of the spatial transcriptomic data. In regard to conclusions about temporal changes in chemokine expression, in this manuscript we do not make conclusions that CCL2 is important at one timepoint but not another. We are characterizing the broad temporal trends of expression in order to cast a broad net to inform future in vivo studies. There is much work for us to do to explore all the induced chemokines and their receptors.

      Another caveat to these data is the limited or incompletely informative data analysis. The authors use Visium in a more targeted manner to interrogate certain chemokines and cytokines. While this is a great biological avenue, it would be beneficial to see more general analyses considering Visum captures the entire transcriptome. Some important questions that are left unanswered from this study are:

      What major genes defined each spatial cluster?...

      The initial characterization of each spatial cluster was performed in Harvest et al., 2023. In brief, we used a mixture of published single-cell sequencing data, histological-based parameters, and ImmGen to define each cluster. We have not re-stated those methods in the current manuscript, but instead reference our prior paper.

      What were the top differentially expressed genes across time points of infection?...

      Though the top differentially expressed genes for each cluster can be informative in some situations, we chose a more targeted approach because of the obvious importance of chemokines. Nonetheless, we have included an additional graphic that summarizes the top twenty upregulated genes for each cluster (new Table 4). The average log2FC values for each of these genes can be found in Table 4 – source data 1.  

      Did the authors choose to focus on chemokines/receptors purely from a hypothesis perspective or did chemokines represent a major signature in the transcriptomic differences across time points?

      We chose to focus on chemokines because of their obvious importance for recruitment of immune cells. They were also among the highest induced genes in the spatial transcriptome (new Table 4).

      In addition to the absence of deep characterization of the spatial transcriptomic data, the study lacks sufficient quantitative analysis to back up the authors' qualitative assessments...

      See above comment regarding statistical comparisons.

      Furthermore, the authors are underutilizing the spatial information provided by Visium with no spatial analysis conducted to quantify the patterning of expression patterns or spatial correlation between factors.

      Several factors make quantification challenging. Lesions grow considerably in size in the first few days of infection, and then shrink in size in the latter days. This makes quantification challenging between timepoints. Radial quantification is also challenging due to the irregular shapes of each granuloma (see comment below for further discussion). Most importantly, the key next experiments are to validate the importance of each chemokine and receptor in vivo. Once we know which ones are the most important, this will justify putting more effort into spatial quantitative analysis and patterning of expression for those chemokines. 

      Impact:

      The author's analysis helps highlight the chemokine profiles of protective, yet host protective granulomas. As the authors comment on in their discussion, these findings have important similarities and differences with other notable granulomatous conditions, such as tuberculosis. Beyond the relevance to C. violaceum infection, these data can help inform studies of other types of granulomas and hone candidate strategies for host-directed therapy strategies.

      Recommendations for the authors:

      Reviewer #2 (Recommendations For The Authors):

      The Visium analysis would be strengthened by

      (1) Showing several histology examples of granulomas at each timepoint to help aid the reader in seeing how 'representative' each Visium sample is...

      These histological analyses are performed in our previous manuscript, and indeed were a crucial aspect of the initial characterization of the spatial transcriptomics dataset, which was performed in Harvest et al., 2023. Full liver sections are shown in that paper at each timepoint, and readers can see that the architecture is highly reproducible.

      (2) Validating their results in other tissues, either with Visium or with more targeted assays for their study's key molecules, such as immunohistochemistry or in situ hybridization

      We agree on the importance of validation studies and have plans to perform single-cell RNA sequencing experiments to further enhance resolution. With key genes in mind, we then plan to perform more in vivo studies to assess physiological relevance of upregulated genes in specific cell types.

      At the very least it would be important to validate the expression of CXCL1 and CXCL2 in other tissues and at the protein level, given the importance of those findings

      We think that the reviewer is asking us to validate that CXCL1 and CXCL2 are actually expressed given the negative reparixin data. However, if we do prove that they are expressed, this will not resolve whether they have critical roles in neutrophil recruitment. To prove this, we would need either a better CXCR2 inhibitor or Cxcr2 knockout mice. Therefore, we are saving further exploration for the future. Regarding validating other chemokines, we establish that CCR2 is critical, and we now show by immunofluorescence and ELISA (new Figure 7 – figure supplement 4) that CCL2 is highly expressed in WT mice, and Ccr2–/– mice actually have strongly elevated CCL2 expression at 3 DPI compared to WT mice.

      In Figure 1B, the UMAP here is largely uninformative. To display the clusters, the authors should instead show a heatmap or equivalent visualization of which genes defined each cluster. It would be helpful for the authors to also write out the full name of each cluster before using the abbreviations shown.

      Please see our previous comment about the initial characterization of clusters performed in Harvest et al., 2023, which details the characteristic genes for each cluster. We have written the full names of each cluster in the legend of Figure 1.

      In Figure 1C the authors, use a binary representation of whether a cluster is present or not at a particular time point. However, the spot size is arbitrary, and the colors of the dots are the same as the cluster color code. It is not clear what threshold the authors (or SpatialDimPlots) use to declare a given cluster is present at a given time point. Therefore, this chart does not give any sense of the extent of each cluster's presence at each time. The authors should revisualize these data to display the abundance of each cluster at each timepoint. This could simply be done by adjusting the size of the circle or using a more traditional heatmap.

      We have now updated this graphic to display the extent of a cluster’s presence, with the size of each dot corresponding to the abundance of each cluster.

      In Figures 2 and 3 the authors describe the kinetics of each chemokine by cluster. While the dynamic expression is evident in the images, it is challenging to determine which clusters are driving expression in the absence of cluster annotation in those figures. The authors should support their visual findings with quantification of each factor in each cluster across time points.

      In Figure 5, violin plots are shown for Cxcl1 and Ccl2 that depict gene expression by each cluster. However, because each capture area is approximately 50 µm in diameter, the data do not achieve single-cell resolution and are not as informative as one would hope. Therefore, violin plots for each chemokine were not shown, though we have generated these graphics. We did not add these graphics to the revision because we did not think readers would generally want to see several pages of violin plots in the supplement. As mentioned, we plan to do single-cell RNA sequencing to further assess chemokine expression by each cell type present within the granulomas at key timepoints.

      With respect to the lack of spatial analysis, the authors describe certain transcript signals (ie. peripheral region versus central region of the granuloma) across each lesion. To back up these qualitative assertions, the authors could use line profiles from the center of each granuloma to the outside to plot the variation in expression of each transcript over radial space. This would provide a more direct way to determine the spatial coordination between various transcripts.

      We considered using line profiles to quantify spatial variation within each lesion at each timepoint. However, this was exceptionally challenging due to the asymmetrical nature of some lesions, and the size discrepancy at different timepoints as the granulomas grow (during infection) and shrink (during resolution). When attempting to decide where to draw the line profiles, we determined that this approach did not enhance our analyses beyond using the cluster overlay and H&E to identify and interrogate different clusters.

      The data visualization in Figure 4 seems unnecessarily confusing. The authors put the transcriptomic signal into categories of 'absent', 'low', 'medium', and 'high.' Why not simply use a continuous scale? The data would also benefit from hierarchical clustering of the heatmap rows to highlight chemokines and their receptors with similar expression patterns across time.

      We considered using a continuous scale as suggested by the reviewer. However, we chose not to create a continuous scale because quantitation is challenging due to the size changes in the lesions over time, such that larger lesions have greater inclusion of surrounding hepatocytes as well as necrotic cores, which would dilute the signal if averaged with the active immunologic granuloma zones. Figure 4 was intended to simplify the entirety of the SpatialFeaturePlots in an easy-to-digest manner, to aid in hypothesis generation as we consider the potential function of each chemokine and receptor in this model. We chose to organize each chemokine ligand based on family, maintaining a numerical order to allow Figure 4 to serve as a quick reference for anyone who is interested in a particular chemokine ligand or receptor.

      Do the authors feel confident in the transcriptomic signal coming from regions of necrosis? Given that many of their bright signals are coming from within clusters annotated as necrosis or necrosis-adjacent this raises an important technical consideration. Can the authors use the H&E image to estimate the cellular density (based on nuclear counts) in each region annotated by Visium? Are there any studies supporting the accurate performance of spatial transcriptomic methods in necrosis? Necrosis can be a source of non-specific binding during in situ hybridization assays.

      The reviewer raises a good point. A defining characteristic of the areas of necrosis is the lack of defined cell borders, with faded or absent nuclei. In these regions, it is impossible to estimate cellular density. Given these concerns, we have included an additional figure (new Figure 1 – figure supplement 1A-B) to display raw counts in each cluster across all timepoints. Though regions of necrosis do display lower read quantity compared to other areas, we are still confident in the positive transcriptomic signal coming from adjacent regions because there are plenty of negative examples in which expression is not detected. In other words, temporal and spatial upregulation of key genes is still observed in the tissues, and future experiments will aim to interrogate the physiological relevance of each gene, while validating the spatial transcriptomics data with other methodologies.

      The methods should include a much more detailed description of the tissue preparation and collection for the Visium experiment. The section on the computational analysis of the Visium data is also extremely limited. At a minimum, the authors should include details on how they performed clustering of the Visium regions.

      The detailed description of tissue preparation, computational analysis, and clustering is in our previous manuscript, from which this dataset originates. We can add a direct quote of the methodology if the reviewer requests.

      The cluster labels in Figure 5 A-B are very difficult to see. Furthermore, it would help if the authors displayed the annotated cluster names (ie. Those shown in 5C) instead of their numerical coding for a more direct interpretation of the data.

      We agree and have updated this figure with annotated cluster names.

      The scale bars in Figure 7 are very difficult to see.

      The scale bars in histology images were kept small intentionally so as not to occlude data, and eLife is an online-only, digital media platform which allows readers to sufficiently zoom on high-resolution histology images. We have increased the DPI resolution for histology images to further aid in visualization.

      The information presented in Tables 2 and 3 is greatly appreciated and will really help guide the reader through the analyses.

      We assembled this information for our own learning about chemokines and hope that it is useful for the reader.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      It is suggested that for each limb the RG (rhythm generator) can operate in three different regimes: a non-oscillating state-machine regime, and in a flexor driven and a classical half-center oscillatory regime. This means that the field can move away from the old concept that there is only room for the classic half-center organization

      Strengths:

      A major benefit of the present paper is that a bridge was made between various CPG concepts ( "a potential contradiction between the classical half-center and flexor-driven concepts of spinal RG operation"). Another important step forward is the proposal about the neural control of slow gait ("at slow speeds ({less than or equal to} 0.35 m/s), the spinal network operates in a state regime and requires external inputs for phase transitions, which can come from limb sensory feedback and/or volitional inputs (e.g. from the motor cortex").

      Weaknesses:

      Some references are missing

      We thank the Reviewer for the thoughtful and constructive comments. We have added additional text to meet the specific Reviewer’s recommendations and several references suggested by the Reviewer.  

      Reviewer #2 (Public Review):

      Summary:

      The biologically realistic model of the locomotor circuits developed by this group continues to define the state of the art for understanding spinal genesis of locomotion. Here the authors have achieved a new level of analysis of this model to generate surprising and potentially transformative new insights. They show that these circuits can operate in three very distinct states and that, in the intact cord, these states come into successive operation as the speed of locomotion increases. Equally important, they show that in spinal injury the model is "stuck" in the low speed "state machine" behavior.

      Strengths:

      There are many strengths for the simulation results presented here. The model itself has been closely tuned to match a huge range of experimental data and this has a high degree of plausibility. The novel insight presented here, with the three different states, constitutes a truly major advance in the understanding of neural genesis of locomotion in spinal circuits. The authors systematically consider how the states of the model relate to presently available data from animal studies. Equally important, they provide a number of intriguing and testable predictions. It is likely that these insights are the most important achieved in the past 10 years. It is highly likely proposed multi-state behavior will have a transformative effect on this field.

      Weaknesses:

      I have no major weaknesses. A moderate concern is that the authors should consider some basic sensitivity analyses to determine if the 3 state behavior is especially sensitive to any of the major circuit parameters - e.g. connection strengths in the oscillators or?

      We thank the Reviewer for the thoughtful and constructive comments. The sensitivity analysis has been included as Supplemental file.

      Reviewer #3 (Public Review):

      Summary:

      This work probes the control of walking in cats at different speeds and different states (split-belt and regular treadmill walking). Since the time of Sherrington there has been ongoing debate on this issue. The authors provide modeling data showing that they could reproduce data from cats walking on a specialized treadmill allowing for regular and split-belt walking. The data suggest that a non-oscillating state-machine regime best explains slow walking - where phase transitions are handled by external inputs into the spinal network. They then show at higher speeds a flexor-driven and then a classical halfcenter regime dominates. In spinal animals, it appears that a non-oscillating state-machine regime best explains the experimental data. The model is adapted from their previous work, and raises interesting questions regarding the operation of spinal networks, that, at low speeds, challenge assumptions regarding central pattern generator function. This is an interesting study. I have a few issues with the general validity of the treadmill data at low speeds, which I suspect can be clarified by the authors.

      Strengths:

      The study has several strengths. Firstly the detailed model has been well established by the authors and provides details that relate to experimental data such as commissural interneurons (V0c and V0d), along with V3 and V2a interneuron data. Sensory input along with descending drive is also modelled and moreover the model reproduces many experimental data findings. Moreover, the idea that sensory feedback is more crucial at lower speeds, also is confirmed by presynaptic inhibition increasing with descending drive. The inclusion of experimental data from split-belt treadmills, and the ability of the model to reproduce findings here is a definite plus.

      Weaknesses:

      Conceptually, this is a very useful study which provides interesting modeling data regarding the idea that the network can operate in different regimes, especially at lower speeds. The modelling data speaks for itself, but on the other hand, sensory feedback also provides generalized excitation of neurons which in turn project to the CPG. That is they are not considered part of the CPG proper. In these scenarios, it is possible that an appropriate excitatory drive could be provided to the network itself to move it beyond the state-machine state - into an oscillatory state. Did the authors consider that possibility? This is important since work using L-DOPA, for example, in cats or pharmacological activation of isolated spinal cord circuits, shows the CPG capable of producing locomotion without sensory or descending input.

      We thank the Reviewer for the thoughtful and constructive comments. We have added additional texts, references, and discussed the issues raised by the Reviewer. Particularly, in section “Model limitations and future directions” we now admit that afferent feedback can provide some constant level excitation to the RG circuits after spinal transection which can partly compensate for the lack of supraspinal drive and hence affect (shift) the timing of transitions between the considered regimes. We mentioned that this is one of the limitations of the present model. The potential effects of neuroactive drugs, like DOPA, on CPG circuits after spinal transection were left out because they are outside the scope of the present modeling studies.    

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      specific feedback to the authors:

      Nevertheless, there are some minor points, worth considering.

      Link to HUMAN DATA

      Here the authors may be interested to know that human data supports their proposal. This is relevant since there is ample evidence for the operation of spinal CPG's in humans (Duysens and van de Crommert,1998). The present model predicts that the basic output of the CPG remains even at very slow speeds, thus leading to similarity in EMG output. This prediction fits the experimental data (den Otter AR, Geurts AC, Mulder T, Duysens J. Speed related changes in muscle activity from normal to very slow walking speeds. Gait Posture. 2004 Jun;19(3):270-8). To investigate whether the basic CPG output remains basically the same even at very slow speeds (as also predicted by the current model), humans walked slowly on a treadmill (speeds as slow as 0.28 m s−1). Results showed that the phasing of muscle activity remained relatively stable over walking speeds despite substantial changes in its amplitude. Some minor additions were seen, consistent with the increased demands of postural stability. Similar results were obtained in another study: Hof AL, Elzinga H, Grimmius W, Halbertsma JP. Speed dependence of averaged EMG profiles in walking. Gait Posture. 2002 Aug;16(1):78-86. doi:

      10.1016/s0966-6362(01)00206-5. PMID: 12127190.

      These authors wrote: "The finding that the EMG profiles of many muscles at a wide range of speeds can be represented by addition of few basic patterns is consistent with the notion of a central pattern generator (CPG) for human walking". The basic idea is that the same CPG can provide the motor program at slow and fast speeds but that the drive to the CPG differs. This difference is accentuated under some conditions in pathology, such as in Parkinson's Kinesia Paradoxa. It was argued that the paradox is not really a paradox but is explained as the CPGs are driven by different systems at slow and at fast speeds (Duysens J, Nonnekes J. Parkinson's Kinesia Paradoxa Is Not a Paradox. Mov Disord. 2021 May;36(5):1115-1118. doi: 10.1002/mds.28550. Epub 2021 Mar 3. PMID: 33656203.)

      These ideas are well in line with the current proposal ("Based on our predictions, slow (conditionally exploratory) locomotion is not "automatic", but requires volitional (e.g. cortical) signals to trigger stepby-step phase transitions because the spinal network operates in a state-machine regime. In contrast, locomotion at moderate to high speeds (conditionally escape locomotion) occurs automatically under the control of spinal rhythm-generating circuits receiving supraspinal drives that define locomotor speed, unless voluntary modifications or precise stepping are required to navigate complex terrain").

      As mentioned in the present paper, other examples exist from pathology ("...Another important implication of our results relates to the recovery of walking in movement disorders, where the recovered pattern is generally very slow. For example, in people with spinal cord injury, the recovered walking pattern is generally less than 0.1 m/s and completely lacks automaticity 77-79. Based on our predictions, because the spinal locomotor network operates in a state-machine regime at these slow speeds, subjects need volition, additional external drive (e.g., epidural spinal cord stimulation) or to make use of limb sensory feedback by changing their posture to perform phase transitions"). As mentioned above, another example is provided by Parkinson's disease. The authors may also be interested in work on flexible generators in SCI: Danner SM, Hofstoetter US, Freundl B, Binder H, Mayr W, Rattay F, Minassian K. Human spinal locomotor control is based on flexibly organized burst generators. Brain. 2015 Mar;138(Pt 3):577-88. doi: 10.1093/brain/awu372. Epub 2015 Jan 12. PMID: 25582580; PMCID: PMC4408427.

      We thank the reviewer for these additional and interesting insights. We added a new paragraph in the Discussion to bolster the link with human data that includes references suggested by the Reviewer.

      CHAIN OF REFLEXES

      It reads: "... in opposition to the previously prevailing viewpoint of Charles Sherrington 21,22 that locomotion is generated through a chain of reflexes, i.e., critically depends on limb sensory feedback (reviewed in 23)." This is correct but incomplete. The reference cited (23: Stuart, D.G. and Hultborn, H, "Thomas Graham Brown (1882--1965), Anders Lundberg (1920-), and the neural control of stepping," Brain Res. Rev. 59(1), 74-95 (2008)) actually reads: "Despite the above findings, the doctrinaire position in the early 1900s was that the rhythm and pattern of hind limb stepping movements was attributable to sequential hind limb reflexes. According to Graham Brown (1911c) this viewpoint was largely due to the arguments of Sherrington and a Belgian physiologist, Maurice Philippson (1877-1938). Philippson studied stepping movements in chronically maintained spinal dogs, using techniques he had acquired in the Strasbourg laboratory of the distinguished German physiologist, Friedrich Goltz (1834-1902). He also analyzed kinematically moving pictures of dog locomotion, which had been sent to him by the renowned French physiologist, Etienne-Jules Marey (1830-1904). Philippson (1905) certainly presented arguments explaining his perception of how sequential spinal reflexes contributed to the four phases of the step cycle (see Fig. 1 in Clarac, 2008). In retrospect, it is likely that Graham Brown was correct in attributing to Philippson and Sherrington the then-prevailing viewpoint that reflexes controlled spinal stepping. It is puzzling, nonetheless, that far less was said then and even now about Philippson's belief that the spinal control was due to a combination of central and reflex mechanisms (Clarac, 2008),4,5 4 We are indebted to François Clarac for drawing to our attention Philippson's statement on p. 37 of his 1905 article that "Nos expériences prouvent d'une part que la moelle lombaire séparée du reste de l'axe cérébro-spinal est capable de produire les mouvements coordonnés dans les deux types de locomotion, trot et gallop. [Our experiments prove that one side of the spinal cord separated from the cerebro-spinal axis is able to produce coordinated movements in two types of locomotion, trot and gallop]." Then, on p. 39 Philippson (1905) states that "Nous voyons donc, en résumé que la coordination locomotrice est une fonction exclusivement médullaire, soutenue d'une part par des enchainements de réflexes directs et croisés, dont l'excitant est tantot le contact avec le sol, tantot le mouvement même du membre. [In summary, we see that locomotor coordination is an exclusive function of the spinal cord supported by a sequencing of direct and crossed reflexes, which are activated sometimes by contact with the ground and sometimes even by leg movement]. A coté de cette coordination basée sur des excitations périphériques, il y a une coordination centrale provenant des voies d'association intra-médullaires. [In conjunction with this peripherally excited coordination, there is a central coordination arising from intraspinal pathways]." (The English translations have also been kindly supplied by François Clarac.) Clearly, Philippson believed in both a central spinal and a reflex control of stepping! 5 In part 1 of his 1913/1916 review Graham Brown discussed Philippson's 1905 article in much detail (pp. 345-350 in Graham Brown, 1913b). He concludes with the statement that "... Philippson die wesentlichen Factoren des Fortbewegungsaktes in das exterozeptive Nervensystem verlegt. Er nimmt an, dass die zyklischen Bewegungen automatisch durch äussere Reize erhalten werden, welche in sich selbst thythmisch als Folge der Reflexakte welche sie selbst erzeugen, wiederholt werden. [Philippson assigns the important factors of the act of locomotion to the exteroceptive nervous system. He assumes that the cyclic movements are automatically maintained by external stimuli which, by themselves, are rhythmically repeated as a consequence of the reflexive actions that they generate themselves]." (English translation kindly supplied by Wulfila Gronenberg). This interpretation clearly ignores Philippson's emphasis on a central spinal component in the control of stepping....). "

      Hence it is a simplification to give all credits to Sherrington and ignoring the role of Philippson concerning the chain of reflexes idea.

      We again thank the Reviewer for these additional and interesting insights. We added the Philippson (1905) and Clarac (2008) references. The important contribution of Philippson is now indicated.

      GTO Ib feedback

      It reads: "This effect and the role of Ib feedback from extensor afferents has been demonstrated and described in many studies in cats during real and fictive locomotion 2,57-59."

      These citations are appropriate but it is surprising to see that the Hultborn contribution is limited to the Gossard reference while the even more important earlier reference to Conway et al is missing (Conway BA, Hultborn H, Kiehn O. Proprioceptive input resets central locomotor rhythm in the spinal cat. Exp Brain Res. 1987;68(3):643-56. doi: 10.1007/BF00249807. PMID: 3691733).

      Yes, the Conway et al. reference has been added.

      Other species

      The authors may also look at other species. The flexible arrangement of the CPGs, as described in this article, is fully in line with work on other species, showing cpg networks capable to support gait, but also scratching, swimming ..etc (Berkowitz A, Hao ZZ. Partly shared spinal cord networks for locomotion and scratching. Integr Comp Biol. 2011 Dec;51(6):890-902. doi: 10.1093/icb/icr041. Epub 2011 Jun 22. PMID: 21700568. Berkowitz A, Roberts A, Soffe SR. Roles for multifunctional and specialized spinal interneurons during motor pattern generation in tadpoles, zebrafish larvae, and turtles. Front Behav Neurosci. 2010 Jun 28;4:36. doi: 10.3389/fnbeh.2010.00036. PMID: 20631847; PMCID: PMC2903196.)

      Similar ideas about flexible coupling can also be found in: Juvin L, Simmers J, Morin D. Locomotor rhythmogenesis in the isolated rat spinal cord: a phase-coupled set of symmetrical flexion extension oscillators. J Physiol. 2007 Aug 15;583(Pt 1):115-28. doi: 10.1113/jphysiol.2007.133413. Epub 2007 Jun 14. PMID: 17569737; PMCID: PMC2277226. Or zebrafish: Harris-Warrick RM. Neuromodulation and flexibility in Central Pattern Generator networks. Curr Opin Neurobiol. 2011 Oct;21(5):685-92. doi: 10.1016/j.conb.2011.05.011. Epub 2011 Jun 7. PMID: 21646013; PMCID: PMC3171584.

      We added a sentence in the Discussion along with supporting references.

      Standing

      In the view of the present reviewer, the model could even be extended to standing in humans. It reads: "at slow speeds ({less than or equal to} 0.35 m/s), the spinal network operates in a state regime and requires external inputs"; similarly (personal experience) when going from sit to stand: as soon as weight is over support, extension is initiated and the body raises, as one would expect when the extensor center is activated by reinforcing load feedback, replacing GTO inhibition (Faist M, Hoefer C, Hodapp M, Dietz V, Berger W, Duysens J. In humans Ib facilitation depends on locomotion while suppression of Ib inhibition requires loading. Brain Res. 2006 Mar 3;1076(1):87-92. doi:

      Yes, we agree that the model could be extended to standing and the transition from standing to walking is particularly interesting. However, for this paper, we will keep the focus on locomotion over a range of speeds.

      Reviewer #2 (Recommendations For The Authors):

      The presentation is exceedingly well done and very clear.

      A moderate concern is that the authors do not make use of the capacity of computer simulations for sensitivity analyses. Perhaps these have been previously published? In any case, the question here is whether the 3 state behavior is especially sensitive to excitability of one of the main classes of neurons or a crucial set of connections.

      The sensitivity analysis has been made and included as Supplemental file.

      Minor point. I have but two minor points. A bit more explanation should be provided for the use of the terms "state machine" to describe the lowest speed state. Perhaps this is a term from control theory? In any case, it is not clear why this is term is appropriate for a state in which the oscillator circuits are "stuck" in a constant output form and need to be "pushed" by sensory input.

      Yes, we now provide a definition in the Introduction.

      Minor point: it is of course likely that neuromodulation of multiple types of spinal neurons occurs via inputs that activate G protein coupled receptors. These types of inputs are absent from the model, which is fine, but some sort of brief discussion should be included. One possibility is to note that the circuit achieves transitions between different states without the need for neuromodulatory inputs. This appears to me to be a very interesting and surprising insight.

      In section “Model limitations and future directions” in the Discussion, we now mention that the term “supraspinal drive” in our model is used to represent supraspinal inputs providing both electrical and neuromodulator effects on spinal neurons increasing their excitability, which disappear after spinal transection.” We think that it is so far too early to simulate the exact effects of the descending neuromodulation, since there is almost no data on the effect of different modulators on specific types of spinal interneurons.

      Reviewer #3 (Recommendations For The Authors):

      Minor Comments  

      Page numbers would be useful.

      Abstract

      Following spinal transection, the network can only operate in a state-machine regime. This is a bit strong since it applies to computational data. Clarify this statement.

      We agree. Sentence has been changed to: “Following spinal transection, the model predicts that the spinal network can only operate in the state-machine regime.”

      Introduction

      Intro - "This is somewhat surprising...". It gives the impression that spinal cats are autonomously stable on the belt. They are stabilized by the experimenter.

      The text has been changed to: “This is somewhat surprising because intact and spinal cats rely on different control mechanisms. Intact cats walking freely on a treadmill engage vision for orientation in space and their supraspinal structures process visual information and send inputs to the spinal cord to control locomotion on a treadmill that maintains a fixed position of the animal relative to the external space. Spinal cats, whose position on the treadmill relative to the external space is fixed by an experimenter, can only use sensory feedback from the hindlimbs to adjust locomotion to the treadmill speed.”

      "Cannot consistently perform treadmill locomotion" - likely a context-dependent result. Certainly, cats can do this easily off a treadmill - stalking, for example. Perhaps somewhere, mention that treadmill locomotion is not entirely similar to overground locomotion.

      We completely agree. Stalking is an excellent example showing that during overground locomotion slow movements (and related phase transitions) can be controlled by additional voluntary commands from supraspinal structures, which differs from simple treadmill locomotion, performing out of specific goalor task-dependent contexts. Based on this, we suggest a difference between a relatively slow (exploratory-type, including stalking) and relatively fast (escape-type) overground locomotion. We added the following sentence to the introduction:” This is evidently context dependent and specific for the treadmill locomotion as cats, humans  and other animals can voluntarily decide to perform consistent overground locomotion at slow speeds.”

      The authors introduce the concept of the state machine regime. In my opinion, this could use some more explanation and citations to the literature. Was it a term coined by the authors, or is there literature reinforcing this point?

      This is a computer science and automata theory term that has already been used in descriptions of locomotion (see our references in the 2nd paragraph of Discussion). We added a definition and corresponding references in the Introduction.

      In terms of sensory feedback, particularly group II input, it would be interesting to calculate if the conduction delay to the spinal cord at higher speeds would have a certain cutoff point at which it would no longer be timed effectively for phase transitions. This could reinforce your point.

      This is an interesting proposition but it is unlikely to be a factor over the range of speeds that we investigated (0.1 to 1.0 m/s). Assuming that group II afferents transmit their signals to spinal circuits at a latency of 10-20 ms, this is more than enough time to affect phase transitions, even at the highest speed considered. This might be a factor at very high speeds (e.g. galloping) or in small animals with high stepping frequencies.

      Results.

      The assertion that intact cats are inconsistent in terms of walking at slow speeds needs to be bolstered. For example, if a raised platform were built for a tray of food, would the intact cat consistently walk at slower speeds and eat? I suspect so. By the same token, would they walk slowly during bipedal walking? It is pretty easy to check this. Also, reports from the literature show differential effects of runway versus treadmill gait analysis, specifically when afferent input is removed.

      The Reviewer is correct that raising a platform for a food tray or even having intact cats walk with their hindlimbs only (with forelimbs on a stationary platform) may allow for consistent stepping at slow speeds (0.1 – 0.3 m/s). However, this effectively removes voluntary control of locomotion and makes the pattern more automatic (spinal + limb sensory feedback). These examples provide additional specific contexts, and we have already mentioned (see above) that slow locomotion of intact cat is context dependent. 

      "We believe that intact animals walking on a treadmill..." Citations for this? Certainly, this is not a new point.

      No, this is not new. We changed the sentence and added a reference to the statement: “Intact animals walking on a treadmill use visual cues and supraspinal signals to adjust their speed and maintain a fixed position relative to the external space with reference to Salinas et al. (Salinas, M.M., Wilken, J M, and Dingwell, J B, "How humans use visual optic flow to regulate stepping during walking," Gait. Posture. 57, 15-20, 2017).

      The presentation of the results is somewhat disjointed. The intact data is presented for tied and splitbelt results, but this is not addressed explicitly until figure 4. Would it not be better to create a figure incorporating both intact and modelling data and present the intact data where appropriate?

      We tried to do this initially, but this way required changing the style of the whole paper and we decided against this idea. Therefore, we prefer to keep the presentation of results as it is now. 

      Regarding the role of sensory feedback being especially important at low speeds, it is interesting that egr3+ mice (lacking spindle input) show an inability to walk at high speeds >40 cm/s but can walk at lower speeds (up to 7 cm/s) (Takeoka et al 2014). Similar findings were found with a lesion affecting Group I afferents in general (Takeoka and Arber 2019). Also, Grillner and colleagues show that cats can produce fictive locomotion in the absence of sensory input.

      In the Takeoka experiments it is difficult to assess the effect of removing somatosensory feedback because animals can simply decide to not step at higher speeds to avoid injury. Their mice deprived of somatosensory feedback can walk at slow speeds, likely thanks to voluntary commands, and cannot do so at higher speeds because (1) maybe somatosensory feedback is indeed necessary and/or (2) because they feel threatened because of impaired posture and poor control in general. In other words, they choose to not walk at faster speeds to avoid injury.

      Fictive locomotion by definition is without phasic somatosensory feedback as the animals are curarized or studies are performed in isolated spinal cord preparations. Depending on the preparation, pharmacology or brainstem stimulation is required to evoke fictive locomotion. If animals are deafferented, pharmacology or brainstem stimulation are required to induce fictive locomotion to offset the loss of spinal neuronal excitability provided by primary afferents. At the same time, our preliminary analysis of old fictive locomotion data in the University of Manitoba Spinal Cord center (Drs. Markin and Rybak had an official access to these data base during our collaboration with Dr. David McCrea) has shown that the frequency of stable fictive locomotion in cats usually exceeded 0.6 - 0.7 Hz, which approximately corresponds to the speed above 0.3 - 0.4 m/s. These data and estimation are just approximate; they have not been statistically analyzed and published and hence have not been included in our paper.

      Discussion. The statement that sensory feedback is required for animals to locomote may need to be qualified. Animals need some sensory feedback to locomote is perhaps better. For example, lesion studies by Rossignol in the early 2000s showed that cutaneous feedback from the paw was seemingly quite critical (in spinal cats). Also, see previous comments above.

      We changed this to: “… requires some sensory feedback to locomote, …”

      Figures

      Figure 1C. This figure is somewhat confusing. If intact cats do not walk (arrow), how are the data for swing and stance computed? Also raw traces would be useful to indicate that there is variability. Also, while duration is useful, would you not want to illustrate the co-efficient of variation as well as another way to show that the stepping pattern was inconsistent?

      This is probably a misunderstanding. The left panel of Fig. 1C superimposes data of intact cats from panel A (with speed range from 0.4 m/s to 1.0 m/s) and data from spinal cats from panel B (with speed range from 0.1 m/s and 1.0 m/s). Therefore, the left part of this left panel 1C (with speed range from 0.1 m/s to 0.4 m/s (pointed out by the arrow) corresponds only to spinal cats (not to intact cats). The standard deviations of all measurements are shown. All these figures were reproduced from the previous publications. We did not apply new statistical analysis to these previously published data/figures.

      Figure 4. 'All supraspinal drives (and their suppression of sensory feedback) are eliminated from the schematic shown in A. ' However, it is labelled 'brainstem drives,' which is confusing. Moreover, many of the abbreviations are confusing. Do you need l-SF-E1 in the figure, or could you call it 'Feedback 1' and then refer to l-SF-E1 in the legend? The same goes for βr, etc. Can they move to the legend?

      In the intact model (Fig. 4A), we have supraspinal drives (𝛼𝐿 and 𝛼𝑅, and  𝛾𝐿 and 𝛾𝑅 ), some of which provide presynaptic inhibition of sensory feedback (SF-E1 and SF-E2) as shown in Fig. 4A. In spinaltransected model (Fig. 4B), the above brainstem drives and their effects (presynaptic inhibition) on both feedback types are eliminated (therefore, there is no label “Brainstem drives in Fig. 4B). Also, we do not see a strong reason to change the feedback names, since they are explained in the text.

      I appreciate the detail of these figures, but they are difficult to conceptualize. They are useful in the context of 3C. Perhaps move this figure to supplementary and then show the proposed schematics for the system operating at slow, medium, and fast speeds in a replacement figure?

      We apologize for the resistance, but we would like to keep the current presentation.

      There is a lack of raw data (models or experimental) data reinforcing the figures. I would add these to all figures, which would nicely complement the graphs.

      These raw data can be found in the cited manuscripts. It would be the same figures.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In their paper, Zhan et al. have used Pf genetic data from simulated data and Ghanaian field samples to elucidate a relationship between multiplicity of infection (MOI) (the number of distinct parasite clones in a single host infection) and force of infection (FOI). Specifically, they use sequencing data from the var genes of Pf along with Bayesian modeling to estimate MOI individual infections and use these values along with methods from queueing theory that rely on various assumptions to estimate FOI. They compare these estimates to known FOIs in a simulated scenario and describe the relationship between these estimated FOI values and another commonly used metric of transmission EIR (entomological inoculation rate).

      This approach does fill an important gap in malaria epidemiology, namely estimating the force of infection, which is currently complicated by several factors including superinfection, unknown duration of infection, and highly genetically diverse parasite populations. The authors use a new approach borrowing from other fields of statistics and modeling and make extensive efforts to evaluate their approach under a range of realistic sampling scenarios. However, the write-up would greatly benefit from added clarity both in the description of methods and in the presentation of the results. Without these clarifications, rigorously evaluating whether the author's proposed method of estimating FOI is sound remains difficult. Additionally, there are several limitations that call into question the stated generalizability of this method that should at minimum be further discussed by authors and in some cases require a more thorough evaluation.

      Major comments:

      (1) Description and evaluation of FOI estimation procedure.

      a. The methods section describing the two-moment approximation and accompanying appendix is lacking several important details. Equations on lines 891 and 892 are only a small part of the equations in Choi et al. and do not adequately describe the procedure notably several quantities in those equations are never defined some of them are important to understand the method (e.g. A, S as the main random variables for inter-arrival times and service times, aR and bR which are the known time average quantities, and these also rely on the squared coefficient of variation of the random variable which is also never introduced in the paper). Without going back to the Choi paper to understand these quantities, and to understand the assumptions of this method it was not possible to follow how this works in the paper. At a minimum, all variables used in the equations should be clearly defined. 

      We thank the reviewer for this useful comment. We plan to clarify the method, including all the relevant variables in our revised manuscript. The reviewer is correct in pointing out that there are more sections and equations in Choi et al., including the derivation of an exact expression for the steady-state queue-length distribution and the two-moment approximation for the queue-length distribution. Since only the latter was directly utilized in our work, we included in the first version of our manuscript only material on this section and not the other. We agree with the reviewer on readers benefiting from additional information on the derivation of the exact expression for the steady-state queue-length distribution. Therefore, we will summarize the derivation of this expression in our revised manuscript. Regarding the assumptions of the method we applied, especially those for going from the exact expression to the two-moment approximation, we did describe these in the Materials and Methods of our manuscript. We recognize from this comment that the writing and organization of this information may not have been sufficiently clear. We had separated the information on this method into two parts, with the descriptive summary placed in the Materials and Methods and the equations or mathematical formula placed in the Appendix. This can make it difficult for readers to connect the two parts and remember what was introduced earlier in the Materials and Methods when reading the equations and mathematical details in the Appendix. For our revised manuscript, we plan to cover both parts in the Materials and Methods, and to provide more of the technical details in one place, which will be easier to understand and follow.

      b. Additionally, the description in the main text of how the queueing procedure can be used to describe malaria infections would benefit from a diagram currently as written it's very difficult to follow. 

      We thank the reviewer for this suggestion. We will add a diagram illustrating the connection between the queueing procedure and malaria transmission.

      c. Just observing the box plots of mean and 95% CI on a plot with the FOI estimate (Figures 1, 2, and 10-14) is not sufficient to adequately assess the performance of this estimator. First, it is not clear whether the authors are displaying the bootstrapped 95%CIs or whether they are just showing the distribution of the mean FOI taken over multiple simulations, and then it seems that they are also estimating mean FOI per host on an annual basis. Showing a distribution of those per-host estimates would also be helpful. Second, a more quantitative assessment of the ability of the estimator to recover the truth across simulations (e.g. proportion of simulations where the truth is captured in the 95% CI or something like this) is important in many cases it seems that the estimator is always underestimating the true FOI and may not even contain the true value in the FOI distribution (e.g. Figure 10, Figure 1 under the mid-IRS panel). But it's not possible to conclude one way or the other based on this visualization. This is a major issue since it calls into question whether there is in fact data to support that these methods give good and consistent FOI estimates. 

      There appears to be some confusion on what we display in some key figures. We will clarify this further both here and in the revised text. In Figures 1, 2, and 10-14, we displayed the bootstrapped distributions including the 95% CIs. These figures do not show the distribution of the mean FOI taken over multiple simulations. We estimated mean FOI on an annual basis per host in the following sense. Both of our proposed methods require either a steady-state queue length distribution, or moments of this distribution for FOI inference. However, we only have one realization or observation for each individual host, and we do not have access to either the time-series observation of a single individual’s MOI or many realizations of a single individual’s MOI at the same sampling time. This is typically the case for empirical data, although numerical simulations could circumvent this limitation and generate such output. Nonetheless, we do have a queue length distribution at the population level for both the simulation output and the empirical data, which can be obtained by simply aggregating MOI estimates across all sampled individuals. We use this population-level queue length distribution to represent and approximate the steady-state queue length distribution at the individual level. Such representation or approximation does not consider explicitly any individual heterogeneity due to biology or transmission. The estimated FOI is per host in the sense of representing the FOI experienced by an individual host whose queue length distribution is approximated from the collection of all sampled individuals. The true FOI per host per year in the simulation output is obtained from dividing the total FOI of all hosts per year by the total number of all hosts. Therefore, our estimator, combined with the demographic information on population size, is for the total number of Plasmodium falciparum infections acquired by all individual hosts in the population of interest per year.

      We evaluated the impact of individual heterogeneity on FOI inference by introducing individual heterogeneity into the simulations. With a considerable amount of transmission heterogeneity across individuals (namely 2/3 of the population receiving more than 90% of all bites whereas the remaining 1/3 receives the rest of the bites), our two methods exhibit a similar performance than those of the homogeneous transmission scenarios.

      Concerning the second point, we will add a quantitative assessment of the ability of the estimator to recover the truth across simulations and include this information in the legend of each figure. In particular, we will provide the proportion of simulations where the truth is captured by the entire bootstrap distribution, in addition to some measure of relative deviation, such as the relative difference between the true FOI value and the median of the bootstrap distribution for the estimate. This assessment will be a valuable addition, but please note that the comparisons we have provided in a graphical way do illustrate the ability of the methods to estimate “sensible” values, close to the truth despite multiple sources of errors. “Close” is here relative to the scale of variation of FOI in the field and to the kind of precision that would be useful in an empirical context. From a practical perspective based on the potential range of variation of FOI, the graphical results already illustrate that the estimated distributions would be informative.

      d. Furthermore the authors state in the methods that the choice of mean and variance (and thus second moment) parameters for inter-arrival times are varied widely, however, it's not clear what those ranges are there needs to be a clear table or figure caption showing what combinations of values were tested and which results are produced from them, this is an essential component of the method and it's impossible to fully evaluate its performance without this information. This relates to the issue of selecting the mean and variance values that maximize the likelihood of observing a given distribution of MOI estimates, this is very unclear since no likelihoods have been written down in the methods section of the main text, which likelihood are the authors referring to, is this the probability distribution of the steady state queue length distribution? At other places the authors refer to these quantities as Maximum Likelihood estimators, how do they know they have found the MLE? There are no derivations in the manuscript to support this. The authors should specify the likelihood and include in an appendix an explanation of why their estimation procedure is in fact maximizing this likelihood, preferably with evidence of the shape of the likelihood, and how fine the grid of values they tested is for their mean and variance since this could influence the overall quality of the estimation procedure. 

      We thank the reviewer for pointing out these aspects of the work that can be further clarified. We will specify the ranges for the choice of mean and variance parameters for inter-arrival times as well as the grid of values tested in the corresponding figure caption or in a separate supplementary table. We maximized the likelihood of observing the set of individual MOI estimates in a sampled population given steady queue length distributions (with these distributions based on the two-moment approximation method for different combinations of the mean and variance of inter-arrival times). We will add a section to either the Materials and Methods or the Appendix in our revised manuscript including an explicit formulation of the likelihood.

      We will add example figures on the shape of the likelihood to the Appendix. We will also test how choices of the grid of values influence the overall quality of the estimation procedure. Specifically, we will further refine the grid of values to include more points and examine whether the results of FOI inference are consistent and robust against each other.

      (2) Limitation of FOI estimation procedure.

      a. The authors discuss the importance of the duration of infection to this problem. While I agree that empirically estimating this is not possible, there are other options besides assuming that all 1-5-year-olds have the same duration of infection distribution as naïve adults co-infected with syphilis. E.g. it would be useful to test a wide range of assumed infection duration and assess their impact on the estimation procedure. Furthermore, if the authors are going to stick to the described method for duration of infection, the potentially limited generalizability of this method needs to be further highlighted in both the introduction, and the discussion. In particular, for an estimated mean FOI of about 5 per host per year in the pre-IRS season as estimated in Ghana (Figure 3) it seems that this would not translate to 4-year-old being immune naïve, and certainly this would not necessarily generalize well to a school-aged child population or an adult population. 

      The reviewer is indeed correct about the difficulty of empirically measuring the duration of infection for 1-5-year-olds, and that of further testing whether these 1-5-year-olds exhibit the same distribution for duration of infection as naïve adults co-infected with syphilis. We will nevertheless continue to use the described method for duration of infection, while better acknowledging and discussing the limitations this aspect of the method introduces. We note that the infection duration from the historical clinical data we have relied on, is being used in the malaria modeling community as one of the credible sources for this parameter of untreated natural infections in malaria-naïve individuals in malaria-endemic settings of Africa (e.g. in the agent-based model OpenMalaria, see 1).

      It is important to emphasize that the proposed methods apply to the MOI estimates for naïve or close to naïve patients. They are not suitable for FOI inference for the school-aged children and the adult populations of high-transmission endemic regions, since individuals in these age classes have been infected many times and their duration of infection is significantly shortened by their immunity. To reduce the degree of misspecification in infection duration and take full advantage of our proposed methods, we will emphasize in the revision the need to prioritize in future data collection and sampling efforts the subpopulation class who has received either no infection or a minimum number of infections in the past, and whose immune profile is close to that of naïve adults, for example, infants. This emphasis is aligned with the top priority of all intervention efforts in the short term, which is to monitor and protect the most vulnerable individuals from severe clinical symptoms and death.

      Also, force of infection for naïve hosts is a key basic parameter for epidemiological models of a complex infectious disease such as falciparum malaria, whether for agent-based formulations or equation-based ones. This is because force of infection for non-naïve hosts is typically a function of their immune status and the force of infection of naïve hosts. Thus, knowing the force of infection of naïve hosts can help parameterize and validate these models by reducing degrees of freedom.

      b. The evaluation of the capacity parameter c seems to be quite important and is set at 30, however, the authors only describe trying values of 25 and 30, and claim that this does not impact FOI inference, however it is not clear that this is the case. What happens if the carrying capacity is increased substantially? Alternatively, this would be more convincing if the authors provided a mathematical explanation of why the carrying capacity increase will not influence the FOI inference, but absent that, this should be mentioned and discussed as a limitation. 

      Thank you for this question. We will investigate more values of the parameter c systematically, including substantially higher ones. We note however that this quantity is the carrying capacity of the queuing system, or the maximum number of blood-stage strains that an individual human host can be co-infected with. We do have empirical evidence for the value of the latter being around 20 (2). This observed value provides a lower bound for parameter c. To account for potential under-sampling of strains, we thus tried values of 25 and 30 in the first version of our manuscript.

      In general, this parameter influences the steady-state queue length distribution based on the two-moment approximation, more specifically, the tail of this distribution when the flow of customers/infections is high. Smaller values of parameter c put a lower cap on the maximum value possible for the queue length distribution. The system is more easily “overflowed”, in which case customers (or infections) often find that there is no space available in the queuing system/individual host upon their arrival. These customers (or infections) will not increment the queue length. The parameter c has therefore a small impact for the part of the grid resulting in low flows of customers/infection, for which the system is unlikely to be overflowed. The empirical MOI distribution centers around 4 or 5 with most values well below 10, and only a small fraction of higher values between 15-20 (2). When one increases the value of c, the part of the grid generating very high flows of customers/infections results in queue length distributions with a heavy tail around large MOI values that are not supported by the empirical distribution. We therefore do not expect that substantially higher values for parameter c would change either the relative shape of the likelihood or the MLE.

      Reviewer #2 (Public Review):

      Summary:

      The authors combine a clever use of historical clinical data on infection duration in immunologically naive individuals and queuing theory to infer the force of infection (FOI) from measured multiplicity of infection (MOI) in a sparsely sampled setting. They conduct extensive simulations using agent-based modeling to recapitulate realistic population dynamics and successfully apply their method to recover FOI from measured MOI. They then go on to apply their method to real-world data from Ghana before and after an indoor residual spraying campaign.

      Strengths:

      (1) The use of historical clinical data is very clever in this context. 

      (2) The simulations are very sophisticated with respect to trying to capture realistic population dynamics. 

      (3) The mathematical approach is simple and elegant, and thus easy to understand. 

      Weaknesses: 

      (1) The assumptions of the approach are quite strong and should be made more clear. While the historical clinical data is a unique resource, it would be useful to see how misspecification of the duration of infection distribution would impact the estimates. 

      We thank the reviewer for bringing up the limitation of our proposed methods due to their reliance on a known and fixed duration of infection from historical clinical data. Please see our response to reviewer 1 comment 2a.

      (2) Seeing as how the assumption of the duration of infection distribution is drawn from historical data and not informed by the data on hand, it does not substantially expand beyond MOI. The authors could address this by suggesting avenues for more refined estimates of infection duration. 

      We thank the reviewer for pointing out a potential improvement to the work. We acknowledge that FOI is inferred from MOI, and thus is dependent on the information contained in MOI. FOI reflects risk of infection, is associated with risk of clinical episodes, and can relate local variation in malaria burden to transmission better than other proxy parameters for transmission intensity. It is possible that MOI can be as informative as FOI when one regresses the risk of clinical episodes and local variation in malaria burden with MOI. But MOI by definition is a number and not a rate parameter. FOI for naïve hosts is a key basic parameter for epidemiological models. This is because FOI of non-naïve hosts is typically a function of their immune status and the FOI of naïve hosts. Thus, knowing the FOI of naïve hosts can help parameterize and validate these models by reducing degrees of freedom. In this sense, we believe the transformation from MOI to FOI provides a useful step.

      Given the difficulty of measuring infection duration, estimating infection duration and FOI simultaneously appears to be an attractive alternative, as the referee pointed out. This will require however either cohort studies or more densely sampled cross-sectional surveys due to the heterogeneity in infection duration across a multiplicity of factors. These kinds of studies have not been, and will not be, widely available across geographical locations and time. This work aims to utilize more readily available data, in the form of sparsely sampled single-time-point cross-sectional surveys.

      (3) It is unclear in the example how their bootstrap imputation approach is accounting for measurement error due to antimalarial treatment. They supply two approaches. First, there is no effect on measurement, so the measured MOI is unaffected, which is likely false and I think the authors are in agreement. The second approach instead discards the measurement for malaria-treated individuals and imputes their MOI by drawing from the remaining distribution. This is an extremely strong assumption that the distribution of MOI of the treated is the same as the untreated, which seems unlikely simply out of treatment-seeking behavior. By imputing in this way, the authors will also deflate the variability of their estimates. 

      We thank the reviewer for pointing out aspects of the work that can be further clarified. It is difficult to disentangle the effect of drug treatment on measurement, including infection status, MOI, and duration of infection. Thus, we did not attempt to address this matter explicitly in the original version of our manuscript. Instead, we considered two extreme scenarios which bound reality, well summarized by the reviewer. First, if drug treatment has had no impact on measurement, the MOI of the drug-treated 1-5-year-olds would reflect their true underlying MOI. We can then use their MOI directly for FOI inference. Second, if the drug treatment had a significant impact on measurement, i.e., if it completely changed the infection status, MOI, and duration infection of drug-treated 1-5-year-olds, we would need to either exclude those individuals’ MOI or impute their true underlying MOI. We chose to do the latter in the original version of the manuscript. If those 1-5-year-olds had not received drug treatment, they would have had similar MOI values than those of the non-treated 1-5-year-olds. We can then impute their MOI by sampling from the MOI estimates of non-treated 1-5-year-olds.

      The reviewer is correct in pointing out that this imputation does not add additional information and can potentially deflate the variability of MOI distributions, compared to simply throwing or excluding those drug-treated 1-5-year-olds from the analysis. Thus, we can include in our revision FOI estimates with the drug-treated 1-5-year-olds excluded in the estimation.

      - For similar reasons, their imputation of microscopy-negative individuals is also questionable, as it also assumes the same distributions of MOI for microscopy-positive and negative individuals. 

      We imputed the MOI values of microscopy-negative but PCR-positive 1-5-year-olds by sampling from the microscopy-positive 1-5-year-olds, effectively assuming that both have the same, or similar, MOI distributions. We did so because there is a weak relationship in our Ghana data between the parasitemia level of individual hosts and their MOI (or detected number of var genes, on the basis of which the MOI values themselves were estimated). Parasitemia levels underlie the difference in detection sensitivity of PCR and microscopy.

      We will elaborate on this matter in our revised manuscript and include information from our previous and on-going work on the weak relationship between MOI/the number of var genes detected within an individual host and their parasitemia levels. We will also discuss potential reasons or hypotheses for this pattern.

      Reviewer #3 (Public Review):

      Summary: 

      It has been proposed that the FOI is a method of using parasite genetics to determine changes in transmission in areas with high asymptomatic infection. The manuscript attempts to use queuing theory to convert multiplicity of infection estimates (MOI) into estimates of the force of infection (FOI), which they define as the number of genetically distinct blood-stage strains. They look to validate the method by applying it to simulated results from a previously published agent-based model. They then apply these queuing theory methods to previously published and analysed genetic data from Ghana. They then compare their results to previous estimates of FOI. 

      Strengths: 

      It would be great to be able to infer FOI from cross-sectional surveys which are easier and cheaper than current FOI estimates which require longitudinal studies. This work proposes a method to convert MOI to FOI for cross-sectional studies. They attempt to validate this process using a previously published agent-based model which helps us understand the complexity of parasite population genetics. 

      Weaknesses: 

      (1) I fear that the work could be easily over-interpreted as no true validation was done, as no field estimates of FOI (I think considered true validation) were measured. The authors have developed a method of estimating FOI from MOI which makes a number of biological and structural assumptions. I would not call being able to recreate model results that were generated using a model that makes its own (probably similar) defined set of biological and structural assumptions a validation of what is going on in the field. The authors claim this at times (for example, Line 153 ) and I feel it would be appropriate to differentiate this in the discussion. 

      We thank the reviewer for this comment, although we think there is a mis-understanding on what can and cannot be practically validated in the sense of a “true” measure of FOI that would be free from assumptions for a complex disease such as malaria. We would not want the results to be over-interpreted and will extend the discussion of what we have done to test the methods. We note that for the performance evaluation of statistical methods, the use of simulation output is quite common and often a necessary and important step. In some cases, the simulation output is generated by dynamical models, whereas in others, by purely descriptive ones. All these models make their own assumptions which are necessarily a simplification of reality. The stochastic agent-based model (ABM) of malaria transmission utilized in this work has been shown to reproduce several important patterns observed in empirical data from high-transmission regions, including aspects of strain diversity which are not represented in simpler models.

      In what sense this ABM makes a set of biological and structural assumptions which are “probably similar” to those of the queuing methods we present, is not clear to us. We agree that relying on models whose structural assumptions differ from those of a given method or model to be tested, is the best approach. Our proposed methods for FOI inference based on queuing theory rely on the duration of infection distribution and the MOI distribution among sampled individuals, both of which can be direct outputs from the ABM. But these methods are agnostic on the specific mechanisms or biology underlying the regulation of duration and MOI.

      Another important point raised by this comment is what would be the “true” FOI value against which to validate our methods. Empirical MOI-FOI pairs for FOI measured directly by tracking cohort studies are still lacking. There are potential measurement errors for both MOI and FOI because the polymorphic markers typically used in different cohort studies cannot differentiate hyper-diverse antigenic strains fully and well (5). Also, these cohort studies usually start with drug treatment. Alternative approaches do not provide a measure of true FOI, in the sense of the estimation being free from assumptions. For example, one approach would be to fit epidemiological models to densely sampled/repeated cross-sectional surveys for FOI inference. In this case, no FOI is measured directly and further benchmarked against fitted FOI values. The evaluation of these models is typically based on how well they can capture other epidemiological quantities which are more easily sampled or measured, including prevalence or incidence. This is similar to what is done in this work. We selected the FOI values that maximize the likelihood of observing the given distribution of MOI estimates. Furthermore, we paired our estimated FOI value for the empirical data from Ghana with another independently measured quantity EIR (Entomological Inoculation Rate), typically used in the field as a measure of transmission intensity. We check whether the resulting FOI-EIR point is consistent with the existing set of FOI-EIR pairs and the relationship between these two quantities from previous studies. We acknowledge that as for model fitting approaches for FOI inference, our validation is also indirect for the field data.

      Prompted by the reviewer’s comment, we will discuss this matter in more detail in our revised manuscript, including clarifying further certain basic assumptions of our agent-based model, emphasizing the indirect nature of the validation with the field data and the existing constraints for such validation.

      (2) Another aspect of the paper is adding greater realism to the previous agent-based model, by including assumptions on missing data and under-sampling. This takes prominence in the figures and results section, but I would imagine is generally not as interesting to the less specialised reader. The apparent lack of impact of drug treatment on MOI is interesting and counterintuitive, though it is not really mentioned in the results or discussion sufficiently to allay my confusion. I would have been interested in understanding the relationship between MOI and FOI as generated by your queuing theory method and the model. It isn't clear to me why these more standard results are not presented, as I would imagine they are outputs of the model (though happy to stand corrected - it isn't entirely clear to me what the model is doing in this manuscript alone). 

      We thank the reviewer for this comment. We will add supplementary figures for the MOI distributions generated by the queuing theory method (i.e., the two-moment approximation method) and our agent-based model in our revised manuscript.

      In the first version of our manuscript, we considered two extreme scenarios which bound the reality, instead of simply assuming that drug treatment does not impact the infection status, MOI, and duration of infection. See our response to reviewer 2 point (3). The resulting FOI estimates differ but not substantially across the two extreme scenarios, partially because drug-treated individuals’ MOI distribution is similar to that of non-treated individuals (or the apparent lack of drug treatment on MOI as pointed by the referee). We will consider potentially adding some formal test to quantify the difference between the two MOI distributions and how significant the difference is. We will discuss which of the two extreme scenarios reality is closer to, given the result of the formal test. We will also discuss in our revision possible reasons/hypotheses underlying the impact of drug treatment on MOI from the perspective of the nature, efficiency, and duration of the drugs administrated.

      Regarding the last point of the reviewer, on understanding the relationship between MOI and FOI, we are not fully clear about what was meant. We are also confused about the statement on what the “model is doing in this manuscript alone”. We interpret the overall comment as the reviewer suggesting a better understanding of the relationship between MOI and FOI, either between their distributions, or the moments of their distributions, perhaps by fitting models including simple linear regression models. This approach is in principle possible, but it is not the focus of this work. It will be equally difficult to evaluate the performance of this alternative approach given the lack of MOI-FOI pairs from empirical settings with directly measured FOI values (from large cohort studies). Moreover, the qualitative relationship between the two quantities is intuitive. Higher FOI values should correspond to higher MOI values. Less variable FOI values should correspond to more narrow or concentrated MOI distributions, whereas more variable FOI values should correspond to more spread-out ones. We will discuss this matter in our revised manuscript.

      (3) I would suggest that outside of malaria geneticists, the force of infection is considered to be the entomological inoculation rate, not the number of genetically distinct blood-stage strains. I appreciate that FOI has been used to explain the latter before by others, though the authors could avoid confusion by stating this clearly throughout the manuscript. For example, the abstract says FOI is "the number of new infections acquired by an individual host over a given time interval" which suggests the former, please consider clarifying. 

      We thank the reviewer for this helpful comment as it is fundamental that there is no confusion on the basic definitions. EIR, the entomological inoculation rate, is closely related to the force of infection but is not equal to it. EIR focuses on the rate of arrival of infectious bites and is measured as such by focusing on the mosquito vectors that are infectious and arrive to bite a given host. Not all these bites result in actual infection of the human host. Epidemiological models of malaria transmission clearly make this distinction, as FOI is defined as the rate at which a host acquires infection. This definition comes from more general models for the population dynamics of infectious diseases in general. (For diseases simpler than malaria, with no super-infection, the typical SIR models define the force of infection as the rate at which a susceptible individual becomes infected).  For malaria, force of infection refers to the number of blood-stage new infections acquired by an individual host over a given time interval. This distinction between EIR and FOI is the reason why studies have investigated their relationship, with the nonlinearity of this relationship reflecting the complexity of the underlying biology and how host immunity influences the outcome of an infectious bite.

      We agree however with the referee that there could be some confusion in our definition resulting from the approach we use to estimate the MOI distribution (which provides the basis for estimating FOI). In particular, we rely on the non-existent to very low overlap of var repertoires among individuals with MOI=1, an empirical pattern we have documented extensively in previous work (See 2, 3, and 4). The method of var_coding and its Bayesian formulation rely on the assumption of negligible overlap. We note that other approaches for estimating MOI (and FOI) based on other polymorphic markers, also make this assumption (reviewed in _5). Ultimately, the FOI we seek to estimate is the one defined as specified above and in both the abstract and introduction, consistent with the epidemiological literature. We will include clarification in the introduction and discussion of this point in the revision.

      (4) Line 319 says "Nevertheless, overall, our paired EIR (directly measured by the entomological team in Ghana (Tiedje et al., 2022)) and FOI values are reasonably consistent with the data points from previous studies, suggesting the robustness of our proposed methods". I would agree that the results are consistent, given that there is huge variation in Figure 4 despite the transformed scales, but I would not say this suggests a robustness of the method. 

      We will modify the relevant sentences to use “consistent” instead of “robust”.

      (5) The text is a little difficult to follow at times and sometimes requires multiple reads to understand. Greater precision is needed with the language in a few situations and some of the assumptions made in the modelling process are not referenced, making it unclear whether it is a true representation of the biology. 

      We thank the reviewer for this comment. As also mentioned in the response to reviewer 1’s comments, we will reorganize and rewrite parts of the text in our revision to improve clarity.

      References and Notes

      (1) Maire, N. et al. A model for natural immunity to asexual blood stages of Plasmodium falciparum malaria in endemic areas. Am J Trop Med Hyg., 75(2 Suppl):19-31 (2006).

      (2) Tiedje, K. E. et al. Measuring changes in Plasmodium falciparum census population size in response to sequential malaria control interventions. eLife, 12 (2023).

      (3) Day, K. P. et al. Evidence of strain structure in Plasmodium falciparum var gene repertoires in children from Gabon, West Africa. Proc. Natl. Acad. Sci. U.S.A., 114(20), 4103-4111 (2017).

      (4) Ruybal-Pesántez, S. et al. Population genomics of virulence genes of Plasmodium falciparum in clinical isolates from Uganda. Sci. Rep., 7(11810) (2017).

      (5) Labbé, F. et al. Neutral vs. non-neutral genetic footprints of Plasmodium falciparum multiclonal infections. PLoS Comput Biol 19(1) (2023).

    1. Author Response:

      Reviewer #1 (Public Review):

      [...] The conclusions of the in vitro experiments using cultured hippocampal slices were well supported by the data, but aspects of the in vivo experiments and proteomic studies need additional clarification.

      (1) In contrast to the in vitro experiments in which a γ-secretase inhibitor was used to exclude possible effects of Aβ, this possibility was not examined in in-vivo experiments assessing synapse loss and function (Figure 3) and cognitive function (Figure 4). The absence of plaque formation (Figure 4B) is not sufficient to exclude the possibility that Aβ is involved. The potential involvement of Aβ is an important consideration given the 4-month duration of protein expression in the in vivo studies.

      Response: We appreciate the reviewer for raising this question. While our current data did not exclude the potential involvement of Aβ-induced toxicity in the synaptic and cognitive dysfunction observed in mice overexpressing β-CTF, addressing this directly remains challenging. Treatment with γ-secretase inhibitors could potentially shed light on this issue. However, treatments with γ-secretase inhibitors are known to lead to brain dysfunction by itself likely due to its blockade of the γ-cleavage of other essential molecules, such as Notch[1, 2]. As a result, this approach is unlikely to provide a definitive answer, which also prevents us from pursuing it further in vivo. We hope the reviewer understands this limitation and agrees to a discussion of this issue in the revised manuscript instead.

      (2) The possibility that the results of the proteomic studies conducted in primary cultured hippocampal neurons depend in part on Aβ was also not taken into consideration.

      Response: We thank the reviewer for raising this interesting question. In the revised manuscript, we plan to address this experimentally by using a γ-secretase inhibitor to investigate the potential contribution of Aβ in this study.

      Likely impact of the work on the field, and the utility of the methods and data to the community:

      The authors' use of sparse expression to examine the role of β-CTF on spine loss could be a useful general tool for examining synapses in brain tissue.

      Response: We thank the reviewer for these comments. Indeed, it is a very robust assay and we would like to share this method with the scientific community as soon as possible.

      Additional context that might help readers interpret or understand the significance of the work:

      The discovery of BACE1 stimulated an international effort to develop BACE1 inhibitors to treat Alzheimer's disease. BACE1 inhibitors block the formation of β-CTF which, in turn, prevents the formation of Aβ and other fragments. Unfortunately, BACE1 inhibitors not only did not improve cognition in patients with Alzheimer's disease, they appeared to worsen it, suggesting that producing β-CTF actually facilitates learning and memory. Therefore, it seems unlikely that the disruptive effects of β-CTF on endosomes plays a significant role in human disease. Insights from the authors that shed further light on this issue would be welcome.

      Response: We would like to express our gratitude to the reviewer for raising this interesting question. It remains puzzling why BACE1 inhibition has failed to yield benefits in AD patients, while amyloid clearance via Aβ antibodies has been shown to slow disease progression. One possible explanation is that pharmacological inhibition of BACE1 may not be as effective as genetic removal. Indeed, genetic depletion of BACE1 leads to the clearance of existing amyloid plaques[3], whereas its pharmacological inhibition slows plaque growth and prevents the formation of new plaques but does not stop the growth of the existing ones[4]. We think the negative results of BACE1 inhibitors in clinical trials may not be sufficient to rule out the potential contribution of β-CTF to AD pathogenesis. Given that cognitive function continues to deteriorate rapidly in plaque-free patients after 1.5 years of treatment with Aβ antibodies in phase three clinical studies[5], it is important to consider the possible role of other Aβ-related fragments, such as β-CTF. We will include some further discussion in the revised manuscript.

      Reviewer #2 (Public Review):

      Summary:

      In this study, the authors investigate the potential role of other cleavage products of amyloid precursor protein (APP) in neurodegeneration. They combine in vitro and in vivo experiments, revealing that β-CTF, a product cleaved by BACE1, promotes synaptic loss independently of Aβ. Furthermore, they suggest that β-CTF may interact with Rab5, leading to endosomal dysfunction and contributing to the loss of synaptic proteins.

      Response: We would like to thank the reviewer for his/her insightful suggestions. We have addressed the specific comments in following sections.

      Weaknesses:

      Most experiments were conducted in vitro using overexpressed β-CTF. Additionally, the study does not elucidate the mechanisms by which β-CTF disrupts endosomal function and induces synaptic degeneration.

      Response: We would like to thank the reviewer for this insightful comment. While a significant portion of our experiments were conducted in vitro, the main findings were also confirmed in vivo (Figures 3 and 4). Repeating all the experiments in vivo would be challenging and may not be necessary. Regarding the use of overexpressed β-CTF, we acknowledge that this is a common issue in neurodegenerative disease studies. These diseases progress slowly over many years, sometimes even decades in patients. To model this progression in cell or mouse models within a time frame feasible for research, overexpression of certain proteins is often required. While not ideal, it is sometimes unavoidable. Since β-CTF levels are elevated in AD patients[6], its overexpression is a reasonable approach to investigate its potential effects.

      We did not further investigate the mechanisms by which β-CTF disrupted endosomal function because our preliminary results align with previous findings. Kim et al. demonstrated that β-CTF recruits APPL1 (a Rab5 effector) via the YENPTY motif to Rab5 endosomes, where it stabilizes active GTP-Rab5, leading to pathologically accelerated endocytosis, endosome swelling and selectively impaired transport of Rab5 endosomes[6]. In our manuscript, we observed that co-expression of Rab5S34N with β-CTF effectively mitigated β-CTF-induced spine loss in hippocampal slice cultures (Figures 6I-J), indicating that Rab5 overactivation-induced endosomal dysfunction contributed to β-CTF-induced spine loss, which was consistent with their conclusions.

      Reviewer #3 (Public Review):

      Summary:

      Most previous studies have focused on the contributions of Abeta and amyloid plaques in the neuronal degeneration associated with Alzheimer's disease, especially in the context of impaired synaptic transmission and plasticity which underlies the impaired cognitive functions, a hallmark in AD. But processes independent of Abeta and plaques are much less explored, and to some extent, the contributions of these processes are less well understood. Luo et all addressed this important question with an array of approaches, and their findings generally support the contribution of beta-CTF-dependent but non-Abeta-dependent process to the impaired synaptic properties in the neurons. Interestingly, the above process appears to operate in a cell-autonomous manner. This cell-autonomous effect of beta-CTF as reported here may facilitate our understanding of some potentially important cellular processes related to neurodegeneration. Although these findings are valuable, it is key to understand the probability of this process occurring in a more natural condition, such as when this process occurs in many neurons at the same time. This will put the authors' findings into a context for a better understanding of their contribution to either physiological or pathological processes, such as Alzheimer's. The experiments and results using the cell system are quite solid, but the in vivo results are incomplete and hence less convincing (see below). The mechanistic analysis is interesting but primitive and does not add much more weight to the significance. Hence, further efforts from the authors are required to clarify and solidify their results, in order to provide a complete picture and support for the authors' conclusions.

      Response: We would like to thank the reviewer for the constructive suggestions. We have addressed the specific comments in following sections.

      Strengths:

      (1) The authors have addressed an interesting and potentially important question

      (2) The analysis using the cell system is solid and provides strong support for the authors' major conclusions. This analysis has used various technical approaches to support the authors' conclusions from different aspects and most of these results are consistent with each other.

      Response: We would like to thank the reviewer for these comments.

      Weaknesses:

      (1) The relevance of the authors' major findings to the pathology, especially the Abeta-dependent processes is less clear, and hence the importance of these findings may be limited.

      Response: We would like to thank the reviewer for pointing this out. Phase 3 clinical trial data for Aβ antibodies show that cognitive function continues to decline rapidly, even in plaque-free patients, after 1.5 years of treatment[5]. This suggests that plaque-independent mechanisms may drive AD progression. Therefore, it is crucial to consider the potential contributions of other Aβ species or related fragments, such as alternative forms of Aβ and β-CTF. While it is too early to definitively predict how β-CTF contributes to AD progression, it is notable that β-CTF, rather than Aβ, induced synaptic deficits in mice, which recapitulates a key pathological feature of AD. Ultimately, the true role of β-CTF in AD pathogenesis can only be confirmed through clinical studies.

      (2) In vivo analysis is incomplete, with certain caveats in the experimental procedures and some of the results need to be further explored to confirm the findings.

      Response: We would like to thank the reviewer for this suggestion. We plan to correct these caveats in the revised manuscript.

      (3) The mechanistic analysis is rather primitive and does not add further significance.

      Response: We would like to thank the reviewer for this comment. We did not delve further into the underlying mechanisms because our analysis indicates that Rab5 dysfunction underlies β-CTF-induced endosomal dysfunction, which is consistent with another study and has been addressed in detail there[6]. We hope the reviewer could understand that our focus in this paper is on how β-CTF triggers synaptic deficits, which is why we did not investigate the mechanisms of β-CTF-induced endosomal dysfunction further.

      References:

      1. GüNER G, LICHTENTHALER S F. The substrate repertoire of γ-secretase/presenilin [J]. Seminars in cell & developmental biology, 2020, 105: 27-42.
      2. DOODY R S, RAMAN R, FARLOW M, et al. A phase 3 trial of semagacestat for treatment of Alzheimer's disease [J]. The New England journal of medicine, 2013, 369(4): 341-50.
      3. HU X, DAS B, HOU H, et al. BACE1 deletion in the adult mouse reverses preformed amyloid deposition and improves cognitive functions [J]. The Journal of experimental medicine, 2018, 215(3): 927-40.
      4. PETERS F, SALIHOGLU H, RODRIGUES E, et al. BACE1 inhibition more effectively suppresses initiation than progression of β-amyloid pathology [J]. Acta Neuropathol, 2018, 135(5): 695-710.
      5. SIMS J R, ZIMMER J A, EVANS C D, et al. Donanemab in Early Symptomatic Alzheimer Disease: The TRAILBLAZER-ALZ 2 Randomized Clinical Trial [J]. Jama, 2023, 330(6): 512-27.
      6. KIM S, SATO Y, MOHAN P S, et al. Evidence that the rab5 effector APPL1 mediates APP-βCTF-induced dysfunction of endosomes in Down syndrome and Alzheimer's disease [J]. Molecular psychiatry, 2016, 21(5): 707-16.
    1. Henry George, Progress and Poverty, Selections (1879) In 1879, the economist Henry George penned a massive bestseller exploring the contradictory rise of both rapid economic growth and crippling poverty. This association of poverty with progress is the great enigma of our times. It is the central fact from which spring industrial, social, and political difficulties that perplex the world, and with which statesmanship and philanthropy and education grapple in vain. From it come the clouds that overhang the future of the most progressive and self-reliant nations. It is the riddle which the Sphinx of Fate puts to our civilization, and which not to answer is to be destroyed. So long as all the increased wealth which modern progress brings goes but to build up great fortunes, to increase luxury and make sharper the contrast between the House of Have and the House of Want, progress is not real and cannot be permanent. The reaction must come. The tower leans from its foundations, and every new story but hastens the final catastrophe. To educate men who must be condemned to poverty, is but to make them restive; to base on a state of most glaring social inequality political institutions under which men are theoretically equal, is to stand a pyramid on its apex. … … the evils arising from the unjust and unequal distribution of wealth, which are becoming more and more apparent as modern civilization goes on, are not incidents of progress, but tendencies which must bring progress to a halt; that they will not cure themselves, but, on the contrary, must, unless their cause is removed, grow greater and greater, until they sweep us back into barbarism by the road every previous civilization has trod. But it also shows that these evils are not imposed by natural laws; that they spring solely from social mal-adjustments which ignore natural laws, and that in removing their cause we shall be giving an enormous impetus to progress. … Equality of political rights will not compensate for the denial of the equal right to the bounty of nature. Political liberty, when the equal right to land is denied, becomes, as population increases and invention goes on, merely the liberty to compete for employment at starvation wages. This is the truth that we have ignored. And so there come beggars in our streets and tramps on our roads; and poverty enslaves men whom we boast are political sovereigns; and want breeds ignorance that our schools cannot enlighten; and citizens vote as their masters dictate; and the demagogue usurps the part of the statesman; and gold weighs in the scales of justice; and in high places sit those who do not pay to civic virtue even the compliment of hypocrisy; and the pillars of the republic that we thought so strong already bend under an increasing strain. We honor Liberty in name and in form. We set up her statues and sound her praises. But we have not fully trusted her. And with our growth so grow her demands. She will have no half service! Liberty! it is a word to conjure with, not to vex the ear in empty boastings. For Liberty means Justice, and Justice is the natural law—the law of health and symmetry and strength, of fraternity and co-operation. They who look upon Liberty as having accomplished her mission when she has abolished hereditary privileges and given men the ballot, who think of her as having no further relations to the every-day affairs of life, have not seen her real grandeur—to them the poets who have sung of her must seem rhapsodists, and her martyrs fools! As the sun is the lord of life, as well as of light; as his beams not merely pierce the clouds, but support all growth, supply all motion, and call forth from what would otherwise be a cold and inert mass, all the infinite diversities of being and beauty, so is liberty to mankind. It is not for an abstraction that men have toiled and died; that in every age the witnesses of Liberty have stood forth, and the martyrs of Liberty have suffered. … The fiat has gone forth! With steam and electricity, and the new powers born of progress, forces have entered the world that will either compel us to a higher plane or overwhelm us, as nation after nation, as civilization after civilization, have been overwhelmed before. It is the delusion which precedes destruction that sees in the popular unrest with which the civilized world is feverishly pulsing only the passing effect of ephemeral causes. Between democratic ideas and the aristocratic adjustments of society there is an irreconcilable conflict. Here in the United States, as there in Europe, it may be seen arising. We cannot go on permitting men to vote and forcing them to tramp. We cannot go on educating boys and girls in our public schools and then refusing them the right to earn an honest living. We cannot go on prating of the inalienable rights of man and then denying the inalienable right to the bounty of the Creator. Even now, in old bottles the new wine begins to ferment, and elemental forces gather for the strife!   Source: Henry George, Progress and Poverty: An Inquiry into the Cause of Industrial Depressions and of Increase of Want with Increase of Wealth: The Remedy (1879).

      The contradiction between increasing economic growth and rising poverty is examined in Henry George's growth and Poverty. He contends that the unfair distribution of wealth, especially land ownership, is the root cause of economic inequality. George cautions that if society does not correct this imbalance, it could collapse due to the growing concentration of wealth within a small number of people. He claims that economic justice, especially equitable access to natural resources, is necessary for true liberty in addition to political rights. George's writings serve as an appeal for societal systems to be changed in order to stop the negative effects of unbridled inequality.

    1. What is evidence? It is a moment remembered from a novel, a story overheard, a movie, an experience. It’s anything you use to think through your concepts.

      I think from this concept we all have different stories and experiences but may have similarities in the way we are as humans.

    1. This may be due to some low level of introgression via gene flow between species, or some remnants of unsorted loci. The one sample from Fiji does contain some genetic material from the blue cluster (Fig. 2) and is unlikely to have experienced recent gene flow with Ulithi individuals. It is therefore more likely that the small amount of genetic material from alternative genetic clusters, as seen in a few individuals, is the result of unsorted ancestral shared loci.

      It is interesting to think about the possibility of gene flow between coral groups. We commonly think of gene flow more in mobile organisms, but it very much is possible with corals through breakage and broadcast spawning events (albeit much less likely in corals than it is in mobile organisms).

    1. this might

      This use of tentative language is something that appears a lot in math education research for learner-centered environments. Math processes are too-often presented as certainties - "Here is the way to solve this type of problem." "You did not follow the correct procedure." To open up a more creative, sense-making, problem solving culture, we should increase our usage of tentative language - "What are some possible ways to solve this type of problem?" Number Talks are great structure for opening learners up to creative new possibilities for solving problems. Many teachers and community members may criticize applying an open, creative approach to mathematics as inefficient. In reality, mathematics is an excellent vehicle for learning how to think in open, creative ways, to notice patterns and structure, to create logical arguments. When math is only taught with efficiency in mind, we end up excluding some of the most creative minds in heavy favor of those who are strong memorizers and/or rule-followers.

    1. Fighting climate change involves large, upfront costs in the form of foregone goods and services. Whether it taxes emissions or imposes a shrinking cap, the government takes away options from producers. Thus, measured GDP and per capita income will be lower, at least compared to what they would have been in the absence of emission curbs.

      I think what the author is trying to say her is that fighting climate change is expensive. It is often inefficient and substantially lowers productivity, and therefore output. This is a huge dilemma for countries. We talked about in class about the saying: "First get rich then get green". Developing countries like China or India simply will not use more resources to slow down economic growth, because it hinders their process as a country. Whereas on the other hand, developed countries like the US or EU often have huge competition across the globe, so tax on emissions may slow down their process to compete with other countries.

    1. “The culture of this extreme dissection of TV that recaps started has grown. There are just so many different formats where you can be doing that,” Emami says. At Vulture, recaps are “still a very big part of what we do, but I also think it’s now just one part of what we do. It’s one part of a coverage plan, and that can include explainers, think pieces, what are the biggest questions asked after this episode of Westworld.” Recaps were just one expression of an idea that still holds sway over the internet, and how audiences talk about TV in general: essentially, that it’s worth talking about — publicly, rigorously, and joyfully. As long as that philosophy remains intact, its execution is both flexible and secondary. Netflix shows may not make for good recaps, but they can still spawn a meme like Barb, a perfect fusion of internet weirdos and the unwitting object of their passion that followed the spirit of recaps, if not their letter. The permission to honor something you love by unpacking it, and the idea that affection itself is reason itself for unpacking, is a difficult dam to unburst.

      The passage reflects on the evolution of television criticism and audience engagement, highlighting the importance of discussion and analysis in a variety of formats while emphasizing the joy and affection that drive these conversations.

    1. we fear the visibility without which we cannot truly hv~

      I believe this part of the text refers to the anxiety people may have when exposing who they are to others. That happens because we, as humans, care about what others think of us and tend to fear rejection. However, if we're not being visible, we're not achieving any personal fulfillment.

    1. I wonder if there's a copy anywhere of the Macey business system book that they sold to explain how to use it?

      reply to u/atomicnotes at https://old.reddit.com/r/Zettelkasten/comments/1fa0240/early_1900s_3_x_5_inch_card_index_filing_cabinet/

      This is an excellent question. I strongly suspect you won't find a booklet or book from Macey after 1906 that does this, though there may have been something before that.

      You'll notice that on page 9, the 1906 Macy Catalog takes what I consider to be a pot shot at their Shaw-Walker competition in the section "Not a kindergarten". Shaw-Walker was selling not just furniture, but a more specific system, as well as a magazine. Since there's something to be learned for current knowledge managers and zettel-casters in the historical experience of these companies and the systems and methods they were selling, I'll quote that section here (substitute references to enterprise and business for yourself):

      Not a Kindergarten

      Every successful enterprise knows its own requirements best, and develops the best system for its own purpose. We manufacture business machinery. Our appliances and supplies are boiled down to a few parts, and simple forms, and will accommodate any system in any business. The office boy can understand and use them. If we undertook to teach the whole world how to run its business, we would have to saddle the cost on those who buy for what we tried to teach those who do not.

      System in business is desirable, but no system can make a business successful, where the management is deficient. So called ‘Systems’ often result in useless expense and disappointment. We retain what experience proves useful and practical; so far as possible, eliminating all complicated and useless features. This explains how we can employ the best workmanship and material, combined with pleasing designs, and sell our goods with profit at lower prices than the inferior articles offered by others.

      There may have been some booklets at some point, but I've not run across them for any of the major manufacturers of the time. (I've only loosely searched this area.) Some of the general principles were covered in various articles in System Magazine which was published by Shaw-Walker, a filing cabinet manufacturer, in the early century. System Magazine was sold to McGraw-Hill which renamed it Business Week, but it is now better known as Bloomberg Business Week. In the December 1906 issue of System, W. K. Kellogg, the President of the Toasted Corn Flake Company, is quoted touting the invaluable nature of the Shaw-Walker filing system at a time when his company was using 640 drawers of their system.

      To some extent the smaller discrete "system" was really a part of a broader range of information and knowledge of business and competition. This can be seen in the fact that System Magazine still exists, just under an alternate name, along with a much broader area of business schools and business systems. We've just "forgotten" (or take for granted) the art of the smaller systems and processes which seemed new in the late 1800s and early 1900s.

      Other companies had "systems" they sold or taught, much like Tiago Forte teaches his "Second Brain" method or Nick Milo teaches "Linking Your Thinking". However, most of them were really in the business of selling goods: furniture, filing cabinets, desks, index cards, card dividers, etc. and this was where the real money was to be found at the time.

      A similar example in the space is the Memindex System booklet that came with their box and index cards. The broad principles of the system can be described in a few paragraphs so that the average person can read it and modify it to their particular needs or use case. The company never felt the need to write an entire book along the lines of David Allen's Getting Things Done or Ryder Carroll's Bullet Journal Method. Allen and Carroll are selling systems by way of books or classes. Admittedly, Carroll does have custom printed notebooks for using his methods, but I suspect these are a tiny fraction of the overall notebook sales for those who use his method.

      Here's evidence of a correspondence course from the Library Bureau some time after 1927, which was when they'd been purchased by Remington Rand: https://www.ebay.com/itm/335534180049 . Library Bureau had an easier time as their system was standardized for libraries, though they did have efforts to cater to business concerns the way Shaw-Walker, The Macey Company, Globe-Wernicke and others certainly did.

      I think the best examples in broader book form from that time period are Kaiser's two books which still stand up pretty well today for those creating knowledge management systems, zettelkasten, commonplace books, getting things done/productivity systems, second brains, etc.

      Kaiser, J. Card System at the Office. The Card System Series 1. London: Vacher and Sons, 1908. http://archive.org/details/cardsystematoffi00kaisrich.

      ———. Systematic Indexing. The Card System Series 2. London: Sir Isaac Pitman & Sons, Ltd., 1911. http://archive.org/details/systematicindexi00kaisuoft.

    1. Author Response

      We thank the reviewers for their positive comments and constructive feedback following their thorough reading of the manuscript. In this provisional reply we will briefly address the reviewer’s comments and suggestions point by point. In the forthcoming revised manuscript, we will more thoroughly address the reviewer’s comments and provide additional supporting data.

      (1) The expression 'randomly clustered networks' needs to be explained in more detail given that in its current form risks to indicate that the network might be randomly organized (i.e., not organized). In particular, a clustered network with future functionality based on its current clustering is not random but rather pre-configured into those clusters. What the authors likely meant to say, while using the said expression in the title and text, is that clustering is not induced by an experience in the environment, which will only be later mapped using those clusters. While this organization might indeed appear as randomly clustered when referenced to a future novel experience, it might be non-random when referenced to the prior (unaccounted) activity of the network. Related to this, network organization based on similar yet distinct experiences (e.g., on parallel linear tracks as in Liu, Sibille, Dragoi, Neuron 2021) could explain/configure, in part, the hippocampal CA1 network organization that would appear otherwise 'randomly clustered' when referenced to a future novel experience.

      As suggested by the reviewer, we will revise the text to clarify that the random clustering is random with respect to any future, novel environment. The cause of clustering could be prior experiences (e.g. Bourjaily M & Miller P, Front. Comput. Neurosci. 5:37, 2011) or developmental programming (e.g. Perin R, Berger TK, & Markram H, Proc. Natl. Acad. Sci. USA 108:5419, 2011).

      (2) The authors should elaborate more on how the said 'randomly clustered networks' generate beyond chance-level preplay. Specifically, why was there preplay stronger than the time-bin shuffle? There are at least two potential explanations:

      (2.1) When the activation of clusters lasts for several decoding time bins, temporal shuffle breaks the continuity of one cluster's activation, thus leading to less sequential decoding results. In that case, the preplay might mainly outperform the shuffle when there are fewer clusters activating in a PBE. For example, activation of two clusters must be sequential (either A to B or B to A), while time bin shuffle could lead to non-sequential activations such as a-b-a-b-a-b where a and b are components of A and B;

      (2.2) There is a preferred connection between clusters based on the size of overlap across clusters. For example, if pair A-B and B-C have stronger overlap than A-C, then cluster sequences A-B-C and C-B-A are more likely to occur than others (such as A-C-B) across brain states. In that case, authors should present the distribution of overlap across clusters, and whether the sequences during run and sleep match the magnitude of overlap. During run simulation in the model, as clusters randomly receive a weak location cue bias, the activation sequence might not exactly match the overlap of clusters due to the external drive. In that case, the strength of location cue bias (4% in the current setup) could change the balance between the internal drive and external drive of the representation. How does that parameter influence the preplay incidence or quality?

      Based on our finding that preplay occurs only in networks that sustain cluster activity over multiple decoding time bins (Figure 5d-e), our understanding of the model’s function is consistent with the reviewers first explanation. We will provide additional analysis in the forthcoming revised manuscript in order to directly test the first explanation and will also test the intriguing possibility that the reviewer’s second suggestion contributes to above-chance preplay.

      (3) The manuscript is focused on presenting that a randomly clustered network can generate preplay and place maps with properties similar to experimental observations. An equally interesting question is how preplay supports spatial coding. If preplay is an intrinsic dynamic feature of this network, then it would be good to study whether this network outperforms other networks (randomly connected or ring lattice) in terms of spatial coding (encoding speed, encoding capacity, tuning stability, tuning quality, etc.)

      We agree that this is an interesting future direction, but we see it as outside the scope of the current work. There are two interesting avenues of future work: 1) Our current model does not include any plasticity mechanisms, but a future model could study the effects of synaptic plasticity during preplay on long-term network dynamics, and 2) Our current model does not include alternative approaches to constructing the recurrent network, but future studies could systematically compare the spatial coding properties of alternative types of recurrent networks.

      (4) The manuscript mentions the small-world connectivity several times, but the concept still appears too abstract and how the small-world index (SWI) contributes to place fields or preplay is not sufficiently discussed.

      For a more general audience in the field of neuroscience, it would be helpful to include example graphs with high and low SWI. For example, you can show a ring lattice graph and indicate that there are long paths between points at opposite sides of the ring; show randomly connected graphs indicating there are no local clustered structures, and show clustered graphs with several hubs establishing long-range connections to reduce pair-wise distance.

      How this SWI contributes to preplay is also not clear. Figure 6 showed preplay is correlated with SWI, but maybe the correlation is caused by both of them being correlated with cluster participation. The balance between cluster overlap and cluster isolation is well discussed. In the Discussion, the authors mention "...Such a balance in cluster overlap produces networks with small-world characteristics (Watts and Strogatz, 1998) as quantified by a small-world index..." (Lines 560-561). I believe the statement is not entirely appropriate, a network similar to ring lattice can still have the balance of cluster isolation and cluster overlap, while it will have small SWI due to a long path across some node pairs. Both cluster structure and long-range connection could contribute to SWI. The authors only discuss the necessity of cluster structure, but why is the long-range connection important should also be discussed. I guess long-range connection could make the network more flexible (clusters are closer to each other) and thus increase the potential repertoire.

      We agree that the manuscript would benefit from a more concrete explanation of the small-world index. We will revise the text and add illustrative figures.

      We note that while our most successful clustered networks are indeed those with small-world characteristics, there are other ways of producing small-world networks which may not show good place fields or preplay. We will test another type of small-world network if time permits.

      Our discussion of “cluster overlap” is specific to our type of small-world network in which there is no pre-determined spatial dimension (unlike the ring network of Watts and Strogatz). Therefore, because clusters map randomly to location once a particular spatial context is imposed, the random overlap between clusters produces long-range connections in that context (and any other context) so one can think of the amount of overlap between clusters as representing the number of long-range connections in a Watts-Strogatz model, except, we wish to iterate, such models involve a spatial topology within the network, which we do not include.

      (5) What drives PBE during sleep? Seems like the main difference between sleep and run states is the magnitude of excitatory and inhibitory inputs controlled by scaling factors. If there are bursts (PBE) in sleep, do you also observe those during run? Does the network automatically generate PBE in a regime of strong excitation and weak inhibition (neural bifurcation)?

      During sleep simulations, the PBEs are spontaneously generated by the recurrent connections in the network. The constant-rate Poisson inputs drive low-rate stochastic spiking in the recurrent network, which then randomly generates population events when there is sufficient internal activity to transiently drive additional spiking within the network.

      During run simulations, the spatially-tuned inputs drive greater activity in a subset of the cells at a given point on the track, which in turn suppress the other excitatory cells through the feedback inhibition.

      (6) Is the concept of 'cluster' similar to 'assemblies', as in Peyrache et al, 2010; Farooq et al, 2019? Does a classic assembly analysis during run reveal cluster structures?

      Yes, we are highly confident that the clusters in our network would correspond to the functional assemblies that have been studied through assembly analysis and will present the relevant data in a revision.

      (7) Can the capacity of the clustered network to express preplay for multiple distinct future experiences be estimated in relation to current network activity, as in Dragoi and Tonegawa, PNAS 2013?

      We agree this is an interesting opportunity to compare the results of our model to what has been previously found experimentally and will test this if time permits.

      Reviewer # 2

      Weaknesses:

      My main critiques of the paper relate to the form of the input to the network.

      First, because the input is the same across trials (i.e. all traversals are the same duration/velocity), there is no ability to distinguish a representation of space from a representation of time elapsed since the beginning of the trial. The authors should test what happens e.g. with traversals in which the animal travels at different speeds, and in which the animal's speed is not constant across the entire track, and then confirm that the resulting tuning curves are a better representation of position or duration.

      We agree that this is an important question, and we plan to run further simulations where we test the effects of varying the simulated speed. We will present results in the resubmission.

      Second, it's unclear how much the results depend on the choice of a one-dimensional environment with ramping input. While this is an elegant idealization that allows the authors to explore the representation and replay properties of their model, it is a strong and highly non-physiological constraint. The authors should verify that their results do not depend on this idealization. Specifically, I would suggest the authors also test the spatial coding properties of their network in 2-dimensional environments, and with different kinds of input that have a range of degrees of spatial tuning and physiological plausibility. A method for systematically producing input with varying degrees of spatial tuning in both 1D and 2D environments has been previously used in (Fang et al 2023, eLife, see Figures 4 and 5), which could be readily adapted for the current study; and behaviorally plausible trajectories in 2D can be produced using the RatInABox package (George et al 2022, bioRxiv), which can also generate e.g. grid cell-like activity that could be used as physiologically plausible input to the network.

      We agree that testing the robustness of our results to different models of feedforward input is important and we plan to do this in our revised manuscript for the linear track and W-track.

      Testing the model in a 2D environment is an interesting future direction, but we see it as outside the scope of the current work. To our knowledge there are no experimental findings of preplay in 2D environments, but this presents an interesting opportunity for future modeling studies.

      Finally, I was left wondering how the cells' spatial tuning relates to their cluster membership, and how the capacity of the network (number of different environments/locations that can be represented) relates to the number of clusters. It seems that if clusters of cells tend to code for nearby locations in the environment (as predicted by the results of Figure 5), then the number of encodable locations would be limited (by the number of clusters). Further, there should be a strong tendency for cells in the same cluster to encode overlapping locations in different environments, which is not seen in experimental data.

      Thank you for making this important point and giving us the opportunity to clarify. We do find that subsets of cells with identical cluster membership have correlated place fields, but as we show in Figure 7b the network place map as a whole shows low remapping correlations across environments, which is consistent with experimental data (Hampson RE et al, Hippocampus 6:281, 1996; Pavlides C, et al, Neurobiol Learn Mem 161:122, 2019). Our model includes a relatively small number of cells and clusters compared to CA3, and with a more realistic number of clusters, the level of correlation across network place maps should reduce even further in our model network. The reason for a low level of correlation is because cluster membership is combinatorial, whereby cells that share membership in one cluster can also belong to separate/distinct other clusters, rendering their activity less correlated than might be anticipated. In our revised manuscript we will address this point more carefully and cite the relevant experimental support.

      Reviewer # 3

      Weaknesses:

      To generate place cell-like activity during a simulated traversal of a linear environment, the authors drive the network with a combination of linearly increasing/decreasing synaptic inputs, mimicking border cell-like inputs. These inputs presumably stem from the entorhinal cortex (though this is not discussed). The authors do not explore how the model would behave when these inputs are replaced by or combined with grid cell inputs which would be more physiologically realistic.

      We chose the linearly varying spatial inputs as the minimal model of providing spatial input to the network so that we could focus on the dynamics of the recurrent connections. We agree our results will be strengthened by testing alternative types of border-like input so will present such additional results in our revised version. However, given that a sub-goal of our model was to show that place fields could arise in locations at which no neurons receive a peak in external input, whereas combining input from multiple grid cells produces peaked place-field like input, adding grid cell input (and the many other types of potential hippocampal input) is beyond the scope of the paper.

      Even though the authors claim that no spatially-tuned information is needed for the model to generate place cells, there is a small location-cue bias added to the cells, depending on the cluster(s) they belong to. Even though this input is relatively weak, it could potentially be driving the sequential activation of clusters and therefore the preplays and place cells. In that case, the claim for non-spatially tuned inputs seems weak. This detail is hidden in the Methods section and not discussed further. How does the model behave without this added bias input?

      First, we apologize for a lack of clarity if we have caused confusion about the type of inputs (linear and cluster-dependent as we had attempted to portray prominently in Figure 1, where it is described in the caption, l. 156-157, and Results, l. 189-190 & l. 497-499, as well as in the Methods, l. 671-683) and if we implied an absence of spatially-tuned information in the network. In the revision we will clarify that for reliable place fields to appear, the network must receive spatial information and that one point of our paper is that the information need not arrive as peaks of external input already resembling place cells or grid cells. We chose linearly ramping boundary inputs as the minimally place-field like stimulus (that still contains spatial information) but in our revision we will include alternatives. We should note that during sleep, when “preplay” occurs, there is no such spatial bias (which is why preplay can equally correlate with place field sequences in any context). In the revision, we will update Figure 1 to show more clearly the cluster-dependent linearly ramping input received by some specific cells with both similar and different place fields.

      Unlike excitation, inhibition is modeled in a very uniform way (uniform connection probability with all E cells, no I-I connections, no border-cell inputs). This goes against a long literature on the precise coordination of multiple inhibitory subnetworks, with different interneuron subtypes playing different roles (e.g. output-suppressing perisomatic inhibition vs input-gating dendritic inhibition). Even though no model is meant to capture every detail of a real neuronal circuit, expanding on the role of inhibition in this clustered architecture would greatly strengthen this work.

      This is an interesting future direction, but we see it as outside the scope of our current work. While inhibitory microcircuits are certainly important physiologically, we focus here on a minimal model that produces the desired place cell activity and preplay, as measured in excitatory cells.

      For the modeling insights to be physiologically plausible, it is important to show that CA3 connectivity (which the model mimics) shares the proposed small-world architecture. The authors discuss the existence of this architecture in various brain regions but not in CA3, which is traditionally thought of and modeled as a random or fully connected recurrent excitatory network. A thorough discussion of CA3 connectivity would strengthen this work.

      We agree this is an important point that is missing, and we will revise the text to specifically address CA3 connectivity (Guzman et al., Science 353 (6304), 1117-1123 2016) and the small-world structure therein due to the presence of “assemblies”.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      1. Point-by-point description of the revisions

      This section is mandatory. *Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. *

      Reviewer #1 (Evidence, reproducibility, and clarity (Required)): This is an interesting manuscript from the Jagannathan laboratory, which addresses the interaction proteome of two satellite DNA-binding proteins, D1 and Prod. To prevent a bias by different antibody affinities they use GFP-fusion proteins of D1 and Prod as baits and purified them using anti GFP nanobodies. They performed the purifications in three different tissues: embryo, ovary and GSC enriched testes. Across all experiments, they identified 500 proteins with surprisingly little overlap among tissues and between the two different baits. Based on the observed interaction of prod and D1 with members of the canonical piRNA pathway the authors hypothesized that both proteins might influence the expression of transposable elements. However, neither by specific RNAi alleles or mutants that lead to a down regulation of D1 and Prod in the gonadal soma nor in the germline did they find an effect on the repression of transposable elements. They also did not detect an effect of a removal of piRNA pathway proteins on satellite DNA clustering, which is mediated by Prod and D1. However, they do observe a mis-localisation of the piRNA biogenesis complex to an expanded satellite DNA in absence of D1, which presumably is the cause of a mis-regulation of transposable elements in the F2 generation.This is an interesting finding linking satellite DNA and transposable element regulation in the germline. However, I find the title profoundly misleading as the link between satellite DNA organization and transgenerational transposon repression in Drosophila has not been identified by multi-tissue proteomics but by a finding of the Brennecke lab that the piRNA biogenesis complex has a tendency to localise to satellite DNA when the localisation to the piRNA locus is impaired. Nevertheless, the investigation of the D1 and Prod interactome is interesting and might reveal insights into the pathways that drive the formation of centromeres in a tissue specific manner.

      We thank the reviewer for the overall positive comments on our manuscript. As noted above, we have performed a substantial number of revision experiments and improved our text. We believe that our revised manuscript demonstrates a clear link between our proteomics data and the transposon repression. We would like to make three main points,

      1. Our proteomics data identified that D1 and Prod co-purified transposon repression proteins in embryos. To test the functional significance of this association, we have used Drosophila genetics to generate flies lacking embryonic D1. In adult ovaries from these flies, we observe a striking elevation in transposon expression and Chk2-dependent gonadal atrophy. Along with the results from the control genotypes (F1 D1 mutant, F2 D1 het), our data clearly indicate that embryogenesis (and potentially early larval development) are a period when D1 establishes heritable TE silencing that can persist throughout development.
      2. Based on the newly acquired RNA-seq and small RNA seq data, we have edited our title to more accurately reflect our data. Specifically, we have substituted the word 'transgenerational' with 'heritable', meaning that the presence of D1 during early development alone is sufficient to heritably repress TEs at later stages of development.
      3. In addition, our RNA seq and small RNA seq experiments demonstrate that D1 makes a negligible contribution to piRNA biogenesis and TE repression in adults, despite the significant mislocalization of the RDC complex. In this regard, our results are substantially different from the recent Kipferl study from the Brennecke lab (Baumgartner et al. 2022).

        Major comments Unfortunately, the proteomic data sets are not very convincing. Not even the corresponding baits are detected in all assays. I wonder whether the extraction procedure really allows the authors to analyze all functionally relevant interactions of Prod and D1. It would be good to see a western blot or an MS analysis of the soluble nuclear extract they use for purification compared to the insoluble chromatin. It may well be that a large portion of Prod or D1 is lost in this early step. I also find the description of the proteomic results hard to follow as the authors mostly list which proteins the find as interactors of Prod and D1 without stating in which tissue or with what bait they could purify them (i.e. p7: Importantly, our hits included known chromocenter-associated or pericentromeric heterochromatin-associated proteins, such as Su(var)3-9[52], ADD1[23,24], HP5[23,24],mei-S332[53], Mes-4[23], Hmr[24,39,54], Lhr[24,39], and members of the chromosome passenger complex, such as borr and Incenp[55]). It would be interesting to at least discuss tissue specific interactions.

      Out of six total AP-MS experiments in this manuscript (D1 x 3, Prod x 2 and Piwi), we observe a strong enrichment of the bait in 5/6 attempts (log2FC between 4-12). In the initial submission, the lack of a third high-quality biological replicate for the D1 testis sample meant that only the p-value (0.07) was not meeting the cutoff. To address this, we have repeated this experiment with an additional biological replicate (Fig. S2A), and our data now clearly show that D1 is significantly enriched in the testis sample.

      As suggested by the reviewer, we have also assessed our lysis conditions (450mM NaCl and benzonase) and the solubilization of our baits post-lysis. In Fig. S1D, we have blotted equivalent fractions of the soluble supernatant and insoluble pellet from GFP-Piwi embryos and show that both GFP-Piwi and D1 are largely solubilized following lysis. In Fig. S1E, we also show that our IP protocol works efficiently.

      GFP-Prod pulldown in embryos is the only instance in which we do not detect the bait by mass spectrometry. Here, one reason could be relatively low expression of GFP-Prod in comparison to GFP-D1 (Fig. S1E), which may lead to technical difficulties in detecting peptides corresponding to Prod. However, we note that Prod IP co-purified proteins such as Bocks that were previously suggested as Prod interactors from high-throughput studies (Giot et al. 2003; Guruharsha et al. 2011). In addition, Prod IP from embryos also co-purified proteins known to associate with chromocenters such as Hcs and Saf-B. Finally, the concordance between D1 and Prod co-purified proteins from embryo lysates (Fig. 2A, C) suggest that the Prod IP from embryos is of reasonable quality.

      We also acknowledge the reviewer's comment that the description of the proteomic data was hard to follow. Therefore, we have revised our text to clearly indicate which bait pulled down which protein in which tissue (lines 148-156). We have also highlighted and discussed bait-specific and tissue-specific interactions in the text (lines 162-173).

      Minor comments The authors may also want to provide a bit more information on the quantitation of the proteomic data such as how many values were derive from the match-between runs function and how many were imputed as this can severely distort the quantification.

      Figure 1: Distribution of data after imputation in embryo (left), ovary (middle) and testis (right) datasets. Imputation is performed with random sampling from the 1% least intense values generated by a normal distribution.

      To ensure the robustness of our data analysis, we considered only those proteins that were consistently identified in all replicates for at least one bait (GFP-D1, GFP-Prod or NLS-GFP). This approach resulted in a relative low number of missing values. However, it is also important to bear in mind that in an AP-MS experiment, the number of missing values is higher, as interactors are not identified in the control pulldown. Importantly, the imputation of missing values during the data analysis did not alter the normal distribution of the dataset (Fig. 1, this document). Detailed information regarding the imputed values is also provided (Table 1, this document). The coding script used for the data analysis is available in the PRIDE submission of the dataset (Table 2, this document). This information has been added to our methods section under data availability.

      Table 1: ____Number of match-between-runs and imputations for embryo, ovary and testis datasets

      Dataset

      #match-between-runs

      %match-between-runs

      %imputation

      Embryo

      5541/27543

      20.11%

      8.36%

      Ovary

      1936/9530

      20.30%

      8.18%

      Testis

      1748/7168

      24.39%

      3.12%

      Table 2: ____Access to the PRIDE submission of the datasets

      Name

      ID PRIDE

      UN reviewer

      PW reviewer

      IP-MS of D1 from Testis tissue

      PXD044026

      reviewer_pxd044026@ebi.ac.uk

      ydswDQVW

      IP-MS of Piwi from Embryo tissue

      PXD043237

      reviewer_pxd043237@ebi.ac.uk

      TMCoDsdx

      IP-MS of Prod and D1 proteins from Ovary tissue

      PXD043236

      reviewer_pxd043236@ebi.ac.uk

      VOHqPmaS

      IP-MS of Prod and D1 proteins from Embryo tissue

      PXD043234

      reviewer_pxd043234@ebi.ac.uk

      L77VXdvA

      **Referee Cross-Commenting** I agree with the two other reviewers that the connection between the interactome and the transgenerational phenotype is unclear. This is also what I meant i my comment that the title is somewhat misleading. A systematic analysis of the D1 and Prod knock down effect on mRNAs and small Rnas would indeed be helpful to better understand the interesting effect.

      As suggested by the reviewer, we have performed RNA seq and small RNA seq in control and D1 mutant ovaries (Fig. 4) to fully understand the contribution of D1 in piRNA biogenesis and TE repression. Briefly, the mislocalization of RDC complex in D1 mutant ovaries does not significantly affect TE-mapping piRNA biogenesis (Fig. 4C, E). In addition, loss of D1 does not substantially alter TE expression in the ovaries (Fig. 4B) or alter the expression of genes involved in TE repression (Fig. 4F). Along with the results presented in Fig. 5 and Fig. 6, our data clearly indicate that embryogenesis (and potentially early larval development) is a critical period during which D1 makes an important contribution to TE repression.

      Reviewer #1 (Significance (Required)): Nevertheless, the investigation of the D1 and Prod interactome is interesting and might reveal insights into the pathways that drive the formation of centromeres in a tissue specific manner. It may be mostly interesting for the Drosophila community but could also be exiting for a broader audience interested in the connection of heterochromatin and its indirect effect on the regulation of transposable elements.

      We thank the reviewer again for the helpful and constructive comments, which have enabled us to significantly improve our study. We are excited by the results from our study, which illuminate unappreciated aspects of transcriptional silencing in constitutive heterochromatin.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)): Chavan et al. set out to enrich our compendium of pericentric heterochromatin-associated proteins - and to learn some new biology along the way. An ambitious AP-Mass baited with two DNA satellite-binding proteins (D1 and Prod), conducted across embryos, ovaries, and testes, yielded hundreds of candidate proteins putatively engaged at chromocenters. These proteins are enriched for a restricted number of biological pathways, including DNA repair and transposon regulation. To investigate the latter in greater depth, the authors examine D1 and prod mutants for transposon activity changes using reporter constructs for multiple elements. These reporter constructs revealed no transposon activation in the adult ovary, where many proteins identified in the mass spec experiments restrict transposons. However, the authors suggest that the D1 mutant ovaries do show disrupted localization of a key member of a transposon restriction pathway (Cuff), and infer that this mislocalization triggers a striking, transposon derepression phenotype in the F2 ovaries.

      The dataset produced by the AP-Mass Spec offers chromosome biologists an unprecedented resource. The PCH is long-ignored chromosomal region that has historically received minimal attention; consequently, the pathways that regulate heterochromatin are understudied. Moreover, attempting to connect genome organization to transposon regulation is a new and fascinating area. I can easily envision this manuscript triggering a flurry of discovery; however, there is quite a lot of work to do before the data can fully support the claims.

      We appreciate the reviewer taking the time to provide thoughtful comments and constructive suggestions to improve the manuscript. We believe that we have addressed all the comments raised to the significant advantage of our paper.

      Major comments 1. The introduction requires quite a radical restructure to better highlight the A) importance of the work and B) limit information whose relevance is not clear early in the manuscript. A. Delineating who makes up heterochromatin is a long-standing problem in chromosome biology. This paper has huge potential to contribute to this field; however, it is not the first. Others are working on this problem in other systems, for example PMID:29272703. Moreover, we have some understanding of the distinct pathways that may impact heterochromatin in different tissues (e.g., piRNA biology in ovaries vs the soma). Also, the mutant phenotypes of prod and D1 are different. Fleshing out these three distinct points could help the reader understand what we know and what we don't know about heterochromatin composition and its special biology. Understanding where we are as a field will offer clear predictions about who the interactors might be that we expect to find. For example, given the dramatically different D1 and prod mutant phenotypes (and allele swap phenotypes), how might the interactors with these proteins differ? What do we know about heterochromatin formation differences in different tissues? And how might these differences impact heterochromatin composition?

      The reviewer brings up a fair point and we have significantly reworked our introduction. We share the reviewer's opinion that improved knowledge of the constitutive heterochromatin proteome will reveal novel biology.

      1. The attempt to offer background on the piRNA pathway and hybrid dysgenesis in the Introduction does not work. As a naïve reader, it was not clear why I was reading about these pathways - it is only explicable once the reader gets to the final third of the Results. Moreover, the reader will not retain this information until the TE results are presented many pages later. I strongly urge the authors to shunt the two TE restriction paragraphs to later in the manuscript. They are currently a major impediment to understanding the power of the experiment - which is to identify new proteins, pathways, and ultimately, biology that are currently obscure because we have so little handle on who makes up heterochromatin.

      We agree with this suggestion. We have introduced the piRNA pathway in the results section (lines 205 - 216), right before this information is needed. We've also removed the details on hybrid dysgenesis, since our new data argues against a maternal effect from the D1 mutant.

      The implications of the failure to rescue female fertility by the tagged versions of both D1 and Prod are not discussed. Consequently, the reader is left uneasy about how to interpret the data.

      We understand this point raised by the reviewer. However, in our proteomics experiments, we have used GFP-D1 and GFP-Prod ovaries from ~1 day old females (line 579, methods). These ovaries are morphologically similar to the wild type (Fig. S1C) and their early germ cell development appears to be intact. Moreover, chromocenter formation in female GSCs is comparable to the wildtype (Fig. 1C-D). These data, along with the rescue of the viability of the Prod mutant (Fig. S1A-B), suggest that the presence of a GFP tag is not compromising D1 or Prod function in the early stages of germline development and is consistent with published and unpublished data from our lab. In addition, D1 and Prod from ovaries co-purify proteins such as Rfc38 (D1), Smn (D1), CG15107 (Prod), which have been identified in previous high-throughput screens (Guruharsha et al. 2011; Tang et al. 2023). For these reasons, we believe that GFP-D1 and GFP-Prod ovaries are a good starting point for the AP-MS experiment. We speculate that the failure to completely rescue female fertility may be due to improper expression levels of GFP-D1 or GFP-Prod flies at later stages of oogenesis, which are not present in ovaries from newly eclosed females and therefore unlikely to affect our proteomic data.

      1. How were the significance cut-offs determined? Is the p-value reported the adjusted p-value? As a non-expert in AP-MS, I was surprised to find that the p-value, at least according to the Methods, was not adjusted based on the number of tests. This is particularly relevant given the large/unwieldy(?) number of proteins that were identified as signficant in this study. Moreover, the D1 hit in Piwi pull down is actually not significant according to their criteria of p We used a standard cutoff of log2FC>1 and p2FC and low p-value) since these are more likely to be bona fide interactors. Third, we have used string-DB for functional analyses where we can place our hits in existing protein-protein interaction networks. Using this approach, we detect that Prod (but not D1) pulls down multiple members of the RPA complex in the embryo (RPA2 and RpA-70, Fig. S2B) while embryonic D1 (but not Prod) pulls down multiple components of the origin recognition complex (Orc1, lat, Orc5, Orc6, Fig. S2C) and the condensin I complex (Cap-G, Cap-D2, barr, Fig. S2D). Altogether, these filtering strategies allow us to eliminate as many false positives as possible while making sure to minimize the loss of true hits through multiple testing correction.

      How do we know if the lack of overlap across tissues is indeed germline- or soma-specialization rather than noise?

      To address this part of the comment, we have amended our text (lines 162-173) as follows,

      'We also observed a substantial overlap between D1- and Prod-associated proteins (yellow points in Fig. 2A, B, Table S1-3), with 61 hits pulled down by both baits (blue arrowheads, Fig. 2C) in embryos and ovaries. This observation is consistent with the fact that both D1 and Prod occupy sub-domains within the larger constitutive heterochromatin domain in nuclei. Surprisingly, only 12 proteins were co-purified by the same bait (D1 or Prod) across different tissues (magenta arrowheads, Fig. 2C, Table S1-3). In addition, only a few proteins such as an uncharacterized DnaJ-like chaperone, CG5504, were associated with both D1 and Prod in embryos and ovaries (Fig. 2D). One interpretation of these results is that the protein composition of chromocenters may be tailored to cell- and tissue-specific functions in Drosophila. However, we also note that the large number of unidentified peptides in AP-MS experiments means that more targeted experiments are required to validate whether certain proteins are indeed tissue-specific interactors of D1 and Prod.'

      To make this inference, conducting some validation would be required. More generally, I was surprised to see no single interactor validated by reciprocal IP-Westerns to validate the Mass-Spec results, though I am admittedly only adjacent to this technique. Note that colocalization, to my mind, does not validate the AP-MS data - in fact, we would a priori predict that piRNA pathway members would co-localize with PCH given the enrichment of piRNA clusters there.

      Here, we would point out that we have conducted multiple validation experiments with a specific focus on the functional significance of the associations between D1/Prod and TE repression proteins in embryos. While the reviewer suggests that piRNA pathway proteins may be expected to enrich at the pericentromeric heterochromatin, this is not always the case. For example, Piwi and Mael are present across the nucleus (pulled down by D1/Prod in embryos) while Cutoff, which is present adjacent to chromocenters in nurse cells, was not observed to interact with either D1 or Prod in the ovary samples.

      Therefore, for Piwi, we performed a reciprocal AP-MS experiment in embryos due to the higher sensitivity of this method (GFP-Piwi AP-MS, Fig. 3B). Excitingly, this experiment revealed that four largely uncharacterized proteins (CG14715, CG10208, Ugt35D1 and Fit) were highly enriched in the D1, Prod and Piwi pulldowns and these proteins will be an interesting cohort for future studies on transposon repression in Drosophila (Fig. 3C).

      Furthermore, we believe that determining the localization of proteins co-purified by D1/Prod is an important and orthogonal validation approach. For Sov, which is implicated in piRNA-dependent heterochromatin formation, we observe foci that are in close proximity to D1- and Prod-containing chromocenters (Fig. 3A).

      As for suggestion to validate by IP-WBs, we would point out that chromocenters exhibit properties associated with phase separated biomolecular condensates. Based on the literature, these condensates tend to associate with other proteins/condensates through low affinity or transient interactions that are rarely preserved in IP-WBs, even if there are strong functional relationships. One example is the association between D1 and Prod, which do not pull each other down in an IP-WB (Jagannathan et al. 2019), even though D1 and Prod foci dynamically associate in the nucleus and mutually regulate each other's ability to cluster satellite DNA repeats (Jagannathan et al. 2019). Similarly, IP-WB using GFP-Piwi embryos did not reveal an interaction with D1 (Fig. S4B). However, our extensive functional validations (Figures 4-6) have revealed a critical role for D1 in heritable TE repression.

      The AlphaFold2 data are very interesting but seem to lack of negative control. Is it possible to incorporate a dataset of proteins that are not predicted to interact physically to elevate the impact of the ones that you have focused on? Moreover, the structural modeling might suggest a competitive interaction between D1 and piRNAs for Piwi. Is this true? And even if not, how does the structural model contribute to your understanding for how D1 engages with the piRNA pathway? The Cuff mislocalization?

      In the revised manuscript, we have generated more structural models using AlphaFold Multimer (AFM) for proteins (log2FC>2, p0.5 and ipTM>0.8), now elaborated in lines 175-177. Despite the extensive disorder in D1 and Prod, we identified 22 proteins, including Piwi, that yield interfaces with ipTM scores >0.5 (Table S4, Table S8). These hits are promising candidates to further understand D1 and Prod function in the future.

      For the predicted model between Prod/D1 and Piwi (Fig. S4A), one conclusion could indeed be competition between D1/Prod and piRNAs for Piwi. Another possibility is that a transient interaction mediated by disordered regions on D1/Prod could recruit Piwi to satellite DNA-embedded TE loci in the pericentromeric heterochromatin. These types of interactions may be especially important in the early embryonic cycles, where repressive histone modifications such as H3K9me2/3 must be deposited at the correct loci for the first time. We suggest that mutating the disordered regions on D1 and Prod to potentially abrogate the interaction with Piwi will be important for future studies.

      The absence of a TE signal in D1 and Prod mutant ovaries would be much more compelling if investigated more agnostically. The observation that not all TE reporter constructs show a striking signal in the F2 embryos makes me wonder if Burdock and gypsy are not regulated by these two proteins but possibly other TEs are. Alternatively, small RNA-seq would more directly address the question of whether D1 and Prod regulate TEs through the piRNA pathway.

      We completely agree with this comment from the reviewer. We have performed RNA seq on D1 heterozygous (control) and D1 mutant ovaries in a chk26006 background. Since Chk2 arrests germ cell development in response to TE de-repression and DNA damage(Ghabrial and Schüpbach 1999; Moon et al. 2018), we reasoned that the chk2 mutant background would prevent developmental arrest of potential TE-expressing germ cells and we observed that both genotypes exhibited similar gonad morphology (Fig. 4A). From our analysis, we do not observe a significant effect on TE expression in the absence of D1, except for the LTR retrotransposon tirant (Fig. 4B). We also do not observe differential expression of TE repression genes (Fig. 4F).

      We have complemented our RNA seq experiment with small RNA profiling from D1 heterozygous (control) and D1 mutant ovaries. Here, overall piRNA production and antisense piRNAs mapping to TEs were largely unperturbed (Fig. 4C-E).

      Overall, our data is consistent with the TE reporter data (Fig. S7) and suggests that zygotic depletion of D1 does not have a prominent role in TE repression. However, we have uncovered that the presence of D1 during embryogenesis is critical for TE repression in adult gonads (Fig. 6, Fig. S9).

      I had trouble understanding the significance of the Cuff mis-localization when D1 is depleted. Given Cuff's role in the piRNA pathway and close association with chromatin, what would the null hypothesis be for Cuff localization when a chromocenter is disrupted? What is the null expectation of % Cuff at chromocenter given that the chromocenter itself expands massively in size (Figure 4D). The relationship between these two factors seems rather indirect and indeed, the absence of Cuff in the AP would suggest this. The biggest surprise is the absence of TE phenotype in the ovary, given the Cuff mutant phenotype - but we can't rule out given the absence of a genome-wide analysis. I think that these data leave the reader unconvinced that the F2 phenotype is causally linked to Cuff mislocalization.

      We apologize that this data was not more clearly represented. In a wild-type context, Cuff is distributed in a punctate manner across the nurse cell nuclei, with the puncta likely representing piRNA clusters (Fig. 5A-B). We find that a small fraction of Cuff (~5%) is present adjacent to the nurse cell chromocenter (inset, Fig. 5A and Fig. 5D). In the absence of D1, the nurse cell chromocenters increase ~3-4 fold in size. However, the null expectation is still that ~5% of total Cuff would be adjacent to the chromocenter, since the piRNA clusters are not expected to expand in size. In reality, we observe ~27% of total Cuff is mislocalized to chromocenters. Our data indicate that the satellite DNA repeats at the larger chromocenters must be more accessible to Cuff in the D1 mutant nurse cells. This observation is corroborated by the significant increase in piRNAs corresponding to the 1.688 satellite DNA repeat family (Fig. 4E).

      The lack of TE expression in the F1 D1 mutant was indeed surprising based on the Cuff mislocalization but as the reviewers pointed out, we only analyzed two TE reporter constructs in the initial submission. In the revised manuscript, we have performed both RNA seq and small RNA seq. Surprisingly, our data reveal that the Cuff mislocalization does not significantly affect piRNA biogenesis (Fig. 4C, D) and piRNAs mapping to TEs. As a result, both TE repression (Fig. 4B) and fertility (Fig. 6D) are largely preserved in the absence of D1 in adult ovaries.

      Finally, we thank the reviewer for their excellent suggestion to incorporate the F2 D1 heterozygote (Fig. S9) in our analysis! This important control has revealed that the maternal effect of the D1 mutant is negligible for gonad development and fertility (Fig. 6B-D). Rather, our data clearly emphasize embryogenesis (or early larval development) as a key period during which D1 can promote heritable TE repression. Essentially, D1 is not required during adulthood for TE repression if it was present in the early stages of development.

      Apologies if I missed this, but Figure 5 shows the F2 D1 mutant ovaries only. Did you look at the TM6 ovaries as well? These ovaries should lack the maternally provisioned D1 (assuming that females are on the right side) but have the zygotic transcription.

      As mentioned above, this was a great suggestion and we've now characterized this important control in the context of germline development and fertility, to the significant advantage of our paper.

      Minor comments 9. Add line numbers for ease of reference

      We apologize for this. Line numbers have been added in the full revision.

      1. The function of satellite DNA itself is still quite controversial - I would recommend being a bit more careful here - the authors could refer instead to genomic regions enriched for satellite DNA are linked to xyz function (see Abstract line 2 and 7, for example.)

      The abstract has been rewritten and does not directly implicate satellite DNA in a specific cellular function. However, we have taken the reviewer's suggestion in the introduction (line 57-58).

      "Genetic conflicts" in the introduction needs more explanation.

      This sentence has been removed from the introduction in the revised manuscript.

      "In contrast" is not quite the right word. Maybe "However" instead (1st line second paragraph of Intro)

      Done. Line 57 of the revised manuscript.

      Results: what is the motivation for using GSC-enriched testis?

      We use GSC-enriched testes for practical reasons. First, they contain a relatively uniform population of mitotically dividing germ cells unlike regular testes which contain a multitude of post-mitotic germ cells such as spermatocytes, spermatids and sperm. Second, GSC-enriched testes are much larger than normal testes and reduced the number of dissections that were needed for each replicate.

      1. Clarify sentence about the 500 proteins in the Results section - it's not clear from context that this is the union of all experiments.

      Done. Lines 145-149 in the revised manuscript.

      The data reported are not the first to suggest that satellite DNA may have special DNA repair requirements. e.g., PMID: 25340780

      We apologize if we gave the impression that we were making a novel claim. Specialized DNA repair requirements at repetitive sequences have indeed been previously hypothesized(Charlesworth et al. 1994) and studied and we have altered the text to better reflect this (lines 193-195). We have cited the study suggested by the reviewer as well as studies from the Chiolo(Chiolo et al. 2011; Ryu et al. 2015; Caridi et al. 2018) and Soutoglou(Mitrentsi et al. 2022) labs, which have also addressed this fascinating question.

      Page 10: indicate-> indicates.

      Done.

      1. Page 14: revise for clarity: "investigate a context whether these interactions could not take place"

      We've incorporated this suggestion in the revised text (lines 383-386).

      1. Might be important to highlight the 500 interactions are both direct and indirect. "Interacting proteins" alone suggests direct interactions only.

      Done. Lines 145-149.

      The effect of the aub mutant on chromocenter foci did not seem modest to me - however, the bar graphs obscure the raw data - consider plotting all the data not just the mean and error?

      Done. This data is now represented by a box-and-whisker plot (Fig. S5), which shows the distribution of the data.

      Reviewer #2 (Significance (Required)):

      The dataset produced by the AP-Mass Spec offers chromosome biologists an unprecedented resource. The PCH is long-ignored chromosomal region that has historically received minimal attention; consequently, the pathways that regulate heterochromatin are understudied. Moreover, attempting to connect genome organization to transposon regulation is a new and fascinating area. I can easily envision this manuscript triggering a flurry of discovery; however, there is quite a lot of work to do before the data can fully support the claims.

      This manuscript represents a significant contribution to the field of chromosome biology.

      We thank the reviewer for the positive evaluation of our manuscript, and we really appreciate the great suggestion for the F2 D1 heterozygote control! Overall, we believe that our manuscript is substantially improved with the addition of RNA seq, small RNA seq and important genetic controls. Moreover, we are excited by the potential of our study to open new doors in the study of pericentromeric heterochromatin.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)): In the manuscript entitled "Multi-tissue proteomics identifies a link between satellite DNA organization and transgenerational transposon repression", the authors used two satellite DNA-binding proteins, D1 and Prod, as baits to identify chromocenter-associated proteins through quantitative mass spectrometry. The proteomic analysis identified ~500 proteins across embryos, ovaries, and testes, including several piRNA pathways proteins. Depletion of D1 or Prod did not directly contribute to transposon repression in ovary. However, in the absence of maternal and zygotic D1, there was a dramatic increase of agametic ovaries and transgenerational transposon de-repression. Although the study provides a wealth of proteomic data, it lacks mechanistic insights into how satellite DNA organization influence the interactions with other proteins and their functional consequences.

      We thank the reviewer for highlighting that this study will be a valuable resource for future studies on the composition and function of pericentromeric heterochromatin. Based on the reviewer's request for more mechanistic knowledge into how satellite DNA organization affects transposon repression, we have performed RNA seq and small RNA seq, added important genetic controls and significantly improved our text. In the revised manuscript, our data clearly demonstrate that embryogenesis (and potentially early larval development) is a critical time period when D1 contributes to heritable TE repression. Flies lacking D1 during embryogenesis exhibit TE expression in germ cells as adults, which is associated with Chk2-dependent gonadal atrophy. We are particularly excited by these data since TE loci are embedded in the satellite DNA-rich pericentromeric heterochromatin and our study demonstrates an important role for a satellite DNA-binding protein in TE repression.

      Major____ comments 1. While the identification of numerous interactions is significant, it would be helpful to acknowledge that lots of these proteins were known to bind DNA or heterochromatin regions. To strengthen the study, I recommend conducting functional validation of the identified interactions, in addition to the predictions made by Alphfold 2.

      We are happy to take this comment on board. In fact, we believe that the large number of DNA-binding and heterochromatin-associated proteins identified in this study are a sign of quality for the proteomic datasets. In the revised manuscript, we have highlighted known heterochromatin proteins co-purified by D1/Prod in lines 148-151 as well as proteins previously suggested to interact with D1/Prod from high-throughput studies in lines 153-156 (Guruharsha et al. 2011; Tang et al. 2023). In this study, we have focused on the previously unknown associations between D1/Prod and TE repression proteins and functionally validated these interactions as presented in Figures 3-6.

      The observation of transgenerational de-repression is intriguing. However, to better support this finding, it would be better to provide a mechanistic explanation based on the data presented.

      We appreciate this comment from the reviewer, which is similar to major comment #6 raised by reviewer #2. To generate mechanistic insight into the underlying cause of gonad atrophy in the F2 D1 mutant, we have performed RNA seq, small RNA seq and analyzed germline development and fertility in the F2 D1 heterozygous control (Fig. S9).

      For the RNA seq, we used D1 heterozygous (control) and D1 mutant ovaries in a chk26006 background. Since Chk2 arrests germ cell development in response to TE de-repression and DNA damage(Ghabrial and Schüpbach 1999; Moon et al. 2018), we reasoned that the chk2 mutant background would prevent developmental arrest of potential TE-expressing germ cells and we observed that both genotypes exhibited similar gonad morphology (Fig. 4A). From our analysis, we do not observe a significant effect on TE expression in the absence of D1, except for the LTR retrotransposon tirant (Fig. 4B). We also do not observe differential expression of TE repression genes (Fig. 4F).

      We have complemented our RNA seq experiment with small RNA profiling from D1 heterozygous (control) and D1 mutant ovaries. Here, overall piRNA production and antisense piRNAs mapping to TEs were largely unperturbed (Fig. 4C-E). Together, these data are consistent with the TE reporter data (Fig. S7) and suggests that zygotic depletion of D1 does not have a prominent role in TE repression.

      However, we have uncovered that the presence of D1 during embryogenesis is critical for TE repression in adult gonads (Fig. 6, Fig. S9). Essentially, either only maternal deposited D1 (F1 D1 mutant) or only zygotically expressed D1 (F2 D1 het) were sufficient to ensure TE repression and fertility. In contrast, a lack of D1 during embryogenesis (F2 D1 mutant) led to elevated TE expression and Chk2-mediated gonadal atrophy.

      Our results also explain why previous studies have not implicated either D1 or Prod in TE repression, since D1 likely persists during embryogenesis when using depletion approaches such as RNAi-mediated knockdown or F1 generation mutants.

      Minor____ comments 3. Given the maternal effect of the D1 mutant, in Figure 4, I suggest analyzing not only nurse cells but also oocytes to gain a more comprehensive understanding.

      We agree with the reviewer that this experiment can be informative. In the F2 D1 mutant ovaries, germ cell development does not proceed to a stage where oocytes are specified, thus limiting microscopy-based approaches. Nevertheless, we have gauged oocyte quality in all the genotypes using a fertility assay (Fig. 6D) since even ovaries that have a wild-type appearance can produce dysfunctional gametes. In this experiment, we observe that fertility is largely intact in the F1 D1 mutant and F2 D1 heterozygote strains, suggesting that it is the presence of D1 during embryogenesis (or early larval development) that is critical for heritable TE repression and proper oocyte development.

      The conclusion that D1 and Prod do not directly contribute to the repression of transposons needs further analysis from RNA-seq data. Alternatively, the wording could be adjusted to indicate that D1 and Prod are not required for specific transposon silencing, such as Burdock and gypsy.

      Agreed. We have performed RNA-seq in D1 heterozygous (control) and D1 mutant ovaries in a chk26006 background (Fig. 4A, B) as described above.

      As D1 mutation affects Cuff nuclear localization, it would be insightful to sequence the piRNA in the ovaries.

      Agreed. We have performed small RNA profiling from D1 heterozygous (control) and D1 mutant ovaries. Despite the significant mislocalization of the RDC complex, overall piRNA production and antisense piRNAs mapping to TEs were largely unaffected (Fig. 4C-E). However, we observed significant changes in piRNAs mapping to complex satellite DNA repeats (Fig. 4D), but these changes were not associated with a maternal effect on germline development or fertility (F2 D1 heterozygote, Fig. 6B-D).

      **Referee Cross-Commenting**

      I appreciate the valuable insights provided by the other two reviewers regarding this manuscript. I concur with their assessment that substantial improvements are needed before considering this manuscript for publication.

      1. The proteomics data has the potential to be a valuable resource for other scientific community. However, I share the concerns raised by reviewer 1 about the current quality of the data sets. Addressing this, it will augment the manuscript with additional data to show the success of the precipitation. Moreover, as reviewer 2 and I suggested, additional co-IP validations of the IP-MS results are needed to enhance the reliability.

      In the revised manuscript, we have performed multiple experiments to address the quality of the MS datasets based on comments raised by reviewer #1. They are as follows,

      Out of six total AP-MS experiments in this manuscript (D1 x 3, Prod x 2 and Piwi), we observe a strong enrichment of the bait in 5/6 attempts (log2FC between 4-12, Fig. 2A, B, Fig. S2A, Table S1-S3, Table S7). In the D1 testis sample from the initial submission, the lack of a third biological replicate meant that only the p-value (0.07) was not meeting the cutoff. To address this, we have repeated this experiment with an additional biological replicate (Fig. S2A), and our data now clearly show that D1 is also significantly enriched in the testis sample.

      As suggested by the reviewer #1, we have assessed our lysis conditions (450mM NaCl and benzonase) and the solubilization of our baits post-lysis. In Fig. S1D, we have blotted equivalent fractions of the soluble supernatant and insoluble pellet from GFP-Piwi embryos and show that both GFP-Piwi and D1 are largely solubilized following lysis. In Fig. S1E, we also show that our IP protocol works efficiently.

      The only instance in which we do not detect the bait by mass spectrometry is for GFP-Prod pulldown in embryos. Here, one reason could be relatively low expression of GFP-Prod in comparison to GFP-D1 (Fig. S1E), which may lead to technical difficulties in detecting peptides corresponding to Prod. However, we note that Prod IP from embryos co-purified proteins such as Bocks that were previously suggested as Prod interactors from high-throughput studies (Giot et al. 2003; Guruharsha et al. 2011). In addition, Prod IP from embryos also co-purified proteins known to associate with chromocenters such as Hcs(Reyes-Carmona et al. 2011) and Saf-B(Huo et al. 2020). Finally, the concordance between D1 and Prod co-purified proteins from embryo lysates (Fig. 2A, C) suggest that the Prod IP from embryos is of reasonable quality.

      As for the IP-WB validations, we would point out that chromocenters exhibit properties associated with phase separated biomolecular condensates. In our experience, these condensates tend to associate with other proteins/condensates through low affinity or transient interactions that are rarely preserved in IP-WBs, even if there are strong functional relationships. One example is the association between D1 and Prod, which do not pull each other down in an IP-WB (Jagannathan et al. 2019), even though D1 and Prod foci dynamically associate in the nucleus and mutually regulate each other's ability to cluster satellite DNA repeats (Jagannathan et al. 2019). Similarly, IP-WB using GFP-Piwi embryos did not reveal an interaction with D1 (Fig. S4B). However, our extensive functional validations (Figures 4-6) have revealed a critical role for D1 in heritable TE repression.

      I agree with reviewer 2 that the present conclusion is not appropriate regarding D1 and Prod do not contribute to transposon silencing. As reviewer 2 and I suggested, the systematical analysis by using both mRNA-seq and small RNA-seq is required to draw the conclusion.

      Agreed. We have performed RNA seq and small RNA seq as elaborated above. Our conclusions on the role of D1 in TE repression are now much stronger.

      1. The transgenerational phenotype is an intriguing aspect of the study. I agree with reviewer 2 that the inclusion of TM6 ovaries would be a nice control. Further, it is hard to connect this phenotype with the interactions identified in this manuscript. It would be appreciated if the author could provide a mechanistic explanation.

      We have significantly improved these aspects of our study in the revised manuscript. Through the analysis of germline development in the F2 D1 heterozygotes as suggested by reviewer #2, in addition to the recommended RNA seq and small RNA seq, we have now identified embryogenesis (and potentially early larval development) as a time period when D1 makes an important contribution to TE repression. Loss of D1 during embryogenesis leads to TE expression in adult germline cells, which is associated with Chk2-dependent gonadal atrophy. Taken together, we have pinpointed the specific contribution of a satellite DNA-binding protein to transposon repression.

      Reviewer #3 (Significance (Required)):

      Although this study successfully identified several interactions, the authors did not fully elucidate how these interactions contribute to the transgenerational silencing of transposons or ovary development.

      We thank the reviewer for the thoughtful comments and overall positive evaluation of our study, especially the proteomic dataset. We are confident that the revised manuscript has improved our mechanistic understanding of the contribution made by a satellite DNA-binding protein in TE repression.

      References

      Baumgartner L, Handler D, Platzer SW, Yu C, Duchek P, Brennecke J. 2022. The Drosophila ZAD zinc finger protein Kipferl guides Rhino to piRNA clusters eds. D. Bourc'his, K. Struhl, and Z. Zhang. eLife 11: e80067.

      Caridi CP, D'Agostino C, Ryu T, Zapotoczny G, Delabaere L, Li X, Khodaverdian VY, Amaral N, Lin E, Rau AR, et al. 2018. Nuclear F-actin and myosins drive relocalization of heterochromatic breaks. Nature 559: 54-60.

      Charlesworth B, Sniegowski P, Stephan W. 1994. The evolutionary dynamics of repetitive DNA in eukaryotes. Nature 371: 215-220.

      Chiolo I, Minoda A, Colmenares SU, Polyzos A, Costes SV, Karpen GH. 2011. Double-strand breaks in heterochromatin move outside of a dynamic HP1a domain to complete recombinational repair. Cell 144: 732-744.

      Ghabrial A, Schüpbach T. 1999. Activation of a meiotic checkpoint regulates translation of Gurken during Drosophila oogenesis. Nat Cell Biol 1: 354-357.

      Giot L, Bader JS, Brouwer C, Chaudhuri A, Kuang B, Li Y, Hao YL, Ooi CE, Godwin B, Vitols E, et al. 2003. A protein interaction map of Drosophila melanogaster. Science 302: 1727-1736.

      Guruharsha KG, Rual JF, Zhai B, Mintseris J, Vaidya P, Vaidya N, Beekman C, Wong C, Rhee DY, Cenaj O, et al. 2011. A protein complex network of Drosophila melanogaster. Cell 147: 690-703.

      Huo X, Ji L, Zhang Y, Lv P, Cao X, Wang Q, Yan Z, Dong S, Du D, Zhang F, et al. 2020. The Nuclear Matrix Protein SAFB Cooperates with Major Satellite RNAs to Stabilize Heterochromatin Architecture Partially through Phase Separation. Molecular Cell 77: 368-383.e7.

      Jagannathan M, Cummings R, Yamashita YM. 2019. The modular mechanism of chromocenter formation in Drosophila eds. K. VijayRaghavan and S.A. Gerbi. eLife 8: e43938.

      Mitrentsi I, Lou J, Kerjouan A, Verigos J, Reina-San-Martin B, Hinde E, Soutoglou E. 2022. Heterochromatic repeat clustering imposes a physical barrier on homologous recombination to prevent chromosomal translocations. Molecular Cell 82: 2132-2147.e6.

      Moon S, Cassani M, Lin YA, Wang L, Dou K, Zhang ZZ. 2018. A Robust Transposon-Endogenizing Response from Germline Stem Cells. Dev Cell 47: 660-671 e3.

      Pascovici D, Handler DCL, Wu JX, Haynes PA. 2016. Multiple testing corrections in quantitative proteomics: A useful but blunt tool. PROTEOMICS 16: 2448-2453.

      Reyes-Carmona S, Valadéz-Graham V, Aguilar-Fuentes J, Zurita M, León-Del-Río A. 2011. Trafficking and chromatin dynamics of holocarboxylase synthetase during development of Drosophila melanogaster. Molecular Genetics and Metabolism 103: 240-248.

      Ryu T, Spatola B, Delabaere L, Bowlin K, Hopp H, Kunitake R, Karpen GH, Chiolo I. 2015. Heterochromatic breaks move to the nuclear periphery to continue recombinational repair. Nat Cell Biol 17: 1401-1411.

      Tang H-W, Spirohn K, Hu Y, Hao T, Kovács IA, Gao Y, Binari R, Yang-Zhou D, Wan KH, Bader JS, et al. 2023. Next-generation large-scale binary protein interaction network for Drosophila melanogaster. Nat Commun 14: 2162.

    2. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      Chavan et al. set out to enrich our compendium of pericentric heterochromatin-associated proteins - and to learn some new biology along the way. An ambitious AP-Mass baited with two DNA satellite-binding proteins (D1 and Prod), conducted across embryos, ovaries, and testes, yielded hundreds of candidate proteins putatively engaged at chromocenters. These proteins are enriched for a restricted number of biological pathways, including DNA repair and transposon regulation. To investigate the latter in greater depth, the authors examine D1 and prod mutants for transposon activity changes using reporter constructs for multiple elements. These reporter constructs revealed no transposon activation in the adult ovary, where many proteins identified in the mass spec experiments restrict transposons. However, the authors suggest that the D1 mutant ovaries do show disrupted localization of a key member of a transposon restriction pathway (Cuff), and infer that this mislocalization triggers a striking, transposon derepression phenotype in the F2 ovaries.

      The dataset produced by the AP-Mass Spec offers chromosome biologists an unprecedented resource. The PCH is long-ignored chromosomal region that has historically received minimal attention; consequently, the pathways that regulate heterochromatin are understudied. Moreover, attempting to connect genome organization to transposon regulation is a new and fascinating area. I can easily envision this manuscript triggering a flurry of discovery; however, there is quite a lot of work to do before the data can fully support the claims.

      Major

      1. The introduction requires quite a radical restructure to better highlight the A) importance of the work and B) limit information whose relevance is not clear early in the manuscript. A. Delineating who makes up heterochromatin is a long-standing problem in chromosome biology. This paper has huge potential to contribute to this field; however, it is not the first. Others are working on this problem in other systems, for example PMID:29272703. Moreover, we have some understanding of the distinct pathways that may impact heterochromatin in different tissues (e.g., piRNA biology in ovaries vs the soma). Also, the mutant phenotypes of prod and D1 are different. Fleshing out these three distinct points could help the reader understand what we know and what we don't know about heterochromatin composition and its special biology. Understanding where we are as a field will offer clear predictions about who the interactors might be that we expect to find. For example, given the dramatically different D1 and prod mutant phenotypes (and allele swap phenotypes), how might the interactors with these proteins differ? What do we know about heterochromatin formation differences in different tissues? And how might these differences impact heterochromatin composition? B. The attempt to offer background on the piRNA pathway and hybrid dysgenesis in the Introduction does not work. As a naïve reader, it was not clear why I was reading about these pathways - it is only explicable once the reader gets to the final third of the Results. Moreover, the reader will not retain this information until the TE results are presented many pages later. I strongly urge the authors to shunt the two TE restriction paragraphs to later in the manuscript. They are currently a major impediment to understanding the power of the experiment - which is to identify new proteins, pathways, and ultimately, biology that are currently obscure because we have so little handle on who makes up heterochromatin.
      2. The implications of the failure to rescue female fertility by the tagged versions of both D1 and Prod are not discussed. Consequently, the reader is left uneasy about how to interpret the data.
      3. How were the significance cut-offs determined? Is the p-value reported the adjusted p-value? As a non-expert in AP-MS, I was surprised to find that the p-value, at least according to the Methods, was not adjusted based on the number of tests. This is particularly relevant given the large/unwieldy(?) number of proteins that were identified as signficant in this study. Moreover, the D1 hit in Piwi pull down is actually not significant according to their criteria of p <0.05 (D1 is p=0.05).
      4. How do we know if the lack of overlap across tissues is indeed germline- or soma-specialization rather than noise? To make this inference, conducting some validation would be required. More generally, I was surprised to see no single interactor validated by reciprocal IP-Westerns to validate the Mass-Spec results, though I am admittedly only adjacent to this technique. Note that colocalization, to my mind, does not validate the AP-MS data - in fact, we would a priori predict that piRNA pathway members would co-localize with PCH given the enrichment of piRNA clusters there.
      5. The AlphaFold2 data are very interesting but seem to lack of negative control. Is it possible to incorporate a dataset of proteins that are not predicted to interact physically to elevate the impact of the ones that you have focused on? Moreover, the structural modeling might suggest a competitive interaction between D1 and piRNAs for Piwi. Is this true? And even if not, how does the structural model contribute to your understanding for how D1 engages with the piRNA pathway? The Cuff mislocalization?
      6. The absence of a TE signal in D1 and Prod mutant ovaries would be much more compelling if investigated more agnostically. The observation that not all TE reporter constructs show a striking signal in the F2 embryos makes me wonder if Burdock and gypsy are not regulated by these two proteins but possibly other TEs are. Alternatively, small RNA-seq would more directly address the question of whether D1 and Prod regulate TEs through the piRNA pathway.
      7. I had trouble understanding the significance of the Cuff mis-localization when D1 is depleted. Given Cuff's role in the piRNA pathway and close association with chromatin, what would the null hypothesis be for Cuff localization when a chromocenter is disrupted? What is the null expectation of % Cuff at chromocenter given that the chromocenter itself expands massively in size (Figure 4D). The relationship between these two factors seems rather indirect and indeed, the absence of Cuff in the AP would suggest this. The biggest surprise is the absence of TE phenotype in the ovary, given the Cuff mutant phenotype - but we can't rule out given the absence of a genome-wide analysis. I think that these data leave the reader unconvinced that the F2 phenotype is causally linked to Cuff mislocalization.
      8. Apologies if I missed this, but Figure 5 shows the F2 D1 mutant ovaries only. Did you look at the TM6 ovaries as well? These ovaries should lack the maternally provisioned D1 (assuming that females are on the right side) but have the zygotic transcription.

      Minor

      1. Add line numbers for ease of reference
      2. The function of satellite DNA itself is still quite controversial - I would recommend being a bit more careful here - the authors could refer instead to genomic regions enriched for satellite DNA are linked to xyz function (see Abstract line 2 and 7, for example.)
      3. "Genetic conflicts" in the introduction needs more explanation.
      4. "In contrast" is not quite the right word. Maybe "However" instead (1st line second paragraph of Intro)
      5. Results: what is the motivation for using GSC-enriched testis?
      6. Clarify sentence about the 500 proteins in the Results section - it's not clear from context that this is the union of all experiments.
      7. The data reported are not the first to suggest that satellite DNA may have special DNA repair requirements. e.g., PMID: 25340780
      8. Page 10: indicate-> indicates.
      9. Page 14: revise for clarity: "investigate a context whether these interactions could not take place"
      10. Might be important to highlight the 500 interactions are both direct and indirect. "Interacting proteins" alone suggests direct interactions only.
      11. The effect of the aub mutant on chromocenter foci did not seem modest to me - however, the bar graphs obscure the raw data - consider plotting all the data not just the mean and error?

      Significance

      The dataset produced by the AP-Mass Spec offers chromosome biologists an unprecedented resource. The PCH is long-ignored chromosomal region that has historically received minimal attention; consequently, the pathways that regulate heterochromatin are understudied. Moreover, attempting to connect genome organization to transposon regulation is a new and fascinating area. I can easily envision this manuscript triggering a flurry of discovery; however, there is quite a lot of work to do before the data can fully support the claims.

      This manuscript represents a significant contribution to the field of chromosome biology.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review):

      Summary:

      The question of whether eyespots mimic eyes has certainly been around for a very long time and led to a good deal of debate and contention. This isn't purely an issue of how eyespots work either, but more widely an example of the potential pitfalls of adopting 'just-so-stories' in biology before conducting the appropriate experiments. Recent years have seen a range of studies testing eye mimicry, often purporting to find evidence for or against it, and not always entirely objectively. Thus, the current study is very welcome, rigorously analysing the findings across a suite of papers based on evidence/effect sizes in a meta-analysis.

      Strengths:

      The work is very well conducted, robust, objective, and makes a range of valuable contributions and conclusions, with an extensive use of literature for the research. I have no issues with the analysis undertaken, just some minor comments on the manuscript. The results and conclusions are compelling. It's probably fair to say that the topic needs more experiments to really reach firm conclusions but the authors do a good job of acknowledging this and highlighting where that future work would be best placed.

      Weaknesses:

      There are few weaknesses in this work, just some minor amendments to the text for clarity and information.

      We greatly appreciate Reviewer 1’s positive comments on our manuscript. We also revised our manuscript text and a figure in accordance with Reviewer 1’s recommendations.

      Reviewer #2 (Public Review):

      Many prey animals have eyespot-like markings (called eyespots) which have been shown in experiments to hinder predation. However, why eyespots are effective against predation has been debated. The authors attempt to use a meta-analytical approach to address the issue of whether eye-mimicry or conspicuousness makes eyespots effective against predation. They state that their results support the importance of conspicuousness. However, I am not convinced by this.

      There have been many experimental studies that have weighed in on the debate. Experiments have included manipulating target eyespot properties to make them more or less conspicuous, or to make them more or less similar to eyes. Each study has used its own set of protocols. Experiments have been done indoors with a single predator species, and outdoors where, presumably, a large number of predator species predated upon targets. The targets (i.e, prey with eyespot-like markings) have varied from simple triangular paper pieces with circles printed on them to real lepidopteran wings. Some studies have suggested that conspicuousness is important and eye-mimicry is ineffective, while other studies have suggested that more eye-like targets are better protected. Therefore, there is no consensus across experiments on the eye-mimicry versus conspicuousness debate.

      The authors enter the picture with their meta-analysis. The manuscript is well-written and easy to follow. The meta-analysis appears well-carried out, statistically. Their results suggest that conspicuousness is effective, while eye-mimicry is not. I am not convinced that their meta-analysis provides strong enough evidence for this conclusion. The studies that are part of the meta-analysis are varied in terms of protocols, and no single protocol is necessarily better than another. Support for conspicuousness has come primarily from one research group (as acknowledged by the authors), based on a particular set of protocols.

      Furthermore, although conspicuousness is amenable to being quantified, for e.g., using contrast or size of stimuli, assessment of 'similarity to eyes' is inherently subjective. Therefore, manipulation of 'similarity to eyes' in some studies may have been subtle enough that there was no effect.

      There are a few experiments that have indeed supported eye-mimicry. The results from experiments so far suggest that both eye-mimicry and conspicuousness are effective, possibly depending on the predator(s). Importantly, conspicuousness can benefit from eye-mimicry, while eye-mimicry can benefit from conspicuousness.

      Therefore, I argue that generalizing based on a meta-analysis of a small number of studies that conspicuousness is more important than eye-mimicry is not justified. To summarize, I am not convinced that the current study rules out the importance of eye-mimicry in the evolution of eyespots, although I agree with the authors that conspicuousness is important.

      We understand Reviewer 2’s concerns and have addressed them by adding some sentences in the discussion part (L506- 508, L538-L540). In addition, our findings, which were guided by current knowledge, support the conspicuousness hypothesis, but we acknowledge the two hypotheses are not mutually exclusive (L110-112). We also do not reject the eye mimicry hypothesis. As we have demonstrated, there are still several gaps in the current literature and our understanding (L501-553). Our aim is for this research to stimulate further studies on this intriguing topic and to foster more fruitful discussions.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Minor comments

      Lines 59/60: "it is possible that eyespots do not involve mimicry of eyes..."

      The sentence was revised (L59). To enhance readability, we have integrated Reviewer 1's suggestions by simplifying the relevant section instead of using the suggested sentence.

      Line 61: not necessarily aposematism. They might work simply through neophobia, unfamiliarity, etc even without unprofitability

      We changed the text in line with the comment from Reviewer 1 (L61-63).

      Lines 62/63 - this is a little hard to follow because I think you really mean both studies of real lepidopterans as well as artificial targets. Need to explain a bit more clearly.

      We provided an additional explanation of our included primary study type (L64-65).

      Lines 93/94 - not quite that they have nothing to do with predator avoidance, but more that any subjective resemblance to eyes is coincidental, or simply as a result of those marking properties being more effective through conspicuousness in their own right.

      Line 94 - similarly, not just aposematism. You explain the possible reasons above on l92 as also being neophobia, etc.

      We agreed with Reviewer 1’s comments and added more explanations about the conspicuousness hypothesis (L96-97). We have also rewritten the sentences that could be misleading to readers (L428).

      Line 96 - this is perhaps a bit misleading as it seems to conflate mechanism and function. The eye mimicry vs conspicuousness debate is largely about how the so-called 'intimidation' function of eyespots works. That is, how eyespots prevent predators from attacking. The deflection hypothesis is a second function of eyespots, which might also work via consciousness or eye mimicry (e.g. if predators try to peck at 'eyes') but has been less central to the mimicry debate.

      The explanations and suggestions from Reviewer 1 are very helpful. We revised this part of our manuscript (L103-108) and Figure 1 and its legend to make it clearer that the eyespot hypothesis and the conspicuousness hypothesis explain anti-predator functions from a different perspective than the deflection hypothesis.

      There is a third function of eyespots too, that being as mate selection traits. Note that Figure 1 should also be altered to reflect these points.

      We wanted to focus on explaining why eyespot patterns can contribute to prey survival. Therefore, we did not state that eyespot patterns function as mate selection traits in this paragraph. Alternatively, we have already mentioned this in the Discussion part (L455-L465) and rewrote it more clearly (L456).

      Were there enough studies on non-avian predators to analyse in any way? 

      We found a few studies on non-avian predators (e.g. fish, invertebrates, or reptiles), but not enough to conduct a meta-analysis.

      Line 171/72 - why? Can you explain, please.

      The reason we excluded studies that used bright or contrasting patterns as control stimuli in our meta-analysis is to ensure comparability across primary studies. We added an explanation in the text (L180-181).

      Line 177 - can you clarify this?

      Without control stimuli, it is challenging to accurately assess the effect of eyespots or other conspicuous patterns on predation avoidance. Control stimuli allow for a comparison of the effect of eyespots or patterns. We added a more detailed explanation to clarify here (L186-188).

      Line 309 - presumably you mean 33 papers, each of which may have multiple experiments? I might have missed it, but how many individual experiments in total? 

      There were 164 individual experiments. We have now added that information in the manuscript (L320).

      Line 320 - paper shaped in a triangle mostly?

      We cannot say that most artificial prey were triangular. After excluding the caterpillar type, 57.4% were triangular, while the remaining 43.6% were rectangular (Figure 2b).

      Line 406: Stevens.

      We fixed this name, thank you (L417).

      Discussion - nice, balanced and thorough. Much of the work done has been in Northern Europe where eyespot species are less common. Perhaps things may differ in areas where eyespots are more prevalent.

      We appreciate Reviewer 1’s kind words and comments. We agree with your comments and reflected them in our manuscript (L542-545).

      Line 477 - True, and predators often have forward-facing eyes making it likely both would often be seen, but a pair of eyes may not be absolutely crucial to avoidance since sometimes a prey animal may only see one eye of a predator (e.g. if the other is occluded, or only one side of the head is visible).

      We were grateful for Reviewer 1's comment. We added a sentence noting that the eyespots do not necessarily have to be in pairs to resemble eyes (L490-L492).

    1. of both race and gender that remained in place—particularly among its women employees known as computers..d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }11211Darden’s arrival at Langley coincided with the early days of digital computing. Although Langley could claim one of the most advanced computing systems of the time—an IBM 704, the first computer to support floating-point math—its resources were still limited. For most data analysis tasks, Langley’s Advanced Computing Division relied upon human computers like Darden herself. These computers were all women, trained in math or a related field, and tasked with performing the calculations that determined everything from the best wing shape for an airplane, to the best flight path to the moon. .d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Aneta SwianiewiczBut despite the crucial roles they played in advancing this and other NASA research, they were treated like unskilled temporary workers.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }11. They were brought into research groups on a project-by-project basis, often without even being told anything about the source of the data they were asked to analyze..d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Lena Zlock Most of the engineers, who were predominantly men, never even bothered to learn the computers’ names.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1111.These women computers have only recently.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Michela Banks begun to receive credit for their crucial work, thanks to scholars of the history of computing.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Roujia Wang—and to journalists like Margot Lee Shetterly, whose book, Hidden Figures: The American Dream and the Untold Story of the Black Women Who Helped Win the Space Race,.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Melinda Rossi along with its film adaptation.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Fagana Stone, is responsible for bringing Christine Darden’s story into the public eye.2 Her story, like those of her colleagues, is one of hard work under discriminatory conditions. Each of these women computers was required to advocate for herself—and some, like Darden, chose also to advocate for others. It is because of both her contributions to data science and her advocacy for women that we have chosen to begin our book, Data Feminism, with Darden’s story. For feminism begins with a belief in the “political, social, and economic equality of the sexes,”.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Michela Banks as the Merriam-Webster Dictionary defines the term—as does, for the record, Beyoncé.3 And any definition of feminism also necessarily includes the activist work that is required to turn that belief into reality.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Yolanda Yang. In Data Feminism, we bring these two aspects of feminism together, demonstrating a way of thinking about data, their analysis, and their display, that is informed by this tradition of feminist activism as well as the legacy of feminist critical thought..d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1nyah beanAs for Darden, she did not only apply her skills of data analysis to spaceflight trajectories; she also applied them to her own career path..d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Yasin Chowdhury After working at Langley for a number of years, she began to notice two distinct patterns in her workplace: men with math credentials were placed in engineering positions, where they could be promoted through the ranks of the civil service, while women with the same degrees were sent to the computing pools, where they languished until they retired or quit.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }211..d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Joe Masnyy She did not want to become one of those women, nor did she want others to experience the same fate. So she gathered up her courage and decided to approach the chief of her division to ask him why..d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Yasin Chowdhury As Darden, now seventy-five, told Shetterly in an interview for Hidden Figures, his response was sobering: “Well, nobody’s ever complained,” he told Darden. “The women seem to be happy doing that, so that’s just what they do.”.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }21111In today’s world, Darden might have gotten her boss fired—or at least served with an Equal Employment Opportunity Commission complaint. But at the time that Darden posed her question, stereotypical remarks about “what women do” were par for the course..d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Roujia Wang In fact, challenging assumptions about what women could or couldn’t do—especially in the workplace—was the central subject of Betty Friedan’s best-selling book, The Feminine Mystique. Published in 1963, The Feminine Mystique.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Jillian McCarten is often credited with starting feminism’s so-called second wave.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Yolanda Yang.4 Fed up with the enforced return to domesticity following the end of World War II, and inspired by the national conversation about equality of opportunity prompted by the civil rights movement, women across the United States began to organize around a wide range of issues, including reproductive rights.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }21 and domestic violence, as well as the workplace inequality and restrictive gender roles that Darden faced at Langley.That said, Darden’s specific experience as a Black woman with a full-time job was quite different than that of a white suburban housewife—the central focus of The Feminine Mystique. And when critics rightly called out Friedan for failing to acknowledge the range of experiences of women in the United States (and abroad), it was women like Darden, among many others, whom they had in mind. In Feminist Theory: From Margin to Center, another landmark feminist book published in 1984, bell hooks puts it plainly: “[Friedan] did not discuss who would be called in to take care of the children and maintain the home if more women like herself were freed from their house labor and given equal access with white men to the professions. .d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }11She did not speak of the needs of women without men, without children, without homes. She ignored the existence of all non-white women and poor white women..d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Melinda Rossi She did not tell readers whether it was more fulfilling to be a maid, a babysitter, a factory worker, a clerk, or a prostitute than to be a leisure-class housewife.”.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Jillian McCarten5In other words, Friedan had failed to consider how those additional dimensions of individual and group identity—like race and class, not to mention sexuality, ability, age, religion, and geography, among many others—intersect with each other to determine one’s experience in the world.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Jayri Ramirez. Although this concept—intersectionality.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }11—did not have a name when hooks described it, the idea that these dimensions cannot be examined in isolation from each other has a much longer intellectual history..d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }116 Then, as now, key scholars and activists were deeply attuned to how the racism embedded in US culture.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }2Fagana Stone, Amanda Christopher, coupled with many other forms of oppression, made it impossible to claim a common experience.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Melinda Rossi—or a common movement—for all women everywhere. Instead, what was needed was “the development of integrated analysis and practice based upon the fact that the major systems of oppression are interlocking.”7 These words are from the Combahee River Collective Statement, written in 1978 by the famed Black feminist activist group out of Boston. In this book, we draw heavily from intersectionality and other concepts developed through the work of Black feminist scholars and activists because they offer some of the best ways for negotiating this multidimensional terrain.Indeed, feminism must be intersectional if it seeks to address the challenges of the present moment..d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }2Angela Li, Cynthia Lisee We write as two straight, white women based in the United States, with four advanced degrees and five kids between us. We identify as middle-class and cisgender—meaning that our gender identity matches the sex that we were assigned at birth. We have experienced.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Jillian McCarten sexism in various ways at different points of our lives—being women in tech and academia, birthing and breastfeeding babies, and trying to advocate for ourselves and our bodies in a male-dominated health care system. But we haven’t experienced sexism in ways that other women certainly have or that nonbinary people have, for there are many dimensions of our shared identity, as the authors of this book, that align with dominant group positions. This fact makes it impossible for us to speak from experience about some oppressive forces—racism, for example. But it doesn’t make it impossible for us to educate ourselves.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Melinda Rossi and then speak about racism and the role that white people play in upholding it..d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Peem Lerdp Or to challenge ableism and the role that abled people play in upholding it. Or to speak about class and wealth inequalities and the role that well-educated, well-off people play in maintaining those..d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Fagana Stone Or to believe in the logic of co-liberation. Or to advocate for justice through equity. .d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1nyah beanIndeed, a central aim of this book is to describe a form of intersectional feminism that takes the inequities of the present moment as its starting point and begins its own work by asking: How can we use data to remake the world?8This is a complex and weighty task, and it will necessarily remain unfinished. But its size and scope need not stop us—or you, the readers of this book—from taking additional steps toward justice. Consider Christine Darden, who, after speaking up to her division chief, heard nothing from him but radio silence. But then, two weeks later, she was indeed promoted.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Amanda Christopher and transferred to a group focused on sonic boom research. In her new position, Darden was able to begin directing her own research projects and collaborate with colleagues of all genders as a peer. Her self-advocacy serves as a model: a sustained attention to how systems of oppression intersect with each other, informed by the knowledge that comes from direct experience. It offers a guide for challenging power and working toward justice.What Is Data Feminism?Christine Darden would go on to conduct groundbreaking research on sonic boom minimization techniques, author more than sixty scientific papers in the field of computational fluid dynamics, and earn her PhD in mechanical engineering—all while “juggling the duties of Girl Scout mom, Sunday school teacher, trips to music lessons, and homemaker,” Shetterly reports. But even as she ascended the professional ranks, she could tell that her scientific accomplishments were still not being recognized as readily as those of her male counterparts; the men, it seemed, received promotions far more quickly.Darden consulted with Langley’s Equal Opportunity Office, where a white woman by the name of Gloria Champine had been compiling a set of statistics about gender and rank. The data confirmed Darden’s direct experience: that women and men.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Jillian McCarten—even those with identical academic credentials, publication records, and performance reviews—were promoted at vastly different rates. .d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Aneta SwianiewiczChampine recognized that her data could support Darden in her pursuit of a promotion and, furthermore, that these data could help communicate the systemic nature of the problem at hand. .d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Yuanxi LiChampine visualized the data in the form of a bar chart, and presented the chart to the director of Darden’s division.9 He was “shocked at the disparity,” Shetterly reports, and Darden received the promotion she had long deserved.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }2Angela Li, Fagana Stone.10 Darden would advance to the top rank in the federal civil service, the first Black woman at Langley to do so. By the time that she retired from NASA, in 2007, Darden was a director herself..d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Joe Masnyy11Although Darden’s rise into the leadership ranks at NASA was largely the result of her own knowledge, experience, and grit, her story is one that we can only tell as a result of the past several decades of feminist activism and critical thought. It was a national feminist movement that brought women’s issues to the forefront of US cultural politics, and the changes brought about by that movement were vast. They included both the shifting gender roles that pointed Darden in the direction of employment at NASA and the creation of reporting mechanisms.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; } like the one that enabled her to continue her professional rise..d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }2Roujia Wang, Seyoon Ahn But Darden’s success in the workplace was also, presumably, the result of many unnamed colleagues and friends who may or may not have considered themselves feminists. These were the people who provided her with community and support.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Melinda Rossi—and likely a not insignificant number of casserole dinners—as she ascended the government ranks. These types of collective efforts have been made increasingly legible, in turn, because of the feminist scholars and activists whose decades of work have enabled us to recognize that labor—emotional as much as physical—as such today.As should already be apparent, feminism has been defined and used in many ways. Here and throughout the book, we employ the term feminism as a shorthand for the diverse and wide-ranging projects that name and challenge sexism and other forces of oppression, as well as those which seek to create more just, equitable, and livable futures. .d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }312Because of this broadness, some scholars prefer to use the term feminisms, which clearly signals the range of—and, at times, the incompatibilities among—these various strains of feminist activism and political thought. For reasons of readability, we choose to use the term feminism here, but our feminism is intended to be just as expansive. It includes the work of regular folks like Darden and Champine, public intellectuals like Betty Friedan and bell hooks, and organizing groups like the Combahee River Collective, which have taken direct action to achieve the equality of the sexes. It also includes the work of scholars and other cultural critics—like Kimberlé Crenshaw and Margot Lee Shetterly, among many more—who have used writing to explore the social, political, historical, and conceptual reasons behind the inequality of the sexes that we face today.In the process, these writers and activists have given voice to the many ways in which today’s status quo is unjust.12 These injustices are often the result of historical and contemporary differentials of power, including those among men, women, and nonbinary people, as well as those among white women and Black women, academic researchers and Indigenous communities, and people in the Global North and the Global South..d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; } Feminists analyze these power differentials so that they can change them..d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1athmar al-ghanim Such a broad focus—one that incorporates race, class, ability, and more—would have sounded strange to Friedan or to the white women largely credited for leading the fight for women’s suffrage in the nineteenth century.13 But the reality is that women of color have long insisted that any movement for gender equality must also consider the ways in which privilege and oppression are intersectional..d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1nyah beanBecause the concept of intersectionality is essential for this whole book, let’s get a bit more specific. The term was coined by legal theorist Kimberlé Crenshaw in the late 1980s..d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1nyah bean14 In law school, Crenshaw had come across the antidiscrimination case of DeGraffenreid v. General Motors. Emma DeGraffenreid was a Black working mother who had sought a job at a General Motors factory in her town. She was not hired and sued GM for discrimination. The factory did have a history of hiring Black people: many Black men worked in industrial and maintenance jobs there. They also had a history of hiring women: many white women worked there as secretaries. These two pieces of evidence provided the rationale for the judge to throw out the case. Because the company did hire Black people and did hire women, it could not be discriminating based on race or gender. But, Crenshaw wanted to know, what about discrimination on the basis of race and gender together? This was something different, it was real, and it needed to be named. Crenshaw not only named the concept, but would go on to explain and elaborate the idea of intersectionality in award-winning books, papers, and talks.15Key to the idea of intersectionality is that it does not only describe the intersecting aspects of any particular person’s identity (or positionalities, as they are sometimes termed).16 It also describes the intersecting forces of privilege and oppression at work in a given society. .d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }111Oppression involves the systematic mistreatment of certain groups of people by other groups. It happens when power is not distributed equally—when one group controls the institutions of law, education, and culture, and uses its power to systematically exclude other groups while giving its own group unfair advantages (or simply maintaining the status quo).17 In the case of gender oppression, we can point to the sexism, cissexism.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Amanda Christopher, and patriarchy that is evident in everything from political representation to the wage gap to who speaks more often (or more loudly.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Jillian McCarten) in a meeting..d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Tegan Lewis18 In the case of racial oppression, this takes the form of racism and white supremacy. Other forms of oppression include ableism, colonialism, and classism. Each has its particular history and manifests differently in different cultures and contexts, but all involve a dominant group that accrues power and privilege at the expense of others. Moreover, these forces of power and privilege on the one hand and oppression on the other mesh together in ways that multiply their effects.The effects of privilege and oppression are not distributed evenly across all individuals and groups, however. For some, they become an obvious and unavoidable part of daily life, particularly for women and people of color and queer people and immigrants: the list goes on. If you are a member of any or all of these (or other) minoritized groups, you experience their effects everywhere, shaping the choices you make (or don’t get to make) each day. These systems of power are as real as rain..d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }2Eva Maria Chavez But forces of oppression can be difficult to detect when you benefit from them (we call this a privilege hazard later in the book).d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }2Yolanda Yang, Jillian McCarten. And this is where data come in: it was a set of intersecting systems of power and privilege that Darden was intent on exposing when she posed her initial question to her division chief. .d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1g mAnd it was that same set of intersecting systems of power and privilege that Darden sought to challenge when she approached Champine. Darden herself didn’t need any more evidence of the problem she faced; she was already living it every day.19 But when her experience was recorded as data and aggregated with others’ experiences, it could be used to challenge institutional systems of power and have far broader impact than on her career trajectory alone..d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1111In this way, Darden models what we call data feminism: a way of thinking about data, both their uses and their limits, that is informed by direct experience, by a commitment to action, and by intersectional feminist thought..d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Tegan Lewis T.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }11he starting point for data feminism is something that goes mostly unacknowledged in data science: power is not distributed equally in the world. Those who wield power are disproportionately elite, straight, white, able-bodied, cisgender men from the Global North.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Seng Aung Sein Myint.20 The work of data feminism is first to tune into how standard practices in data science serve to reinforce these existing inequalities and second to use data science to challenge and change the distribution of power..d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Megan Foesch21 Underlying data feminism is a belief in and commitment to co-liberation: the idea that oppressive systems of power harm all of us, that they undermine the quality and validity of our work, and that they hinder us from creating true and lasting social impact with data science..d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1nyah beanWe wrote this book because we are data scientists and data feminists. Although we speak as a “we” in this book, and share certain identities, experiences, and skills, we have distinct life trajectories and motivations for our work on this project. If we were sitting with you right now, we would each introduce ourselves by answering the question: What brings you here today? Placing ourselves in that scenario, here is what we would have to say.Catherine: I am a hacker mama. I spent fifteen years as a freelance software developer and experimental artist, now professor, working on projects ranging from serendipitous news-recommendation systems to countercartography to civic data literacy to making breast pumps not suck. I’m here writing this book because, for one, the hype around big data and AI is deafeningly male and white and technoheroic .d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Jillian McCartenand the time is now to reframe that world with a feminist lens. The second reason I’m here is that my recent experience running a large, equity-focused hackathon taught me just how much people like me—basically, well-meaning liberal white people—are part of the problem in struggling for social justice. This book is one attempt to expose such workings of power, which are inside us as much as outside in the world.22Lauren: I often describe myself as a professional nerd. I worked in software development before going to grad school to study English, with a particular focus on early American literature and culture. (Early means very early—like, the eighteenth century.) As a professor at an engineering school, I now work on research projects that translate this history into contemporary contexts. For instance, I’m writing a book about the history of data visualization, employing machine-learning techniques to analyze abolitionist newspapers, and designing a haptic recreation of a hundred-year-old visualization scheme that looks like a quilt. Through projects like these, I show how the rise of the concept of “data” (which, as it turns out, really took off in the eighteenth century.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Jillian McCarten) is closely connected to the rise of our current concepts of gender and race. So one of my reasons for writing this book is to show how the issues of racism and sexism that we see in data science today are by no means new. The other reason is to help translate humanistic thinking into practice and, in so doing, create more opportunities for humanities scholars to engage with activists, organizers, and communities.23We both strongly believe that data can do good in the world. But for it to do so, we must explicitly acknowledge that a key way that power and privilege operate in the world today has to do with the word data itself..d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Seng Aung Sein Myint The word dates to the mid-seventeenth century, when it was introduced to supplement existing terms such as evidence and fact..d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Tegan Lewis Identifying information as data, rather than as either of those other two terms, served a rhetorical purpose.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Jillian McCarten.24 It converted otherwise debatable information into the solid basis for subsequent claims. But what information needs to become data before it can be trusted? Or, more precisely, whose information needs to become data before it can be considered as fact and acted upon?.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }2Peem Lerdp, Fagana Stone25 Data feminism must answer these questions, too..d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }211The story that begins with Christine Darden entering the gates of Langley, passes through her sustained efforts to confront the structural oppression she encountered there, and concludes with her impressive array of life achievements, is a story about the power of data. Throughout her career, in ways large and small, Darden used data to make arguments and transform lives. But that’s not all. Darden’s feel-good biography is just as much a story about the larger systems of power that required data—rather than the belief in her lived experience.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Cynthia Lisee—to perform that transformative work. An institutional mistrust of Darden’s experiential knowledge was almost certainly a factor in Champine’s decision to create her bar chart. Champine likely recognized, as did Darden herself, that she would need the bar chart to be believed..d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }11In this way, the alliance between Darden and Champine, and their work together, underscores the flaws and compromises that are inherent in any data-driven project. The process of converting life experience into data always necessarily entails a reduction of that experience.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Tegan Lewis—along with the historical and conceptual burdens of the term. .d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }11That Darden and Champine were able to view their work as a success despite these inherent constraints underscores even more the importance of listening to and learning from people whose lives and voices are behind the numbers. No dataset or analysis or visualization or model or algorithm is the result of one person working alone. Data feminism can help to remind us that before there are data, there are people—people who offer up their experience to be counted and analyzed, people who perform that counting and analysis, people who visualize the data and promote the findings of any particular project, and people who use the product in the end..d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1nyah bean There are also, always, people who go uncounted—for better or for worse.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }11. And there are problems that cannot be represented—or addressed—by data alone. And so data feminism, like justice, must remain both a goal and a process, one that guides our thoughts and our actions as we move forward toward our goal of remaking the world..d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }111Data and Power.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Kaiyun ZhengIt took five state-of-the-art IBM System/360 Model 75 machines to guide the Apollo 11 astronauts to the moon. Each was the size of a car and cost $3.5 million dollars. Fast forward to the present. We now have computers in the form of phones that fit in our pockets and—in the case of the 2019 Apple iPhone XR—can perform more than 140 million more instructions per second than a standard IBM System/360..d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Kotaro Garvin26 That rate of change is astounding; it represents an exponential growth in computing capacity (figure 0.2a). We’ve witnessed an equally exponential growth in our ability to collect and record information in digital form—and in the ability to have information collected about us (figure 0.2b).Figure 0.2: (a) The time-series chart included in the original paper on Moore’s law, published in 1965, which posited that the number of transistors that could fit on an integrated circuit (and therefore contribute to computing capacity) would double every year. Courtesy of Gordon Moore. (b) Several years ago, researchers concluded that transistors were approaching their smallest size and that Moore’s law would not hold. Nevertheless, today’s computing power is what enabled Dr. Katie Bouman, a postdoctoral fellow at MIT, to contribute to a project that involved processing and compositing approximately five petabytes of data captured by the Event Horizon Telescope to create the first ever image of a black hole. After the publication of this photo in April 2019 showing her excitement—as one of the scientists on the large team that worked for years to capture the image—Bouman was subsequently trolled and harassed online. Courtesy of Tamy Emma Pepin/Twitter.But the act of collecting and recording data about people is not new at all. From the registers of the dead that were published by church officials in the early modern era to the counts of Indigenous populations that appeared in colonial accounts of the Americas, data collection has long been employed as a technique of consolidating knowledge about the people whose data are collected, and therefore consolidating power over their lives..d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Sara Blumenstein27 The close relationship between data and power is perhaps most clearly visible in the historical arc that begins with the logs of people captured and placed aboard slave ships, reducing richly lived lives to numbers and names..d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }11 It passes through the eugenics movement, in the late nineteenth and early twentieth centuries, which sought to employ data to quantify the superiority of white people over all others. It continues today in the proliferation of biometrics technologies that, as sociologist Simone Browne has shown, are disproportionately deployed to surveil Black bodies..d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }28When Edward Snowden, the former US National Security Agency contractor, leaked his cache of classified documents to the press in 2013, he revealed the degree to which the federal government routinely collects data on its citizens—often with minimal regard to legality or ethics..d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Natalie Pei Xu29 At the municipal level, too, governments are starting to collect data on everything from traffic movement to facial expressions in the interests of making cities “smarter.”30 This often translates to reinscribing traditional urban patterns of power such as segregation, the overpolicing of communities of color, and the rationing of ever-scarcer city services..d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Melinda Rossi31But the government is not alone in these data-collection efforts; corporations do it too—with profit as their guide. The words and phrases we search for on Google, the times of day we are most active on Facebook, and the number of items we add to our Amazon carts are all tracked and stored as data—data that are then converted into corporate financial gain.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }12. The most trivial of everyday actions—searching for a way around traffic, liking a friend’s cat video, or even stepping out of our front doors in the morning—are now hot commodities. This is not because any of these actions are exceptionally interesting (although we do make an exception for Catherine’s cats) but because these tiny actions can be combined with other tiny actions to generate targeted advertisements and personalized recommendations—in other words, to give us more things to click on, like, or buy.32.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Esmeralda OrrinThis is the data economy, and corporations, often aided by academic researchers, are currently scrambling to see what behaviors—both online and off—remain to be turned into data and then monetized. Nothing is outside of datafication.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Melinda Rossi, as this process is sometimes termed—not your search history, or Catherine’s cats, or the butt that Lauren is currently using to sit in her seat. To wit: Shigeomi Koshimizu, a Tokyo-based professor of engineering, has been designing matrices of sensors that collect data at 360 different positions around a rear end while it is comfortably ensconced in a chair.33 He proposes that people have unique butt signatures, as unique as their fingerprints. In the future, he suggests, our cars could be outfitted with butt-scanners instead of keys or car alarms to identify the driver..d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Kotaro GarvinAlthough datafication may occasionally verge into the realm of the absurd, it remains a very serious issue. Decisions of civic, economic, and individual importance are already and increasingly being made by automated systems sifting through large amounts of data. For example, PredPol, a so-called predictive policing company founded in 2012 by an anthropology professor at the University of California, Los Angeles, has been employed by the City of Los Angeles for nearly a decade to determine which neighborhoods to patrol more heavily, and which neighborhoods to (mostly) ignore. .d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Jillian McCartenBut because PredPol is based on historical crime data and US policing practices have always disproportionately surveilled and patrolled neighborhoods of color, the predictions of where crime will happen in the future look a lot like the racist practices of the past..d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }3Fagana Stone, Melinda Rossi, Amanda Christopher34 These systems create what mathematician and writer Cathy O’Neil, in Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy, calls a “pernicious feedback loop,” amplifying the effects of racial bias and of the criminalization of poverty that are already endemic to the United States..d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Kaiyun ZhengO’Neil’s solution is to open up the computational systems that produce these racist results. Only by knowing what goes in, she argues, can we understand what comes out. This is a key step in the project of mitigating the effects of biased data. Data feminism additionally requires that we trace those biased data back to their source. PredPol and the “three most objective data points” that it employs certainly amplify existing biases, but they are not the root cause.35 The cause, rather, is the long history of the criminalization of Blackness in the United States, which produces biased policing practices, which produce biased historical data, which are then used to develop risk models for the future..d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }36 Tracing these links to historical and ongoing forces of oppression can help us answer the ethical question, Should this system exist?37 In the case of PredPol, the answer is a resounding no.Understanding this long and complicated chain reaction is what has motivated Yeshimabeit Milner, along with Boston-based activists, organizers, and mathematicians, to found Data for Black Lives, an organization dedicated to “using data science to create concrete and measurable change in the lives of Black communities.”38 Groups like the Stop LAPD Spying coalition are using explicitly feminist and antiracist methods to quantify and challenge invasive data collection by law enforcement.39 Data journalists are reverse-engineering algorithms and collecting qualitative data at scale about maternal harm.40 Artists are inviting participants to perform ecological maps and using AI for making intergenerational family memoirs.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Melinda Rossi (figure 0.3a).41All these projects are data science. Many people think of data as numbers alone, but data can also consist of words or stories, colors or sounds, or any type of information that is systematically collected, organized, and analyzed .d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }12(figures 0.3b, 0.3c).42 The science in data science simply implies a commitment to systematic methods of observation and experiment. .d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Peem LerdpThroughout this book, we deliberately place diverse data science examples alongside each other. They come from individuals and small groups, and from across academic, artistic, nonprofit, journalistic, community-based, and for-profit organizations. This is due to our belief in a capacious definition of data science, one that seeks to include rather than exclude and does not erect barriers based on formal credentials, professional affiliation, size of data, complexity of technical methods, or other external markers of expertise..d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Cynthia Lisee Such markers, after all, have long been used to prevent women from fully engaging in any number of professional fields, even as those fields—which include data science and computer science, among many others—were largely built on the knowledge that women were required to teach themselves.43 An attempt to push back against this gendered history is foundational to data feminism, too.Throughout its own history, feminism has consistently had to work to convince the world that it is relevant to people of all genders.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }2Fagana Stone, Amanda Christopher. We make the same argument: that data feminism is for everybody. (And here we borrow a line from bell hooks.).d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }2Peem Lerdp, Vibha Sathish Kumar44 You will notice that the examples we use are not only about women, nor are they created only by women. That’s because data feminism isn’t only about women. It takes more than one gender to have gender inequality and more than one gender to work toward justice. Likewise, data feminism isn’t only for women. Men, nonbinary, and genderqueer people are proud to call themselves feminists and use feminist thought in their work. Moreover, data feminism isn’t only about gender. Intersectional feminists have keyed us into how race, class, sexuality, ability, age, religion, geography, and more are factors that together influence each person’s experience and opportunities in the world. Finally, data feminism is about power.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Peem Lerdp—about who has it and who doesn’t. Intersectional feminism examines unequal power.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Megan Foesch. And in our contemporary world, data is power too. Because the power of data is wielded unjustly, it must be challenged and changed..d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1nyah beanData Feminism in ActionData is a double-edged sword. In a very real sense, data have been used as a weapon by those in power to consolidate their control—over places and things, as well as people. Indeed, a central goal of this book is to show how governments and corporations have long employed data and statistics as management techniques to preserve an unequal status quo. .d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }3Tegan Lewis, Melinda Rossi, Jillian McCartenWorking with data from a feminist perspective requires knowing and acknowledging this history. To frame the trouble with data in another way: it’s not a coincidence that the institution that employed Christine Darden and enabled her professional rise is the same that wielded the results of her data analysis to assert the technological superiority of the United States over its communist adversaries and to plant an American flag on the moon. But this flawed history does not mean ceding control of the future to the powers of the past. Data are part of the problem, to be sure. But they are also part of the solution. Another central goal of this book is to show how the power of data can be wielded back.Figure 0.3: We define data science expansively in this book—here are three examples. (a) Not the Only One by Stephanie Dinkins (2017), is a sculpture that features a Black family through the use of artificial intelligence. The AI is trained and taught by the underrepresented voices of Black and brown individuals in the tech sector. (b) Researcher Margaret Mitchell and colleagues, in “Seeing through the Human Reporting Bias” (2016), have worked on systems to infer what is not said in human speech for the purposes of image classification. For example, people say “green bananas” but not “yellow bananas” because yellow is implied as the default color of the banana. Similarly, people say “woman doctor” but do not say “man doctor,” so it is the words that are not spoken that encode the bias. (c) A gender analysis of Hollywood film dialogue, “Film Dialogue from 2,000 Screenplays Broken Down by Gender and Age,” by Hanah Anderson and Matt Daniels, created for The Pudding, a data journalism start-up (2017).To guide us in this work, we have developed seven core principles. Individually and together, these principles emerge from the foundation of intersectional feminist thought. Each of the following chapters is structured around a single principle. The seven principles of data feminism are as follows:.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Monserrat PadillaExamine power. Data feminism begins by analyzing how power operates in the world.Challenge power. Data feminism commits to challenging unequal power structures and working toward justice.Elevate emotion and embodiment. Data feminism teaches us to value multiple forms of knowledge, including the knowledge that comes from.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }11 people as living, feeling bodies in the world.Rethink binaries and hierarchies. Data feminism requires us to challenge the gender binary, along with other systems of counting and classification that perpetuate oppression..d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Eva Maria ChavezEmbrace pluralism. Data feminism insists that the most complete knowledge comes from synthesizing multiple perspectives, with priority .d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }3Eva Maria Chavez, Fagana Stone, Tegan Lewisgiven to local, Indigenous, and experiential ways of knowing.Consider context. Data feminism asserts that data are not neutral or objective. They are the products of unequal social relations, and this context is essential for conducting accurate, ethical analysis..d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Natalie Pei XuMake labor visible. The work of data science, like all work in the world, is the work of many hands. .d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Melinda RossiData feminism makes this labor visible so that it can be recognized and valued.Each of the following chapters takes up one of these principles, drawing upon examples from the field of data science, expansively defined, to show how that principle can be put into action. Along the way, we introduce key feminist concepts like the matrix of domination (Patricia Hill Collins; see chapter 1), situated knowledge (Donna Haraway; see chapter 3), and emotional labor (Arlie Hochschild; see chapter 8), as well as some of our own ideas about what data feminism looks like in theory and practice. To this end, we introduce you to people at the cutting edge of data and justice. These include engineers and software developers, activists and community organizers, data journalists, artists, and scholars. This range of people, and the range of projects they have helped to create, is our way of answering the question: What makes a project feminist? As will become clear, a project may be feminist in content, in that it challenges power by choice of subject matter; in form, in that it challenges power by shifting the aesthetic and/or sensory registers of data communication; and/or in process, in that it challenges power by building participatory, inclusive processes of knowledge production.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }11. What unites this broad scope of data-based work is a commitment to action and a desire to remake the world..d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Sara BlumensteinOur overarching goal is to take a stand against the status quo—against a world that benefits us, two white college professors, at the expense of others..d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Justine Smith To work toward this goal, we have chosen to feature the voices of those who speak from the margins, whether because of their gender, sexuality, race, ability, class, geographic location, or any combination of those (and other) subject positions. We have done so, moreover, because of our belief that those with direct experience of inequality know better than we do about what actions to take next. For this reason, we have attempted to prioritize the work of people in closer proximity to issues of inequality over those who study inequality from a distance..d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Natalie Pei Xu In this book, we pay particular attention to inequalities at the intersection of gender and race. This reflects our location in the United States, where the most entrenched issues of inequality have racism at their source. Our values statement, included as an appendix to this book, discusses the rationale for these authorial choices in more detail.Any book involves making choices about whose voices and whose work to include and whose voices and work to omit. We ask that those who find their perspectives insufficiently addressed or their work insufficiently acknowledged view these gaps as additional openings for conversation. Our sincere hope is to contribute in a small way to a much larger conversation, one that began long before we embarked upon this writing process and that will continue long after these pages are through.This book is intended to provide concrete steps to action for data scientists seeking to learn how feminism can help them work toward justice, and for feminists seeking to learn how their own work can carry over to the growing field of data science. It is also addressed to professionals in all fields in which data-driven decisions are being made.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Melinda Rossi, as well as to communities that want to resist or mobilize the data that surrounds them. It is written for everyone who seeks to better understand the charts and statistics that they encounter in their day-to-day lives, and for everyone who seeks to communicate the significance of such charts and statistics to others..d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Peem LerdpOur claim, once again, is that data feminism is for everyone. It’s for people of all genders. It’s by people of all genders. And most importantly: it’s about much more than gender. Data feminism is about power, about who has it and who doesn’t, and about how those differentials of power can be challenged and changed using data.d-undefined, .lh-undefined { background-color: rgba(0, 0, 0, 0.2) !important; }1Yolanda Yang. We invite you, the readers of this book, to join us on this journey toward justice and toward remaking our data-driven world.Connections1 of 2children and siblingsfilterA Translation of this Pubمقدمه: چرا علم داده به فمینیسم احتیاج داردby Catherine D'Ignazio and Lauren KleinShow DescriptionPublished on Mar 07, 2024data-feminism.mitpress.mit.eduDescriptionترجمه توسط امیرحسین پی‌براهA Translation of this PubIntroducción: por qué la ciencia de datos necesita feminismoby Catherine D'Ignazio and Lauren KleinShow DescriptionPublished on Apr 23, 2023data-feminism.mitpress.mit.eduDescriptionDataGénero (Coordinación: Mailén García. Traductoras: Ivana Feldfeber,Sofía García, Gina Ballaben, Giselle Arena y Mariángela Petrizzo. Revisión: Helena Suárez Val.Con la ayuda de Diana Duarte Salinas, Ana Amelia Letelier, y Patricia Maria Garcia Iruegas)Footnotes44LicenseCreative Commons Attribution 4.0 International License (CC-BY 4.0)Comments168 .discussion-list .discussion-thread-component.preview:hover, .discussion-list .discussion-thread-component.expanded-preview { border-left: 3px solid #2D2E2F; padding-left: calc(1em - 2px); } ?Login to discussHappy Polarbear: This passage describing the attitude of most male engineers towards their work is both painfully accurate and poignant, portraying them not as respected individuals deserving recognition for their achievements, but merely as inanimate objects, tools for calculation.?Cynthia Lisee: Such a fertile approach”?Cynthia Lisee: There is somethig immeasurable in lived experience, somethind stat would never reach. data not subject to an ethic of human relations based on "welcoming the Other" are mere abstractions and sources of violence Jamia Williams: Thank you! Reframing is essential when many of these events were deemed “riots” when it was Black folks rising up against various systems.Jamia Williams: Still happening today!?Jillian McCarten: The context in which numbers are collected?Jillian McCarten: The idea that some areas, and therefore some people don’t need to be monitored feels immoral. ?Jillian McCarten: I’ve been thinking about how it’s not what you’re doing but what your goal is, and corporations using our data to make more money off us definitely does not feel the same as collecting data on gender discrimination to stop the practice. ?Jillian McCarten: curious what examples it’s better?Jillian McCarten: It’s interesting what we need evidence to believe, and what we willingly believe without evidence ?Jillian McCarten: the word data origionaly meant to communicate that the fact is confirmed to be true- to shut down disputes ?Jillian McCarten: I love linguistic history, I’d like to learn more about this?Jillian McCarten: Yes, I’m afraid how how biases are baked into AI, and then reinforced ?Jillian McCarten: This reminds me of how priviledge is a lot less visible to those who hold it. ?Jillian McCarten: I wonder if she also had access to data on promotions across race. There’s all kinds of discrimination, and the kinds of data seen as worth collecting also reveal bias. I wonder if the white woman who collected the data focused on gender and missed other identities experiencing discrimination. ?Jillian McCarten: I appreciate how the authors directly state their most salient identities; this should be the norm. Oftentimes when I read a book like this I have to research the authors to learn their identities. Identities always influence the way we think and see the world. ?Jillian McCarten: Compelling quote about power?Jillian McCarten: It’s interesting to me that Darden’s story and the book are the two examples given so far. When I took Into to Women’s Studies in undergrad, this book was heavily criticized for mostly speaking on white feminist issues. I appreciate the author giving a more nuanced intersectional framing in the next paragraph. Jamia Williams: Love to know this! Jamia Williams: And it still far from being accomplished?Jillian McCarten: I’m curious which numbers would help communicate that, and how research can help illustrate the prevelence of this type of sexism. ?Jillian McCarten: This is a compelling example of how in our systems of power some people are seen as more valuable than others, and that likely connects to what data sources are seen as valuable.Jamia Williams: “Hidden figure” Jamia Williams: Thank you! Reframing is essential when many of these events were deemed “riots” when it was Black folks rising up against various systems.Jamia Williams: Still happening today!?Jillian McCarten: I think data is especially important in communicating how segregation persists, and how unofficial segregation is often harder to confront. ?Jillian McCarten: I think it’s important to confront the differences between the image of the US presented and the realities that people live in. I resonate with this statement- growing up I was told over and over how the US is the best place to live, and in the past few years I’ve been learning more about the historical and current harms perpetuated by our government?Jillian McCarten: So many decisions and judgement-calls that go into telling historical events, especially a quick summary like this. I’m glad that this author presents the police this way; I think a lot of authors I’ve read will ignore this reality. ?Amanda Christopher: This is a new term for me! ?Amanda Christopher: This makes me wonder how many women before her advocated for themselves, or if she was the first women at NASA to do so as her supervisor claimed. If she was not, why was her case different? What about the culture of the time at NASA allowed for her to be promoted? If she was the first, what would have happened if other women before her had the courage like Christine to speak up.?Melinda Rossi: Perfect for educators!?Melinda Rossi: I like that the authors are working to offer this knowledge to all.?Melinda Rossi: I like this. Giving credit where credit is due…what a concept!?Melinda Rossi: Ok, here’s the good-for-humanity stuff!?Melinda Rossi: The sad part is that it’s mostly used for financial gains, and not for the good of society/humanity. ?Melinda Rossi: This is sad and terrifying…and yet also seems about right. ?Melinda Rossi: I like this. Data can never capture all and that’s important to remember when we are looking at data and generalizing as if all are spoken for.?Tegan Lewis: This sums up our education system-using data and test scores to maintain the inequity in our school system.?Melinda Rossi: Yes! THIS! + 1 more...?Tegan Lewis: Data is more than numbers. What other data could be gathered in a school system??Tegan Lewis: Does it have to??Tegan Lewis: Would this be considered a misuse of data? Or more of the root of bias??Tegan Lewis: data feminism-can be used to expose inequity and challenge systems of power.Esmeralda Orrin: .Ah, capitalism,’?Tegan Lewis: gender oppression-was evident in the case of Darden?Tegan Lewis: Identity?Tegan Lewis: Would this apply to all forms of sexism, regardless of gender??Amanda Christopher: I would say absolutely, yes. I think one large misconception about feminism is that it only focuses on women, not all genders and sexes.Esmeralda Orrin: somehow I’m not surprised that men know what women are happy doing?Melinda Rossi: Finding a supportive community is key! ?Melinda Rossi: I think this part is so important. Being willing to educate themselves on issues that they might unconsciously contribute to is critical.?Melinda Rossi: We are not a monolith!?Melinda Rossi: bell hooks coming in hot with the truth.?Melinda Rossi: Hidden Figures was (sadly) the first time I had ever heard of Black women at NASA.Fagana Stone: The article could have had more power had the authors also included a note about countless studies that show invaluable contribution of diverse backgrounds and perspectives to innovation and progress. Fagana Stone: Not applicable to all cultures, as there are cultures ruled by matriarchs.?Amanda Christopher: Yes and in those cultures feminism may look differently as feminism is focused on equal rights for all genders. Many of the matriarchical cultures have more than two genders. And just about all societies have some form of gender inequalities.Fagana Stone: Wouldn’t the algorithm update itself as more surveillance data is available rather than fixate on old historical data??Melinda Rossi: That’s a good point. You would think it would be able to update with technology advancing as much as it has. + 1 more...Fagana Stone: In a capitalist country, it should be expected to have wealth inequalities… Not everyone can be wealthy nor can everyone struggle financially. Yes, there are systemic injustices, but it takes all parties involved to improve access to and understand importance of education. Dominated by two political parties running on opposing views, I can’t help but feel very pessimistic about significant progress on these issues in the near future (while the country is enacting backward looking policies and laws). Fagana Stone: “Racism” is a learned concept. Born and raised in Azerbaijan, we did not have a concept of racism, to which I was exposed to after having moved to the states. ?Amanda Christopher: Great point to add to the authors’; that it is “impossible to claim a common experience… for all women, everywhere.”Fagana Stone: It is important to note that men too struggle with sufficient paternity leave. It is critical to shift the thought from women being the only ones fit for childcare role to include men as well.Fagana Stone: Women in some states still fight for their reproductive rights!?Melinda Rossi: Fagana, that’s exactly what I was thinking. Some things change, and some things stay the same. Fagana Stone: Critical lesson in articulating the needs with the hope to identify and operationalize solutions.Fagana Stone: Excellent film! I highly recommend it.Fagana Stone: “The Soviet Union was responsible for launching the first human to space, carrying out the first spacewalk, sending the first woman to space, assembling the first modular space station in orbit around Earth (Mir) — and most of these achievements were accomplished using the same space capsule used in the 1960s.”Fagana Stone: Being from one of the former Soviet Union countries, it is also important to note that the Soviet Union had a more considerable tolerance for risk, hence the advancements mentioned in the field of astronautics. ?Rayon Ston: qKaiyun Zheng: I’ve listened to a podcast before, which is called What happens when an algorithm gets it wrong, In Machines We Trust, MIT Technology Review. It mainly talks about the technology of the use of facial recognition in public and where it can go wrong.The podcast begins with a story about a man who is accused of stealing because a computer matches his photo with a picture of the thief caught on a public camera. But in fact, it was a computer error. The computer can't tell whether the thief is a man or a black man, and the police blindly trust the computer's judgment, and moreover, he says that historically black people steal a lot. And based on the conversation in the podcast, the facial recognition technology isn't perfect, it makes mistakes and matches the wrong people. Such problems are not rare, and involve both privacy violations and potential discrimination.It made me realize that we have a lot more to do in data science.Kaiyun Zheng: We’ve learned about the differences between information and data in the very beginning lessons, and this makes me think about why we emphasize “data” instead of “info” here before the term "feminism".Kaiyun Zheng: The mention of the uneven distribution of power in this book piques my curiosity about how the topic will be addressed. I have previously read a book called "Foundation of Information," which discusses the relationship between power and information. The book suggests that when power is concentrated, the information gathered can sometimes deviate from the truth. As a result, I am curious about how data feminism ensures the authenticity and effectiveness of information collection.Additionally, the information of researching history is also mentioned in the later interview, which makes me curious about how the information of the past can be useful in the present so that it can be used as part of data feminism.Kaiyun Zheng: Intersectionality as a new term which appears after feminism is really interesting. I like how it is introduced here which talks about the example of a black woman since I thought it is the manifestation of a much broader phenomenon in the society. From Google, it is defined as "the interconnected nature of social categorizations such as race, class, and gender, regarded as creating overlapping and interdependent systems of discrimination or disadvantage" which strongly linked to the topic "feminism" (actually closer to equal rights).Each person has multiple identities. For example, I am a university student, an employee at a company, and a kid at home. These are just a few of the many labels that can be applied to an individual, including larger categories such as race, gender, and education. In an information-oriented society, labels can often obscure our understanding of the true nature of things and the individuality of a person can be overlooked. Intersectionality, while still categorizing individuals, does so in a more nuanced manner by connecting multiple labels to form a more specific and accurate representation. This can help individuals overcome challenges and reduce the oppression of vulnerable groups by dominant societal forces.Although from my personal point of view, classifying people is not a very good behavior after all, its emergence also reflects the response to various situations, so as to reduce the oppression of the dominant group of society on the vulnerable group.?Yuanxi Li: It's heartening that the value women create in terms of data has ultimately been validated by data itself, and this result has been achieved through mutual assistance among women.?Yuanxi Li: Intersectionality is an important term that shows how race, class, gender, and other individual characteristics affect with each other?Joe Masnyy: This story has shown the possibilities of this sort of advocation, though as stated early this is clearly not the norm. I appreciate the value of anecdotes such as these, although this text would benefit from hard data to show the scope and magnitude of the issue. Hopefully this is something that is explored further on in the text.?Joe Masnyy: This reality was, in the grand scheme of things, not very long ago. You could argue this still persists even today, with many STEM fields still being largely male in demographics. Even still, women tend to make less than men on average in the exact same fields.?Kotaro Garvin: We have so much more capability then before, but why does it seem like we are not making the same kind of progress? Is it not happening? or is it just unrecognized? ?Kotaro Garvin: I think this is one of the greatest ideas I have ever read, but it also shows why data is so important, everybody is unique but we can still be categorized using data. ?Justine Smith: taking a stand against system that is benefit you?Seng Aung Sein Myint: The decision making process is alway opaque. Hope there is some kind of US federal law which push the school to be a little bit transparent than before. ?Seng Aung Sein Myint: This kind of statistic of average, also make something very simple. No, I am not arguing about this data. ?Seng Aung Sein Myint: Hmm. It is strange to read now. ?Finch Brown: This is such a great line! No wonder someone has already commented on it. I have been thinking a lot recently about how subjective human experiences align and diverge, and how insufficient language and data are in describing experiences. A cool article I just read that reminds me of this is from the New Yorker: How We Should Think About Different Styles of Thinking. One main draw for me in data science is tackling the challenge of most accurately representing data and the stories it tells, given its inescapable constraints.?Yasin Chowdhury: Skill is important everywhere but in a different ways. so its good to have skills. ?Yasin Chowdhury: Without this line the entire story would not exist. But still now a days we do not see that courage specially in black women whoa really talented but chose towards non stem fields because of the difference in ratio. ?Jayri Ramirez: I believe that it is important to understand that it is more than ones gender that can affect the experiences of women. I think this statement is a good description of how there are many dimensions which affect racism and other forms of oppression. ?Roujia Wang: This shows that feminism can meet two kinds of human needs, the first is the detailed technical needs of NASA space agency, and the other is to meet the need of women also need equal status and need the same rights as men to achieve their dreams. In this process, feminism and data science are inextricably linked to each other's achievements.?Seyoon Ahn: As it was discussed in comment above, this part demonstrates the needs of feminism in data science and how not just the individuals but the society as a whole can benefit from data science with an approach of feminism. ?Roujia Wang: In that world, the stereotype of women was that women were not allowed to work in the sciences and that women were more at home with young children and taking care of the family than working outside the home. But such stereotypes prevented many talented women from having a chance to make a career out of it.?Roujia Wang: When people are misogynistic, female scientists contribute to data science research, because women can make up for the shortcomings of men in many ways. Women also use their abilities to change the perception of women in the world?Monserrat Padilla: I am really eager to learn and practice more methodically these principles. The key value in being able to analyze data holistically and seeing the subject matter as a whole at the intersections. Putting these principles into practice will allow for a more complete truth to be available while producing data and/or reading data.?Caroline Hayes: I think it is really moving that they decided to use someone as powerful as Darden’s story to start this textbook. As such a strong, smart women she was able to work in an intellectual field and challenge norms like she did in this instance. In a way she is breaking from the data so commonly released on women in and out of the work field. Instead of becoming one of the computers like 100% of the women before her, she became a part of the 1% who changed it for everyone.?Vibha Sathish Kumar: I agree, this part also resounded with me as well. It also makes you wonder about those other women who were stuck in the same situation for years. Many of those women likely didn’t have access to data or have the means to stand up for themselves in the environment set-up for them. I wonder if this issue is also relevant today, where some women do not have the opportunity to share their experience or have it accounted as data. It takes time to have others recognize their privilege and use it to bring others up - maybe data feminism could be a way to do that. ?Natalie Pei Xu: That is sad to notice that there are still many woman is being ignored and stay silence from some reasons. ?Natalie Pei Xu: First hand resource will be more helpful.?Natalie Pei Xu: This conscious awareness of “product of unequal social relation” is important while collecting, analyzing and concluding, since there is already been a lens filtered the primary source. ?Natalie Pei Xu: Besides using data as a powerful tool to pursuit justice, personal privacy is also a critical concern. ?Natalie Pei Xu: This is very inclusive and thoughtful description about feminism which makes it open up to various people among physical and mental features that aiming at the same thing: justice.Eva Maria Chavez: .Eva Maria Chavez: ecFagana Stone: If we were to focus on collecting unbiased data, then why would the authors even mention “priority” in qualifying it? + 1 more...Eva Maria Chavez: ECEva Maria Chavez: emEva Maria Chavez: collective powerEva Maria Chavez: EMCEva Maria Chavez: ?Kim Martin: test?nyah bean: -?nyah bean: -Fagana Stone: Qualitative data can be so powerful!?nyah bean: -?nyah bean: -?nyah bean: -?nyah bean: -?nyah bean: -?nyah bean: -?nyah bean: yes?nyah bean: -?nyah bean: -?nyah bean: -?nyah bean: -?nyah bean: -?nyah bean: -?nyah bean: -?Yolanda Yang: We should know that “We are under this situation.“?Yolanda Yang: Very personally, I am always shocked by how precise the content they suggest “what I may also interested.“ Also reminds me of Health on the phone, that it reminds us of our next coming period time, and usually also precise.?Melinda Rossi: Yes!?Yolanda Yang: People with privilege cannot recognize, even if they do, they are less likely to make any change, as this would decrease their benefit?Jillian McCarten: One quote that I think of often is “when one has held a position of privilege for so long, equality feels like oppression.” ?Yolanda Yang: “Speak“ and MeToo. Makes it visible.?Yolanda Yang: Looking for equality = we need make efforts ahead to it. Need to uncover it. ?Yolanda Yang: Reminds me of china girl or china head, that used at the beginning of analog films, those are females without names that contribute to film industry, but they were not even supposed to be presented to the audiences.?Yolanda Yang: Even though this has been desegregated for years, it still exists among people’s unconsciousness. ?Jeraldynne Gomez: systematically desgined so that women were stagnant in their positions. The disparity of power and the assertion of such system is correlated as it benefits the men who are implementing it ?Michela Banks: Important Annabel DeLair-Dobrovolny: Converting people into data as a means to assert power and dehumanize the “other”.?Michela Banks: definition ?Michela Banks: At least 50 years later. Why at this time??Michela Banks: power distance between men and women ?Michela Banks: were not recognized for intelligence ?Michela Banks: indicates perception of women in workplace?Michela Banks: note segregation during time of education?Michela Banks: describes environment?ethan chang: Shows how much has changed since then… even though can still be seen to this day.Annabel DeLair-Dobrovolny: Power imbalances contributing to the dehumanization of women in the workplace.?athmar al-ghanim: exactly!!! some individuals have such a negative connotation toward “feminism”. but here, it proves that feminism is just a group of like-minded individuals peacefully going after what they want. all feminists want is change, because for so long, there has been none. and it is about time we stopped neglecting the minority and start appreciating and uplifting them.?athmar al-ghanim: its quite sad to see how barely anything has changed in regard to men having the upper hand in workforces, especially those in STEM related fields. ?athmar al-ghanim: this passage resonates with me as it is a big fear of mine, a woman, going into STEM, that I will constantly have to fight twice as hard as a man, just to show that I am worthy of a position that I am qualified for.?Angela Li: I question how long this took and whether there was an internal fight for Darden to receive her long deserved promotion. The reason being is that I find it hard to believe that the men in power are so readily to accept change in which they lose power or control that benefits them. Earlier in this text, when Darden was working as a calculator with no respect or recognition, her supervisor said that the reason women and men lead such different career paths despite having the same credentials was because no one had ever complained. Through these quotes It sounds like the narrative being pushed is that main reason women are oppressed is because men are unaware of the the disparate treatment and effects of their actions which seems too excusable to not be questioned.Fagana Stone: I read this as the systemic discrimination against women was so normalized that it was essentially on everyone’s blindspot. Having such data showed a trend, a factual analysis that no one could ignore. Also, it takes a lot of courage to challenge the status quo, and these ladies found the way to communicate it to their superiors - through numbers!?Angela Li: I’d like to expand and connect on this idea to reaffirm the highlighted statement. I’m connecting it to to the text “Feminism is for Everybody” by Bell Hooks. In early stages of feminism there were a select few types of feminism that were identified. Of these types there were reformist and visionary feminism. reformist feminism focused mainly on equality with men in the workforce which overshadowed the original radical foundations of contemporary feminism which called for reform and restructuring of society to form a fundamentally anti-sexist nation. while white supremacist capitalist patriarchy suppressed visionary feminism, reformist feminists were also eager to silence them because they could maximize their freedom within the existing system and exploit the lower class of subordinated women.?Cynthia Lisee: Thank you for this important insight?Kat Rohrmeier: The definition of dehumanizing.?Melinda Rossi: Right? Gross.?Aneta Swianiewicz: ?Aneta Swianiewicz: ?Aneta Swianiewicz: ?Aneta Swianiewicz: data to expose inequality?Aneta Swianiewicz: ?g m: “institutional mistrust”?g m: Not only looking @ data, but the how. How was it collected? How has it been processed, and by who??Melinda Rossi: ^^^ Yes! Great point!?g m: Why data is important: challenges privileged hazard by making invisible systems visible.?Lena Zlock: Power dynamics and access to knowledge // needs an equitable foundation, clear statement of relations?Lena Zlock: DH as a countercultural phenomenon?Peem Lerdp: Target goals and audiecnes.?Peem Lerdp: Theme 2?Peem Lerdp: Theme 1?Vibha Sathish Kumar: I find it interesting that the authors mention this explicitly to the readers. A clear stated point that everyone is involved with change. ?Peem Lerdp: Insight on “science” in the phrase data science.?Peem Lerdp: Problems with distinction between what is data and what is information involve deciding who holds the power to make those distinction.Fagana Stone: It is important to add that how we interpret data matters as well.?Peem Lerdp: Def’n?Peem Lerdp: Using data to corroborate lived exp.?Peem Lerdp: Dissociating the identity of the author with the ideas discussed by the author.?Peem Lerdp: Intersectionality and its historic roots.?Peem Lerdp: History of gender inequality in workplace.?Megan Foesch: I think this is such an important lens to have when analyzing the world and what is important. Often times, we get caught up in trivial things that are not important in the bigger picture. We must remind ourselves that issues like justice, race, feminism, equality, and power are all crucial everyday issues that we must solve in order to live as a flourishing community. In order to have justice, each individual must be heard and seen which is currently not happening and needs to. ?Megan Foesch: Throughout this whole article I think that this sentence is one of the most important. The authors reflect on how data feminism is truly about power and how the lack of power between genders signifies that there is an inequality. It is important for us to acknowledge and address this inequality so women can feel as empowered, strong, and safe, as men feel. I think it is also important to point out that data feminism isn’t only for women but “men, nonbinary, and genderqueer people”. In order for a change to be made everyone must accept and acknowledge the imbalance of power that occurs in society. ?Megan Foesch: Before taking this class, I had very rarely heard the term Data Feminism, therefore this idea was somewhat new to me. I am familiar with the ideas of feminism however thinking about feminism from a scientific standpoint is one that can help reinforce popular opinions about lack of equality among genders. It is very difficult to argue something when it is science especially when focusing on systems of power and who holds that power as it is backed by scientific data and evidence.?Nick Klagge: It appears that a word or phrase is missing from the end of this sentence. Perhaps “lived experience” or something like that??Sara Blumenstein: What makes a project feminist??Sara Blumenstein: Data as “consolidating power over lives”?Sara Blumenstein: “Data feminism” as goal and process?Sara Blumenstein: Data vs. fact?Sara Blumenstein: Aggregating data to challenge institutional systems of power?will richardson: This is a very deep statement about feminism. It is also very relevent to the readings.?Sara Blumenstein: Defining “feminism” + 1 more...Data FeminismMIT PressRSSLegalPublished withCommunityData FeminismCollectionDData FeminismPubIntroduction: Why Data Science Needs FeminismcollectionData FeminismCite as D’Ignazio, C., & Klein, L. (2020). Introduction: Why Data Science Needs Feminism. In Data Feminism. Retrieved from https://data-feminism.mitpress.mit.edu/pub/frfa9szdduplicateCopymoreMore Cite OptionsTwitterRedditFacebookLinkedInEmailAuto Generated DownloadPDFWordMarkdownEPUBHTMLOpenDocumentPlain TextJATS XMLLaTeXWhat Is Data Feminism?Data and PowerData Feminism in ActiontickRelease #6Aug 25, 2021 3:54 PMdocument-shareRelease #5Aug 25, 2021 3:22 PMdocument-shareRelease #4Feb 11, 2021 10:25 AMdocument-shareRelease #3Jul 27, 2020 9:43 AMdocument-shareRelease #2Jul 27, 2020 9:42 AMdocument-shareRelease #1Mar 16, 2020 9:12 AMWhat Is Data Feminism?Data and PowerData Feminism in Action(function(){function c(){var b=a.contentDocument||a.contentWindow.document;if(b){var d=b.createElement('script');d.innerHTML="window.__CF$cv$params={r:'8be8b165eed78191',t:'MTcyNTU2NTI0Ni4wMDAwMDA='};var a=document.createElement('script');a.nonce='';a.src='/cdn-cgi/challenge-platform/scripts/jsd/main.js';document.getElementsByTagName('head')[0].appendChild(a);";b.getElementsByTagName('head')[0].appendChild(d)}}if(document.body){var a=document.createElement('iframe');a.height=1;a.width=1;a.style.position='absolute';a.style.top=0;a.style.left=0;a.style.border='none';a.style.visibility='hidden';document.body.appendChild(a);if('loading'!==document.readyState)c();else if(window.addEventListener)document.addEventListener('DOMContentLoaded',c);else{var e=document.onreadystatechange||function(){};document.onreadystatechange=function(b){e(b);'loading'!==document.readyState&&(document.onreadystatechange=e,c())}}}})();error

      This is another example of how we need more women in STEM. There are so many officially desegregated organizations. But segragation is embedded in behavior and that is what needs coaching.

    1. Note: This preprint has been reviewed by subject experts for Review Commons. Content has not been altered except for formatting.

      Learn more at Review Commons


      Referee #2

      Evidence, reproducibility and clarity

      In this manuscript by Kehrer et al., use an elegant Apex2 BioID method to identify novel putative microneme proteins by mass-spectrometry and pick one candidate for further characterization. They identify a novel putative microneme protein they name Akratin which they characterize through targeted gene deletion and a series of complementation experiments. This reveals first that akratin appears to be functioning in male gamete egress, and though complementation using a putative trafficking mutant, also in midgut traversal.

      Overall the study is thoroughly performed but some of the conclusions are not fully supported.

      1)The newly identified microneme protein is still putative in my mind as the authors have not co-localized it with another marker. This is crucial for conclusions about its putative function and crucial for the trafficking experiment as explained below. It is also important given the high number of putative false positives in the BioID experiment.

      2)I would consider it essential to also localise the Apex2 tagged SOAP protein as the authors cannot be sure that there is a partial mislocalisation of the protein leading to false positives.

      3)I am not convinced by the trafficking defect. This could be because the localsation in the images are not easy to distinguish and it may be much clearer looking down the microscope. I think co-localisation with another microneme marker would go a long way and demonstrating that akratin upon mutation actually localises elsewhere is important. It is even more important since there is no phenotype in male egress, but then later in ookinetes, which is a bit surprising if this is a proper conserved trafficking motif.

      4)The candidate selection section is poorly described. A flow chart or clearer inclusion/ exclusion criteria would be useful.

      5)I understand the approach to focus on more abundant biotinylated proteins, however, I think it may not be the best approach to use peptide counting. Apex2 labelling as the authors rightfully say, is mainly based on tyrosine labelling of surface exposed areas, so the abundance of proteins in the IP will depend on accessible tyrosines, protein abundance, distance from the bait, size of the protein and how many tryptic peptides can be generated. Reproducible results between 2 conditions are more likely to show true positives and may be the best way to prioritize, or assign confidence. Also: cOuld the authors use mean intensity values for the peptides covering proteins as a metric for abundance using label free quantification? This is not a requirement but may allow quantification in a slightly better way. I am not sure about the Table S1 colour scheme (the legend does not explain green, purple and blue shading). Are all green ones confirmed microneme proteins? Please add a proper descripton of the table and columns.

      6)Figure 2C and D are from PlasmoDB and should ideally not be included as figure panels. This is misleading and could either be mentioned in the text, or put into supplementary data with a clear note that the authors have not aquired these data. I would also suggest to move figures 3A-C into figure 2 and present the KO with the complementation data for a direct comparison.

      Minor:

      1)When the authors say "numbers of peptides identified": is this unique peptides or does it include non-unique peptides?

      2)Figures 1 I-K could move into supplementary as they are somewhat non-informative given the nature of BioID described in the main points.

      3)Line 253: Whether akratin is involved in membrane lysis directly, or important for microneme secretion so this is a knock-on effect is not known. This could be added to the discussion, but there is no evidence for this statement in the results section.

      4)Line 274: Refers to Figure 3F, which does not exist.

      5)Line 333: Overall I think this is a bit of an overstatement. The use of Apex2 in these conditions is definitely nice to see but for now the authors have validated none of the microneme proteins by co-localization. So we are still a bit in the dark how well the method works in terms of false positives. The targeting motif in my mind is not yet confirmed in the absence of co-localisations with other markers. An alternative explanation could be that the c-terminus of the protein is important for its function in one stage, but not another but that trafficking is not- or only marginally affected.

      Significance

      The significance of the manuscript in my mind lies in the application of Apex2 in Plasmodium parasites, which will be an advance for the field. However, we do not learn about labeling times, how short it can be so its potential is not fully looked at.

      The list of the putative micronemes will of course be of high interest for the community, but because of the limited validation in this study will require further validation by others.

      The identification of the dual function of this protein in transmission in egress and ookinete traversal is interesting and surely leads to further studies. The identification of a putative differential trafficking motif is intruiging, if, as stated in the major concerns, this can be validated.

      My expertise lies in Plasmodium biology with good knowledge of mass-spectrometry approaches.

      Referees Cross-commenting

      I agree with the assessment of the other reviewer, a slightly more detailed discussion of the hits would be desireable (exported proteins, why are they there). This could be a drawback of the system used, and mentioned.

      Western blot of the GFP is a very good idea to clarify whether the localization is maybe, in parts, GFP that is not fused to the full lenght protein, either by cleavage, or a breakdown product.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Manuscript number: RC-2024-02378

      Corresponding author(s): Angelika Böttger

      [Please use this template only if the submitted manuscript should be considered by the affiliate journal as a full revision in response to the points raised by the reviewers.

      • *

      If you wish to submit a preliminary revision with a revision plan, please use our "Revision Plan" template. It is important to use the appropriate template to clearly inform the editors of your intentions.]

      1. General Statements [optional]

      After we have carefully studied the four reviews we have received, we made some major revisions to the manuscript. These included the following main points:

      • Concerns regarding clarity of the manuscript: we have substantially edited the abstract, introduction and discussion part of the manuscript and added many more references to previous work by other authors, especially Cazet 2021, Tursch 2022 and Gahan 2017. We focused our introduction and discussion on organizer function and on the Gierer-Meinhardt-Model for pattern formation. We think that the conclusions are of great general interest because they suggest a function of the Hydra head organizer according to the original definition by Hans Spemann, that is “harmonious interlocking of separate processes which makes up development”. Notch signaling, in our interpretation, is an instrument for this function of the organizer. Comparison with Craspedacusta compellingly illustrates this idea.
      • Concerns regarding Craspedacusta experiments: we have isolated four Craspedacusta transcripts (CsSp5, CsWnt3, CsAlx and CsNOWA) and analyzed their response to DAPT during head regeneration in Craspedacusta. This revealed that DAPT did not inhibit CsWnt3 expression, in accordance with it not having an effect on head regeneration in Craspedacusta However, DAPT inhibited expression of the other potential CsNotch target genes, confirming that DAPT generally works in Craspedacusta polyps as Notch-inhibitor.
      • Concerns regarding HyKayak function: we have conducted a rescue experiment to show the function of Hykayak as a target for Notch-regulated repressor genes and a local inhibitor of Wnt-3 expression, which revealed that the expected up-regulation of HyWnt3 in DAPT-treated animals was very weak and did not rescue the DAPT-regeneration phenotype-this was discussed, but data were not included.

      2. Point-by-point description of the revisions

      This section is mandatory. *Please insert a point-by-point reply describing the revisions that were already carried out and included in the transferred manuscript. *


      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      Major: • The introduction is lacking a full description of what is known about transcriptional changes during Hydra regeneration and in particular the role of Wnt signalling in this process. Of note the authors do not cite several important studies from recent years including (but not limited to):

      *https://doi.org/10.1073/pnas.2204122119 *

      *https://doi.org/10.1186/s13072-020-00364-6 *

      *https://doi.org/10.1101/587147 *

      *https://doi.org/10.7554/eLife.60562 This problem is further compounded later when the authors do not cite/discuss work which has performed the same or similar analyses to their own. The authors should endeavor to give a more comprehensive background knowledge. *

      Answer:

      Our work focuses on the role of Notch-signalling during Hydra head regeneration, specifically when the head is removed at an apical position. We therefore now have included additional information about transcriptional changes during this process in the introduction. In addition, we have included the suggested citations in the introduction to give a more general background knowledge.

      e.g. .Following decapitation, the expression of Hyβ-catenin and HyTcf was upregulated earliest, followed by local activation of Wnt genes. Among these, HyWnt3 and HyWnt11 appeared within 1.5 h of head removal, followed by HyWnt1, HyWnt9/10c, HyWnt16, and HyWnt7, indicating their role in the formation of the Hydra head organizer (Hobmayer et al., 2001; Lengfeld et al., 2009; Philipp et al., 2009; Tursch et al., 2022).

      • The authors do not cite or reference at all the study by Cazet et al. which used iCRT14 along with RNAseq and ATACseq to probe the role of Wnt signaling during early regeneration. This is a major issue. Although I appreciate that the authors have done much longer time courses and that their data therefore add something to our understanding it will still be important to discuss here. For example, the authors show that Wnt3 is activated normally in iCRT14 animals. Is this also seen in the RNAseq from Cazet et al.*
      • *

      Answer:

      iCRT14 was used in Hydra regeneration experiments by Gufler et al (which we did cite) and Cazet et al, but the specific aspects of hypostome and tentacle regeneration were not addressed. Cazet et al. have analyzed HyWnt3expression after iCRT treatment during the first 12 hrs of regeneration. Our data show, in addition that HyWnt3 is not controlled by TCF-dependent transcription during Hydra head regeneration after apical cuts throughout the whole regeneration process including the morphogenesis state. Nevertheless, the other Wnt-genes, which are indicated in canonical Wnt-signalling are affected by iCRT14 also in our study.

      We have now included comparison of Cazet- and our data, we wrote:

      HyWnt3 and Wnt9/10c expression are swiftly induced by injuries. When HyWnt3 and HyWnt9/10 activities are sustained, organizers can be formed, which induce ectopic heads when the original organizer tissue (the head) is removed (Cazet et al., 2021; Tursch et al., 2022).”

      The effect of iCRT14 had been analyzed in previous studies (Cazet et al., 2021; Gufler et al., 2018; Tursch et al., 2022). All showed b-catenin-dependency for down-regulation of head specific genes in foot regenerates at time points up to 12 hrs after head removal, including HyWnt3. They also stated a failure of head regeneration in the presence of iCRT14 but, in accordance with our study, did not reveal that HyWnt3 expression at future heads depended on b-catenin. None of these studies analyzed the regeneration of tentacles and hypostomes separately and they did not report whether* the regeneration of hypostomes 48 hrs after head removal occurred normally upon iCRT14 treatment. *

      • The visualizations used in Figure 3 are quite difficult to interpret and do to in all cases match descriptions in the text. The way the same type data is displayed in figure 5 so much nicer. It is also better to treat the same types of data in the same manner consistently throughout the paper. For Hes, for example, the authors state that there is a reduction although the data shows that this is very small and taking into account the 95% confidence interval does not seem to be significant. If this is the case then the positive control is not working in this experiment. This would be much clear if individual time points were compared like in figure 5 and statistical tests shown. The authors then state that Alx is not affected but there is actually a larger effect than what they deemed significant for Hes (the axes are notably different between these two and I think a more consistent axis would make the genes more comparable). Similarly, Gsc is described as being not affected at 8 hours but it appears again to change more that the positive control Hes. Given this I would call into question the validity of this dataset and/or the interpretation by the authors. A more thorough analysis including taking better into account statistical significance would go a long way to increasing confidence in this data. • The same issues in interpretation described for Figure 3 also apply to figure 4. The authors state that Wnt7 is affected less than Wnt1 and 3 but this is not evident in the figure and no comparative analysis is performed to confirm this. The same for Wnt 11 and 9/10c where what the authors description is very difficult to see in the figure. Sp5 is apparently upregulated, but this is not discussed. Again the axes are notably different making it even more difficult to compare between samples. __Answer*____:__

      We have now presented the data by simple scatter blots with significance information for every data point. This allows comparison between samples as requested by the reviewer. The GAMs were moved to the supplement. We believe that some readers may appreciate GAM-representation of the data because of the accessibility of the confidence interval over time.

      Concerning DAPT:

      “We now performed RT-qPCR analysis to compare gene expression dynamics of these genes during head regeneration 0, 8, 24, 36 and 48 hrs after head removal. Animals were either treated with 30 µM DAPT in 1% DMSO, or 1% DMSO as control for the respective time frames. Timepoint 0 was measured immediately after head removal. The results of these analyses revealed that HyHes expression was clearly inhibited by DAPT during the first 36 hrs after head removal (Fig. 3B), confirming previously published data which had indicated HyHes as a direct target for NICD (Münder et al., 2010). HyAlx expression levels were slightly up-regulated after 24 hrs, but later partially inhibited by DAPT (Fig. 3C). CnGsc expression under DAPT treatment initially (8hrs) was comparable to control levels, but then it was strongly inhibited (Fig. 3D). This goes along with the observed absence of organizer activity in regenerating Hydra tips (Münder et al., 2013). Interestingly, a similar result was seen for HySp5 expression, which was also normal at 8 hrs but was then inhibited by DAPT at later time points (Fig. 3E). HyKayak, while expression is normal after 8 hrs, was strongly overexpressed between 24 and 36 hrs of regeneration in DAPT-treated polyps in comparison to control regenerates (Fig. 3F).

      Concerning iCRT14

      Next, following the same procedure as described for DAPT, we compared the gene expression dynamics of iCRT14-treated regenerates with control regenerates. We found that the expression of HyWnt3 was not inhibited by iCRT14. In fact, it even appeared slightly up-regulated at the 8 hrs time point (Fig. 4A). Normal HyWnt3-expression at the end of the regeneration period was confirmed by in-situ hybridization for HyWnt3 as shown in Fig. 1D, indicating that HyWnt3 expression patterns and expression levels in ecto- and endodermal cells of the hypostome were faithfully regenerated (Fig. 4A). In contrast, HyAlx expression was completely abolished by iCRT14 (Fig. 4B), consistent with the observation that iCRT14-treated head regenerates did not regenerate any tentacles (Fig. 1A). HySp5 expression was not significantly affected by iCRT14 treatment at any time point (Fig. 4C).

      Furthermore, we found that CnGsc levels in iCRT14 remained similar to control regenerates up to 24 hrs, but were attenuated at later time points (Fig. 4D), very similar to the expression dynamics of the Notch-target gene HyHes (Fig. 4E). The expression of HyKayak was decreased at 8 hrs after head removal in the presence of iCRT14, but then increased above control levels after 48 hrs (Fig. 4F). There were no significant changes in the expression dynamics of HyBMP2/4 and HyBMP5/8b between iCRT14-treated regenerates and controls (Fig. 4G, H).”

      The precise number of biological replicates can be seen in the individual diagrams, they included for most genes three biological replicates, with always three technical replicates for each data point. Biological replicates were obtained over several years by different researchers. For some genes, we obtained very consistent data with high confidence in every experiment (e.g. HyWnt3, HyBMP4). We illustrate this in table 1, where three arrows indicate all such cases. With some genes we observed greater variation, which we interpret as no effect or a minor effect in table 1. Some of these variations may be explained by our observation of wave-like patterns in the expression dynamics. Therefore, we have included the following statement:

      “In addition, the gene expression dynamics for many of the analyzed genes appears in wave-like patterns in some experiments (see Figs S3 and S4). As we have only four time points measured, we cannot draw strong conclusions from these observations, except that some of the deviations in our data points (e.g. 48 hrs HyHes)”

      • In their description of figure 4 the authors completely omit to discuss the Cazet et al dataset which has the exact same early timepoints for iCRT14 treatment. This must be discussed and compared and any difference noted. * Answer:

      We included the iCRT14 results from Cazet et al., in our revised manuscript (see above).

      • End of page 11: The authors propose a model thereby the role of Notch in Wnt3 expression may be due to the presence of a repressor. However, I don't see any putative evidence at that stage. The authors also do not cite relevant work from both Cazet et al. and Tursch et al which show that Wnt3 is likely upregulated by bZIP TFs. In both these cases the authors show evidence of bZIP TF binding sites in the Wnt3 promoter along with other analyses. This is very relevantto the model presented by the authors here and must be discussed and compared. - * In particular the authors put forward HyKayak as an inhibitor of Wnt3 and this should be discussed along with the previous work.

      Answer:

      Tursch et al. 2022 did not claim that HyWnt3 is upregulated by bZiP TFs. They showed that HyWnt3 was strongly upregulated in a position-independent manner upon inhibition of the p38 or JNK (c-Jun N-terminal kinase) pathways (i.e., stress-induced MAPK pathways). This would rather support our hypothesis that HyKayak (AP-1 protein) might be a repressor of Wnt3-expression.

      Cazet et al have indicated that injury-responsive bZIP TFs are the most plausible regulators of canonical Wnt-signalling components during the early generic wound response. They identified CRE-elements, which can be bound by bZIP TFs, in the putative regulatory sequences of HyWnt3. However, they focused on the early stage of regeneration (0-12hpa), and showed that bZIP TFs, including jun, fos and creb are transiently upregulated at 3hpa and hypothesise that they could induce the upregulation of HyWnt3 at this stage as an injury response. We have to point out that the Hydra fos-homolog Hykayak, which our work is concerned with, is not identical with the fos-gene described in Cazet’s paper. In addition, the Hykayak gene was downregulated by Notch signalling during the morphogenesis state of regeneration (24-36 hrs), which is not the same stage investigated by Cazet et al. To avoid confusion, we have now included the Cazet-fos-sequence in our sequence comparison in Fig. S1 (fos_Cazet_HYDVU). Moreover, we have included more information about fos_Cazet in the context of a comparison with HyKayak.

      • *

      Different bZiP transcriptional factors (TFs) may have different effects on the expression of Wnt genes, and these effects are context-dependent. In previous research, Cazet et al. identified another Hydra fos gene (referred to as fos_cazet) and bZiP TF binding sites in the putative regulatory sequences of HyWnt3 and HyWnt9/10c. They showed that bZiP TF-genes, including Jun and fos, were transiently upregulated 3 hrs after amputation, therefore they hypothesized that bZiP TFs could induce TCF-independent upregulation of HyWnt3 during the early generic wound response (Cazet et al., 2021). However, in our study HyKayak expression continuously increased throughout the entire head regeneration process (Fig. 3E and 4E) including the morphogenesis stages (24-48 hrs post-amputation). Another study reported that inhibition of the JNK pathway (which disrupts formation of the AP-1 complex) resulted in upregulation of HyWnt3 expression in both, head and foot regenerates (Tursch et al., 2022). This result might support our hypothesis, but it only included the first 6 hours after amputation, similar to Cazet’s research. Therefore, it appears that HyKayak and fos_Cazet may have opposing roles in the regulation of Wnt-gene expression and are possibly activated by different signaling pathways depending on the stages of regeneration.

      • On page 12 the authors conclude based on gene expression in inhibitor treatment that there is a “change in complex composition of the two transcription factors.” This is something which would require biochemical evidence and I therefore suggest they remove this entirely. * Answer:

      we have removed this sentence

      • The authors use experiments in Craspedacusta to test their hypothesis of the role of Wnt and Notch signaling in Hydra. There is, in my opinion, an incorrect logic here. Regardless of the outcome, the roles of Wnt and Notch could conceivably be different in the two species and therefore testing hypothesis from one is not possible in the other. The authors should reframe their discussion of this to be more of a comparative framework. Moreover, the results do not necessarily indicate what the authors say. In Hydra Notch signaling is required to form the hypostome/mouth and this is not the case in Craspedacusta while Wnt signaling is required. The authors do not cite an important study from another Hydrozoan Hydractinia (Gahan at al.,2017). In that study the authors show that DAPT inhibits tentacles during regeneration but that the hypostome (or at least the arrangement of neurons and cnidocytes around the mouth) forms normally. This would indicate that in Hydractinia the process of head formation is different to in Hydra and would be congruent with what is shown here in Craspedacusta. This should be more thoroughly discussed, and all relevant literature cited.* Answer:

      We have concentrated our Craspedacusta work on Notch-signalling now. We only show that DAPT does not inhibit the regeneration of Craspedacusta heads. We have included new data showing that nevertheless it has an effect on the expression of hypothetical Notch target genes, but not on CsWnt3 (new Fig. 7). We have re-written our discussion accordingly and included the Hydractinia-work about Notch (Gahan2017). Although the Hydractinia paper lacks gene expression studies making it difficult to directly compare with the Hydra data, it supports our claim that Notch is required for regeneration of polyps with head and tentacles. We indeed do not know anything about Wnt-signalling in Craspedacusta. Our new results show that it is probably expressed in the head, because we observe very low levels of expression in the polyps after head removal, which increases considerably during regeneration of the head. This was included in the results:

      Results:

      “Finally, we investigated the expression of the Craspedacusta Wnt3-gene and its response to DAPT treatment during head regeneration. We observed low expression level of CsWnt3 after head removal (t=0), which dramatically increased as the head regenerated, suggesting that Wnt3 is expressed in the head of Craspedacusta polyps as it is in the head of other cnidarians including Hydra, Hydractinia and Nematostella (Hobmayer et al., 2000; Kusserow et al., 2005; Plickert et al., 2006). In accordance with having no effect on head regeneration, DAPT also did not inhibit CsWnt3 expression during this process in Craspedacusta. This is opposite to the situation in Hydra. If CsWnt3 would be involved in the Craspedacusta head regeneration, this could explain the failure of DAPT to interfere with this process”.

      Discussion part

      “Head regeneration also occurs in the colonial sea water hydrozoan Hydractinia. Colonies consist of stolons covering the substrate and connecting polyps, including feeding polyps, which have hypostomes and tentacles, and are capable of head regeneration, similar to Hydra polyps. Wnt3 is expressed at the tip of the head and by RNAi mediated knockdown it was shown that this gene is required for head regeneration (Duffy et al., 2010). In the presence of DAPT, Gahan et al observed that proper heads did not regenerate, similar to Hydra. However, they observed regeneration of the nerve ring around the hypostome indicating the possibility that hypostomes had been regenerated. Unfortunately, this study did not include gene expression data and therefore it is not clear whether Wnt3 expression was affected or not (Gahan et al., 2017).

      …..

      An interesting question was whether regeneration of cnidarian body parts, which are only composed of one module, also requires Notch-signalling. This is certainly true for the Hydra foot, which regenerates fine in the presence of DAPT (Käsbauer et al. 2007). Moreover, we tested head regeneration in Craspedacusta polyps, which do not have tentacles, and show that DAPT does not have an effect on this regeneration process. This corroborates our idea that Notch is required for regeneration in cnidarians, when this process involves two pattern forming processes to produce two independent structures, which are controlled by different signalling modules. This would be the case for the Hydra and for the Hydractinia heads, but not for Craspedacusta. ”

      Moreover, we indicate at the end of our discussion that further studies about head regeneration in Craspedacusta and the genes involved would be desirable. We believe this would be beyond the scope of the current paper and we are working on a new Craspedacusta study.

      “Future studies on expression patterns of the genes that control formation of the Hydra head, including Sp5 and Alx in Craspedacusta could provide insights into the evolution of cnidarian body patterns. Sp5 and Alx appear to be conserved targets of Notch-signalling in the two cnidarians we have investigated. Wnt-3, while being inhibited by Notch-inhibition in Hydra head regenerates, is not a general target of Notch signalling. It was not affected by DAPT in our comparative transcriptome analysis (Moneer et al. 2021b) on uncut Hydra polyps, and it was also not affected by DAPT in regenerating heads of Craspedacusta.”

      • From reading the manuscript I do not fully understand the model the authors put forward. It is unclear what "coordinating two independent pattern forming systems" really means. It might be beneficial to make a schematic illustration of the model and how it goes wrong in both sets of inhibitor treatments. * Answer:

      We have edited the manuscript considerably and explained what we mean with the two pattern forming systems. It starts with the abstract:

      Hydra head regeneration consists of two parts, hypostome/organizer and tentacle development.”

      Thus, in accordance with regeneration of two head structures we find two signaling and gene expression modules with HyWnt3 and HyBMP4 part of a hypostome/organizer module, and BMP5/8, HyAlx and b-catenin part of a tentacle module. We conclude that Notch functions as an inhibitor of tentacle production in order to allow regeneration of hypostome/head organizer.

      “Polyps of Craspedacusta do not have tentacles and thus, after head removal only regenerate a hypostome with a crescent of nematocytes around the mouth opening. This corroborates the idea that Notch-signaling mediates between two pattern forming processes during Hydra head regeneration”

      We have included the description of the organizer concept in the introduction, because we consider this relevant for our model:

      “The “organizer effect” entails a “harmonious interlocking of separate processes which makes up development”, or a side-by-side development of structures independently of each other (Spemann, 1935). In addition to inducing the formation of such structures, the organizer must ensure their patterning (Anderson and Stern, 2016). With reference to Hydra’s hydranth formation after head removal or transplantation, this involves the side-by side induction of hypostome tissue and tentacle tissue. Moreover, it includes the establishment of a regularly organized ring of tentacles with the hypostome doming up in the middle. The function of the Hydra“center of organization” would then be to pattern hypostome and tentacles and to allow for their harmonious re-formation after head removal”.

      In the discussion we integrate the organizer concept with the Gierer-Meinhardt reaction-diffusion models which still explain many aspects of Hydra development.

      Is Notch part of the organizer? The organizer is defined as a piece of tissue with inductive and structuring capacity. Notch is expressed in all cells of Hydra polyps (Prexl et al., 2011) and overexpression of NICD does not induce second axes all over the Hydra body column (Pan et al., 2024), as seen with overexpression of stabilized b-catenin (Gee et al., 2010). Moreover, Notch functions differently during regeneration after apical and basal cuts. Phenotypically during head regeneration in DAPT, we clearly recognize a missing inhibition of tentacle tissue after apical cuts and missing inhibition of head induction after basal cuts (Pan et al., 2024). We would thus rather suggest that the organizer activity of Hydra tissue utilizes Notch-signaling as a mediator of inhibition. As our study of transgenic NICD overexpressing and knockdown polyps had suggested, the localization of Notch signaling cells depends on relative concentrations of Notch- and Notch-ligand proteins, which are established by gradients of signaling molecules that define the Hydra body axis (Pan et al., 2024; Sprinzak et al., 2010) . This is in very good agreement with a ”reaction-diffusion-model” provided by Alfred Gierer and Hans Meinhardt (Gierer and Meinhardt, 1972; Meinhardt and Gierer, 1974) suggesting a gradient of positional values across the Hydra body column. This gradient may determine the activities of two activation/inhibition systems, one for tentacles and one for the head. When the polyps regenerate new heads, Notch could provide inhibition for either system, depending on the position of the cut.

      We provide a new Fig. 8., which clearly illustrates the effects of DAPT and iCRT14 on hypostome and tentacle regeneration.

      Minor: • The abstract could be rewritten to have more of an introduction to the problem rather than jumping directly into results. It would also be beneficial if the abstract followed the logic of the paper.

      Answer: We agree and have re-written the abstract.

      • In Figure 3 and 4 it is not clear why they are divided into A and B. It appears that the categorization of genes into different groups lacks a clear rationale. This seems totally unnecessary. In addition, the order in which the genes are described in the text does not match what is seen in the figure making it confusing to follow. • In Figure 5 the authors use two different types of charts and I would stick with one. B is much better as it shows the individual data points as well as other information. I would use this throughout including in Figure 3 and 4. *

      __Answer: __

      We changed Fig. 3, 4 and 5 according to these comments and now present the data in one format over all three figures, in scatterplots (more detailed answer above).

      We are now describing the results in the order of the figures, with A and B omitted.

      Figure S3 is missing a description of panel C.

      In figure S3 it is not clear why the inhibitor was removed and not kept on throughout the experiment. Please discuss. __Answer: __

      Fig. S3 was removed.

      Figure S4 has no A or B in the figure, only in the legend. __Answer: __

      We have included A and B…

      *Reviewer #1 (Significance (Required)):

      Although some of the authors data appear to be novel I find the study makes only minor progress on the questions. In particular the authors do not properly cite the relevant literature and to put their manuscript into the correct context. The new model proposed by the authors is based entirely on qPCR data which is not thoroughly analyzed and are not strong enough in the absence of information about the spatial expression the genes they discuss. The proposal of HyKayak as a negative regulator of Wnt3 is interesting but the authors do not provide any solid direct evidence for this (ChIP, EMSA etc) and it is somewhat in disagreement with other models of bZIP function in the literature (which again are not discussed).*

      The manuscript is of limited general interest. It has a number of interesting observations which would be of interest to the Hydra community and the broader cnidarian community. The study lacks contextualization within a broader framework, whether it be in the context of regeneration or Wnt/Notch signaling. This limitation may narrow the overall interest in it.

      Answer:

      Our previous analysis of the effect of Notch on head regeneration in Hydra (Münder 2013) had suggested the inhibition model, which is part of Fig. 8. We show now that during head regeneration in Hydra formation of two structures is guided by different signaling/transcription modules, one using Wnt3 and BMP4, but not b-catenin; and one using BMP5/8 and b-catenin. We suggest that Notch functions as an inhibitor “of use” to the organizer when the “two-part” head structure is regenerated.

      We agree that our original manuscript was not well enough written to clearly put it into developmental context. We now focus the discussion of our work sharply on the organizer problem and think that the conclusions are of great general interest. In a simple view they suggest that the function of the Hydra head organizer is to allow harmonious development of head and tentacles, which we consider separate, and on a molecular basis independently regulated parts of the Hydra head. Notch signaling, in our interpretation, is an instrument of the organizer. Our comparison with Craspedacusta illustrates this idea. Craspedacusta only regenerates one head structure, which is possible in the absence of this instrument (also see reviewers 3 and 4).

      Concerning HyKayak, there is no disagreement with other authors as we analyze a fos-gene different from the one discussed by Cazet et al (see above). We have conducted a rescue experiments as suggested by reviewer 3 with the Kayak-inhibitor and with HyKayak shRNAi knockdown, however, rescue of the phenotype was not achieved although HyWnt3 was upregulated after DAPT treatment in the knockdown group. We attribute this to the very strong effect of DAPT. We have adjusted our hypothesis and only suggest that HyKayak could be a target for the Notch-induced repressor genes (e.g.HyHes). We mentioned this failed rescue in the manuscript (answer for see reviewer 3). Further experiments, e.g Chip/EMSA constitute a new project on the basis of these ideas and should be reserved for further studies of the Kayak-function in Hydra.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      *The study investigates the role of Notch and beta-catenin signaling in coordinating head regeneration in Hydra. It combines gene expression dynamics, inhibitor treatments, and comparisons with Craspedacusta polyps to propose a lateral inhibition model for Notch function during Hydra head regeneration, mediating between two pattern-forming systems.

      Three main concerns arise from this work:*

      • Lack of spatial expression data: The study proposes a model based on pattern-forming systems but falls short of providing direct spatial expression data for the genes under consideration in both control and treated scenarios. This gap weakens the empirical support for the proposed model. __Answer:*__

      The expression patterns for most of the presented genes including HyAlx and HyWnt3 in the presence and absence of DAPT have been published before (Münder 2013). Expression patterns for all other genes during regeneration (except Hykayak) are already known from literature. For Hykayak we have included expression data from Siebert et al (single cell transcriptome analysis) in the supplementary material. For iCRT14 treatment, we carried out a FISH-experiment and showed that HyWnt3 is expressed in the normal pattern at the hypostome. For further genes after DAPT and iCRT-treatment in situ hybridisation data are indeed lacking (e.g. BMP5/8). However, we have analyzed some very strongly downregulated regulated genes (e.g. HyAlx completely downregulated by iCRT14, all HyWnts and BMP2-4completely downregulated by DAPT) and for those in situ hybridisation could (1) be difficult due to low expression in treated samples and (2) may not be informative.

      • Clarity and relevance of Craspedacusta comparisons: The section discussing the regeneration in Craspedacusta polyps appears somewhat disjointed from the main narrative, with its contribution to the overarching story of Hydra regeneration remaining unclear. *

      Answer:

      We had not intended to explain gene expression during Craspedacusta head regeneration but wanted to prove our hypothesis that Notch is needed to allow side-by-side development of two newly arising structures, which use different signalling modules during head regeneration. That Notch is __not __needed for the regeneration of Craspedacusta, a polyp without tentacles, appears to strengthen our main hypothesis. In order to connect this point more clearly to the narrative we have included new data. We show that CsWnt3 expression lowers after head removal and rises when the head regenerates, indicating CsWnt3-expression in the head of Craspedacusta polyps. Moreover, we show now that Notch in Craspedacusta may have similar target genes as in Hydra (e.g. Sp5 and Alx), might also affect nematocyte differentiation as in Hydra, but does not inhibit Wnt3 expression. We also acknowledge that a precise understanding of the molecular pathways for head regeneration in Craspedacusta requires further work and have removed the results of iCRT14 treatment because of our lack of knowledge about the role of b-catenin in Craspedacusta patterning. Citations from our changed text are found in the answer to reviewer 1.

      • Accessibility of the text: The study's presentation, including its title, abstract, and main text, presents challenges in terms of clarity and accessibility, making it difficult for readers to follow and understand the research's scope, methodologies, and conclusions.*

      • *

      Answer:

      We agree and have completely re-written the abstract, and large parts of the introduction and discussion (also see above answer for reviewer 1).

      Reviewer #2 (Significance (Required)):

      In conclusion, while the study aims to advance our understanding of the complex signaling pathways governing Hydra head regeneration, it necessitates significant revisions. Enhancing the empirical evidence through detailed spatial patterning data, clarifying the comparative analysis with Craspedacusta polyps, and __refining the narrative __to improve accessibility are critical steps needed to solidify the study's contributions to the field.

      Answer:

      By including Kayak-expression data from Siebert et al and indicating the places of major expression of all analysed genes schematically in the Figs describing the qPCR data we revised our manuscript. We have added new data about Craspedacusta and believe that our re-written manuscript refines the narrative by focusing on the organizer (see answer to reviewer 1).



      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Major comments:

      - In the abstract, the authors assert that their findings "indicate competing pathways for hypostome and regeneration." However, the nature of this competition and its resolution is not adequately elucidated within the manuscript. The term "competition" lacks context and clarity, leaving the reader without a clear understanding of what pathways are competing, for what, and how this competition is resolved during regeneration. Furthermore, this concept is not further explored or referenced throughout the remainder of the manuscript, leaving it somewhat disconnected from the main body of the research. It is recommended that the authors either revise the statement in the abstract to provide more clarity on the competing pathways and their implications for regeneration, or alternatively, if the authors believe there is sufficient evidence to support the claim of competing pathways, they should expand upon this point within the main body of the manuscript. Additional argumentation and evidence would be necessary to substantiate such a claim and provide a deeper understanding of the mechanisms underlying regeneration in Hydra.

      Answer:

      We agree and have removed any reference to “competing” pathways from the abstract and the main text.

      - The abstract makes a significant assertion regarding the mechanism by which Notch signaling impacts the expression of HyWnt3, suggesting that it operates by inhibiting HyKayak-mediated repression of HyWnt3 rather than directly activating transcription at the HyWnt3 promoter. This claim is central to the goals outlined in the study, which aim to elucidate the functioning of Notch signaling in HyWnt3 expression. To bolster this assertion, it would be prudent for the authors to conduct experiments demonstrating the mediating role of Kayak. Specifically, demonstrating that downregulation of Kayak through RNAi can rescue the DAPT-mediated downregulation of Wnt3 would provide strong support for the authors' claim. Additionally, while not strictly necessary, it would be beneficial to investigate whether chemical inhibition of Wnt can rescue the phenotype resulting from Kayak RNAi. Conducting and analyzing such experiments within a 2-3-month revision period should be feasible given that the authors already possess all necessary materials and have developed the required methods. These additional experiments would not only strengthen the evidence supporting the authors' claim but also provide further insights into the regulatory mechanisms at play in Notch signaling and HyWnt3 expression.

      • *

      Answer:

      We have conducted the suggested rescue experiments with the kayak-inhibitor, however, rescue was not achieved. We also tried rescue experiments by combining DAPT treatment and Kayak shRNA knockdown. HyWnt3 was slightly upregulated after DAPT treatment in the Kayak knockdown group but the phenotype could not be rescued. We are therefore now only state that HyKayak could be a target for the Notch-induced repressor genes (e.g.HyHes). We mentioned the failed rescue experiments in the manuscript:

      Results:

      *The up-regulation of HyKayak by DAPT suggests that HyKayak may serve as a potential target gene for Notch-regulated repressors including HyHes and CnGsc, potentially acting as a repressor of HyWnt3 gene transcription. *

      Discussion:

      We therefore suggest that the Hydra Fos-homolog HyKayak inhibits HyWnt3 expression and can be a target for a Notch-induced transcriptional repressor (like HyHes) in the regenerating Hydra head. Nevertheless, we were not able to rescue the DAPT-phenotype by inhibiting HyKayak, neither by using the inhibitor nor by shRNA-treatment, probably due to the strength of the DAPT effect. Therefore, we cannot exclude that Notch activates HyWnt3 directly, or that it represses unidentified Wnt-inhibitors through HyHes or CnGsc.

      - The usage of the term "lateral inhibition" in the title and abstract of the manuscript carries specific implications, as it is commonly associated with distinct mechanisms in the context of Notch signaling and reaction-diffusion systems. Notably, in the Notch signaling context, lateral inhibition typically refers to the amplification of small differences between neighboring cells through direct interactions, facilitated by the limitations of Notch signaling to immediate neighbors. Conversely, in reaction-diffusion systems, such as the Gierer-Meinhardt model, lateral inhibition describes long-range inhibition associated with pattern formation.

      Given this discrepancy, it is crucial for the authors to clarify their interpretation of "lateral inhibition" to avoid ambiguity and ensure accurate understanding. If they are referring to Notch-specific lateral inhibition, they should provide evidence of adjacent localization of Notch and Delta cells to support their argument. Alternatively, if they are invoking the concept of long-range inhibition described by the Gierer-Meinhardt model, they must explain how a membrane-tethered ligand like Notch can exert effects beyond one cell diameter from the signaling center.

      * Regardless of the interpretation chosen by the authors, addressing this clarification will have significant implications for the subsequent treatment of their arguments. Depending on their chosen interpretation, experimental demonstrations may be necessary to substantiate their claims, which could be laborious and time-consuming. However, such demonstrations are essential for establishing the validity of the term "lateral inhibition" as used in the title and abstract of the manuscript.*

      Answer:

      We agree with the reviewer concerning the term “lateral inhibition” and have now removed it. Instead, we have emphasized that our data clearly show during apical regeneration a Notch-mediated inhibition of tentacle tissue formation. We also discuss data from our most recent publication (Pan 2024) showing that this is the opposite at basal cuts, where the loss of Notch function leads to the regeneration of two heads. We then discuss this in the context of the Gierer-Meinhardt Model and in the context of the organizer (also see above in answer to reviewer 1):

      It is true that it is difficult to reconcile the long-range signaling processes, on which the Gierer-Meinhardt model is based with the cell-cell interactions mediated by Notch-signaling. We have now published a mathematical model to explain our understanding of this for the role of Notch during budding and in steady state animals (Pan2024), which is based on work by Sprinzak et al 2010. For head regeneration, we do not have such a model yet. Here we are looking at expression patterns changing over time. Therefore, we assume waves of gene expression, relying on the autoinhibitory function of the HyHes-repressor. This is included in the discussion:

      In addition, the gene expression dynamics for many of the analyzed genes appears in wave-like patterns in some experiments (see Figs S3 and S4). As we have only four time points measured, we cannot draw strong conclusions from these observations, except that some of the deviations in our data points (e.g. 48 hrs HyHes) might be caused by oscillations. Nevertheless, we propose that the dynamic development of gene expression patterns over the time course of regeneration hint at a wave like expression of Notch-target genes (e.g. HyAlx, (Münder et al., 2013; Smith et al., 2000)). Hes-genes have been implicated in mediating waves of gene expression, e.g. during segmentation and as part of the circadian clock (Kageyama et al., 2007). This property is due to the capability of Hes-proteins to inhibit their own promoter. Future models for head regeneration in Hydra should consider the function of Notch to inhibit either module of the regeneration process and the potential for the Notch/Hes system to cause waves of gene expression. Such waves intuitively seem necessary to change the gene expression patterns underlying morphogenesis during the time course of head regeneration.

      - The utilization of Craspedacusta as a comparative model in the argumentation of the manuscript appears somewhat unclear. The authors posit that Notch is essential for organizer emergence in Hydra, while Wnt is not necessary, as indicated by the observed effects of iCRT14 beta-catenin/TCF inhibition. However, in Craspedacusta, which lacks tentacles but possesses an organizer, one might anticipate a conserved requirement for organizer formation but not tentacle development. Therefore, it would be reasonable to expect that Craspedacusta would still form an organizer under iCRT14 treatment but would not depend on Notch signaling, as the necessity to separate tentacle formation from organizer formation is absent. The authors' observation that Craspedacusta fails to form an organizer under iCRT14 treatment partially aligns with these expectations. However, the complexity of the results suggests a need for a deeper understanding of the involvement of different pathways in Craspedacusta. Before applying inhibitors, it would be crucial to elucidate the spatiotemporal differences in the expression of relevant Wnt and Notch pathway components between Hydra and Craspedacusta. This knowledge would provide valuable insights into the specific roles of these pathways in organizer formation and tentacle development in both species, helping to clarify the observed differences in response to iCRT14 treatment. Additionally, considering the possibility of differential sensitivity to iCRT14 (see comment below) between Hydra and Craspedacusta would be essential for accurately interpreting the results and drawing meaningful conclusions regarding the involvement of Notch and Wnt signaling pathways in these processes.

      Answer:

      We have clarified in our re-written manuscript that the organizer functions in Hydra heads and head regeneration to harmonize the development of two independent structures (see answer for reviewer 1) and that Notch-signalling is an instrument to achieve this. Craspedacusta polyps do not have tentacles, thus we do not see two independent structures. Correspondingly, we see that they do not need Notch-signaling. We do not know whether they have organizer tissue, because they are too small to perform transplantation experiments. Similarly, in situ hybridisation experiments to look for CsWnt expression are hard to envisage. What we have now done during the revision of this paper are RT-qPCR experiments to follow the expression of CsWnt3 after head removal until a new head is formed. This indicated the localization of CsWnt3 expression in the head (citations in response to reviewer 1).

      We agree that the role of Wnt/b-catenin for Craspedacusta cannot be sufficiently described with our iCRT14 experiment and therefore removed it. To strengthen the DAPT data, we also examined Craspedacusta homologs of the Hydra Notch-target genes that we had previously identified (Moneer2021). We found that expression of CsSp5 and CsAlx were inhibited by DAPT. This was also true for the nematocyte gene NOWA (see new Fig. 7). In Hydra, DAPT blocks one important differentiation step of nematocytes and therefore the expression of all genes expressed in differentiating capsule precursors, including NOWA is inhibited, while the number of mature capsules does not change. To see the same DAPT effect on NOWA-expression in Craspedacusta reassured us that DAPT had entered the animals and might also have a similar effect on nematocytes as in Hydra.

      Minor comments - The concentration-dependent effects of iCRT14 on beta-catenin signaling, as demonstrated by Gufler et al. 2018, suggest that the efficacy of inhibition may vary depending on the concentration used. Specifically, Gufler et al. found that a concentration of 10µM was sufficient for efficient inhibition of beta-catenin signaling. However, in the current study, the authors utilized a concentration of 5µM of iCRT14. Given the central role of the observed effects, particularly the persistence of Wnt3 expression, in the argumentation of the manuscript, it is plausible that these effects could be attributed to partial inhibition resulting from the lower concentration of iCRT14 used in the study. To address this potential limitation, the authors could consider conducting a quick examination of the effects of 10µM iCRT14 or utilizing other inhibitors of beta-catenin/TCF interaction, such as iCRT3. By comparing the effects of different concentrations or alternative inhibitors, the authors could ascertain whether the observed effects are indeed attributable to partial inhibition from 5µM iCRT14, or if they persist despite higher concentrations or alternative inhibitors. This additional experimentation would provide valuable insights into the specificity and efficacy of the inhibition and strengthen the validity of the conclusions drawn regarding the role of beta-catenin signaling in the observed phenomena.

      Answer:

      The iCRT14 concentration was adjusted to 5 µM because the initial 10µM proved to be too toxic. 5µM also produced the same phenotypes and results as seen before. Cazet et al. also used 5 µM iCRT14 in their study.

      - The use of Generalized Additive Models (GAMs) in Figures 3 and 4 to present the time series qPCR results may introduce some challenges in interpretation due to the potential for distortion of values at specific time points based on neighboring ones. Given the relatively low time resolution of the data, this approach could lead to a distorted depiction of the temporal dynamics. For instance, in Figure 3B, where Wnt3 peaks at 10 hours, the absence of measurements between 8 and 24 hours introduces uncertainty regarding the accuracy and reliability of this peak at 10 hours.

      * To address these concerns and enhance clarity, it may be advisable for the authors to consider presenting the data using simple boxplots instead of GAMs. Boxplots provide a more straightforward visualization of the distribution of data at each time point, allowing for a clearer interpretation of trends and fluctuations over time. This approach would mitigate the potential for distortion introduced by GAMs and provide a more accurate representation of the temporal dynamics observed in the qPCR results*

      • *

      Answer:

      We agree and have changed the data representation to simple scatterplots (see answers for reviewer 1).

      - The comparison of the effects of iCRT14 versus DAPT treatments would benefit from having consistent gene expression data across both treatments. However, in Figure 4A, there are fewer genes tested compared to Figure 3A, with Hes and Kayak omitted. While the authors interpretation suggests that these genes may not change after iCRT14 treatment due to their upstream position in the signaling pathways, it is essential to empirically demonstrate this relationship, as it is central to the conclusions drawn. To address this gap in the analysis, it would be valuable for the authors to provide a time series of differential expression for Hes and Kayak following iCRT14 treatment.

      Answer:

      We have provided a time series for expression of HyHes and HyKayak in responses to iCRT14 treatment during regeneration (see Fig.4).

      “We found that the expression the Notch-target gene HyHes remained similar to control regenerates up to 24 hrs, but then was attenuated (Fig. 4A), possibly due to failure of tentacle boundary formation, the tissue where HyHes is strongly expressed…The expression of HyKayak was decreased at 8 hrs after head removal in the presence of iCRT14, came back to normal up to 36 hrs and was suddenly increased after 48 hrs (Fig. 4E), correlating with inhibition of the HyHes repressor. There were no significant changes in the expression dynamics of HyBMP2/4 and HyBMP5/8b between iCRT14-treated regenerates and controls (Fig. 4F, G).”

      - The analysis of the impact of chemical inhibition of Notch and Wnt signaling in Figure 7 schematic highlights changes in spatial expression patterns of the target genes. However, the interpretation of their impact primarily relies on qPCR data. As evident from Figure 7, when Notch is inhibited, it is anticipated that Kayak expression will shift from the area of the tentacles to the tip. This spatial shift in expression patterns is a critical aspect of the authors' arguments, especially considering the centrality of Kayak in their findings. Notably, similar spatial expression patterns have been demonstrated for Alx using FISH in Pan et al., available on BioRxiv. Given the importance of Kayak in the presented arguments, it is advisable to also investigate its spatial expression patterns using techniques such as FISH.

      • *

      Answer:

      We have, instead of FISH-experiments, included expression data for HyKayak from Siebert et al 2019 (single cell transcriptome data) in Fig. S1D, which show its expression in head- and battery cells (tentacle cells). This is similar to HyAlx. Therefore, Kayak-FISH would be expected to reveal expression of the gene at the tip of the regenerate the whole time, similar to HyAlx, because tentacle gene inhibition or patterning does not occur (see Münder 2013). Due to the failure of our rescue experiment to demonstrate the function of kayak we have omitted kayak from Fig. 8 and only mention in the discussion that it could be a target for Notch activated transcriptional repressors, like HyHes or CnGsc.

      Reviewer #3 (Significance (Required)):

      *The paper introduces novelties to the field of regeneration and developmental biology by leveraging Craspedacusta polyp as a novel model system for investigating the evolutionary and developmental dynamics of tentacles. In doing so, it sheds new light on the intricate mechanisms underlying tentacle formation and patterning. Furthermore, the study implicates Kayak in the regulation of Wnt3, adding a fresh perspective to our understanding of the molecular pathways governing Hydra regeneration. Notably, the results of the research challenge the prevailing notion of autoregulation of Wnt3, which has long been considered fundamental to organizer formation in Hydra. While these findings offer intriguing insights, further investigation will be crucial to conclusively ascertain the validity of this assertion. *

      • *

      Despite the clarity of the data presented, the interpretation and integration of these findings in the manuscript are lacking. The narrative at times feels disjointed, with different storylines loosely connected. While the findings are intriguing and merit publication, a substantial revision of the manuscript is necessary to provide a more coherent and illuminating interpretation of the results. *The implications of this research extend beyond the specific confines of Craspedacusta polyp and Hydra biology. It holds significant relevance for both the Hydra biology community and the broader field of Notch signaling research. *

      By highlighting the pivotal role of Notch signaling in regeneration and patterning within Hydra, the study enriches our comprehension of this model organism and its evolutionary adaptations. Moreover, it provides a valuable lens through which the evolution of Notch signalling cascades can be examined. This interdisciplinary approach underscores the interconnectedness of diverse biological systems and underscores the importance of exploring novel model organisms to unravel the complexities of evolution and development.

      • *

      Answer:

      We have edited the manuscript considerably and re-written the introduction and the discussion parts. We are focusing on integrating this work with the organizer concept in developmental biology, and on the Gierer-Meinhardt-model, and point out that Notch-signaling is required for the development of two head structures by inhibiting the development of either one during head regeneration, which is necessary to enable the development of the other one. Which one is inhibited depends on the positional value of the tissue where the cut occurs. Craspedacusta polyps do only have one structure. We suggest that this is why head regeneration does not require Notch-signalling in Craspedacusta. In contrast, as we have included in our discussion now, Hydractinia polyps, again with head/mouth and tentacles, require Notch-signaling for head regeneration (according to Gahan 2019), see also answers for reviewers 1 and 2.



      Reviewer #4 (Evidence, reproducibility and clarity (Required)):

      Major comments:

      The conclusions from the experiments are drawn accurately, not overstating the results. The main conclusion, that in Hydra Notch pathway mediates between two patterning modules, hypostome and tentacle forming modules, is supported by in situ hybridization and qPCR analyses of hypostome and tentacle specific genes.

      OPTIONAL. Authors hypothesize, that Notch maintains expression of Wnt3 vie its targets, transcriptional repressors Goosecoid or Hes, which halt the expression of Wnt3 repressor HyKayak. Epistatic relationships between Notch, Goosecoid or Hes and HyKayak could be tested, first, by combining pharmacological inhibition of Notch by DAPT with shRNA-mediated knockdown and second, in double knockdowns generated by electroporating shRNAs for two genes simultaneously. If the proposed in the pathway relationships are correct the repressive effect of DAPT treatment on an organizer regeneration should be rescued in HyKayak shRNA-mediated knockdown. Regeneration of an organizer also should occur in Notch/HyKayak and Goosecoid (Hes)/HyKayak shRNA-mediated double knockdowns. Electroporation of shRNAs for multiple genes is an effective and quick way to generate double and triple knockdowns. The proposed experiments will much strengthen the conclusions drawn from this study. Given that the authors have successfully used shRNA-mediated technique to generate HyKayak knockdown animals, they should be able to complete the proposed experiments within in a couple of months. Answer:

      We very much like the suggested strategy to probe the regeneration pathways by shRNA-mediated knockdown experiments- this will be a basis for future investigations.

      We conducted the suggested rescue experiment by combining the DAPT treatment and Kayak shRNA knockdown. HyWnt3 was slightly upregulation after DAPT treatment in the Kayak knockdown group. However, this upregulation did not rescue the organizer’s regeneration. We think that the effect of DAPT is too strong. We have included this in the discussion of our results (see answer for reviewer 2).

      • The data are presented in a logical and clear manner. The paper is easy to read, and the conclusions are explicit for each experimental section. The methodology is described in detail and should be easy to reproduce.*

      • All experiments are done with multiple biological and technical replicates. However, the description of statistical analysis used in each case is missing, p values and error bars are missing in Fig. 2B and Fig. S4. Author should add this information in the main text or in the figure legends.*

      Answer:

      The statistical information was now added in the methods section.

      Minor comments:

      • Fig. 1E. It would be more convincing to present tentacle and hypostome regeneration data separately, comparing hypostome regeneration in treated animals with DMSO control, and in a separate analysis comparing tentacle regeneration with control. Provide the description of statistical method, p values and error bars. If authors prefer to stick to the current way of presenting they should also provide description of statistical analysis used and statistical data.*
      • *

      Answer:

      We changed the representation in Fig. 1E. We now use scatter plots in the main text with p-values added, and explained the statistics of the GAM representation in the supplementary material.

      • Results, section 4 Kayak. Authors use T5424 inhibitor to block the potential interactions between HyKayak with HyJun. The resulted increase in Wnt3 expression measured by qPCR clearly supports the idea of HyKayak being a repressor of Wnt3. However, authors are going further and present the phenotype of T5424 treatment, shortening of the tentacles. Many factors can influence the length of the tentacles. For example, shortening of tentacles is a strong indication of poisoning or animal being in general unwell. At a concentration double of the one used in the experiment T5424 causes a disintegration of the animals (Fig. 3S). It would be more convincing if the authors could provide an in situ hybridization image showing an expansion of Wnt3 expression domain down the hypostome. This is the result one would expect from the inhibition of HyKayak which, according to the proposed mechanism, restricts Wnt3 spatial expression to the most apical portion of the regenerating tip. Alternatively, authors could try to see if T5424 rescues the inhibition of an organizer formation resulted by DAPT treatment. The latter experiment might be difficult to perform due to a possible toxic effect of multidrug treatment. I suggest that authors either include the proposed experiments or leave the results of the Fig S3 out.*

      Answer:

      According to this suggestion we have removed the phenotypes of polyps after treatment with T5424.

      • Results, section 3.2, paragraph 4. 'This also applies for the suggested Hydra organizer gene CnGsc, and BMP2/4 (Broun, Sokol et al. 1999). Please, insert the citation for BMP2/4.*

      • *

      Answer:

      We inserted the citation for BMP2-4 (Watanabe 2014).


      Reviewer #4 (Significance (Required)):

      *Significance:

      The current study is a continuation of the author's previous work where they have characterized Notch pathway in Hydra and showed its role in the regeneration of an organizer and patterning of Hydra head. Here, they present the study of Notch pathway in the context of b-catenin pathway, a pathway that has been shown to be essential for the axis induction and patterning in Hydra. The authors challenge this dogma and show, that during head regeneration b-catenin transcriptional activity is not required either to maintain the expression of wnt3 nor to acquire an inductive activity of the regenerating organizer. Second, they show, that transcriptional fos-related factor Kayak is negatively regulated by Notch-signaling and, in turn, represses transcription of Wnt3. Based on those findings authors propose a function of the Notch pathway in Hydra head regeneration, particularly in spatial separation of the hypostome/organizer module from the tentacle module. The role of Notch pathway in lateral inhibition is well documented in bilaterians. However, in Cnidaria, a sister group to Bilateria, the function of Notch was so far restricted to neurogenesis. This study is very important for our understanding of the evolution of morphogenesis as it shows the ancient role that the Notch pathway is playing in axial patterning, possibly, through lateral inhibition.

      This study can be of a great interest to both researchers specializing in cnidarian development and to a broader audience interested in the evolution of morphogenesis.*

    1. Reviewer #1 (Public Review):

      Kreeger and colleagues have explored the balance of excitation and inhibition in the cochlear nucleus octopus cells of mice using morphological, electrophysiological, and computational methods. On the surface, the conclusion, that synaptic inhibition is present, does not seem like a leap. However, the octopus cells have been in the past portrayed as devoid of inhibition. This view was supported by the seeming lack of glycinergic fibers in the octopus cell area and the lack of apparent IPSPs. Here, Kreeger et al. used beautiful immunohistochemical and mouse genetic methods to quantify the inhibitory and excitatory boutons over the complete surface of individual octopus cells and further analysed the proportions of the different subtypes of spiral ganglion cell inputs. I think the analysis stands as one of the most complete descriptions of any neuron, leaving little doubt about the presence of glycinergic boutons.

      Kreeger et al then examined inhibition physiologically, but here I felt that the study was incomplete. Specifically, no attempt was made to assess the actual, biological values of synaptic conductance for AMPAR and GlyR. Thus, we don't really know how potent the GlyR could be in mediating inhibition. Here are some numbered comments:

      (1) "EPSPs" were evoked either optogenetically or with electrical stimulation. The resulting depolarizations are interpreted to be EPSPs. However previous studies from Oertel show that octopus cells have tiny spikes, and distinguishing them from EPSPs is tricky. No mention is made here about how or whether that was done. Thus, the analysis of EPSP amplitude is ambiguous.

      (2) For this and later analysis, a voltage clamp of synaptic inputs would have been a simple alternative to avoid contaminating spikes or shunts by background or voltage-gated conductances. Yet only the current clamp was employed. I can understand that the authors might feel that the voltage clamp is 'flawed' because of the failure to clamp dendrites. But that may have been a good price to pay in this case. The authors should have at least justified their choice of method and detailed its caveats.

      (3) The modeling raised several concerns. First, there is little presentation of assumptions, and of course, a model is entirely about its assumptions. For example, what excitatory conductance amplitudes were used? The same for inhibitory conductance? How were these values arrived at? The authors note that EPSGs and IPSGs had peaks at 0.3 and 3 ms. On what basis were these numbers obtained? The model's conclusions entirely depend on these values, and no measurements were made here that could have provided them. Parenthetical reference is made to Figure S5 where a range of values are tested, but with little explanation or justification.

      (4) In experiments that combined E and I stimulation, what exactly were time timecourses of the conductance changes, and how 'synchronous' were they, given the different methods to evoke them? (had the authors done voltage clamp they would know the answers).

      (5) Figure 4G is confusing to me. Its point, according to the text, is to show that changes in membrane properties induced by a block of Kv and HCN channels would not be expected to alter the amplitudes of EPSCs and IPSCs across the dendritic expanse. Now we are talking about currents (not shunting effects), and the presumption is that the blockers would alter the resting potential and thus the driving force for the currents. But what was the measured membrane potential change in the blockers? Surely that was documented. To me, the bigger concern (stated in the text) is whether the blockers altered exocytosis, and thus the increase in IPSP amplitude in blockers is due BOTH to loss of shunting and increase in presynaptic spike width. Added to this is that 4AP will reduce the spike threshold, thus allowing more ChR2-expressing axons to reach the threshold. Figure 4G does not address this point.

      (6) Figure 5F is striking as the key piece of biological data that shows that inhibition does reduce the amplitude of "EPSPs" in octopus cells. Given the other uncertainties mentioned, I wondered if it makes sense as an example of shunting inhibition. Specifically, what are the relative synaptic conductances, and would you predict a 25% reduction given the actual (not modeled) values?

      (7) Some of the supplemental figures, like 4 and 5, are hardly mentioned. Few will glean anything from them unless the authors direct attention to them and explain them better. In general, the readers would benefit from more complete explanations of what was done.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      I have mixed feelings regarding this manuscript. On the one hand, the authors did an impressive amount of work. On the other hand, the manuscript seems overly descriptive (writing should be more concise) without a clear message or hypothesis that is cohesive to all the presented evidence. Below, I will outline my concerns.

      We appreciate the comment about missing a cohesive presentation. We worked extensively to improve that in the revised manuscript.

      Reviewer #1- first part

      1. I am not an expert in the field of viral biology and immunology. I wonder how well the IFN treatment mimics the cellular response to infection (yet without the virus). Also, how good is ruxolitinib at blocking the IFN response ? I would appreciate it if you could explain both with one or two sentences and provide the necessary references.

      The reviewer is correct that we cannot claim that interferon treatment mimics exactly the cellular response. However, the expression of interferon-stimulated genes (ISGs) is a major arm of the antiviral response to HCMV (c.f. doi:10.3390/v10090447, doi:10.2217/fvl-2018-0189). In addition, Ruxolitinib is a potent and selective Janus kinase 1 and 2 inhibitor (doi:10.1021/ol900350k), and we have shown in the past that it very effectively reduces the expression of many ISGs (doi: 10.1038/s41590-018-0275-z). Since ISGs constitute a major part of the host response to HCMV infection, the fact that their expression leads to minor changes in the tRNA pool strongly suggests that it is mainly the virus (as opposed to the host cell) that mediates the changes seen in the tRNA pools during HCMV infection. In the revised version, these claims were amended, and relevant references were added (pages 5, lines 132-136).

      (MAJOR) Can these two treatments really allow the effects of host response and viral infection to be separated? OR in other words, are these two effects really orthogonal? In my opinion, they are NOT. Fig. 1E seems to support my opinion, as the changes seen for the "IFN" sample relative to the "uninfected" sample (referred to as "changes-A" below), are parallel to the changes seen for the "24hpi + ruxo" sample relative to the "24hpi" sample ("changes-B"). More specifically, changes-A represent the host response, as argued by the author, whereas changes-B represent the elimination of the host response (due to ruxo, conditioned on the virus-driven effect). If the virus-driven effect and the host response could really be separated, one would expect changes-A and changes-B are more or less opposite. However, they appeared to be parallel, suggesting that uninfected versus infected conditions can have totally different (even opposite) host responses. More importantly, if one cannot separate the host response from virus-driven effects, the conclusion of "tRNA changes are driven by virus, not host response" is then unfounded.

      This is an important point to clarify. Changes-A indeed represent the effect of the host antiviral response on the tRNA pool. Changes-B, however, represent a mix of two effects. 1: counteracting the effect of the host antiviral response on the tRNA pool, which we show is a minor effect, and 2: The enhanced effect of the virus, since ruxolitinib, by inhibiting the host antiviral response, enhances the viral infection. It may indeed be that both the virus and the host antiviral effects are in the same direction. However, it is clear that the antiviral effect is minor. Thus, it is likely that the second effect of ruxolitinib (i.e., allowing enhanced viral infection) is the more substantial one. Therefore, it seems as though the viral effect and the elimination of the host effect are in the same direction. This point was clarified in the revised version (page 6, lines 145-146).

      Even if we let go of this previous point and accept that these results indeed offer some support for the notion that the virus-driven effect are the main contributor to the shifts in tRNA pool, the support is at best moderate. A big gap here is "how?" I suggest the authors should at least give some insight on how virus can do that in Discussion (and mention it with one sentence in Results).

      We certainly welcome the challenge, which we now meet in the revision. In short, here, transcription regulation of tRNAs, mainly upon viral infection, is poorly studied. Unlike other herpesviruses, HCMV does not cause a host shut-off of the host transcripts. Upon HCMV infection, the tRNA transcription machinery is upregulated significantly, which probably contributes to the upregulation in pre-tRNA (doi.org/10.1016/j.semcdb.2023.01.011). However, it is still unknown what the viral factors are that promote upregulation in the tRNA transcription machinery. We now relate to this point in the results (page 6, lines 147-148) and discuss the known effects of viral infection of tRNA expression in the discussion section (page 15, lines 447-451).

      The authors compared the HCMV codon usage to the proliferation and differentiation signatures of human cells. But these two signatures are not compared with measured tRNA expression. It might shed some light on the general characteristics of tRNA pool shifts due to infection (towards a proliferation-like or differentiation-like signature). This fits in the general topic of virus-host interaction and might give more evidence for the point that HMCV is adapted to a differentiation signature (as it drives the host into that state).

      We performed the analysis suggested by the reviewer. We found that the tRNA pool of uninfected HFF cells correlated to the same extent with proliferation codon usage (r=0.29, p-value=0.029) and differentiation codon usage (r=0.26, p-value=0.05). Similar correlations to the proliferation and differentiation signature were found when analyzing the tRNA pool 72h post-infection (proliferation r=0.33, p-value=0.011, differentiation r=0.28, p-value=0.034). This result suggests no general shift in the tRNA pool towards a specific codon usage signature.

      How is the dashed box in Fig3A/B chosen?

      We determined the dashed lines based on the most prominent groups of transcripts best adapted to proliferation or differentiation codon usage signatures. Figure S3A clearly shows the two groups without viral genes. We emphasize this point in the legend of Figure S3A (page 36, lines 1157).

      The tAI values shown in Fig3C-E are extremely low (compared to other reports I am aware of). Does this mean that the adaptation of viral codon usage to human cell supply is actually very weak? This is in opposition to the major claims made in this section.

      We acknowledge that the tAI values presented here are lower than typically presented. However, this is due to how tAI was calculated rather than the potential weak adaptation between viral genes and tRNA supply. Specifically, unlike previous works that estimate tRNA availability based on tRNA gene copy number, here we calculated tAI using tRNA sequencing (in order to capture the dynamics in the tRNA pool during infection). Indeed, the value of tAI calculated by tRNA read counts is lower than tAI calculated by tRNA copy number. This is due to the skewed distribution of tRNA read counts (some tRNAs are highly expressed, and others are lowly expressed), while tRNA copy number is distributed more evenly. Thus, due to the mathematical nature of the tAI (computing geometric rather than arithmetic average of tRNA availability), the skewed distribution observed in the data results in lower tAI values. When computing tAI based on gene copy number, we get higher tAI values (0.3 on average). Nevertheless, as all tAI calculations here were done similarly, the comparisons between gene groups or genes are valid.

      I believe that the part about SARS-CoV-2 could be made more concise. It is sufficient to mention that results may differ from those obtained with HCMV in one paragraph.

      The section on SARS-Cov-2 is now made rather succinct. This virus is mainly given as a comparison to the primary virus studied in this paper - HCMV.

      Line 299 on page 11 - I do not believe codon usage between different viruses can be directly compared, let alone reaching such a conclusion. Some viruses have low CAI or tAI to humans, but they have co-evolved with humans for a long time. Furthermore, there are viruses that infect multiple hosts, but their CAI for a host with which they have long co-evolved is higher while their CAI for a host that is relatively new is lower.

      We agree with the reviewer that a direct link between co-evolution time and tAI may not always exist. Indeed, other factors might explain the observation that SARS-CoV-2 genes are less adapted than HCMV genes. These may include effective population sizes and mutation rates that vary substantially. We, therefore, removed this conclusion from the manuscript.

      (MAJOR) A more general comment is that there is a difference between tRNA expression and the abundance of translation-ready tRNA. The process of charging tRNA with amino acids may take a long time. It is the abundance of the charged-tRNA (the ternary complex of aminoacylated tRNA and EF-Tu-GTP) that is of biological importance. In this regard, the use of tRNA expression falls short.

      The reviewer raises a valid point. Indeed, our tRNA sequencing protocol measures both charged and uncharged tRNAs that constitute the cell's mature tRNA pool. Compared to previous studies that focus on the transcription process of tRNAs in viral-infection models by sequencing the pre-tRNAs, here we look at the mature tRNA pool that accounts for both transcription and post-transcription processes. Therefore, we changed the use of "tRNA expression" to "mature-tRNA levels" and "highly" or "lowly-abundant tRNAs" rather than “highly” or “lowly expressed tRNAs” in the manuscript. We note, however, that although limited in the ability to differentiate between charged and uncharged tRNAs, the tRNA sequencing protocol used here is commonly used and validated as a state-of-the-art protocol in tRNA sequencing (10.1016/j.molcel.2021.01.028, 10.1038/s41467-020-17879-x, etc.), mainly because it addresses the level of "ready-to-use" tRNA.

      Reviewer #1- second part

      1. (MAJOR) Prior to the actual competition assay in the first high-throughput screen (cell competition assay), the authors applied two days of antibiotic selection and two days of recovery. This could result in a serious problem of false negatives or drop outs. Specifically, an sgRNA targeting an essential gene with high efficiency would kill the cells, leaving no (or a small number of) cells in the ancestor population at the beginning of the competition process. A sgRNA's enrichment in competing populations cannot be reliably estimated in such situations. I am not certain that the FDR used in Figure 5B is sufficient to address this issue. Please clarify whether it could. Providing raw counts for competing and ancestor populations would also be helpful.

      As customary in CRISPR screens, the step of lentiviral transduction and antibiotic selection is necessary to ensure that only CRISPR-edited cells are left in the population. Indeed, essential genes like housekeeping genes are probably removed from the competing population relatively quickly, which might result in their dropouts. We could have lost some tRNA hits in the cell growth CRISPR screen (Figure 5B-C) because of their overall essentiality for cell growth. The MAGcK tool we used, the state-of-the-art in the field, filters out sgRNAs with low read counts to be able to calculate false discovery rates. Indeed, we identified 15 tRNAs that were depleted from the competing cells. We believe that our procedure minimizes the concern of dropouts. tRNA dropout in the HCMV infection CRISPR screen (Figure 6B-C) can also happen, which means our screen underestimates the essentiality of tRNAs to HCMV infection. However, this concern does not affect the significance of the hits we did find. We acknowledge this inherent difficulty in CRISPR screens and will provide the raw read counts of all samples upon full submission. We emphasize, though, that while valid, this concern applies to essentially any CRISPR screen that is commonplace in genomics these days.

      It is also highly questionable to me the nearly negligible effects of tRNA modification enzymes. This may be explained by the point above. Indeed, the dots of tRNA modification enzymes in general appear to have higher FDR (lower y values) when compared to red dots with similar enrichment levels.

      This is a valid point. We found a lack of essentiality of tRNA modification enzymes in both screens. We analyzed additional CRISPR screens and compared the effect of tRNA modification enzyme knockouts relative to the restriction and dependency factors we used in the library. The tested screens included 34 knockout CRISPR screens we downloaded from the BioGRID ORCS database that have similar parameters to our screen. Namely, they all test cell proliferation in a time-course manner, using a pooled sgRNA library and using the MAGeCK tool for data analysis. Overall, the screens use different human cell lines and diverse sgRNA libraries. Although potentially surprising, we found that the lack of essentiality of tRNA modification enzymes was also observed in the analyzed CRISPR screens (Figure S5B and on page 11, lines 322-330, and on page 18, lines 539-541). One potential reason is if some of these enzymes were "backed up" by others, which we mentioned. Another explanation is that most tRNA modification enzymes are indeed not essential for growth and for viral infection (now described in the Discussion, page 18, lines 544-545). Alternatively, dropouts can explain this result, as suggested by the reviewer. To examine the likelihood of the dropout option, we examined the average raw read count of the tRNA modification enzyme in the ancestor samples. We compared it to that of other sub-groups. We found that raw read counts of the tRNA modification enzymes are not different than other sub-groups in the CRISPR library. Thus, the dropout issue cannot explain our screens' lack of essentiality of tRNA modification enzymes.

      The screen based on IE2-GFP labeled HCMV measures a phenotype that is very difficult to interpret. Particularly, I am not sure if GFP2 and GFP3 are good controls for comparing GFP4 (GFP1 might be better). Various factors can affect GFP levels, including, but not limited to, dilution caused by a rapidly dividing host cell, unhealthy translational machinery resulting from infection or microenvironment. My point is supported by some observations in Fig6B. For example, SEC61B, a restriction factor for HCMV infection, is enriched in the GFP2 group, contrary to expectations. It is necessary for the authors to prove with firm evidence that their choice of GFP signal thresholds is appropriate.

      We acknowledge the concern. Specifically, the translation of the GFP gene itself could be affected by the tRNA manipulation done. To account for this potential concern, we tested the codon usage of the eGFP gene (which is the GFP version we used in the system) and compared it with tRNA essentiality, as determined by the cell growth CRISPR screen. We report this in the revised manuscript (page 13, lines 390-392, and added Figure S6D). We found that GFP does not tend to significantly use codons that correspond to essential or less essential tRNAs. The same lack of correlation was also found for the tRNA essentiality upon HCMV infection (not shown).

      More generally, we show that GFP intensity does correlate with viral genome copies (Figure S6A). Also, from mRNA-seq data of temporal HCMV infection (10.1016/j.celrep.2022.110653), IE2 (UL122) shows a dynamic expression- high expression pick in early infection, then a decline in expression level followed by a gradual increase.

      Altogether, we believe that the IE2-GFP level provides a good estimation for viral load.

      Regarding SEC61B, which served as a control in our screen – the referee is rightly asking why it behaves oppositely from what's expected, given that this was supposed to be a restriction factor of HCMV infection. We returned to the literature on the essentiality of this gene upon HCMV infection. In Weissman's paper (10.1038/384432a0), which was the reference for choosing control genes in our system, this gene was targeted through two different CRISPR technologies, once with CRISPR knockout and once with CRISPRi. Interestingly, only upon CRISPRi did this gene prove to be a restriction factor (i.e., improved infection upon reduction of the gene). We comment on this peculiar fact in the revised manuscript (page 13, lines 370-374). However, we note that the rest of our positive and negative controls deliver the expected results – increasing or reducing infection as expected from their role, thus lending considerable support to our experimental system. It is possible, especially in light of our screen, and since other positive and negative controls behave as expected, that the status of the SEC61B gene as a "restriction factor" of viral infection needs to be reconsidered, as we now suggest.

      I would appreciate more information regarding why restriction factors of cell growth have a high GFP2/GFP4. Intuitively, a KO of restriction factors of cell growth should result in better growth and higher GFP, thus leading to enrichment in GFP4, not GFP2.

      The reviewer raises an interesting question (although not at the heart of this work, as sgRNAs for the cell growth restriction factor mainly aim to serve as controls for the CRISPR screen). HCMV has a complex interaction with the cellular cell cycle. Specifically, it establishes a unique G1/S arrest that is both stimulatory and inhibitory since, on the one hand, it serves the virus to arrest the cell cycle, a critical step for viral genome replication. On the other hand, the virus needs many of the resources that serve cell growth. Both p53 and CDKN1A are important regulators at this stage; therefore, their interaction with the virus may indeed be complex. For example, p53 is upregulated by a viral infection. However, it is sequestered in the viral replication compartments, and its transcriptional are down-regulated, but its absence harms viral propagation (doi: 10.1128/mBio.02934-21, doi: 10.1128/jvi.72.3.2033-2039.1998, doi: 10.1128/jvi.00505-06). Therefore, it is not surprising that genes related to cell growth and cell cycle have complex effects on HCMV infection. We mention the essentiality of p53 for HCMV infection in the results (page 14, line 404).

      Line 404 "nonetheless"

      We appreciate the reviewer for noticing the typo. We corrected it.

      Reviewer #1 (Significance (Required)):

      The relation between human tRNA supply and viral translation is a topic of profound biological and biomedical importance. In this study, the authors used HCMV infection as the primary model to investigate this question. Results fall into two major parts: (i) changes in the tRNA pool during viral infection, and (ii) the impact of tRNA-related gene KO on viral infection.

      We appreciate the detailed report. We addressed the major points raised in the revised manuscript.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In this study by Aharon-Hefetz et al., the researchers examined changes in tRNA pools during virus infections. The translation machinery plays a crucial role in virus replication. Consequently, host cells have developed sensors and effectors within this compartment to counteract viral mechanisms. The translation apparatus serves as a pivotal point in the virus-host conflict. Therefore, investigating alterations in the translation machinery during infections is vital for gaining a comprehensive understanding of the infection process. This study offers a thorough and high-quality analysis of data in a relevant cell culture system involving two different viruses. By conducting tRNA sequencing, the researchers studied the human tRNA pool following infections with human Cytomegalovirus (HCMV) and SARS-CoV-2. Changes in tRNA expression induced by HCMV were mainly driven by the virus infection itself, with minimal impact from the cellular immune response. Interestingly, specific tRNA post-transcriptional modifications seemed to influence stability and were subject to manipulation by HCMV. Conversely, SARS-CoV-2 did not lead to significant alterations in tRNA expression or post-transcriptional modifications. Moreover, a systematic CRISPR screen targeting human tRNA genes and modification enzymes allowed the identification of specific tRNAs and enzymes that either enhanced or reduced HCMV infectivity and cellular growth. This information enabled them to control the development of HCMV-specific tRNA modifications, highlighting the importance of these tRNA epitranscriptome modifications in virus replication. The authors concluded that the observed differences between the viruses are consistent with HCMV genes aligning with differentiation codon usage and SARS-CoV-2 genes reflecting proliferation codon usage. This observation's connection to the biology of HCMV and SARS-CoV-2 lies in the codon usage of structural and gene expression-related viral genes, showing a significant adaptation to host cell tRNA pools. Notably, these genes from both viruses demonstrated the highest adaptation to the tRNA pool of infected cells. The reason behind this phenomenon remains unclear. One hypothesis suggests that a high level of structural gene expression is necessary during activation. Testing this hypothesis could involve examining if hindering tRNA modifications affects virus morphogenesis. In summary, this study presents an interesting and innovative perspective on how viruses modify the translation machinery. The meticulous analysis sheds light on a central interaction point between viruses and their host cells.

      Reviewer #2 (Significance (Required)):

      In summary, this study presents an interesting and innovative perspective on how viruses modify the translation machinery. The meticulous analysis sheds light on a central interaction point between viruses and their host cells.

      We thank the reviewer for finding our work interesting, innovative, and well analyzed

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      Summary

      Aharon-Hefetz et al. present the expression dynamics and modification signatures of tRNAs using DM-tRNA-seq in human foreskin fibroblasts or Calu3 cells during infections with two diverse viruses, HCMV and SARS-CoV2, respectively. They also use a newly designed tRNA-centric CRISPR library to screen the essentiality of tRNA and tRNA factors during HCMV-GFP infection. They find several tRNAs that are differentially expressed during HCMV infection, and most closely resemble the set of tRNAs shown to be used during cellular differentiation. Additionally, tRNA differential expression does not resemble that following interferon treatment, implying that virus modulation of tRNAs is unique to the general interferon response. They compare codon usage signatures during infection to their prior-defined sets of proliferation/differentiation tRNA genes. In their CRISPR screen, they find that different tRNAs can promote or restrict HCMV infection levels, as measured by the intensity of GFP fluorescence marker in their virus. Surprisingly, there were few tRNA modification factor hits that contributed to growth or infection.

      Reviewer #3- major comments

      1. The topic of this work is important, and the analysis performed here is assumed to be top quality, based on the previous work by the last author. The weakness with this body of work is a lack of rigor, specifically regarding validation and follow-up studies. Without these experiments, the reader lacks confidence in stated conclusions. For example: There is no validation or clue to how penetrant CRISPR is against tRNA genes. Given how duplicated some tRNA families are, it is possible that CRISPR is more effective against certain families compared to others. While this is likely an inherent caveat in all CRISPR screens, it would lend confidence in this approach to see some validation of tRNA KO by northern blot or RT-qPCR or sequencing.

      We thank the reviewer for raising this important issue. Indeed, many tRNA genes appear in multiple copies in the human genome. Yet, based on our previous work, we expect parallel editing of multiple copies using the same sgRNA. In our previous work (doi.org/10.7554/eLife.58461), we validated, based on several tRNA families, the ability of our tRNA CRISPR system to successfully target and affect tRNA expression levels. This included sequencing of the edited tRNA genes (i.e., DNA sequencing), in which we observed diverse INDEL mutations that predicted full disruption of the tRNA structure. Furthermore, we sequenced the tRNA pool of CRISPR-edited cells and found the downregulation of the targeted tRNAs to be up to 2-4-fold. This previous work provides foundations and confidence in this tRNA-CRISPR approach.

      Nevertheless, to further mitigate the reviewer's concern, we also plan to perform additional experiments in the current settings. We will choose individual tRNAs from our CRISPR screen as representatives to validate CRISPR editing. We will target each tRNA independently and test expression reduction by sequencing. We shall share the results in the full revision if granted.

      1. There is no validation that tRNA modification factor knockouts alter tRNA modification levels. Without this knowledge, the lack of essentiality cannot be confidently and fully interpreted. If the group does not validate whether individual tRNA modification factor knockouts alter modification profiles, then all possible explanations should be posited. For example, it is possible that 1) there could be major redundancy among tRNA modification enzymes, as the authors posit in the Discussion 2) tRNA modification enzymes are not essential for growth bc their activity/the modification they place is non-essential for growth, OR 3) the knockouts are not fully penetrant. I think this Discussion should be expanded to make caveats clearer. Perhaps referencing whether tRNA modification factors have been shown to be essential in other CRISPR screens would be helpful.

      Regarding the possible explanations for the lack of essentiality of tRNA modification enzymes, we agree with all three possibilities the reviewer raised. Reviewer #1 raised an additional option, in which tRNA modification enzymes are essential for HCMV infection and cell growth; thus, we cannot detect them in the screens because they drop out early in the process (before collecting the ancestor samples). We checked this possibility and found comparable read counts of sgRNAs targeting tRNA modification enzymes to that of other sub-libraries. This result suggests the drop-outs of sgRNA targeting are unlikely to happen on our screens.

      Furthermore, as the reviewer asked, we analyzed additional CRISPR screens and compared the effect of tRNA modification enzyme knockouts relative to the restriction and dependency factors we used in the library. The tested screens included 34 knockout CRISPR screens we downloaded from the BioGRID ORCS database that have similar parameters to our screen. Namely, they all test cell proliferation in a time-course manner, using a pooled sgRNA library and using the MAGeCK tool for data analysis. Overall, the screens use different human cell lines and diverse sgRNA libraries. Although potentially surprising, we found that the lack of essentiality of tRNA modification enzymes was also observed in the analyzed CRISPR screens (Figure S5B and on page 11, lines 322-330, and on page 18, lines 539-541).

      1. There is no validation that factors modulating GFP intensity in the HCMV screen actually impact virus replication. This is the point most important to this body of work. While GFP intensity does correlate to genome copies as shown by the authors, GFP read-out on a case-by-case basis could be simply due to factors required for expression/translation of GFP. Are any of the tRNA hits enriched or not represented in GFP reporter sequence? Either way, this information is informative.

      We acknowledge the concern. Specifically, the translation of the GFP gene itself could be affected by the tRNA manipulation done. To account for this potential concern, we tested the codon usage of the eGFP gene (which is the GFP version we used in the system) and compared it with tRNA essentiality, as determined by the cell growth CRISPR screen. We report this in the revised manuscript (page 13, lines 390-392, and added Figure S6D). We found that GFP does not tend to significantly use codons that correspond to essential or less essential tRNAs. The same lack of correlation was also found for the tRNA essentiality upon HCMV infection (not shown).

      Additionally, given that the hits are cross-compared ONLY to other infected (low intensitiy "GFP+") cells, and not to an uninfected population, there is no guarantee that these primarily drive HCMV infection. The top hits should be validated in HFFs, infected with HCMV, with resulting titers/viral gene expression/genome copies measured. Additionally, the reasons for not using a GFP- population as a control should be clarified.

      We agree that additional experiments on some hits may be warranted. We plan to examine for such an effect on infection using an individual gene version of the assay. In particular, we will target individually candidate tRNA genes following validation (as described previously in point 1). We will then infect the tRNA-targeted cells with HCMV and measure the effectiveness of HCMV infection using a standard titer assay.

      The reviewer also suggest comparing GFP1/2/3 to an ancestor in addition to comparing them to GFP4. Towards that we now show a GFP2 vs ancestor comparison (shown below). The results look very similar and are now added to the supplemental material of the revised manuscript (page 13, lines 385-387, Figure S6B).

      Though careful codon usage analysis for HCMV versus the human host was analyzed, it seems pertinent to analyze whether the differentially expressed tRNAs during infection correlate to either codon usage profiles. Figure 3C and S3C intend to address this point for viral gene groups; however, I would encourage the authors to expand the description of these results to make them easier to interpret, especially for those not in the tRNA field. For example, "tRNA adaptation index (tAI)" is not defined in the text, but simply referenced. For clarity, you should include a brief explanation of what this measure describes. Following, when reporting results from Figure 3, the results can then be delivered with more specific and interpretable language. These steps will ensure maximal scientific communication to the audience.

      We appreciate the reviewer's comment regarding the importance of scientific communication and making this manuscript easier to interpret, especially for readers unfamiliar with the world of tRNAs and translation efficiency. We added a description of our motivation to use tAI and the meaning of the measurement (page 9, lines 241-243). We also elaborated on the results part and made the results more interpretable (page 9, lines 245 and 249-250).

      Finally, given that changes are most visible at 72 hpi, the analysis should include expression based on this time point for comparison.

      Regarding the time point used for tAI calculation (Figure 3), we tested the tAI measured by the tRNA pool at 72hpi and got very similar results to that obtained using the tRNA pool measured at 24hpi. As 24hpi represents the pick of HCMV infection, we decided to present this analysis. In the current revised version, we also added the analysis done using the tRNA pool measured 72hpi as suggested by the reviewer (Figure S3D).

      Reviewer #3- minor comments

      1. I would recommend more care in terminology used for the CRISPR screen (Figures 5 and 6) to make the manuscript easier to digest. Labeling sgRNAs-containing cells as " Reduced Growth/Infection" or "Increased Growth/Infection" is not immediately easy to understand. For example, saying this sgRNA "increased growth" could refer to the knockdown increasing growth OR could mean that this sgRNA was enriched in cells with increased growth, which are opposing. It might be more clear to state to use depleted/enriched terminology in these figure labels. This also applies to the text, be sure to plainly describe the terminology and what it means each time you refer to the CRISPR results.

      This is a good point. Indeed, focusing on the significant enrichment of the sgRNAs, rather than their effect on growth or infection, is more straightforward. We changed the terminology in Figures 5C and 6C and the text in the current version.

      Is there actual evidence that the new tRNA sgRNA library is more effective than that used previously? State if so.

      We assume the referee refers to our previous paper on the smaller-scale library (doi.org/10.7554/eLife.58461). The addition here is that the library is much more comprehensive (the previous one targeted only 20 tRNAs). We point it out in the revised manuscript (page 17, lines 499-501).

      Fig 1A-C: The cutoff for "red" symbol distinction is not stringent enough. 1.05 would be red, but that is not convincingly upregulated. The cutoff should be at least FC>1.2.

      We thank the reviewer for bringing our intention to this point. In the current version, we changed the cutoff of absolute fold change higher than 1.2 in Figures 1A-C and S1A (also in legend).

      Need thorough description of tRNA bioinformatics and modification analysis (citing past work is not appropriate here-need to make accessible to your audience).

      Further thorough descriptions of tRNA bioinformatics and modification analysis are added in the revised version (page 6, lines 149-151, page 7, lines 178-183).

      Line 182- Result headings could be more informative, even with small adjustments. For example "Specific tRNA modifications are modulated in response to HCMV infection" is more clear and accurate, as there are only a few measurable changes in tRNA modification. Limitations of using sequencing techniques to analyze modifications (versus MS) should also be discussed.

      We changed that heading accordingly.

      We also mentioned the advantages and disadvantages of using sequencing to assess tRNA modification levels (page 7, lines 184-187).

      It is not immediately clear why the viral plot looks different in Fig S3B compared to Fig 3B.

      We thank the referee for spotting this. We employed different length cutoffs on the genes in each panel and have now fixed that in the revised manuscript.

      Line 254-255. This point is not immediately clear-please include more specific language detailing the logic leading to this conclusion.

      Indeed, the logic here was missing. The idea was that longer genes are associated with gene conservation, hence functionality. Thus, non-canonical HCMV genes that are both long and codon-optimized might have a function during HCMV infection. We added this explanation to the text (pages 8-9, lines 235-238).

      Line 408- "may be essential"-I would modify the language here. Especially given there is no true comparison with uninfected cells.

      We improved the language throughout the revised manuscript.

      There are a number of recent publications profiling tRNA expression in herpesviruses. These should be mentioned and discussed in the context of this work. I know some were included in the reference list, but the body of work as a whole, and how this work fits in and pushes the horizon, could be further emphasized. It is quite impressive that this is a conserved feature of herpesvirus infection. a. PMID: 36752632 b. PMID: 35110532 c. PMID: 34535641 d. PMID: 33986151 e. PMID: 33323507 f. PMID: 35458509

      We thank the reviewer for highlighting these works. We added a discussion item regarding tRNA expression in HCMV and other herpesviruses with the references (pages 15-16, lines 447-458)

      CoV2 Discussion point-The lack of tRNA expression regulation might have more to do with the length of the infection (6 hpi cov2- also didn't see much a change at 5hpi with hcmv). This should be proposed as a possibility.

      It is a possibility that due to the high stability of tRNAs, expression regulation of tRNAs will not affect the tRNA pool in short infection such as of SARS-CoV-2. We added this explanation in the discussion part, page 15, lines 441-442.

      Line 582. Misspelled schlafen in Discussion. (SLFN, not SFLN)

      The point is fixed in the revised manuscript.

      Reviewer #3 (Significance (Required)):

      General assessment: I found this paper exciting to read, given the dearth of knowledge regarding viral modulation of tRNA expression.

      We appreciate the reviewer's comment

      However, the work is highly descriptive, with a complete absence of follow-up or validation studies. At the very least, I would have hoped that the authors validated that viral titer (and not just GFP intensity) was impacted by some of the hits. The lack of confirmation and quality control overall diminishes confidence in the stated conclusions.

      However, I think the topic is timely, important, and that this manuscript offers tools to the community at large to learn more about viral manipulation or other drivers of tRNA regulation. Once follow-up/validation experiments are added to the work, as detailed below, this manuscript will be of broad importance and highly impactful.

      As mentioned above, we plan to add such validations to the fully revised manuscript.

      Advance: While there have been many studies suggesting tRNA regulation occurs during viral infection (these pubs should be referenced as mentioned above), this is an advance due to the fact that it begins to address whether tRNA expression changes functionally impact viral replication. This will be much more solid with follow-up experiments confirming that hits alter HCMV replication (rather than GFP intensity).

      Audience: This will be of broad interest to those with interest in virology and gene expression. The new sub-libraries of tRNA-related factors might be useful to be tested in other cell types and settings. Again, as the work stands, it is descriptive and hypothesis-stimulating, but the conclusions need validation and further support.

      We thank the referee for the encouraging words and the suggested analyses. We already implemented most of the suggestions in the current revised version and hope to add further experiments in a fully revised manuscript.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations for the Authors):

      Major 

      (a) In the study the authors focus on the RALF1 peptide. But according to expression data and the study from Abarca et al., 2021, RALF1 is not the only peptide expressed in the root and also having an impact in root growth effect. Similarly, looking at the primary sequence from RALF1 it does not differ much chemically from other RALFs such as RALF33, RALF23, RALF22, etc. So, does the cell wall pectin methylation status also have an impact on the effect of other RALFs on root growth or is that specific of RALF1? 

      (b) In addition, is the internalization of FER depending only on RALF1 upon the methylation status of cell wall pectins? Or can other RALFs cause a similar effect potentially?

      (c) The authors propose that RALF1 associates with deesterifed pectin, through electrostatic interactions. To do that they perform Biolayer interferometry assays using a buffer with pH 7.4. Is that a relevant pH at the cell wall? Is possible that the authors thought that this may not change the charges of R and K residues, however, it will affect the overall charge of the peptide given the fact that it contains quite some N and Q in the exposed surface. The authors may want to consider that.

      (d) Moreover, the authors do not use their peptide RALF1KR, suggested as a peptide not binding OGs, as a control in their OG binding assays. That biochemical experiment should also be included to validate their results and conclusions.

      We thank reviewer #1 for these comments. In this work, we focused on RALF1 but the majority of AtRALF peptides, when applied exogenously as synthetic peptides, induce RALF1like effects in Arabidopsis (Abarca et al., 2021; PMID: 34608971). Moreover, all RALF peptides display clusters of R and K residues and are negatively charged (Abarca et al., 2021; PMID: 34608971). In comparison to RALF1, we now also use RALF34 because it was suggested to interact also via the Catharanthus roseus receptor-like kinase 1-like (CrRLK1L) THESEUS1 (THE1). Notably, RALF34 also induced the internalization of FER-GFP. Moreover, the interference with PME also disrupted this activity of RALF34. Therefore, we assume that other RALF peptides display the same or similar signalling modalities. Nevertheless, it remains to be addressed if all RALF family members require PME activity. 

      We appreciated these comments and incorporated this aspect in the revised version of the manuscript. The pH was chosen for technical reasons associated with the used BLI buffer. As requested, we also included the RALF1-KR peptide in our OG binding assays. Under these conditions, the mutated peptides were not able to interact with the OGs anymore. Accordingly, we conclude that the K and R residues in RALF1 are crucial for its binding to demethylesterified OGs.  

      (e) Another important aspect is regarding their design RALF1KR mutant and its effect in planta. The authors report the following: "RALF1-KR peptides are not bioactive, because they did neither affect root growth, nor cell wall integrity, nor did they induce the ligand-induced endocytosis of FER in epidermal root cells (Figure 5D-I). These findings suggest that the positively charged residues in RALF1 are essential for its activity in roots." According to the structure published by Xiao at el. 2019, the R in the alpha helix from RALF peptides (YISYQSLKR... in RALF1 seq) is directly involved in the interaction with LLGs. So, a mutation in that R may impair the interaction of RALF1 with LLG and therefore the complex formation with FER. So, it is well possible that the effect that the authors are seeing on FER signaling and endocytosis, using this peptide variant, may not be due to the impaired capacity of the peptide to bind deesterified pectin but to not be able to be sensed by the membrane complex directly. To verify that the authors should test, either biochemically or by CoIP in planta, that their RALF1KR variant can still be perceived by the LLG-FER complex. 

      We agree with reviewer #1 and do not doubt that the positive charges in RALF1 likely interact with several entities. The respective sites were also covered in Liu et al., 2024 (Cell). It would be interesting to understand how the charge-dependent interaction with pectin modulates the RALF binding to the LLG-FER complex, but these experiments are beyond the scope of this manuscript. We confirmed that the negative charges in RALF1 are essential for OG binding as well as for its bioactivity. We however do not rule out that they bear additional structural functions beyond pectin binding. We clarified this aspect in the revised version. It is conceivable that the pectin and receptor complex binding of RALF1 is molecularly and mechanistically related. 

      (f) The authors propose in this study that this effect of RALF1-pectin mode of action on FER is independent from LRXs. That is a very interesting observation which also aligns with similar observations from other independent studies (Moussu et al., 2020; Schoenaers et al. Nat Plants, 2024; Franck et al., 2018). However, that seems to be in conflict with the previous mode of action that the authors had described in Dunser et al., 2019. In that last study the authors had described that FER constitutively interacts with LRX proteins in a direct way to sense cell wall changes. In my view the authors do not critically elaborate to explain these two contradicting results, which are key to understand the mode of action they are describing. This relevant aspect should be addressed more in depth by the authors in their discussion.

      Thank you for the comment. We do not see that our findings contradict our previous work (from Dünser et al., 2019). There we concluded that LRX and FER directly interact to sense cell wall characteristics. However, the loss of LRX function abolished the cell wall sensing mechanism, but the respective loss-of-function and dominant negative lines were still able to detect RALF peptides. We hence proposed that the LRX/FER function is at least partially independent of the FER function in RALF perception. This is in agreement with our current study where we conclude again that FER shows LRX-dependent but also -independent modes of action. 

      Minor

      (g) In the introduction (first page), the authors write the following sentence: "RALF peptides are involved in multiple physiological and developmental processes, ranging from organ growth and pollen tube guidance to modulation of immune responses (Stegmann et al., 2017; Abarca et al., 2021)". RALFs are not involved in pollen tube guidance but pollen tube growth.

      So, that should be changed in the Introduction sentence. Also, in addition, the authors could cite additional references here to support the sentence such as Mecchia et al., 2017 or Ge et al. , 2017, in addition. 

      Thank you for pointing this out and we apologize for our flaw. We corrected the statement in the revised version of the manuscript and added the citations as requested.

      (h) The new study of Schoenaers et al. Nat Plants, 2024 should now be included in the revised version.

      Thank you. We implemented this reference in the revised manuscript.

      Reviewer #2 (Public Review):

      The genetic material used by the authors to strengthen the connection of RALF signalling and

      PME activity might not be as suitable as an acute inhibition of PME activity.  The PMEI3ox line generated by Peaucelle et al., 2008 is alcohol-inducible. Was expression of the PMEI induced during the experiments? As ethanol inducible systems can be rather leaky, it would not be surprising if PME activity would be reduced even without induction, but maybe this would warrant testing whether PMEI3 is actually overexpressed and/or whether PME activity is decreased. On a similar note, the PMEI5ox plants do not appear to show the typical phenotype described for this line. I personally don't think these lines are necessary to support the study. Short-term interference with PME activity (such as with EGCG) might be more meaningful than life-long PMEI overexpression, in light of the numerous feedback pathways and their associated potential secondary effects. This might also explain why EGCG leads to an increase in pH, as one would expect from decreased PME activity, while PMEI expression (caveats from above apply) apparently does not (Fig 3A-D).

      We agree with reviewer #2. The PMEI3ox line from Peaucelle et al., 2008 is ethanolinducible, but we observed a strong phenotype (at seedling and adult stage) without ethanol induction. We performed all experiments (root growth assays and confocal observations) with as well as without induction using ethanol, leading to similar results. We concluded from that, that the line is either leaky or that overexpression of PMEI3 is already induced upon seed sterilisation with ethanol. Accordingly, we did not intend to use the lines as acute inhibition of PME but rather used the lines to genetically confirm our data derived from acute pharmacological inhibition. We do show in Figure 1G that the levels of de-methylesterified pectin is decreased in the PMEI3ox mutant compared to WT seedlings. It is exactly this alteration that we are exploiting to assess the necessity of charged pectin for RALF1 signalling. Since the apoplastic pH in the PMEI3ox line is not altered compared to WT, we can conclude that the observed effect on RALF1 signalling is entirely due to the altered pectin charge.

      We would like to note that the PMEI5ox line indeed shows the reported root-bending phenotype when grown on plates. We started to perform RALF application assays in liquid medium, because EGCG does not show activity on MS plates. Moreover, it allows us to perform the assays with low amounts of synthetic peptides. The seedling images in our root growth assay might be hence misleading since the assay was done in liquid MS medium and the seedlings were carefully straightened on MS plates before imaging. This transfer makes it difficult to observe the root-bending or -curling phenotype, which is typical for PMEI5ox. 

      At least at first sight, the observation that OGs are able to titrate RALF from pectin binding seems at odds with the idea of cooperative binding with low affinity, leading to high avidity oligomers. Perhaps the can provide a speculative conceptual model of these interactions?

      We added a high concentration of OGs in the media and observed a strong repression of RALF1 activity at the root surface. We assume the OGs form oligomers with RALF peptides in the media, preventing them from penetrating the roots.

      I could not find a description of the OG treatment/titration experiments, but I think it would be important to understand how these were performed with respect to OG concentration, timing of the application, etc.

      Thank you for pointing this out. The description of the OG RALF titration is added in the methods section.

      Reviewer #2 (Recommendations for the Authors):

      Page 3: „and can bind to extracellular pectin" Liu et al, 2024 should maybe also be cited here. 

      Amended.

      I am not so sure about the use of "conceptualizing" in the last sentence of the abstract and elsewhere in the manuscript.

      I would suggest adding a few sentences that describe and differentiate what this study and other recently published works (e.g. Dünser, Liu, Mossou, Lin) have revealed about the pectin association of RALFs, LRXs, and FER to help the non-expert reader to navigate this increasingly complex area. May also be worth mentioning that the previously described pectin sensing function of FER is physically separated from the RALF binding domain (Gronnier et al., 2022)

      Thank you for your constructive comments. We followed your suggestions and further improved the discussion in the revised version of our manuscript.

      Reviewer #3 (Recommendations for the Authors): 

      (1) The authors claim that pectin is something like an extracellular signaling scaffold. In other fields, signalling scaffold refers to proteins that tether the signalling components and regulate/are involved in the signal transduction. Here, pectin is a cell wall structural component whose molecular status is sensed and perceived rather than a functional signaling component. To me, it is FERONIA to be called a signalling scaffold in this case. However, this is my view, and the authors may present their concept. 

      RALF peptides as well as FERONIA bind to de-methylesterified pectin, which is essential for its signalling output. Albeit not being a protein, we propose that pectin functions like a scaffold tethering both signalling components and thereby enabling signalling. FERONIA has been indeed also proposed to function as a scaffold when tethering other signalling components.

      (2) I have no problem with authors using the more general term pectin instead of homogalacturonan throughout the text. Still, authors should, at some point in the text, specify that by pectin, they mean homogalacturonan; the authors did not analyze other pectic types on binding. 

      We followed your suggestion.

      (3) The authors show that RALF1 binds to OGs with a high avidity. Given the fact that OGs released from homogalacturonan upon pathogen infection are Damage-Associated Molecular Patterns (DAMPs), this opens the possibility that this particular activity of RALF1 might actually function in modulation of immune response. I suggest that authors should not exclude this possibility. 

      We fully agree to this possibility for FER-dependent signalling.

      (4) Are there any indications that a similar mechanism can be extrapolated to other FERONIA homologs, such as THESEUS or HERCULES? Although it is not essential to comment, I think this could enrich the discussion.

      This is a highly interesting research question, which we may follow up in our upcoming studies. RALF34, which is considered a ligand for THESEUS, also induced FER internalization, which was also sensitive to PME inhibition. While this requires further investigation, this finding hints at a common mechanism for FER- and THE-dependent RALF peptides.

      (5) I suggest using the model scheme currently in the supplement as a main figure to provide an immediate accessible summary of the findings.

      Thank you for the suggestion to add the summary scheme in the main figures. We followed your suggestion.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      In this manuscript by Wu et al., the authors present the high resolution cryoEM structures of the WT Kv1.2 voltagegated potassium channel. Along with this structure the authors have solved several structures of mutants or experimental conditions relevant to the slow inactivation process that these channels undergo and which is not yet completely understood. 

      One of the main findings is the determination of the structure of a mutant (W366F) that is thought to correspond to the slow inactivated state. These experiments confirm results in similar mutants in different channels from Kv1.2 that indicate that inactivation is associated with an enlarged selectivity filter. 

      Another interesting structure is the complex of Kv1.2 with the pore blocking toxin Dendrotoxin 1. The results shown in the revised version indicate that the mechanism of block is similar to that of related blocking-toxins, in which a lysine residue penetrates in the pore. Surprisingly, in these new structures, the bound toxin results in a pore with empty external potassium binding sites. 

      The quality of the structural data presented in this revised manuscript is very high and allows for unambiguous assignment of side chains. The conclusions are supported by the data. This is an important contribution that should further our understanding of voltage-dependent potassium channel gating. In the revised version, the authors have addressed my previous specific comments, which are appended below. 

      (1) In the main text's reference to Figure 2d residues W18' and S22' are mentioned but are not labeled in the insets. 

      This has been fixed: line 229, p. 9.

      (2) On page 8 there is a discussion of how the two remaining K+ ions in binding sites S3 and S4 prevent permeation K+ in molecular dynamics. However, in Shaker, inactivated W434F channels can sporadically allow K+ permeation with normal single-channel conductance but very reduced open times and open probability at not very high voltages. 

      This is noted in the discussion Lines 497-500, p. 18

      (3) The structures of WT in the absence of K+ shows a narrower selectivity filter, however Figure 4 does not convey this finding. In fact, the structure in Figure 4B is constructed in such an angle that it looks as if the carbonyl distances are increased, perhaps this should be fixed. Also, it is not clear how the distances between carbonyls given in the text on page 12 are measured. Is it between adjacent or kitty-corner subunits? 

      We have changed Fig. 4B to show the same view as in Fig. 4A. In the legend we explain that opposing subunits are shown. We no longer give distances, in view of the lack of detectable carbonyl densities.

      (4) It would be really interesting to know the authors opinion on the driving forces behind slow inactivation. For example, potassium flux seems to be necessary for channels to inactivate, which might indicate a local conformational change is the trigger for the main twisting events proposed here. 

      We address this in the Discussion, line 506-523, pp. 18-19.

      Reviewer #2 (Public Review)

      Cryo_EM structures of the Kv1.2 channel in the open, inactivated, toxin complex and in Na+ are reported. The structures of the open and inactivated channels are merely confirmatory of previous reports. The structures of the dendrotoxin bound Kv1.2 and the channel in Na+ are new findings that will of interest to the general channel community. 

      Review of the resubmission: 

      I thank the authors for making the changes in their manuscript as suggested in the previous review. The changes in the figures and the additions to the text do improve the manuscript. The new findings from a further analysis of the toxin channel complex are welcome information on the mode of the binding of dendrotoxin. 

      A few minor concerns: 

      (1) Line 93-96, 352: I am not sure as to what is it the authors are referring to when they say NaK2P. It is either NaK or NaK2K. I don't think that it has been shown in the reference suggested that either of these channels change conformation based on the K+ concentration. Please check if there is a mistake and that the Nichols et. al. reference is what is being referred to. 

      Thank you for noticing the error. We meant NaK2K and we have changed this throughout.

      (2) Line 365: In the study by Cabral et. al., Rb+ ions were observed by crystallography in the S1, S3 and S4 site, not the S2 site. Please correct. 

      Thank you. We have re-written this section, lines 364-381, pp. 13-14.

      Reviewer #3 (Public Review): 

      Wu et al. present cryo-EM structures of the potassium channel Kv1.2 in open, C-type inactivated, toxin-blocked and presumably sodium-bound states at 3.2 Å, 2.5 Å, 2.8 Å, and 2.9 Å. The work builds on a large body of structural work on Kv1.2 and related voltage-gated potassium channels. The manuscript presents a plethora of structural work, and the authors are commended on the breadth of the studies. The structural studies are well-executed. Although the findings are mostly confirmatory, they do add to the body of work on this and related channels. Notably, the authors present structures of DTx-bound Kv1.2 and of Kv1.2 in a low concentration of potassium (which may contain sodium ions bound within the selectivity filter). These two structures add considerable new information. The DTx structure has been markedly improved in the revised version and the authors arrive at well-founded conclusions regarding its mechanism of block. Regarding the Na+ structure, the authors claim that the structure with sodium has "zero" potassium - I caution them to make this claim. It is likely that some K+ persists in their sample and that some of the density in the "zero potassium" structure may be due to K+ rather than Na+. This can be clarified by revisions to the text and discussion. I do not think that any additional experiments are needed. Overall, the manuscript is well-written, a nice addition to the field, and a crowning achievement for the Sigworth lab. 

      Most of this reviewer's initial comments have been addressed in the revised manuscript. Some comments remain that could be addressed by revisions of the text. 

      Specific comments on the revised version: 

      Quotations indicate text in the manuscript. 

      (1) "While the VSD helices in Kv1.2s and the inactivated Kv1.2s-W17'F superimpose very well at the top (including the S4-S5 interface described above), there is a general twist of the helix bundle that yields an overall rotation of about 3o at the bottom of the VSD." 

      Comment: This seemed a bit confusing. I assume the authors aligned the complete structures - the differences they indicate seem to be slight VSD repositioning relative to the pore rather than differences between the VSD conformations themselves. The authors may wish to clarify. As they point out in the subsequent paragraph, the VSDs are known to be loosely associated with the pore. 

      We aligned the VSDs alone, and it is a twist of the VSD helix bundle.

      This is now clarified in lines 269-273, p. 10.

      (2) Comment: The modeling of DTx into the density is a major improvement in the revision. Figure 3 displays some interactions between the toxin and Kv1.2 - additional side views of the toxin and the channel might allow the reader to appreciate the interactions more fully. The overall fit of the toxin structure into the density is somewhat difficult to assess from the figure. (The authors might consider using ChimeraX to display density and model in this figure.) 

      We have added new panels, and stereo pairs, to Figure 3.

      (3) "We obtained the structure of Kv1.2s in a zero K+ solution, with all potassium replaced with sodium, and were surprised to find that it is little changed from the K+ bound structure, with an essentially identical selectivity filter conformation (Figure 4B and Figure 4-figure supplement 1)." 

      Comment: It should be noted in the manuscript that K+ and Na+ ions cannot be distinguished by the cryo-EM studies - the densities are indistinguishable. The authors are inferring that the observed density corresponds to Na+ because the protein was exchanged from K+ into Na+ on a gel filtration (SEC) column. It is likely that a small amount of K+ remains in the protein sample following SEC. I caution the authors to claim that there is zero K+ in solution without measuring the K+ content of the protein sample. Additionally, it should be considered that K+ may be present in the blotting paper used for cryo-EM grid preparation (our laboratory has noted, for example, a substantial amount of Ca2+ in blotting paper). The affinity of Kv1.2 for K+ has not been determined, to my knowledge - the authors note in the Discussion that the Shaker channel has "tight" binding for K+. It seems possible that some portion of the density in the selectivity filter could be due to residual K+. This caveat should be clearly stated in the main text and discussion. More extensive exchange into Na+, such as performing the entire protein purification in NaCl, or by dialysis (as performed for obtaining the structure of KcsA in low K+ by Y. Zhou et al. & Mackinnon 2001), would provide more convincing removal of K+, but I suspect that the Kv1.2 protein would not have sufficient biochemical stability without K+ to endure this treatment. One might argue that reduced biochemical stability in NaCl could be an indication that there was a meaningful amount of K+ in the final sample used for cryo-EM (or in the particles that were selected to yield the final high-resolution structure).

      We now explain in the Methods section, in more detail the steps taken to avoid any residual Na+ contamination during purification, lines 683-687, pp. 24-25. We have changed the text to point out that the ion species cannot be distinguished in the maps, and note results in NaK2K and KcsA (lines 368-381, pp. 13-14).

      We note that the same procedures to remove K+ were used for the Kv1.2sW17’F structure (line 385, p. 14). We qualify the ion replacement to say that Na+ replaces “essentially” all K+ (line 607, p. 21).

      (4) Referring to the structure obtained in NaCl: "The ion occupancy is also similar, and we presume that Kv1.2 is a conducting channel in sodium solution." 

      Comment: Stating that "Kv1.2 is a conducting channel in sodium solution" and implying that conduction of Na+ is achieved by an analogous distribution of ion binding sites as observed for K+ are strong statements to make - and not justified by the experiments provided. Electrophysiology would be required to demonstrate that the channel conducts sodium in the absence of K+. More complete ionic exchange, better control of the ionic conditions (Na+ vs K+), and affinity measurements for K+ would be needed to determine the distribution of Na+ in the filter (as mentioned above). At minimum, the authors should revise and clarify what the intended meaning of the statement "we presume that Kv1.2 is a conducting channel in sodium solution". As mentioned above, it seems possible/likely that a portion of the density in the filter may be due to K+. 

      We now present a more detailed argument (lines 376 to 381, pp. 13-14.)

      Recommendations for the authors: 

      Reviewing Editor: 

      After consultation, the reviewers agree that, although the authors have answered most of the comments raised in the previous review, there remains a concern about the structure obtained in the presence if Na. Given that Kv1.2 is more reluctant to slow inactivation, the conducting structure in Na+ could be due to this fact or that it really has higher affinity for K+ than Na+. In the presence of even a small contamination by K+, this ion could thus occupy the selectivity filter, resulting in an open conformation. The authors should clearly state the steps taken to ensure no contamination by K+. It is also possible that indeed the open structure occurs even in the presence of Na+ in the selectivity filter. This should be also discussed, given that this has been observed in other potassium channel structures. 

      Reviewer #1 (Recommendations For The Authors): 

      In this revised version of the manuscript, the authors have adequately addressed my previous points and improved the clarity and readability of the manuscript. This is a compelling work that shows inactivated structures if the Kv1.2 potassium channel, especially interesting is a structure in the absence of extracellular potassium ions, that can help understand how a reduction in the availability of these ions speed up entrance into the inactivated state in these ion channels. 

      I would just recommend that in the absence of functional data (current recordings) when potassium is removed, the authors just use caution in ascribing this structure to an inactivated state. Also, it should be mentioned that the observed ion densities observed in the pore cannot unambiguously be identified as sodium ions. 

      Reviewer #3 (Recommendations For The Authors): 

      (1)  "The nearby Leu9 is also important as its substitution by alanine also decreases affinity 1000-fold, but we observe no contacts between this residue and residues of the Kv1.2s channel." 

      Comment: It seems early in the text to mention the potential interaction of Leu9 to the channel structure. The authors may wish to discuss Leu9 later in the manuscript - a figure showing the location of Leu9 would strengthen the statement. Any hypothesis on why mutation of it has such a profound effect? 

      Add a figure panel showing Leu9 position.

      We have rewritten the text as suggested, and have identified Leu9 in several panels of Fig. 3.

      (2)  "The X-ray structure of a-DTX (Figure 3A)" 

      Comment: The authors may wish to cite a reference to this X-ray structure. 

      We now cite Skarzynski (1992) on line 321, p. 12.

    1. Weland the blade-winder     suffered woe. That steadfast man     knew misery. Sorrow and longing     walked beside him, wintered in him,     kept wearing him down after Nithad     hampered and restrained him, lithe sinew-bonds     on the better man. That passed over,     this can too. For Beadohilde     her brother’s death weighed less heavily     than her own heartsoreness once it was clearly     understood she was bearing a child.     Her ability to think and decide     deserted her then. That passed over,     this can too. We have heard tell     of Mathilde’s laments, the grief that afflicted     Geat’s wife. Her love was her bane,     it banished sleep. That passed over,     this can too. For thirty winters–     it was common knowledge– Theodric held     the Maerings’ fort. That passed over,     this can too. Earmonric     had the mind of a wolf, by all accounts     a cruel king, lord of the far flung     Gothic outlands. Everywhere men sat     shackled in sorrow, expecting the worst,     wishing often he and his kingdom     would be conquered. That passed over,     this can too. A man sits mournful,     his mind in darkness, so daunted in spirit     he deems himself ever after     fated to endure. He may think then     how throughout this world the Lord in his wisdom     often works change– meting out honor,     ongoing fame to many, to others     only their distress. Of myself, this much     I have to say: for a time I was poet     of the Heoden people, dear to my lord.     Deor was my name. For years I enjoyed     my duties as minstrel and that lord’s favor,     but now the freehold and land titles     he bestowed upon me once he has vested in Heorrenda,     master of verse-craft. That passed over,     this can too. Welund him be wurman      wræces cunnade, anhydig eorl     earfoþa dreag, hæfde him to gesiþþe     sorge and longaþ, wintercealde wræce,     wean oft onfond siþþan hine Niðhad on     nede legde, swoncre seonobende     on syllan monn. Þæs ofereode,     þisses swa mæg. Beadohilde ne wæs     hyre broþra deaþ on sefan swa sar     swa hyre sylfre þing, þæt heo gearolice     ongietan hæfde þæt heo eacen wæs;     æfre ne meahte þriste geþencan     hu ymb þæt sceolde. Þæs ofereode,     þisses swa mæg. We þæt Mæðhilde      mone gefrugnon wurdon grundlease     Geates frige, þæt hi seo sorglufu     slæp ealle binom. Þæs ofereode,     þisses swa mæg. Ðeodric ahte      þritig wintra Mæringa burg;     þæt wæs monegum cuþ. Þæs ofereode,     þisses swa mæg. We geascodan     Eormanrices wylfenne geþoht;     ahte wide folc Gotena rices;     þæt wæs grim cyning. Sæt secg monig     sorgum gebunden, wean on wenan,     wyscte geneahhe þæt þæs cynerices     ofercumen wære. Þæs ofereode,     þisses swa mæg. Siteð sorgcearig,     sælum bidæled, on sefan sweorceð,     sylfum þinceð þæt sy endeleas     earfoða dæl, mæg þonne geþencan     þæt geond þas woruld witig Dryhten     wendeþ geneahhe, eorle monegum     are gesceawað, wislicne blæd,     sumum weana dæl. Þæt ic bi me sylfum     secgan wille, þæt ic hwile wæs     Heodeninga scop, dryhtne dyre;     me wæs Deor noma. Ahte ic fela wintra     folgað tilne, holdne hlaford,     oþ þæt Heorrenda nu, leoðcræftig monn,     londryht geþah þæt me eorla hleo     ær gesealde. Þæs ofereode,     þisses swa mæg.

      Loved how it sounded, they did say the same things over and over again, and it was so cool that they had an old English part.

    1. When the reading brain skims like this, it reduces time allocated to deep reading processes. In other words, we don’t have time to grasp complexity, to understand another’s feelings, to perceive beauty, and to create thoughts of the reader’s own.

      When reading so quick, the brain has little time to process all of the information that was there. By doing this it will affect the way you may think on a specific topic as you may not be able to cover everything in that text.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      My main point of concern is the precision of dissection. The authors distinguish cells isolated from the tailbud and different areas in the PSM. They suggest that the cell-autonomous timer is initiated, as cells exit the tailbud.

      This is also relevant for the comparison of single cells isolated from the embryo and cells within the embryo. The dissection will always be less precise and cells within the PSM4 region could contain tailbud cells (as also indicated in Figure 1A), while in the analysis of live imaging data cells can be selected more precisely based on their location. This could therefore contribute to the difference in noise between isolated single cells and cells in the embryo. This could also explain why there are "on average more peaks" in isolated cells (p. 6, l. 7).

      This aspect should be considered in the interpretation of the data and mentioned at least in the discussion. (It does not contradict their finding that more anterior cells oscillate less often and differentiate earlier than more posterior ones.)

      Reviewer #1 rightly points out that selecting cells in a timelapse is more precise than manual dissection. Manual dissection is inherently variable but we believe in general it is not a major source of noise in our experiments. To control for this, we compared the results of 11 manual dissections of the posterior quarter of the PSM (PSM4) with those of the pooled PSM4 data. In general, we did not see large differences in the distributions of peak number or arrest timing that would markedly increase the variability of the pooled data above that of the individual dissections (Figure 1 – supplement figure 7). We have edited the text in the Results to highlight this control experiment (page 6, lines 13-17).

      It is of course possible that we picked up adjacent TB cells when dissecting PSM4, however the reviewer’s assertion that inclusion of TB cells “could also explain why there are "on average more peaks" in isolated cells” is incorrect. Later in the paper we show that cells from the TB have almost identical distributions to PSM4 (mean ± SD, PSM4 4.36 ± 1.44; TB 4.26 ± 1.35; Figure 4 _ supplement 1). Thus, inadvertent inclusion of TB cells while dissecting would in fact not increase the number of peaks.

      Here, the authors focus on the question of how cells differentiate. The reverse question is not addressed at all. How do cells maintain their oscillatory state in the tailbud? One possibility is that cells need external signals to maintain that as indicated in Hubaud et al. 2014. In this regard, the definition of tailbud is also very vague. What is the role of neuromesodermal progenitors? The proposal that the timer is started when cells exit the tailbud is at this point a correlation and there is no functional proof, as long as we do not understand how cells maintain the tailbud state. These are points that should be considered in the discussion.

      The reviewer asks “How do cells maintain their oscillatory state in the tailbud?”. This is a very interesting question, but as recognized by the reviewer, beyond the scope of our current paper.

      We now further emphasize the point “One possibility is that cells need external signals to maintain … as indicated in Hubaud et al. 2014” in the Discussion and added a reference to the review Hubaud and Pourquié 2014 (Signalling dynamics in vertebrate segmentation. Nat Rev Mol Cell Biol 15, 709–721 (2014). https://doi.org/10.1038/nrm3891) (page 18, lines 19-22).

      To clarify the definition of the TB, we have stated more clearly in the results (page 15, lines 8-12) that we defined TB cells as all cells posterior to the notochord (minus skin) and analyzed those that survived

      >5 hours post-dissociation, did not divide, and showed transient Her1-YFP dynamics.

      The reviewer asks: What is the role of neuromesodermal progenitors? In responding to this, we refer to Attardi et al., 2018 (Neuromesodermal progenitors are a conserved source of spinal cord with divergent growth dynamics. Development. 2018 Nov 9;145(21):dev166728. doi: 10.1242/dev.166728).

      Around the stage of dissection in zebrafish in our work, there is a small remaining group of cells characterized as NMPs (Sox2 +, Tbxta+ expression) in the dorsal-posterior wall of the TB. These NMPs rarely divide and are not thought to act as a bipotential pool of progenitors for the elongating axis, as is the case in amniotes, rather contributing to the developing spinal cord. How this particular group of cells behaves in culture is unclear as we did not subdivide the TB tissue before culturing. It would be possible to specifically investigate these NMPs regarding a role in TB oscillations, but given the results of Attardi et al., 2018 (small number of cells, low bipotentiality), we argue that it would not be significant for the conclusions of the current work. To indicate this, we included a sentence and a citation of this paper in the results towards the beginning of the section on the tail bud (page 15, lines 8-12).

      The authors observe that the number of oscillations in single cells ex vivo is more variable than in the embryo. This is presumably due to synchronization between neighbouring cells via Notch signalling in the embryo. Would it be possible to add low doses of Notch inhibitor to interfere with efficient synchronization, while at the same time keeping single cell oscillations high enough to be able to quantify them?

      It is a formal possibility that Delta-Notch signaling may have some impact on the variability in the number of oscillations. However, we argue that the significant amount of cell tracking work required to carry out the suggested experiments would not be justified, considering what has been previously shown in the literature. If Delta-Notch signaling was a major factor controlling the variability of the intrinsic program that we describe, then we would expect that in Delta-Notch mutants the anterior- posterior limits of cyclic gene expression in the PSM would extend beyond those seen in wildtype embryos. Specifically, we might expect to see her1 expression extending more anteriorly in mutants to account for the dramatic increase in the number of cells that have 5, 6, 7 and 8 cycles in culture (Fig. 1E versus Fig. 1I). However, as shown in Holley et al., 2002 (Fig. 5A, B; her1 and the notch pathway function within the oscillator mechanism that regulates zebrafish somitogenesis. Development. 2002 Mar;129(5):1175-83. doi: 10.1242/dev.129.5.1175), the anterior limit of her1 expression in the PSM in DeltaD mutants (aei) is not different to WT. Thus, Delta-Notch signaling may exert a limited control over the number of oscillations, but likely not in excess of one cycle difference.

      In the same direction, it would be interesting to test if variation is decreased, when the number of isolated cells is increased, i.e. if cells are cultured in groups of 2, 3 or 4 cells, for instance.

      This is a great proposal – however the culture setup used here is a wide-field system that doesn’t allow us to accurately follow more than one cell at a time. Cells that adhere to each other tend to crawl over each other, blurring their identity in Z. This is also why we excluded dividing cells in culture from the analysis. Experiments carried out with a customized optical setup will be needed to investigate this in the future.

      It seems that the initiation of Mesp2 expression is rather reproducible and less noisy (+/- 2 oscillation cycles), while the number of oscillations varies considerably (and the number of cells continuing to oscillate after Mesp2 expression is too low to account for that). How can the authors explain this apparent discrepancy?

      The observed tight linkage of the Mesp onset and Her1 arrest argue for a single timing mechanism that is upstream of both gene expression events; indeed, this is one of the key implications of the paper. However, the infrequent dissociation of these events observed in FGF-treated cells suggests that more than one timing pathway could be involved, although there are other interpretations. We’ve added more discussion in the text on one vs multi-timers (page 17, lines 19-23 – page 18, line 1 - 8)., see next point.

      The observation that some cells continue oscillating despite the upregulation of Mesp2 should be discussed further and potential mechanism described, such as incomplete differentiation.

      This is an infrequent (5 out of 54 cells) and interesting feature of PSM4 cells in the presence of FGF. We imagine that this disassociation of clock arrest from mesp on-set timing could be the result of alterations in the thresholds in the sensing mechanisms controlling these two processes. Alternatively - as reviewer 2 argues - it might reflect multiple timing mechanisms at work. We have added a discussion of these alternative interpretations (page 17, lines 19-23 – page 18, line 1 - 8).

      Fig. 3 supplement 3 B missing

      It’s there in the BioRxiv downloadable PDF and full text – but seems to not be included when previewing the PDF. Thanks for the heads up.

      Reviewer #2 (Public Review):

      The authors demonstrate convincingly the potential of single mesodermal cells, removed from zebrafish embryos, to show cell-autonomous oscillatory signaling dynamics and differentiation. Their main conclusion is that a cell-autonomous timer operates in these cells and that additional external signals are integrated to tune cellular dynamics. Combined, this is underlying the precision required for proper embryonic segmentation, in vivo. I think this work stands out for its very thorough, quantitative, single-cell real-time imaging approach, both in vitro and also in vivo. A very significant progress and investment in method development, at the level of the imaging setup and also image analysis, was required to achieve this highly demanding task. This work provides new insight into the biology underlying embryo axis segmentation.

      The work is very well presented and accessible. I think most of the conclusions are well supported. Here a my comments and suggestions:

      The authors state that "We compare their cell-autonomous oscillatory and arrest dynamics to those we observe in the embryo at cellular resolution, finding remarkable agreement."

      I think this statement needs to be better placed in context. In absolute terms, the period of oscillations and the timing of differentiation are actually very different in vitro, compared to in vitro. While oscillations have a period of ~30 minutes in vivo, oscillations take twice as long in vitro. Likewise, while the last oscillation is seen after 143 minutes in vivo, the timing of differentiation is very significantly prolonged, i.e.more than doubled, to 373min in vitro (Supplementary Figure 1-9). I understand what the authors mean with 'remarkable agreement', but this statement is at the risk of being misleading. I think the in vitro to in vivo differences (in absolute time scales) needs to be stated more explicitly. In fact, the drastic change in absolute timescales, while preserving the relative ones, i.e. the number of oscillations a cell is showing before onset of differentiation remains relatively invariant, is a remarkable finding that I think merits more consideration (see below).

      We have changed the text in the abstract (page 1, line 28) to clarify that the agreement is in the relative slowing, intensity increases and peak numbers.

      One timer vs. many timers

      The authors show that the oscillation clock slowing down and the timing of differentiation, i.e. the time it takes to activate the gene mesp, are in principle dissociable processes. In physiological conditions, these are however linked. We are hence dealing with several processes, each controlled in time (and thereby space). Rather than suggesting the presence of ‘a timer’, I think the presence of multiple timing mechanisms would reflect the phenomenology better. I would hence suggest separating the questions more consistently, for instance into the following three:

      a.  what underlies the slowing down of oscillations?

      b.  what controls the timing of onset of differentiation?

      c.  and finally, how are these processes linked?

      Currently, these are discussed somewhat interchangeably, for instance here: “Other models posit that the slowing of Her oscillations arise due to an increase of time-delays in the negative feedback loop of the core clock circuit (Yabe, Uriu, and Takada 2023; Ay et al. 2014), suggesting that factors influencing the duration of pre-mRNA splicing, translation, or nuclear transport may be relevant. Whatever the identity, our results indicate the timer ought to exert control over differentiation independent of the clock.”(page 14). In the first part, the slowing down of oscillations is discussed and then the authors conclude on 'the timer', which however is the one timing differentiation, not the slowing down. I think this could be somewhat misleading.

      To help distinguish the clock’s slowing & arrest from differentiation, we have clarified the text in how we describe our experiments using her1-/-; her7-/- cells (page 10, lines 9-20).

      From this and previous studies, we learn/know that without clock oscillations, the onset of differentiation still occurs. For instance in clock mutant embryos (mouse, zebrafish), mesp onset is still occurring, albeit slightly delayed and not in a periodic but smooth progression. This timing of differentiation can occur without a clock and it is this timer the authors refer to "Whatever the identity, our results indicate the timer ought to exert control over differentiation independent of the clock." (page 14). This 'timer' is related to what has been previously termed 'the wavefront' in the classic Clock and Wavefront model from 1976, i.e. a "timing gradient' and smooth progression of cellular change. The experimental evidence showing it is cell-autonomous by the time it has been laid down,, using single cell measurements, is an important finding, and I would suggest to connect it more clearly to the concept of a wavefront, as per model from 1976.

      We have been explicit about the connection to the clock & wavefront in the discussion (page 17, line 12-17).

      Regarding question a., clearly, the timer for the slowing down of oscillations is operating in single cells, an important finding of this study. It is remarkable to note in this context that while the overall, absolute timescale of slowing down is entirely changed by going from in vivo to in vitro, the relative slowing down of oscillations, per cycle, is very much comparable, both in vivo and in vivo.

      We have now pointed out the relative nature of this phenomenon in the abstract, page 1, line 28.

      To me, while this study does not address the nature of this timer directly, the findings imply that the cell-autonomous timer that controls slowing down is, in fact, linked to the oscillations themselves. We have previously discussed such a timer, i.e. a 'self-referential oscillator' mechanism (in mouse embryos, see Lauschke et al., 2013) and it seems the new exciting findings shown here in zebrafish provide important additional evidence in this direction. I would suggest commenting on this potential conceptual link, especially for those readers interested to see general patterns.

      While we posit that the timer provides positional info to the clock to slow oscillations and instruct its arrest – we do not believe that “the findings imply that the cell-autonomous timer that controls slowing down is, in fact, linked to [i.e., governed by] the oscillations themselves.”. As we show, in her1-/-; her7-/- embryos lacking oscillations, the timing / positional information across the PSM still exists as read-out by Mesp expression. Is this different positional information than that used by the clock? – possibly – but given the tight linkage between Mesp onset and the timing/positioning of clock arrest, both cell-autonomously and in the embryo, we argue that the simplest explanation is that the timing/positional information used by the clock and differentiation are the same. Please see page 10, lines 9-20, as well as the discussion (page 17, lines 19-23; page 18. Lines 1-8 ).

      We agree that the timer must communicate to the clock– but this does not mean it is dependent on the clock for positional information.

      Regarding question c., i.e. how the two timing mechanisms are functionally linked, I think concluding that "Whatever the identity, our results indicate the timer ought to exert control over differentiation independent of the clock." (page 14), might be a bit of an oversimplification. It is correct that the timer of differentiation is operating without a clock, however, physiologically, the link to the clock (and hence the dependence of the timescale of clock slowing down), is also evident. As the author states, without clock input, the precision of when and where differentiation occurs is impacted. I would hence emphasize the need to answer question c., more clearly, not to give the impression that the timing of differentiation does not integrate the clock, which above statement could be interpreted to say.

      As far as we can tell, we do not state that “without clock input, the precision of when and where differentiation occurs is impacted”, and we certainly do not want to give this impression. In contrast, as mentioned above, the her1-/-; her7-/- mutant embryo studies indicate that the lack of a clock signal does not change the differentiation timing, i.e. it does not integrate the clock. Of course, in the formation of a real somite in the embryo, the clock’s input might be expected to cause a given cell to differentiate a little earlier or later so as to be coordinated with its neighbors, for example, along a boundary. But this magnitude of timing difference is within one clock cycle at most, and does not match the large variation seen in the cultured cells that spans over many clock cycles.

      A very interesting finding presented here is that in some rare examples, the arrest of oscillations and onset of differentiation (i.e. mesp) can become dissociated. Again, this shows we deal here with interacting, but independent modules. Just as a comment, there is an interesting medaka mutant, called doppelkorn (Elmasri et al. 2004), which shows a reminiscent phenotype "the Medaka dpk mutant shows an expansion of the her7 expression domain, with apparently normal mesp expression levels in the anterior PSM.". The authors might want to refer to this potential in vivo analogue to their single cell phenotype.

      Thank you, we had forgotten this result. Although we do not agree that this result necessarily means there are two interacting modules, we have included a citation to the paper, along with a discussion of alternative explanations for the dissociation (page 18, lines 2-14).

      One strength of the presented in vitro system is that it enables precise control and experimental perturbations. A very informative set of experiments would be to test the dependence of the cell-autonomous timing mechanisms (plural) seen in isolated cells on ongoing signalling cues, for instance via Fgf and Wnt signaling. The inhibition of these pathways with well-characterised inhibitors, in single cells, would provide important additional insight into the nature of the timing mechanisms, their dependence on signaling and potentially even into how these timers are functionally interdependent.

      We agree and in future experiments we are taking advantage of this in vitro system to directly investigate the effect of signaling cues on the intrinsic timing mechanism.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      O'Leary and colleagues present data identifying several procedures that alter discrimination between novel and familiar objects, including time, environmental enrichment, Rac-1, context reexposure, and brief reminders of the familiar object. This is complimented with an engram approach to quantify cells that are active during learning to examine how their activation is impacted following each of the above procedures at test. With this behavioral data, authors apply a modeling approach to understand the factors that contribute to good and poor object memory recall.

      We thank the Reviewer for summarizing the scope and depth of our manuscript, and indeed for recognizing our efforts. We engage below with the Reviewer’s specific criticisms.

      Strengths:

      Authors systematically test several factors that contribute to poor discrimination between novel and familiar objects. These results are extremely interesting and outline essential boundaries of incidental, nonaversive memory.<br /> These results are further supported by engram-focused approaches to examine engram cells that are reactivated in states with poor and good object recognition recall.

      We thank the Reviewer for these positive comments.

      Weaknesses:

      For the environmental enrichment, authors seem to suggest objects in the homecage are similar to (or reminiscent of) the familiar object. Thus, the effect of improved memory may not be related to enrichment per se as much as it may be related to the preservation of an object's memory through multiple retrievals, not the enriching experiences of the environment itself. This would be consistent with the brief retrieval figure. Authors should include a more thorough discussion of this.

      This is one of the main issues highlighted by the Editor and the Reviewers. We agree that these results dove-tail with the reminder experiments. We have included additional discussion, see line 510-546.

      Authors should justify the marginally increased number of engram cells in the non-enrichment group that did not show object discrimination at test, especially relative to other figures. More specific cell counting criteria may be helpful for this. For example, was the DG region counted for engram and cfos cells or only a subsection?

      There was a marginal, but non-significant increase in the number of labelled cells within the standard housed mice in Figure 3f. The cell counting criteria was the same across experimental groups and conditions, where the entire dorsal and ventral blade of the dorsal DG was counted for each animal. This non-statistically significant variance may be due to surgical and viral spread difference between mice. We have clarified this in the manuscript, see line 229-232.

      It is unclear why the authors chose a reactivation time point of 1hr prior to testing. While this may be outside of the effective time window for pharmacological interference with reconsolidation for most compounds, it is not necessarily outside of the structural and functional neuronal changes accompanied by reconsolidation-related manipulations.

      A control experiment was performed to demonstrated that a brief reminder exposure of 5 mins on its own was insufficient to induce new learning that formed a lasting memory (Supplementary Figure S4a). Mice given only a brief acquisition period of 5 mins, exhibited no preference for the novel object when tested 1 hour after training, suggesting the absence of a lasting object memory (Supplementary Figure S4b & c). We therefore used the 1-hour time point for the brief reminder experiment in Figure 4a. We have clarified this within the manuscript and supplementary data see line 258-264.

      Figure 5: Levels of exploration at test are inconsistent between manipulations. This is problematic, as context-only reexposures seem to increase exploration for objects overall in a manner that I'm unsure resembles 'forgetting'. Instead, cross-group comparisons would likely reveal increased exploration time for familiar and novel objects. While I understand 'forgetting' may be accompanied by greater exploration towards objects, this is inconsistent across and within the same figure. Further, this effect is within the period of time that rodents should show intact recognition. Instead, context-only exposures may form a competing (empty context) memory for the familiar object in that particular context.

      The Reviewer raises an important question, and we agree with the Reviewer that there should be caution and qualification around interpreting these results as “forgetting”. Indeed, for the context-only rexposures, cross-group comparisons show increased exploration time for familiar and novel objects. As the mice exhibit relatively high exploration of both the novel and familiar objects. An alternative explanation would be that the mice have not truly forgotten the familiar object, but rather as the mouse has not seen the familiar object in the last 6 context only sessions, its reappearance makes it somewhat novel again. Therefore, this change in the object’s reappearance triggers the animal’s curiosity, and in turn drives exploration by the animal. In addition, the context-only exposures may form a competing memory for the familiar object in that particular context. We have highlighted this in the results and also included greater discussion. See lines 306-315.

      I am concerned at the interpretation that a memory is 'forgotten' across figures, especially considering the brief reminder experiments. Typically, if a reminder session can trigger the original memory or there is rapid reacquisition, then this implies there is some savings for the original content of the memory. For instance, multiple context retrievals in the absence of an object reminder may be more consistent with procedures that create a distinct memory and subsequently recruit a distinct engram.

      These findings raise an important question regarding the interpretation of ‘forgetting’. If a reminder trial or experience can trigger the original memory, or there is rapid reacquisition, then this would suggest there is a degree of savings for the original memory content (85, 86). Previous work has emphasized retrieval deficits as a key characteristic of memory impairment, supporting the idea that memory recall or accessibility may be driven by learning feedback from the environment (7, 8, 14–18). Within our behavioral paradigm, a lack of memory expression would still constitute forgetting due to the loss of learned behavioral response in the presence of natural retrieval cues. The changes in memory expression may therefore underlie the adaptive nature of forgetting. This is consistent with the idea that the engram is intact and available, but not accessible. Here we studied natural forgetting, and our data showing memory retrieval following optogenetic reactivation demonstrates that the original engram persists at a cellular level, otherwise activation of those cells would no longer trigger memory recall. We also agree with the reviewer that multiple context retrievals may indeed lead to the formation of a second distinct engram that competes with the original. Recent work suggests that retroactive interference emerges from the interplay of multiple engrams competing for accessibility (18). We have added clarification and included extra discission of this interpretation. See lines 589-598.

      Authors state that spine density decreases over time. While that may be generally true, there is no evidence that mature mushroom spines are altered or that this is consistent across figures. Additionally, it's unclear if spine volume is consistently reduced in reactivated and non-reactivated engram cells across groups. This would provide evidence that there is a functionally distinct aspect of engram cells that is altered consistently in procedures resulting in poor recognition memory (e.g. increased spine density relative to spine density of non-reactivated engram cells and non-engram cells)

      We thank the Reviewer for their helpful comments on explaining our engram dendritic spine data. We agree with the Reviewer that an analysis of the changes in spine type, as well as the difference between engram and non-engram spines as well and reactivation and non-reactivated engram spines would be interesting and may help to further illuminate the morphological changes of forgetting and memory retrieval. Indeed, future analysis could determine if spine density is reduced in reactivated and non-reactivated engram cells or indeed across engram non-engram cells within different learning conditions. This avenue of investigation could determine if there is a functionally distinct aspect of engram cells that are altered following forgetting (67). However, such analysis is beyond the scope of this study. We have highlighted this limitation and included its discussion. See lines 493-499.

      Authors should discuss how the enrichment-neurogenesis results here are compatible with other neurogenesis work that supports forgetting.

      We validated the effectiveness of the enrichment paradigm to enhance neural plasticity by measuring adult hippocampal neurogenesis. The hippocampus has been identified as one of the only regions where postnatal neurogenesis continues throughout life (75). Levels of adult hippocampal neurogenesis do not remain constant throughout life and can be altered by experience (41–43, 57).  In addition, adult born neurons have been shown to contribute to the process of forgetting (74, 78, 79). Although the contribution of adult born neurons to cognition and the memory engram is not fully understood (80, 81). Mishra et al, showed that immature neurons were actively recruited into the engram following a hippocampal-dependent task (67). Moreover, increasing the level of neurogenesis rescued memory deficits by restoring engram activity (67). Augmenting neurogenesis further rescued the deficits in spine density in both immature and mature engram neurons in a mouse model of Alzheimer’s disease (67). Whether neurogenesis alters spine density on differentially for reactivated or non-reactivation engrams cells remains to be investigated (67, 68). This avenue of research would help to illuminate the morphological changes following forgetting and provide evidence if there is a functionally distinct aspect of engram cells that is altered in forgetting (67, 68). Our engram labelling strategy which utilized c-fos-tTA transgenic mice combined with an AAV9-TRE-ChR2-eYFP virus does not necessarily label sufficient immature neurons. Future work could utilize a different engram preparation, such as a genetic labelling strategy (TRAP2) or using a different immediate early gene promoter such as Arc to investigate the contribution of new-born neurons to the engram ensemble. We have added additional discussion of how our work fits with previous literature investigating neurogenesis and forgetting. See lines 547-565.

      Reviewer #2 (Public Review):

      Summary:

      The manuscript examines an important question about how an inaccessible, natural forgotten memory can be retrieved through engram ensemble reactivation. It uses a variety of strategies including optogenetics, behavioral and pharmacological interventions to modulate engram accessibility. The data characterize the time course of natural forgetting using an object recognition task, in which animals can retrieve 1 day and 1 week after learning, but not 2 weeks later. Forgetting is correlated with lower levels of cell reactivation (c-fos expression during learning compared to retrieval) and reduction in spine density and volume in the engram cells. Artificial activation of the original engram was sufficient to induce recall of the forgotten object memory while artificial inhibition of the engram cells precluded memory retrieval. Mice housed in an enriched environment had a slower rate of forgetting, and a brief reminder before the retrieval session promoted retrieval of a forgotten memory. Repeated reintroduction to the training context in the absence of objects accelerated forgetting. Additionally, activation of Rac1-mediated plasticity mechanisms enhanced forgetting, while its inhibition prolonged memory retrieval. The authors also reproduce the behavioral findings using a computational model inspired by Rescorla-Wagner model. In essence, the model proposes that forgetting is a form of adaptive learning that can be updated based on prediction error rules in which engram relevancy is altered in response to environmental feedback.

      We thank the Reviewer for summarizing the scope and depth of our manuscript, and for recognizing our efforts. We engage below the Reviewer’s specific criticisms of our interpretations.

      Strengths:

      (1) The data presented in the current paper are consistent with the authors claim that seemingly forgotten engrams sometimes remain accessible. This suggests that retrieval deficits can lead to memory impairments rather than a loss of the original engram (at least in some cases).

      We thank the Reviewer for their positive summary.

      (2) The experimental procedures and statistics are appropriate, and the behavioral effects appear to be very robust. Several key effects are replicated multiple times in the manuscript.

      We thank the Reviewer for their positive comments.

      Weaknesses:

      (1) My major issue with the paper is the forgetting model proposed in Figure 7. Prior work has shown that neutral stimuli become associated in a manner similar to conditioned and unconditioned stimuli. As a result, the Rescorla-Wagner model can be used to describe this learning (Todd & Homes, 2022). In the current experiments, the neutral context will become associated with the unpredicted objects during training (due to a positive prediction error). Consequently, the context will activate a memory for the objects during the test, which should facilitate performance. Conversely, any manipulation that degrades the association between the context and object should disrupt performance. An example of this can be found in Figure 5A. Exposing the mice to the context in the absence of the objects should violate their expectations and create a negative prediction error. According to the Rescorla-Wagner model, this error will create an inhibitory association between the context and the objects, which should make it harder for the former to activate a memory of the latter (Rescorla & Wagner, 1972). As a result, performance should be impaired, and this is what the authors find. However, if the cells encoding the context and objects were inhibited during the context-alone sessions (Figure 5D) then no prediction error should occur, and inhibitory associations would not be formed. As a result, performance should be intact, which is what the authors observe.

      What about forgetting of the objects that occurs over time? Bouton and others have demonstrated that retrieval failure is often due to contextual changes that occur with the passage of time (Bouton, 1993; Rosas & Bouton, 1997, Bouton, Nelson & Rosas, 1999). That is, both internal (e.g. state of the animal) and external (e.g. testing room, chambers, experimenter) contextual cues change over time. This shift makes it difficult for the context to activate memories with which it was once associated (in the current paper, objects). To overcome this deficit, one can simply re-expose animals to the original context, which facilitates memory retrieval (Bouton, 1993). In Figure 2D, the authors do something similar. They activate the engram cells encoding the original context and objects, which enhances retrieval.

      Therefore, the forgetting effects presented in the current paper can be explained by changes in the context and the associations it has formed with the objects (excitatory or inhibitory). The results are perfectly predicted by the Rescorla-Wagner model and the context-change findings of Bouton and others. As a result, the authors do not need to propose the existence of a new "forgetting" variable that is driven by negative prediction errors. This does not add anything novel to the paper as it is not necessary to explain the data (Figures 7 and 8).

      We thank the reviewer for clearly explaining their concern about our model. We are very sorry that we did not sufficiently explain that our model is, in fact, based on the classic Rescorla-Wagner model. The key equation of the model that updates “engram strength”  is equivalent to the canonical Rescorla-Wagner model that is commonly used in research on reinforcement learning and decision-making (105). One potential minor difference is that we crucially assume different learning rates for positive and negative prediction errors. However, this variant of the Rescorla-Wagner model is common in the computational literature and is generally not regarded as a qualitatively different kind of model. In fact, it allows us to capture that establishing an object-context association (after a positive prediction error) is faster than the forgetting process (through negative errors).

      The other equations that are explained in detail in the Methods are necessary to simulate exploration behavior and render the model suitable for model fitting. Concerning exploration behavior, we use the softmax function, which is commonly used in combination with the Rescorla-Wager model, in order to translate the learned quantity (in our case, engram strength) into behavior (here exploration). The other equations are necessary to fit the model to the data (learning rate α and behavioral variability in exploration behavior).

      Therefore, we fully agree with the reviewer that the Rescorla-Wagner can explain our empirical results, in particular by assuming that the different manipulations affect the strength of object-context associations, which, in turn, governs forgetting as behaviorally observed. 

      In our previous version of the manuscript, we only referred to the Rescorla-Wagner model directly in the Methods. But to make this important point clearer, we now refer to the origin of the model multiple times in the Results section as well. See lines 81, 386-393.

      We also agree with the reviewer that the learning/forgetting process can be described in terms of changes in object-context associations (e.g., inhibitory associations after a negative prediction error). Therefore, we now explicitly refer to the relationship between updated object-context associations and forgetting and highlight that we believe that stronger associations signal higher engram “relevancy”. See lines 386-393.

      We have extended Figure 7 (new panels a and b), where we illustrate the idea that (a) object-context associations govern forgetting and (b) show the key Rescorla-Wagner equation, including a simple explanation of the main terms (engram strength, prediction error, and learning rate). Finally, we have also extended our discussion of the model, where we now directly state that the Rescorla-Wagner model captures the key results of our experiments. See lines 573-580.

      In order to further support a link between our empirical data and computational modeling, we also added extra experiments that showed the modulation of engram cells within the dentate gyrus can regulate these object-context associations. See Supplementary Figure 12a-f and lines 401-404.

      To summarize our reply, we agree with the reviewer’s comment and hope that we have clarified the direct relationship to the Rescorla-Wagner model.

      (2) I also have an issue with the conclusions drawn from the enriched environment experiment (Figure 3). The authors hypothesize that this manipulation alleviates forgetting because "Experiencing extra toys and objects during environmental enrichment that are reminiscent of the previously learned familiar object might help maintain or nudge mice to infer a higher engram relevancy that is more robust against forgetting.". This statement is completely speculative. A much simpler explanation (based on the existing literature) is that enrichment enhances synaptic plasticity, spine growth, etc., which in turn reduces forgetting. If the authors want to make their claim, then they need to test it experimentally. For example, the enriched environment could be filled with objects that are similar or dissimilar to those used in the memory experiments. If their hypothesis is correct, only the similar condition should prevent forgetting.

      We thank the Reviewer for this alternative perspective on our findings. First of all, we agree that this statement is speculative. The effects of enrichment on neural plasticity are well established and it likely contributes to the enhanced memory recall. It is important to emphasize that this process of updating is not necessarily separate from enrichment-induced plasticity at an implementational level, but part of the learning experience within an environment containing multiple objects. The enrichment or, more generally, experience, may therefore enhance memory through the modification of activity of specific engram ensembles. The idea of enrichment facilitating memory updating is consistent with the results obtained by the reminder experiments and further supported by our analysis with the Rescorla-Wagner computational model, where experience updates the accessibility of existing memories, possibly through reactivation of the original engram ensemble.

      We would like to further clarify that our explanation concerns the algorithmic level, in contrast to the neural level. Based on the computational analyses using the Rescorla-Wagner model and in line with the reviewer’s previous comment on the model, we believe that forgetting is governed by the strength of object-context associations (or engram relevancy). Our interpretation is that stronger associations signal that the memory or engram representation is important ("relevant") and should not be forgotten. Accordingly, due to a vast majority of experiences with extra cage objects in the enriched environment, mice might generally learn that such objects are common in their environment and potentially relevant in the future (i.e., the object-context association is strong, preventing forgetting). Our speculation of these results is to help unify our empirical data with the computational model.

      We believe that the Reviewer's alternative explanation in terms of synaptic plasticity, spine growth is not mutually exclusive with the modelling work. It is possible that the computational mechanisms that we explore based on the Rescorla-Wagner model are neuronally related to the biological mechanisms that the reviewer suggests at the implementational level. Therefore, ultimately, the two perspectives might even complement each other. We have included additional discussion to clarify this point. See lines 510-546.

      (3) It is well-known that updating can both weaken or strengthen memory. The authors suggest that memory is updated when animals are exposed to the context in the absence of the objects. If the engram is artificially inhibited (opto) during context-only re-exposures, memory cannot be updated. To further support this updating idea, it would be good to run experiments that investigate whether multiple short re-exposures to the training context (in the presence of the objects or during optogenetic activation of the engram) could prevent forgetting. It would also be good to know the levels of neuronal reactivation during multiple re-exposures to the context in the absence versus context in the presence of the objects.

      We thank the Reviewer for their comments. We agree that additional experiments would be helpful to further support the idea of updating. We have performed additional experiments to test the idea that multiple short re-exposures to the training context, in the presence of objects prevents forgetting. In this paradigm, mice were repeatedly exposed to the original object pair (Supplementary Figure S5a). The results indicate that repeated reminder trials facilitate object memory recall (Supplementary Figure 5b&c). These data indicated that subsequent object reminders over time facilitates the transition of a forgotten memory to an accessible memory. See Supplementary Figure S5 and Lines 279-287.

      (4) There are a number of studies that show boundary conditions for memory destabilization/reconsolidation. Is there any evidence that similar boundary conditions exist to make an inaccessible engram accessible?

      The Reviewer asks an interesting question about boundary conditions and engram accessibility. Boundary conditions could indeed affect the degree of destabilization and reconsolidation, the salience or strength of the memory, as well as the timing of retrieval cues. Future models could focus on understanding the specific boundary conditions in which a memory becomes retrievable and the degree to which it is sufficiently destabilized and liable for updating and forgetting. We have included additional discussion on the potential role of boundary conditions for engram accessibility. See lines 661-666.

      (5) More details about how the quantification of immunohistochemistry (c-fos, BrdU, DAPI) was performed should be provided (which software and parameters were used to consider a fos positive neurons, for example).

      We have added additional information for the parameters of quantification of immunohistochemistry. See lines 796-809.

      (6) Duration of the enrichment environment was not detailed.

      We have highlighted the details for the environmental enrichment duration. See lines 756.

      Reviewer #3 (Public Review):

      Summary:

      The manuscript by Ryan and colleagues uses a well-established object recognition task to examine memory retrieval and forgetting. They show that memory retrieval requires activation of the acquisition engram in the dentate gyrus and failure to do so leads to forgetting. Using a variety of clever behavioural methods, the authors show that memories can be maintained and retrieval slowed when animals are reared in environmental enrichment and that normally retrieved memories can be forgotten if exposed to the environment in which the expected objects are no longer presented. Using a series of neural methods, the authors also show that activation or inhibition of the acquisition engram is key to memory expression and that forgetting is due to Rac1.

      We thank the Reviewer for summarizing the scope and depth of our manuscript, and indeed for recognizing our efforts. We engage below the Reviewer’s specific criticisms of our interpretations.

      Strengths:

      This is an exemplary examination of different conditions that affect successful retrieval vs forgetting of object memory. Furthermore, the computational modelling that captures in a formal way how certain parameters may influence memory provides an important and testable approach to understanding forgetting.

      The use of the Rescorla-Wagner model in the context of object recognition and the idea of relevance being captured in negative prediction error are novel (but see below).

      The use of gain and loss of function approaches are a considerable strength and the dissociable effects on behaviour eliminate the possibility of extraneous variables such as light artifacts as potential explanations for the effects.

      We thank the Reviewer for their positive comments.

      Weaknesses:

      Knowing what process (object retrieval vs familiarity) governed the behavioural effect in the present investigation would have been of even greater significance.

      The Reviewer touches on an important issue of the object recognition task. Understanding how experience alters object familiarity versus object retrieval and its impact on learning would help to develop better models of object memory and forgetting. We have added additional discussion. See lines 666-669.

      The impact of the paper is somewhat limited by the use of only one sex.

      We agree that using only male mice limits the impact of the paper. Indeed, the field of behavioural neuroscience is moving to include sex as a variable. Future experiments should include both male and female mice.

      While relevance is an interesting concept that has been operationalized in the paper, it is unclear how distinct it is from extinction. Specifically, in the case where the animals are exposed to the context in the absence of the object, the paper currently expresses this as a process of relevance - the objects are no longer relevant in that context. Another way to think about this is in terms of extinction - the association between the context and the objects is reduced results in a disrupted ability of the context to activate the object engram.

      We thank the reviewer for their insightful comment on the connection between engram relevance and memory extinction. Lacagnina et al., demonstrated that extinction training suppressed the reactivation of a fear engram, while activating a second putative extinction ensemble (59). In another study, these extinction engram cells and reward cells were shown to be functionally interchangeable (92). Moreover, in a study conducted by Lay et al., the balance between extinction and acquisition was disrupted by inhibiting the extinction recruited neurons in the BLA and CN (93). These results suggested that decision making after extinction can be governed by a balance between acquisition and extinction specific ensembles (93). Together, this may suggest that in the present study, when mice are repeatedly exposed to the training context, the association between the context and the objects is reduced, resulting in a disrupted ability of the context to activate the object engram. Therefore, memory relevance and extinction may operate similarly to effect engram accessibility, and in essence ‘forgetting’ of object memories may be due to neurobiological mechanisms similar to that of extinction learning (4). We have included additional discussion on the link between our results and the extinction literature. See lines 642-654.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Additional measures that may help interpretation of and clarify data are:

      A minute-by-minute analysis for training and testing may provide insight about the learning rate and testing temporal dynamics that may shed light substantially on differential levels of exploration. This should be applied across figures and would support conclusions from models in Figures 7-8 as well.

      Locomotion/distance travelled measures.

      We have included additional analysis for a minute-by-minute analysis of training and testing of the object memory test at 24 hr, 2 weeks as well as under the standard housing and enrichment conditions. The results further support the initial finding that novel object recognition is increased in mice that recall the object at 24 hr. Similarly, mice housed in the enriched housing initially explore the novel object more compared to the familiar object. See Supplementary Figure 1 and 2, as well as lines 103-105 and 211-213.

      The appropriate control for the context exposure figure would be to expose to a novel context in one group and the acquisition/testing context for the other.

      We agree with the reviewer that an additional control of a novel context would further support our findings. Indeed, this line of investigate may dove-tail with the other reviewer comments on the role of competing engrams and interference. Future work could investigate the degree to which novel contexts and multiple memories can affect the rate of forgetting through engram updating. We have included additional discussion. See lines 643 and 655. However, in our experience it is necessary to pre-expose mice to different contexts before object exposure (e.g. Autore et al ’23), in order to form discriminate object/context associations. Establishing such a paradigm for this study would be at odds with the established paradigms and schedules in this current study. Moreover, the possibility that the effect of object displacement on forgetting requires the familiar context, or not, does not impact the main conclusions of this study. However, we agree that it is a point for expansion in the future.

      A control virus+light group vs simply a no-light condition.

      For optogenetic experiments. Control mice underwent the same surgery procedure with virus and optic fibre implantation. However, no light was delivered to excite or inhibit the respective opsin. Previous papers have shown laser light delivered to tissue expressing an AAV-TRE-EYFP lacking an light-opsin does cause cellular excitation. We have clarified this in the text. See lines 726-729.

      Reviewer #2 (Recommendations For The Authors):

      Minor details:

      (1) In the pharmacological modification of Rac 1, please specify what percentage of DMSO was used to dissolve Rac1 inhibitor and correct the typo 'DSMO'

      Rac1 inhibitor (Ehop016) was reconstituted and prepared in PBS with 1% Tween-80, 1% DMSO and 30% PEG. We have clarified this in the text and corrected the typo, thank you. See lines 767.

      (2) In the penultimate paragraph there is a typo 'predication error'

      This is now corrected. Thankyou.

      Reviewer #3 (Recommendations For The Authors):

      I was unable to find information on what the No Light group consisted of. Was there a control virus infused, were the animals implanted with optical fibres (in the presence or absence of a virus), were they surgical controls, etc?

      For optogenetic experiments. No Light Control mice underwent the same surgery procedure with virus and optic fibre implantation. However, no light was delivered to excite or inhibit the respective opsin. We have clarified this in the text. See lines 726-729.

      The discussion lacked specificity in places. For example, the idea of eluding to 'other variables' is somewhat vague (p. 21, middle paragraph). Some examples of what other variables could be relevant would be helpful in capturing what direction or relevance the model may have going forward.

      We have expanded the discussion of other variables which might impact engram relevance and how the model might be developed moving forward. These may include, boundary conditions of destabilization and reconsolidation, the salience or strength of the memory as well as the timing of retrieval cues or updating experience. Future models could focus on understanding the specific boundary conditions in which a memory becomes retrievable and the degree to which it is sufficiently destabilized and liable for updating and forgetting. The role of perceptual learning on memory retrieval and forgetting may also be an avenue of future investigation. Understanding how experience alters object familiarity versus object retrieval and its impact on learning would also help to develop better models of object memory and forgetting. In the current study, only male mice were utilized. Therefore, future work could also include sex as a variable to fully elucidate the impact of experience on the processes of forgetting. See lines 642-669.

      In the same paragraph (p. 21, middle paragraph) there is mention of multiple engrams and how they can compete. The authors reference Autore et al (2023), but I thought Lacagina did this really beautifully also in an experimental setting. This idea is also expressed in Lay et al. (2022). So additional references would further strengthen the authors argument here.

      We thank the reviewer for the additional references for discussing engram competition. We have included these papers in the discission. See lines 642-654.

      Relatedly, environmental enrichment was considered in terms of object relevance. I wonder if the authors may want to consider thinking about their results in terms of effects on perceptual learning.

      Indeed, perceptual learning maybe playing a role in environmental enrichment. We have included additional discussion. See lines 666-669.

    1. "We have: One, a robot may not injure a human being, or, through inaction, allow a human being to come to harm." "Right!" "Two," continued Powell, "a robot must obey the orders given it by human beings except where such orders would conflict with the First Law." "Right!" "And three, a robot must protect its own existence as long as such protection does Dot conflict with the First or Second Laws."

      In the story, the robots are essentially slaves to the humans. They are built in made purely to make human's lives easier. I'm beginning to think this could possibly be a cautionary tale about the way we treat "technology" and "robots" and the possible consequences of expecting them to behave like a human but not treating them like one.

    1. I referred (indirectly) to this in an annotation on https://www.theatlantic.com/magazine/archive/1945/07/as-we-may-think/303881/ as "the PDF". As the first page indicates this is rather a PDF—specifically someone's PDF of the ACM's reprint from 1996 (which can be found hanging off this DOI: https://dl.acm.org/doi/10.1145/227181.227186).

      The Atlantic's PDF can be found here https://cdn.theatlantic.com/media/archives/1945/07/176-1/132407932.pdf (at least for now).

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer #1 (Public Review):

      Summary:

      The study of human intelligence has been the focus of cognitive neuroscience research, and finding some objective behavioral or neural indicators of intelligence has been an ongoing problem for scientists for many years. Melnick et al, 2013 found for the first time that the phenomenon of spatial suppression in motion perception predicts an individual's IQ score. This is because IQ is likely associated with the ability to suppress irrelevant information. In this study, a high-resolution MRS approach was used to test this theory. In this paper, the phenomenon of spatial suppression in motion perception was found to be correlated with the visuo-spatial subtest of gF, while both variables were also correlated with the GABA concentration of MT+ in the human brain. In addition, there was no significant relationship with the excitatory transmitter Glu. At the same time, SI was also associated with MT+ and several frontal cortex FCs.

      Strengths:

      (1) 7T high-resolution MRS is used.

      (2) This study combines the behavioral tests, MRS, and fMRI.

      Weaknesses:

      Major:

      In Melnick (2013) IQ scores were measured by the full set of WAIS-III, including all subtests. However, this study only used visual spatial domain of gF. I wonder why only the visuo-spatial subtest was used not the full WAIS-III? I am wondering whether other subtests were conducted and, if so, please include the results as well to have comprehensive comparisons with Melnick (2013).

      We thank the reviewer for pointing this out. The decision was informed by Melnick’s findings which indicated high correlations between Surround suppression (SI) and the Verbal Comprehension, Perceptual Reasoning, Working Memory, and Processing Speed Indexes, with correlation coefficients of 0.69, 0.47, 0.49, and 0.50, respectively. It is well-established that the hMT+ region of the brain is a sensory cortex involved in visual perception processing (3D perception). Furthermore, motion surround suppression (SI), a specific function of hMT+, aligns closely with this region's activities. Given this context, the Perception Reasoning sub-ability was deemed to have the clearest mechanism for further exploration. Consequently, we selected the most representative subtest of Perception Reasoning—the Block Design Test—which primarily assesses 3D visual intelligence.” For further clarification, due to these reasons, we conducted only the visuo-spatial subtest.

      Minor:

      Comments:

      In the first revised version, we addressed the following recommendations in the 'Author response' file titled 'Recommendation for the authors.' It seems our response may not have reached you successfully. We would like to share and expand upon our response here:

      (1) Table 1 and Table supplementary 1-3 contain many correlation results. But what are the main points of these values? Which values do the authors want to highlight? Why are only p-values shown with significance symbols in Table supplementary 2??

      (1.1) What are the main points of these values?

      Thank reviewer for pointing this out. These correlations represent the relationship between behavior task (SI/BDT) and resting-state functional connectivity. It indicates that left hMT+ is involved in the efficient information integration network when it comes to BDT task. In addition, left hMT+’s surround suppression is involved in several hMT+ - frontal connectivity. Furthermore, the overlap regions between two task indicates the underlying mechanism.

      (1.2) Which values do the authors want to highlight?

      Table 1 and Table Supplementary 1-3 present the preliminary analysis results for Table 2 and Table Supplementary 4-6. So, we generally report all value. Conversely, in the Table 2 and Table Supplementary 4-6, we highlight the value which support our main conclusion.

      (1.3) Why are only p-values shown with significance symbols in Table Supplementary 2?

      Thank you for pointing this out, it is a mistake. We have revised it and delete the significance symbols.

      (2) Line 27, it is unclear to me what is "the canonical theory".

      We thank reviewer for pointing this out. We have revised “the canonical theory" to “the prevailing opinion” (line 27)

      (3) Throughout the paper, the authors use "MT+", I would suggest using "hMT+" to indicate the human MT complex, and to be consistent with the human fMRI literature.

      We thank reviewer for pointing this out. We have revised them.

      (4) At the beginning of the results section, I suggest including the total number of subjects. It is confusing what "31/36 in MT+, and 28/36 in V1" means.

      We thank reviewer for pointing this out. We have included the total number of subjects in the beginning of result section. (line 110, line 128)

      (5) Line 138, "This finding supports the hypothesis that motion perception is associated with neural activity in MT+ area". This sentence is strange because it is a well-established finding in numerous human fMRI papers. I think the authors should be more specific about what this finding implies.

      We thank reviewer for pointing this out. We have revised it to:” This finding is in line with prior results, which indicates that motion perception is associated with neural activity in hMT+ area, but not in EVC (primarily in V1)” (lines 156-158)

      (6) There are no unit labels for all x- and y-axies in Figure 1. I only see the unit for Conc is mmol per kg wet weight.

      We thank reviewer for pointing this out. Figure 1 is a schematic and workflow chart, so labels for x- and y-axes are not needed. I believe this confusion might pertain to Figure 3. In Figures 3a and 3b, the MRS spectrum does not have a standard y-axis unit as it varies based on the individual physical conditions of the scanner; it is widely accepted that no y-axis unit is used. While the x-axis unit is ppm, which indicate the chemical shift of different metabolites. In Figure 3c, the BDT represents IQ scores, which do not have a standard unit. Similarly, in Figures 3d and 3e, the Suppression Index does not have a standard unit.

      (7) Although the correlations are not significant in Figure Supplement 2&3, please also include the correlation line, 95% confidence interval, and report the r values and p values (i.e., similar format as in Figure 1C).

      We thank reviewer for pointing this out. We have revised them and include the correlation line, 95% confidence interval, r values and p values.

      (8) There is no need to separate different correlation figures into Figure Supplementary 1-4. They can be combined into the same figure.

      We thank reviewer for the suggestion. However, each correlation figure in the supplementary figures has its own specific topic and conclusion. Please notes that in the revised version, we have added a figure showing the EVC (primarily in V1) MRS scanning ROI as Supplementary Figure 1. Therefore, the figures the reviewer is concerned about are Supplementary Figure 2-5. The correlation figures in Supplementary Figure 2 indicate that GABA in EVC (primarily in V1) does not show any correlation with BDT and SI, illustrating that inhibition in EVC (primarily in V1) is unrelated to both 3D visuo-spatial intelligence and motion suppression processing. The correlations in Supplementary Figure 3 indicate that the excitation mechanism, represented by Glutamate concentration, does not contribute to 3D visuo-spatial intelligence in either hMT+ or EVC (primarily in V1). Supplementary Figure 4 validates our MRS measurements. Supplementary Figure 5 addresses potential concerns regarding the impact of outliers on correlation significance. Even after excluding two “outliers” from Figures 3d and 3e, the correlation results remain stable.

      (9) Line 213, as far as I know, the study (Melnick et al., 2013) is a psychophysical study and did not provide evidence that the spatial suppression effect is associated with MT+.

      We thank reviewer for pointing this out. It was a mistake to use this reference, and we have revised it accordingly. (line 242)

      (10) At the beginning of the results, I suggest providing more details about the motion discrimination tasks and the measurement of the BDT.

      We thank reviewer for pointing this out. We have included some brief description of task in the beginning of result section. (lines 116-120)

      (11) Please include the absolute duration thresholds of the small and large sizes of all subjects in Figure 1.

      We thank reviewer for the suggestion. We have included these results in Figure 3.

      (12) Figure 5 is too small. The items in plot a and b can be barely visible.

      We thank reviewer for pointing this out. We increase the size and resolution of the Figure.

      Reviewer #3 (Public Review):

      (1) Throughout the manuscript, hMT+ connectivity with the frontal cortex has been treated as an a priori hypothesis/space. However, there is no such motivation or background literature mentioned in the Introduction. Can the authors clarify the necessity of functional connectivity? In other words, can BOLD activity of hMT+ in the localizer task substitute for functional connectivity between hMT+ and the frontal cortex?

      (1.1) Throughout the manuscript, hMT+ connectivity with the frontal cortex has been treated as an a priori hypothesis/space. However, there is no such motivation or background literature mentioned in the Introduction. Can the authors clarify the necessity of functional connectivity?

      We thank reviewer for pointing this out. We offered additional motivation and background literature in the introduction: “Frontal cortex is usually recognized as the cognitive core region (Duncan et al., 2000; Gray et al., 2003). Strong connectivity between the cognitive regions suggests a mechanism for large-scale information exchange and integration in the brain (Barbey, 2018; Cole et al., 2012).  Therefore, the potential conjunctive coding may overlap with the inhibition and/or excitation mechanism of hMT+. Taken together, we hypothesized that 3D visuo-spatial intelligence (as measured by BDT) might be predicted by the inhibitory and/or excitation mechanisms in hMT+ and the integrative functions connecting hMT+ with frontal cortex (Figure 1a).” (lines 67-74). Additionally, we have included a whole-brain analysis for validation. Functional connectivity reveals the information exchange relationships across regions, enhancing our understanding of how hMT+ and the frontal cortex collaborate when solving visual-spatial intelligence tasks.

      (1.2) In other words, can BOLD activity of hMT+ in the localizer task substitute for functional connectivity between hMT+ and the frontal cortex?

      We thank the reviewer for this question. The localizer task was used solely for defining the hMT+ MRS scanning area. Functional connectivity was measured using resting-state fMRI. Research has shown that resting-state functional connectivity between the frontal cortex and other ROIs can further reveal the neural mechanisms underlying intelligence tasks (Song et al., 2008).

      (2) There is an obvious mismatch between the in-text description and the content of the figure:<br /> "In contrast, there was no correlation between BDT and GABA levels in V1 voxels (figure supplement 1a). Further, we show that SI significantly correlates with GABA levels in hMT+ voxels (r = 0.44, P = 0.01, n = 31, Figure 3d). In contrast, no significant correlation between SI and GABA concentrations in V1 voxels was observed (figure supplement 1b)."

      We thank reviewer for pointing this out. We have revised it. The revised version is :” In contrast, there was no correlation between BDT and GABA levels in V1 voxels (figure supplement 2a). Further, we show that SI significantly correlates with GABA levels in hMT+ voxels (r = 0.44, P = 0.01, n = 31, Figure 3d). In contrast, no significant correlation between SI and GABA concentrations in V1 voxels was observed (figure supplement 2b).” (lines 151-156)

      (3) The authors' response to my previous round of review indicated that the "V1 ROIs" covered a substantial amount of V3 (32%). Therefore, it would no longer be appropriate to call these "V1 ROIs". I'd suggest renaming them as "Early Visual Cortex (EVC) ROIs" to be more accurate. Can the authors justify why choosing the left hemisphere for visual intelligence task, which is typically believed to be right lateralized?

      (3.1) The authors' response to my previous round of review indicated that the "V1 ROIs" covered a substantial amount of V3 (32%). Therefore, it would no longer be appropriate to call these "V1 ROIs". I'd suggest renaming them as "Early Visual Cortex (EVC) ROIs" to be more accurate.

      We thank the reviewer for pointing this out. We have revised our description of the MRS scanning ROIs to Early Visual Cortex (EVC). Since the majority of our EVC ROIs are in V1 (around 70%) and almost no V2 was included, we decided to mark the EVC ROIs with the explanation "primarily in V1" for better clarification. This terminology has been widely used to better emphasize the V1-based experimental design.

      (3.2) Can the authors justify why choosing the left hemisphere for visual intelligence task, which is typically believed to be right lateralized?

      We thank the reviewer for pointing this out. The use of the left MT/V5 as a target was motivated by studies demonstrating that left MT+/V5 TMS is more effective at causing perceptual effects (Tadin et al., 2011). Therefore, we chose to use the left hMT+ as our MRS ROI and maintain consistency across different models' ROIs. Additionally, our results support the notion that the visual intelligence task is right lateralized in the frontal cortex. At the resting-fMRI level, we found that significant ROIs, where functional connectivity is highly correlated with BDT scores, are in the right frontal cortex (Figure 5a, b).

      (4) "Small threshold" and "large threshold" are neither standard descriptions, and it is unclear what "small threshold" refers to in the following figure caption. Additionally, the unit (ms) is confusing. Does it refer to timing?<br /> "(f) Peason's correlation showing significant negative correlations between BDT and small threshold."

      Thank you for pointing this out; we agree with your suggestion. We have revised the terms “small threshold” and “large threshold” to “duration threshold of small grating” and “duration threshold of large grating”, respectively. The unit (ms) refers to timing. The details are described in the methods section: “The duration was adaptively adjusted in each trial, and duration thresholds were estimated using a staircase procedure. Thresholds for large and small gratings were obtained from a 160-trial block that contained four interleaved 3-down/1-up staircases. For each participant, we computed the correct rate for different stimulus durations separately for each stimulus size. These values were then fitted to a cumulative Gaussian function, and the duration threshold corresponding to the 75% correct point on the psychometric function was estimated for each stimulus size”.

      (5) In the response letter, the authors mentioned incorporating the neural efficiency hypothesis in the Introduction, but the revised Introduction does not contain such information.

      We thank the reviewer for pointing this out. In our revised version, the second paragraph of the introduction addresses the neural efficiency hypothesis: “The “neuro-efficiency” hypothesis is one explanation for individual differences in gF (Haier et al., 1988). This hypothesis puts forward that the human brain’s ability to suppress irrelevant information leads to more efficient cognitive processing. Correspondingly, using a well-known visual motion paradigm (center-surround antagonism) (Liu et al., 2016; Tadin et al., 2003), Melnick et al found a strong link between suppression index (SI) of motion perception and the scores of the block design test (BDT, a subtest of the Wechsler Adult Intelligence Scale (WAIS), which measures the visuo-spatial component (3D domain) of gF (Melnick et al., 2013). Motion surround suppression (SI), a specific function of human extrastriate cortical region, middle temporal complex (hMT+), aligns closely with this region's activities (Gautama & Van Hulle, 2001). Furthermore, hMT+ is a sensory cortex involved in visual perception processing (3D domain) (Cumming & DeAngelis, 2001). These findings suggest that hMT+ potentially plays a significant role in 3D visuo-spatial intelligence by facilitating the efficient processing of 3D visual information and suppressing irrelevant information. However, more evidence is needed to uncover how the hMT+ functions as a core region for 3D visuo-spatial intelligence.” (lines 51-66)

      Recommendations for the authors:

      Reviewer #1 (Recommendations for The Authors):

      In the Code availability, it states that "this paper does not report original code". It seems weird because at least the code to reproduce the figures from the data should be provided.

      Thank you for pointing this out. Almost all figures were created using software such as DPABI, BrainNet, and GraphPad Prism 9.5, which are manually operated and do not require code adjustments. However, for the MRS fitting curve, we can provide our MATLAB code for redrawing the MRS fitting. The code has been uploaded to GitHub.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this work, Qiu and colleagues examined the effects of preovulatory (i.e., proestrous or late follicular phase) levels of circulating estradiol on multiple calcium and potassium channel conductances in arcuate nucleus kisspeptin neurons. Although these cells are strongly linked to a role as the "GnRH pulse generator," the goal here was to examine the physiological properties of these cells in a hormonal milieu mimicking late proestrus, the time of the preovulatory GnRH-LH surge. Computational modeling is used to manipulate multiple conductances simultaneously and support a role for certain calcium channels in facilitating a switch in firing mode from tonic to bursting. CRISPR knockdown of the TRPC5 channel reduced overall excitability, but this was only examined in cells from ovariectomized mice without estradiol treatment. The patch clamp experiments are comprehensive and overall solid but a direct demonstration of the role of these conductances in being necessary for surge generation (or at least having a direct physiological consequence on surge properties) is lacking, substantially reducing the impact of the findings.

      Strengths:

      (1) Examination of multiple types of calcium and potassium currents, both through electrophysiology and molecular biology.

      (2) Focus on arcuate kisspeptin neurons during the surge is relatively conceptually novel as the anteroventral periventricular nucleus (AVPV) kisspeptin neurons have received much more attention as the "surge generator" population.

      (3) The modeling studies allow for direct examination of manipulation of single and multiple conductances, whereas the electrophysiology studies necessarily require examination of each current in isolation. The construction of an arcuate kisspeptin neuron model promises to be of value to the reproductive neuroendocrinology field.

      We thank the reviewer for recognizing our comprehensive examination of Kiss-ARH neurons through electrophysiological, molecular and computational modeling of their activity during the preovulatory surge, which as the reviewer pointed out is “conceptually novel.”  We  have bolstered our argument that Kiss1-ARH neurons transition from synchronized firing to burst firing with the E2-mediated regulation of channel expression with the addition of new experiments. We have addressed the recommendations as follows:

      Weaknesses:

      (1) The novelty of some of the experiments needs to be clarified. This reviewer's understanding is that prior experiments largely used a different OVX+E2 treatment paradigm mimicking periods of low estradiol levels, whereas the present work used a "high E2" treatment model. However, Figures 10C and D are repeated from a previous publication by the same group, according to the figure legend. Findings from "high" vs. "low" E2 treatment regimens should be labeled and clearly separated in the text. It would also help to have direct comparisons between results from low E2 and high E2 treatment conditions.

      We have revised Figures 10C and 10D to include new findings (only) on Tac2 and Vglut2 expression in OVX and E2-treated Kiss1ARH.  Most importantly, our E2 treatment regime is clearly stated in the Methods and is exactly the same that was used previously (Qiu, eLife 2016 and Qiu, eLife 2018) for the induction of the LH surge in OVX mice (Bosch, Molecular and Cellular Endocrinology 2013) .

      (2) In multiple places, links are made between the changes in conductances and the transition from peptidergic to glutamatergic neurotransmission. However, this relationship is never directly assessed. The data that come closest are the qPCR results showing reduced Tac2 and increased Vglut2 mRNA, but in the figure legend, it appears that these results are from a prior publication using a different E2 treatment regimen.

      In the revised Figure 1, we have now included a clear depiction of the transition from synchronized firing driven by NKB signaling in OVX females to burst firing driven by glutamate in E2-treated females. All of the qPCR results in the revised manuscript are new.  We have used the same E2 treatment paradigm as previously published (Qiu, eLife 2018).

      (3) Similarly, no recordings of arcuate-AVPV glutamatergic transmission are made so the statements that Kiss1ARH neurons facilitate the GnRH surge via this connection are still only conjecture and not supported by the present experiments.

      Using a horizontal hypothalamic slice preparation, we have shown that Kiss1-ARH neurons excite GnRH neurons via Kiss1ARH glutaminergic input to Kiss1AvPV/Pen neurons (summarized in Fig. 12, Qiu, eLife 2016). We did not think that it was necessary to repeat these experiments for the current manuscript.

      (4) Figure 1 is not described in the Results section and is only tenuously connected to the statement in the introduction in which it is cited. The relevance of panels C and D is not clear. In this regard, much is made of the burst firing pattern that arises after E2 treatment in the model, but this burst firing pattern is not demonstrated directly in the slice electrophysiology examples.

      We have extensively revised Figure 1 to include new whole-cell, current clamp recordings that document burst firing  in  E2-treated, OVX females, which is now cited in the Results.

      (5) In Figure 3, it would be preferable to see the raw values for R1 and R2 in each cell, to confirm that all cells were starting from a similar baseline. In addition, it is unclear why the data for TTA-P2 is not shown, or how many cells were recorded to provide this finding.

      Before initiating photo-stimulation for each Kiss1-ARH neuron, we adjust the resting membrane potential to -70 mV, as noted  in each panel in Figure 3, through current injections. We have now included new findings on the effects of the T-channel blocker TTA-P2 on slow EPSP in the revised Figure 3. The number of cells tested with each calcium channel blocker is depicted in each of the bar graphs summarizing the effects of the blockers (Figure 3E).

      (6) In Figure 5, panel C lists 11 cells in the E2 condition but panel E lists data from 37 cells. The reason for this discrepancy is not clear.

      In Figure 5D, we measured the L-, N-, P/Q and R channel currents after pretreatment with TTA-P2 to block the T-type current, whereas in Figure 5C, we measured the total current without TTA-P2.

      (7) In all histogram figures, it would be preferable to have the data for individual cells superimposed on the mean and SEM.

      In the revised Figures we have included the individual data points for the individual neurons and animals (qPCR). 

      (8) The CRISPR experiments were only performed in OVX mice, substantially limiting interpretation with respect to potential roles for TRPC5 in shaping arcuate kisspeptin neuron function during the preovulatory surge.

      The TRPC5 channels are most  important for generating slow EPSPs when expression of NKB is high in the OVX state. Conversely, the glutamatergic response becomes more significant when the expression of NKB and TRPC5 channel are muted in the E2-treated state. Therefore, the CRISPR experiments were specifically conducted in OVX mice to maximize the effects.

      (9) Furthermore, there are no demonstrations that the CRISPR manipulations impair or alter the LH surge.

      In this manuscript, our focus is on the cellular electrophysiological activity of the Kiss1ARH neurons in OVX and E2-treated OVX females. Exploration of CRISPR manipulations related to the LH surge is certainly slated for future  experiments, but these in vivo experiments are  beyond the scope of these comprehensive cellular electrophysiological and molecular studies.

      (10) The time of day of slice preparation and recording needs to be specified in the Methods.

      We have provided the times of slice preparation and recordings in the revised Methods and Materials.

      Reviewer #2 (Public Review):

      Summary:

      Kisspeptin neurons of the arcuate nucleus (ARC) are thought to be responsible for the pulsatile GnRH secretory pattern and to mediate feedback regulation of GnRH secretion by estradiol (E2). Evidence in the literature, including the work of the authors, indicates that ARC kisspeptin coordinate their activity through reciprocal synaptic interactions and the release of glutamate and of neuropeptide neurokinin B (NKB), which they co-express. The authors show here that E2 regulates the expression of genes encoding different voltage-dependent calcium channels, calcium-dependent potassium channels, and canonical transient receptor potential (TRPC5) channels and of the corresponding ionic currents in ARC kisspeptin neurons. Using computer simulations of the electrical activity of ARC kisspeptin neurons, the authors also provide evidence of what these changes translate into in terms of these cells' firing patterns. The experiments reveal that E2 upregulates various voltage-gated calcium currents as well as 2 subtypes of calcium-dependent potassium currents while decreasing TRPC5 expression (an ion channel downstream of NKB receptor activation), the slow excitatory synaptic potentials (slow EPSP) elicited in ARC kisspeptin neurons by NKB release and expression of the G protein-associated inward-rectifying potassium channel (GIRK). Based on these results, and on those of computer simulations, the authors propose that E2 promotes a functional transition of ARC kisspeptin neurons from neuropeptide-mediated sustained firing that supports coordinated activity for pulsatile GnRH secretion to a less intense firing in glutamatergic burst-like firing pattern that could favor glutamate release from ARC kisspeptin. The authors suggest that the latter might be important for the generation of the preovulatory surge in females.

      Strengths:

      The authors combined multiple approaches in vitro and in silico to gain insights into the impact of E2 on the electrical activity of ARC kisspeptin neurons. These include patch-clamp electrophysiology combined with selective optogenetic stimulation of ARC kisspeptin neurons, reverse transcriptase quantitative PCR, pharmacology, and CRIPR-Cas9-mediated knockdown of the Trpc5 gene. The addition of computer simulations for understanding the impact of E2 on the electrical activity of ARC kisspeptin cells is also a strength.

      The authors add interesting information on the complement of ionic currents in ARC kisspeptin neurons and on their regulation by E2 to what was already known in the literature. Pharmacological and electrophysiological experiments appear of the highest standards. Robust statistical analyses are provided throughout, although some experiments (illustrated in Figures 7 and 8) do have rather low sample numbers.

      The impact of E2 on calcium and potassium currents is compelling. Likewise, the results of Trpc5 gene knockdown do provide good evidence that the TRPC5 channel plays a key role in mediating the NKB-mediated slow EPSP. Surprisingly, this also revealed an unsuspected role for this channel in regulating the membrane potential and excitability of ARC kisspeptin neurons.

      We thank the reviewer for recognizing that the “pharmacological and electrophysiological experiments appear of the highest standards” and “the addition of the computer modeling for understanding the impact of E2 on the electrical activity of ARC kisspeptin cells is also a strength.  However, we agree with the reviewer that we needed to provide a direct demonstration of “burst-like” firing of Kiss1-ARH neurons, which we have provided in Figure 1. We have addressed the other recommendations as follows:

      Weaknesses:

      The manuscript also has weaknesses that obscure some of the conclusions drawn by the authors.

      One has to do with the fact that "burst-like" firing that the authors postulate ARC kisspeptin neurons transition to after E2 replacement is only seen in computer simulations, and not in slice patch-clamp recordings. A more direct demonstration of the existence of this firing pattern, and of its prominence over neuropeptide-dependent sustained firing under conditions of high E2 would make a more convincing case for the authors' hypothesis.

      We have provided  a more direct demonstration of the existence of this firing pattern in the whole-cell current clamp experiments in the revised Figure 1.

      In addition, and quite importantly, the authors compare here two conditions, OVX versus OVX replaced with high E2, that may not reflect the physiological conditions (the diestrous [low E2] and proestrous [high E2] stages of the estrous cycle) under which the proposed transition between neuropeptide-dependent sustained firing and less intense burst firing might take place. This is an important caveat to keep in mind when interpreting the authors' findings. Indeed, that E2 alters certain ionic currents when added back to OVX females, does not mean that the magnitude of these ionic currents will vary during the estrous cycle.

      We have published that the magnitude of the slow EPSP, which is TRPC5 channel mediated, varies throughout the estrous cycle with the slow EPSP reaching a maximal amplitude during diestrus, which was significantly reduced during proestrus,  similar to that found in OVX compared to E2-treated, OVX females (Figure 2, Qiu, eLife 2016).  Moreover, TRPC5 channel mRNA expression,  similar to the peptides, is downregulated by an E2 treatment (Figure 10 this manuscript) that mimics proestrus levels of the steroid (Bosch et al., Mol Cell Endocrinology 2013). Furthermore, the magnitude of ionic currents is directly proportional to the number of ion channels expressed in the plasma membrane, which we have found correlates with mRNA expression. Therefore, it is likely that the magnitude of these ionic currents will vary during the estrous cycle.

      Lastly, the results of some of the pharmacological and genetic experiments may be difficult to interpret as presented. For example, in Figure 3, although it is possible that blockade of individual calcium channel subtypes suppresses the slow EPSP through decreased calcium entry at the somato-dendritic compartment to sustain TRPC5 activation and the slow depolarization (as the authors imply), a reasonable alternative interpretation would be that at least some of the effects on the amplitude of the slow EPSP result from suppression of presynaptic calcium influx and, thus, decreased neurotransmitter and neuropeptide secretion. Along the same lines, in Figure 12, one possible interpretation of the observed smaller slow EPSPs seen in mice with mutant TRPC5 could be that at least some of the effect is due to decreased neurotransmitter and neuropeptide release due to the decreased excitability associated with TRPC5 knockdown.

      The reviewer raises a good point, but our previous findings clearly demonstrated that chelating intracellular calcium with BAPTA in whole-cell current clamp recordings abolishes the slow EPSP and persistent firing (Qiu et al., J. Neurosci 2021), which we have noted is the  rationale for dissecting out the contribution of T, R, N, L and P/Q calcium channels to the slow EPSP in our current studies.  The revised Figure 3 also includes the effects of T-channel blocker.

      However, to further bolster the argument for the post-synaptic contribution of the calcium channels to the slow EPSP  and eliminate the potential presynaptic effects of the calcium channel blockers on the postsynaptic slow EPSP amplitude, which may result from reduced presynaptic calcium influx and subsequently decreased neurotransmitter release, we have utilized an additional strategy. Specifically, we have measured the response to the externally administered TACR3 agonist senktide under conditions in which the extracellular calcium influx, as well as neurotransmitter and neuropeptide release, are blocked (revised Figure 3).

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      (1) The use of optogenetics in Figure 3 to trigger the slow EPSP could be better clarified in the text.

      We have clarified in the Methods the optogenetic protocol for generating the slow EPSP, which we have published previously (Qiu et al., eLife 2016; eLife 2018, J. Neurosci 2021).

      (2) The citation for Figure 4C in the text does not match what is shown in the figure.

      Figure 4C has been removed in the revised manuscript.

      (3) Figure 5 - it would be clearer to have panel D labeled as "model results" or similar to distinguish it from the slice recording data.

      Panel D has been labeled as "Model results”.

      (4) The text in lines 191-197 in the Results may be better suited to the Discussion.

      We have modified the text in order to present the new findings without the discussion points.

      (5) It is somewhat confusing to have figure panels cited out of order in the main text (e.g., 7H before 7G and 8H before 8G).

      We have edited the text to report the findings in the proper order of the panels in Figures 7 and 8.

      Reviewer #2 (Recommendations For The Authors):

      - The observations that E2 treatment of OVX mice has an effect on the magnitude of a number of ionic currents does not necessarily mean that these changes will be seen during the estrous cycle, in response to fluctuations in circulating E2 concentrations. Experiments comparing either different estrous cycle stages or OVX mice treated with low or high E2 would be required to gain insight into this question. As such, the relevance of the authors' findings (however interesting these are as they stand) to any potential physiological endocrine/reproductive state transition is questionable, in the reviewer's opinion. The authors should acknowledge this important caveat and moderate the interpretations of their findings and the conclusions of their manuscript accordingly.

      We have published that the magnitude of the slow EPSP, which is TRPC5 channel mediated, varies throughout the estrous cycle with the slow EPSP being large during diestrus and significantly reduced during proestrus,  similar to that found in OVX compared to E2-treated, OVX females (Figure 2, Qiu, eLife 2016).  Moreover, TRPC5 channel mRNA expression,  similar to the peptides, is downregulated by an E2 treatment (Figure 10 this manuscript) that mimics proestrus levels of the steroid (Bosch et al., Mol Cell Endocrinology 2013). Furthermore, the magnitude of ionic currents is directly proportional to the number of ion channels expressed in the plasma membrane, which we have found correlates with mRNA expression. Therefore, it is likely that the magnitude of these ionic currents will vary during the estrous cycle.

      - The bursting firing pattern that the authors refer to and postulate will favor glutamate release under high E2 conditions is only seen in the computer simulations, not in patch-clamp recordings in brain slices (see also comment below). This substantially weakens some of the conclusions of the manuscript. Unless the authors can convincingly demonstrate a change in ARC kisspeptin firing pattern in response to increasing E2 using electrophysiology, these conclusions should be moderated.

      We now include examples of burst firing activity under E2-treatment conditions in Figure 1 and have included summary figure (pie chart) documenting that a significant percentage of cells exhibit this activity with E2 treatment.  

      Other comments:

      - Title: "E2 elicits distinct firing patterns" is not shown in this work. As such, the title needs to be revised.

      We now show these distinct firing patterns in Figure 1, so we think the wording in the title is an accurate reflection of our findings. 

      - Abstract: some of the interpretations are overstated, in the reviewer's opinion.

      Line 23, "... elevating the whole-cell calcium current and contributing to high-frequency firing" should be moderated, as what is shown by the authors is that blockade of calcium channel subtypes suppresses the slow EPSP and associated firing, the frequency of which is not reported (see also a later comment).

      We now include examples of burst firing activity under E2-treatment conditions in Figure 1 and have modified the abstract to state “high frequency burst firing.”

      Lines 26-28, that "mathematical modeling confirmed the importance of TRPC5 channels for initiating and sustaining synchronous firing, while GIRK channels, activated by Dyn binding to kappa opioid receptors, were responsible for repolarization" is simply not what the simulations show, in the reviewer's opinion. Indeed, there is no consideration of synchronous activity in the model, which simulates the firing of a single ARC kisspeptin neuron. Further, the model shows that TRPC5 can contribute to overall excitability (firing in response to current injection, Figure 12G) and that increasing TRPC5 conductance increases firing in response to NKB while this is decreased by adding GIRK conductance to the model (Figure 13A). Therefore, considerations of the importance of TRPC5 channels in initiating synchronous firing and the role of Dyn A-induced GIRK activity should not be included in the interpretations of the mathematical simulations.

      The significance of synchronization lies in the fact that when neuronal networks synchronize, the behavior of each neuron within the network becomes identical. In such scenarios, the firing of a single neuron mirrors the activity of the entire neuronal network. Consequently, our model simulations, based on a single-cell neuronal model, can be utilized to make reliable inferences about synchronized neuronal activity.

      Lines 31-33 (also lines 92-95), that "the transition to burst firing with high, preovulatory levels of E2 facilitates the GnRH surge through its glutamatergic synaptic connection to preoptic Kiss1 neurons" is not supported by the experiments (physiologic or computational) described in the manuscript, and is, therefore, only speculative. These statements should be removed throughout the manuscript.

      Previously, we (Qiu et al., (eLife 2016) documented a direct glutamatergic projection from Kiss1-ARH neurons to Kiss1-AVPV/PeN neurons.  Moreover, Lin et al. (Frontiers Endocrinology 2021) demonstrated that low frequency stimulation of Kiss1-ARH:ChR2 neurons, that is known to only release glutamate, boosts the LH surge, and in a follow-up paper the O’Byrne lab blocked this stimulation with ionotropic glutamate antagonists (Shen et al., Frontiers in Endocrinology 2022).  We have included these references in the Introduction and Discussion, but we did not think that it was necessary to cite these papers in the Abstract.  However, we have re-worded this final statement in the Abstract to: “the transition to burst firing with high, preovulatory levels of E2 would facilitate the GnRH surge….” 

      - Introduction: the usefulness of Figure 1 is questionable. From reading the figure legend, it is the reviewer's understanding that panels A and B are published elsewhere (there is no description of methods or results in the manuscript). Further, panels C and D are meant to illustrate that ARC kisspeptin neurons display different types of firing in OVX vs E2-treated OVX mice. The legend to C indicates that the trace illustrates "synchronous firing" but shows one cell (how can this be claimed as synchronous?) - the legend to D indicates that the trace "demonstrates" burst firing in ARC kisspeptin neurons. This part of the figure is, in the reviewer's opinion, misleading because these are only two examples (no quantifications or replicates are provided) obtained by stimulating firing in two different endocrine conditions by two different agonists. The "demonstration" of differential firing patterns would require a thorough examination of firing patterns in response to current injections (as in Figure 12 E-F) or in response to the two agonists, under the different hormonal conditions.

      Figure 1 has now been completely revised to include new data documenting the different firing patterns.  The methods detailing these experiments can be found in the Material and Methods section.

      The introduction presents a rather incomplete picture of what is known regarding how ARC kisspeptin neurons might coordinate their activity to drive episodic GnRH secretion, and it omits published work showing that blockade of glutamate receptors (in particular AMPA receptors) decreases ARC kisspeptin neuron coordinated activity in the brain slices and in vivo and suppresses pulsatile GnRH/LH secretion in mice.

      If we are not mistaken, the reviewer is referring to fiber photometry recordings of GCaMP activity, which we cite in the Discussion.  However, for the Introduction we tried to “set the stage” for our studies on measuring the individual channels underlying the different firing patterns and how they are regulated by E2.

      The introduction is also quite long with extensive descriptions of previous work by the authors and in other brain areas that would be better suited for the discussion.

      Again, we are trying to rationalize why we focused on particular ion channels based on the literature.

      - Results: lines 129-132 should be moderated, as whether calcium channels increase excitability or facilitate TRPC5 channel opening has not been directly assessed here.

      High frequency optogenetic stimulation of Kiss1-ARH neurons and NKB through its cognate receptor (TACR3) activates TRPC 5 channels (Qiu et al., eLife 2016; J. Neurosci 2021). BAPTA prevents the opening of TRPC5 channels and abrogates the slow EPSP following high frequency stimulation.  Figure 3 documents that inhibition of voltage-activated calcium channels attenuates the slow EPSP, which results in a decrease in excitability.

      Lines 145-146, one limitation of this experiment is that blockade of calcium channel subtypes will not only affect calcium entry and subsequent actions of calcium on TRPC5 channels but also impair the release of neurotransmitters and neuropeptides from kisspeptin neurons. The interpretation that "calcium channels contribute to maintaining the sustained depolarization underlying the slow EPSP" needs, therefore, to be moderated as it is not possible to extract the direct contribution of calcium channels to the activation of TRPC5 channels from these experiments.

      We cited our previous findings documenting that chelating intracellular calcium with BAPTA abolishes the slow EPSP and persistent firing (Qiu et al., J Neurosci 2021).  However, to eliminate the potential effects of calcium channel blockers on the slow EPSP amplitude, which may result from reduced presynaptic calcium influx and subsequently decreased neurotransmitter and neuropeptide secretion, we adopted a different strategy by comparing responses between Senktide and Cd2+ plus Senktide. Our findings revealed that the non-selective Ca2+ channel blocker Cd2+ significantly inhibited Senk-induced inward current (Figures 3F-H).

      Panel C should be removed from Figure 4, as it is published elsewhere.

      Figure 4C has been removed.

      Lines 168-169, "...E2 treatment led to a significant increase in the peak calcium current density in Kiss1ARH neurons, which was recapitulated as predicted by our computational modeling..." How did the model "predict" this increase in calcium current density? As no information is provided in the methods or supplementary information as to how the effect of E2 was integrated into the model, the authors will need to provide additional narration in the text to explain this statement. The "T-channel inflection" referred to in the figure legend will also need to be explained. Lastly, in Figure 5C, the current density unit should be pA/pF. 

      We have added text in the supplementary information to explain how we used the qPCR and electrophysiological data to inform the model regarding the effect that E2 has on the various ionic currents and noted in the Figure 13 legend that the increase/decrease in the conductances is physiologically mediated by E2. We have eliminated the T-channel inflection point (Figure 5D) and corrected the current density label (Figure 5C).

      Lines 198-199, please clarify "E2 does not modulate calcium channel kinetics directly but rather alters the mRNA expression to increase the conductance".

      We have clarified that “that long-term E2 treatment does not modulate calcium channel kinetics but rather alters the mRNA expression to increase the calcium channel conductance” by referring to the specific figures (i.e., Figures 4, 6) in a previous sentence.

      Figures 7 and 8 titles do not accurately reflect the contents: there is nothing about repolarization in the experiments illustrated in Figure 7 or Figure 8. The sample sizes (3 to 4 cells) are also quite small for these experiments.

      We have modified the Figure titles per the reviewer’s comments and increased the cell numbers.

      The title of Figure 9 also does not fully reflect the figure's contents. Although panel G does suggest that the M current contributes to regulating the membrane potential, the reviewer's reading of this figure panel is that the fractional contribution of the M current does not vary during a short burst of action potentials. The suggestion that "KCNQ channels play a key role in repolarizing Kiss1ARH neurons following burst firing" (line 272) and the statement that "our modeling predicted that M-current contributed to the repolarization following burst firing" (line 273) should be revised accordingly.

      The point is that the M-current contributes, albeit a small fraction, to the repolarization during burst firing.

      Line 288, please indicate what figure informs this statement.

      We have revised the statement since the modeling (Figure 13) comes later in the Results.

      Line 311-313, this sentence only superficially describes the simulation, in the reviewer's opinion. Does the model inform on how TRPC5 channels/currents do that? The supplementary information indicates that there is a tone of extracellular neurokinin B embedded in the model. This is important information that should be clearly stated in the manuscript. The authors should also consider discussing the influence of this neurokinin B tone on the contribution of TRPC5 to cell excitability. As a neurokinin B tone in the extracellular space will likely alter the firing of kisspeptin neurons in the model, readers will likely need more information about all this.

      In our current ramp simulations of the model (Fig 12 G&H) there is no involvement of neurokinin B (i.e., the NKB parameter  is set to zero), and the effect on the rheobase is solely due to the decrease of the TRPC5 conductance.  In the model, TRPC5 channels are activated by intracellular calcium levels and are therefore contributing to cell excitability even in the absence of extracellular NKB. The NKB tone is used for the simulations presented in Figure 13 where we vary the TRPC5 conductance under saturating levels of extracellular NKB.

      Lines 316-318 also read as quite superficial. More explanations of what is illustrated in Figure 13 are needed. In particular, it is unclear from the methods and supplementary information what the different ratios of conductances in OVX+E2 vs in OVX are and how they were varied in the model. Furthermore, it is unclear to the reviewer how the outcome of these simulations matches the authors' postulate that E2 enables a transition to a burst firing pattern that favors glutamate release. Looking at simulated firing in Figure 13B, E2 (by increasing calcium conductances) would tend to enable high-frequency firing within bursts (nearing 50 Hz by eye) and high burst rates (approximately 4 bursts per second), which the reviewer would argue might be expected to cause significant neuropeptide release in addition to that of glutamate.

      We have added to the text: “Furthermore, the burst firing of the OVX+E2 parameterized model was supported by elevated h- and Ca 2+-currents (Figure 13B) as well as by the high conductance of Ca2+ channels relative to the conductance of TRPC5 channels (Figure 13C).” We have also provided in the Supplemental Information (Table of Model Parameters) the specific conductances in the OVX and OVX+E2 state and how they are varied to produce the model simulations.

      Granted the high frequency firing during a burst could release peptide, but in the E2-treated, OVX females the expression of the peptides are at “rock bottom.”  Therefore, the sustained high frequency firing during the slow EPSP in the OVX state would generate maximum peptide release.

      In Figure 13C, the reviewer is unclear on the ranges of TRPC5 conductances shown. The in vitro experiments suggest that E2 suppresses Trpc5 gene expression and might suppress TRPC5 currents. The ratio of gTRPC5(OVX+E2)/gTRPC5(OVX) should, thus, be <1.0. This is not represented in the parameter space provided, making the interpretation of this simulation difficult. Please clarify what the effect of decreasing gTRPC5 will be on firing patterns in the model.

      Thank you for pointing this typographical error.  The ratio should be gTRPC5 (OVX)/TRPC5(OVX + E2) for the X-axis.

      - Discussion: many statements and conclusions are overreaching and need to be revised; for example lines 320-322, 329-330, 335-338, 369, 371-373, 391-394, 463-464, and 489-494;

      We have tempered these statements, so they are not “overreaching.”

      Lines 489-494: the authors should integrate published observations that i) ablation of ARC kisspeptin neurons results in increased LH surges in mice and rats and that ii) optogenetic stimulation of ARC kisspeptin fibers in the POA is only effective at increasing LH secretion in a surge-like manner when done at high frequencies (20 Hz), in their discussion of the role of ARC kisspeptin neurons and their firing patterns in the preovulatory surge.

      We have included the paper from the O’Byrne lab (Shen et al. Frontiers in Endocrinology 2022) in the Discussion. However, the Mittleman-Smith paper (Endocrinology, 2016) ablating KNDy neurons using NK3-saporin not only targeted KNDy neurons but other arcuate neurons that express NK3 receptors.  Therefore, we have not cited it in the Discussion.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Major comments: 

      My main concern about the manuscript is the extent of both clinical and statistical heterogeneity, which complicates the interpretation of the results. I don't understand some of the antibiotic comparisons that are included in the systematic review. For instance the study by Paul et al (50), where vancomycin (as monotherapy) is compared to co-trimoxazole (as combination therapy). Emergence (or selection) of co-trimoxazole in S. aureus is in itself much more common than vancomycin resistance. It is logical and expected to have more resistance in the co-trimoxazole group compared to the vancomycin group, however, this difference is due to the drug itself and not due to co-trimoxazole being a combination therapy. It is therefore unfair to attribute the difference in resistance to combination therapy. Another example is the study by Walsh (71) where rifampin + novobiocin is compared to rifampin + co-trimoxazole. There is more emergence of resistance in the rifampin + co-trimoxazole group but this could be attributed to novobiocin being a different type of antibiotic than co-trimoxazole instead of the difference being attributed to combination therapy. To improve interpretation and reduce heterogeneity my suggestion would be to limit the primary analyses to regimens where the antibiotics compared are the same but in one group one or more antibiotic(s) are added (i.e. A versus A+B). The other analyses are problematic in their interpretation and should be clearly labeled as secondary and their interpretation discussed. 

      Thank you for raising these important points and highlighting the need for clarification. We understand that the reviewer has concerns regarding the following points:

      (1) The structure of presenting our analyses, i.e. main analyses and sub-group analyses and their corresponding discussion and interpretation

      Our primary interest was whether combining antibiotics has an overarching effect on resistance and to identify factors that explain potential differences of the effect of combining antibiotic across pathogens/drugs. Therefore, pooling all studies, and thereby all combinations of antibiotics, is one of our main analyses. The decision to pool all studies that compare a lower number of antibiotics to a higher number of antibiotics was hence predefined in our previously published study protocol (PROSPERO CRD42020187257).

      We indeed, find that heterogeneity is high in our statistical analyses. As planned in our study protocol, we did perform several prespecified sub-group analyses and added additional ones. We now emphasize that several sub-group analyses were performed to investigate heterogeneity (L 119ff): “The overall pooled estimates are based on studies that focus on various clinical conditions/pathogens and compare different antibiotics treatments. To explore the impact of these and other potential sources of heterogeneity on the resistance estimates we performed various sub-group analyses and metaregression.” 

      The performed sub-group analyses specifically focused on specific pathogens/clinical conditions (figure 3) or explored heterogeneity due to different antibiotics in comparator arms – as suggested by the reviewer (figure 3B, SI section 6). We find that the heterogeneity remains high even if only resistances to antibiotics common to both arms are considered (SI section 6.1.8). With this analysis we excluded comparisons of different antibiotics (e.g., A vs B+C), such as those between vancomycin and cotrimoxazole named by the reviewer. While we aimed to explore heterogeneity and investigate potential factors affecting the effect of combining antibiotic on resistance, limitations arose due to limited evidence and the nature of data provided by the identified studies. Therefore, interpretability remains also limited for the subgroup analyses, which we highlight in the discussion. (L 186 ff: We accounted for many sources of heterogeneity using stratification and meta-regression, but analyses were limited by missing information and sparse data.) Further, specific subgroup analyses are discussed in more detail in the SI.

      (2) Difference in resistance development due to the type of the antibiotics or due to combination therapy?

      The reviewer raises an important point, which we also try to make: future studies should be systematically designed to compare antibiotic combination therapy, i.e. identical antibiotics in treatment arms should be used, except for additional antibiotics used in both treatment arms. We already mentioned this point in our discussion but highlight this now by emphasizing how many studies did not have identical antibiotics in their treatment arms. We write in L194ff: “19 (45%) of our included studies compared treatment arms with no antibiotics in common, and 22 studies (52%) had more than one antibiotic not identical in the treatment arms (table 1). To better evaluate the effect of combination therapy, especially more RCTs would be needed where the basic antibiotic treatment is consistent across both treatment arms, i.e. the antibiotics used in both treatment arms should be identical, except for the additional antibiotic added in the comparator arm (table 1).”

      Furthermore, we investigated the importance of the type of antibiotics with several subgroup analyses (e.g. SI sections 6.1.8 and 6.1.10). We now further highlight the concern of the type of antibiotics in the result section of the main manuscript, where we discuss the sub-group analysis with no common antibiotics in the treatment arms 131 ff: “Furthermore, a lower number of antibiotics performed better than a higher number if the compared treatment arms had no antibiotics in common (pooled OR 4.73, 95% CI 2.14 – 10.42; I2\=37%, SI table S3), which could be due to different potencies or resistance prevalences of antibiotics as discussed in SI (SI section 6.1.10).” As mentioned above we also perform sub-group analyses, where only resistances of antibiotics common to both arms are considered (SI section 6.1.8). However, as discussed in the corresponding sections, the systematic assessment of antibiotic combination therapy remains challenging as not all resistances against antibiotics used in the arms were systematically measured and reported. Furthermore, the power of these sub-group analyses is naturally a concern, as they include fewer studies. 

      Another concern is about the definition of acquisition of resistance, which is unclear to me. If for example meropenem is administered and the follow-up cultures show Enterococcus species (which is intrinsically resistant to meropenem), does this constitute acquisition of resistance? If so, it would be misleading to determine this as an acquisition of resistance, as many people are colonized with Enterococci and selection of Enterococci under therapy is very common. If this is not considered as the acquisition of resistance please include how the acquisition of resistance is defined per included study. Table S1 is not sufficiently clear because it often only contains how susceptibility testing was done but not which antibiotics were tested and how a strain was classified as resistant or susceptible. 

      Thank you for pointing out this potential ambiguity. The definition of acquisition of resistance reads now (L 275 ff): “A patient was considered to have acquired resistance if, at the follow-up culture, a resistant bacterium (as defined by the study authors) was detected that was not present in the baseline culture.” We also changed the definition accordingly in the abstract (L 36 ff). We hope that the definition of acquisition is now clearer. Our definition of “acquisition of resistance” is agnostic to bacterial species and hence intrinsically resistant species, as the example raised by the reviewer, can be included if they were only detected during the follow-up culture by the studies. Generally, it was not always clear from the studies, which pathogens were screened for and whether the selection of intrinsically resistant bacteria was reported or not. Therefore, we rely on the studies' specifications of resistant and non-resistant without further distinction from our side, i.e. classifying data into intrinsic and non-intrinsic resistance. Overall, the outcome “acquisition of resistance” can be interpreted as a risk assessment for having any resistant bacterium during or after treatment. In contrast, the outcome “emergence of resistance” is more rigorous, demanding the same species to be detected as more resistant during or after treatment.

      The information, which antibiotic susceptibility tests were performed in each individual study can be found in the main text in table 1. However, we agree that this information should be better linked and highlighted again in table S1. We therefore now refer to table 1 in the table description of table S1. L134 ff.: “See table 1 in the main text for which antibiotics the antibiotics tested and reported extractable resistance data”. Furthermore, we added the breakpoints for resistant and susceptible classification if specifically stated in the main text of the study. However, we did not do further research into old guidelines, manufactures manuals or study protocols in case the breakpoints are not specifically stated in the main text as the main goal of this table, in our opinion, is to show a justification, why the studies could be considered for a resistance outcome. We therefore decided against further breakpoint investigations for studies, where the breakpoint is not specifically stated in the main text. 

      Line 85: "Even though within-patient antibiotic resistance development is rare, it may contribute to the emergence and spread of resistance." 

      Depending on the bug-drug combination, there is great variation in the propensity to develop within-patient antibiotic resistance. For example: within-patient development of ciprofloxacin resistance in Pseudomonas is fairly common while within-patient development of methicillin resistance in S. aureus is rare. Based on these differences, large clinical heterogeneity is expected and it is questionable where these studies should be pooled. 

      We agree that our formulation neglects differences in prevalence of within-host resistance emergence depending on bug-drug combinations. We changed our statement in L 86 to: “Within-patient antibiotic resistance development, even if rare, may contribute to the emergence and spread of resistance.”

      Line 114: "The overall pooled OR for acquisition of resistance comparing a lower number of antibiotics versus a higher one was 1.23 (95% CI 0.68 - 2.25), with substantial heterogeneity between studies (I2=77.4%)" 

      What consequential measures did the authors take after determining this high heterogeneity? Did they explore the source of this large heterogeneity? Considering this large heterogeneity, do the authors consider it appropriate to pool these studies?

      Thank you for highlighting this lack of clarity. As mentioned above, we now highlight that we performed several subgroup analyses to investigate heterogeneity. (L 116ff): “The overall pooled estimates are based on studies that focus on various clinical conditions/pathogens and compare different antibiotics treatments. To explore the impact of these and other potential sources of heterogeneity on the resistance estimates we performed various subgroup analyses and meta-regression.” Nevertheless, these analyses faced limitations due to the scarcity of evidence and often still showed a high amount of heterogeneity. Given the lack of appropriate evidence, it is hard to identify the source of heterogeneity. The decision to pool all studies was pre-specified in our previously published study protocol (PROSPERO CRD42020187257) and was motivated by the question whether there is a general effect of combination therapy on resistance development or identify factors that explain potential differences of the effect of combination therapy across bug-drug combinations. Therefore, we think that the presentation of the overall pooled estimate is appropriate, as it was predefined, and potential heterogeneity is furthermore explored in the subgroup analyses. 

      Reviewer #1 (Recommendations For The Authors): 

      I want to congratulate the investigators for the rigorous approach followed and the - in my opinion - correct interpretation of the data and analysis. The disappointing outcome is independent of the quality of the approach used. Yet, the consequences of that outcome are rather limited, and will not be surprising for - at least - some in the field of antibiotic resistance. 

      Thank you for your positive and differentiated feedback.

      Reviewer #2 (Recommendations For The Authors): 

      Line 93: "The screening of the citations of the 41 studies identified one additional eligible study, for a total of 42 studies". 

      Why was this study missed in the search strategy? 

      What is the definition of "quasi-RCTs"? Why were these included in the analysis? 

      Thank you for pointing out this lack of clarity. The additional study, which was found through screening the references of included studies, was not identified with our search strategy as neither the abstract nor database specific identifiers provided any indications that resistance was measured in this study. We added an explanation in the supplementary materials L 792 ff. and refer to this explanation in the main manuscript (L 95). 

      Quasi-randomized trials are trials that use allocation methods, which are not considered truly random. We added this specification in L 95. It now reads: “….two quasi-RCTs, where the allocation method used is not truly random” and in L 252 ff: “Studies were classified as quasi-RCTs if the allocation of participants to study arms was not truly random.” For instance, the study Macnab et al. (1994) assigned patients alternately to the treatment arms. Quasi-randomized controlled trials can lead to biases and especially old studies are more likely to have used quasi-random allocation methods. This can also be seen in our study, where the two quasi-randomized controlled trials were published in 1994 and 1997. The bias is considered in the risk of bias assessment and in our conducted sensitivity analysis regarding the impact of risk of bias on our estimates (supplementary information sections 3.0 and 4.2). Furthermore, one of the two previous conducted meta-analyses comparing beta-lactam monotherapy to beta-lactam and aminoglycoside, which assessed resistance development also included quasi-randomized controlled trials Paul et al 2014. Overall, while designing the study, we decided to include quasi-randomized controlled trials to increase statistical power as we expected that limited statistical power might be a concern and decided to assess potential biases in the risk of bias assessment.  

      Line 100: "Consequently, most studies did not have the statistical power to detect a large effect on within-patient resistance development (figure 2 B, SI p 14).". 

      Small studies actually have more power to detect large effects while smaller power to detect small effects. Please rephrase. 

      Thank you for pointing out this lack of clarity. We rephrased the sentence in order to emphasize our point that the studies are underpowered even if we assume in our power analysis a large effect on resistance development between treatment arms. In this context “the small” studies include too few patients to detect a large difference in resistance development. As resistance development is a rare event, generally studies have to include a larger number of patients to estimate the effect of intervention. We rephrased the sentence in L 101ff to: “Consequently, most studies did not have the statistical power to detect differences in within-patient resistance development even if we assume that the effect on resistance development is large between treatment arms.”

      Line 108: "... and prophylaxis for blood cancer patients with four studies (10%) respectively.". 

      I would suggest using the medical term hematological malignancy patients. 

      Thank you for the suggestion, we changed it as suggested to hematological malignancy patients, also accordingly in the figures, and table 1.

      Line 117: "Since the results for the two resistance outcomes are comparable, our focus in the following is on the acquisition of resistance". 

      The first OR is 1.23 and the second is 0.74, why do you consider these outcomes as comparable? 

      Thank you for pointing out our unprecise formulation. Due to the lack of power the exact estimates need to be interpreted with care. Here, we wanted to make the point that qualitatively the results of both outcomes do not differ in the sense that our analysis shows no substantial difference between a higher and a lower number of antibiotics. We rephrased the sentence to be more precise (L 123ff): “The results for the two resistance outcomes are qualitatively comparable in the sense that individual estimates may differ, but show similar absence of evidence to support either the benefit, harm or equivalence of treating with a higher number of antibiotics. Therefore, our …”. More detailed discussion about differences in estimates can be found in the SI, when the estimates of emergence of resistance are presented (e.g. SI section 2.1).

      Line 123: "Furthermore, a lower number of antibiotics performed better than a higher number if the compared treatment arms had no antibiotics in common (pooled OR 4.73, 95% CI 2.14 - 10.42; I 2 =37%, SI p 7).". 

      How do you explain this? What does this mean? 

      We now added a more detailed explanation in the supplement (L 376ff.): “The result that if the treatment arms had no antibiotics in common a lower number of antibiotics performed better than a higher number of antibiotics could be due to different potencies of antibiotics or resistance prevalences. Further, there could be a bias to combine less potent antibiotics or antibiotics with higher resistance prevalence to ensure treatment efficacy, which couldlead to higher chances to detect resistances in the treatment arm with higher number of antibiotics, e.g. by selecting pre-existing resistance due to antibiotic treatment (see also section 6.1.9).” We furthermore already specifically mention this point in the main manuscript and refer then to the detailed explanation in the SI (L134 ff, “which could be due to different potencies or resistance prevalences of antibiotics as discussed in SI (SI section 6.1.10)”)

      Overall, we want to point out that these results need to be interpreted with caution as overall the statistical power is limited to confidently estimate the difference in effect of a higher and lower number of antibiotics.

      Line 125: ". In contrast, when restricting the analysis to studies with at least one common antibiotic in the treatment arms are pooled there was little evidence of a difference (pooled OR 0.55, 95% CI 0.28 - 1.07". 

      The difference was not statistically significant but there does seem to be an indication of a difference, please rephrase. 

      We rephrased the sentence to (L135 ff.): “In contrast, when restricting the analysis to studies with at least one common antibiotic in the treatment arms we found no evidence of a difference, only a weak indication that a higher number of antibiotics performs better (pooled OR 0.55, 95% CI 0.28 – 1.07; I2 \=74%, figure 3B).” 

      Line 190: "Similarly, today, relevant cohort studies could be analysed collaboratively using various modern statistical methods to address confounding by indication and other biases (66, 67)". 

      However, residual confounding by indication is likely. Please also mention the disadvantages of observational studies compared to RCTs. 

      We now highlight that causal inference with observational data comes with its own challenges and stress that randomized controlled trials are still considered the gold standard. L 204ff now reads: “However, even with appropriate causal inference methods, residual confounding cannot be excluded when using observational data (67). Therefore, will remain the gold standard to estimate causal relationships.”

      Line 230: "Gram-negative bacteria have an outer membrane, which is absent in grampositive bacteria for instance, therefore intrinsic resistance against antibiotics can be observed in gram-negative bacteria (11)". 

      Intrinsic resistance is not unique for Gram-negative bacteria but also exists for Grampositive bacteria. 

      We agree with the reviewer that intrinsic resistance is not unique to gram-negative bacteria and refined our writing. We additionally added that differences between gram-negative and gram-positive bacteria are not only to be expected due to differing intrinsic resistances but also due to potential differences in the mechanistic interactions of antibiotics, i.e., synergy or antagonism. The paragraph reads now (SI L289): “The gram status of a bacterium may potentially determine how effective an antibiotic, or an antibiotic combination is. Differences between gram-negative and gram-positive bacteria such as distinct bacterial surface organisation can lead to specific intrinsic resistances of gram-negative and grampositive bacteria against antibiotics (55). These structural differences can lead to varying effects of antibiotic combinations between gram-negative and gram-positive bacteria (56).”

    1. Author response:

      The following is the authors’ response to the original reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Line 127. Provide a few more words describing the voltage protocol. To the uninitiated, panels A and B will be difficult to understand. "The large negative step is used to first close all channels, then probe the activation function with a series of depolarizing steps to re-open them and obtain the max conductance from the peak tail current at -36 mV. "

      We have revised the text as suggested (revision lines 127 to Line 131): “From a holding potential within the gK,L activation range (here –74 mV), the cell is hyperpolarized to –124 mV, negative to EK and the activation range, producing a large inward current through open gK,L channels that rapidly decays as the channels deactivate. We use the large transient inward current as a hallmark of gK,L. The hyperpolarization closes all channels, and then the activation function is probed with a series of depolarizing steps, obtaining the max conductance from the peak tail current at –44 mV (Fig. 1A).”

      Incidentally, why does the peak tail current decay? 

      We added this text to the figure legend to explain this: “For steps positive to the midpoint voltage, tail currents are very large. As a result, K+ accumulation in the calyceal cleft reduces driving force on K+, causing currents to decay rapidly, as seen in A (Lim et al., 2011).”

      The decay of the peak tail current is a feature of gK,L (large K+ conductance) and the large enclosed synaptic cleft (which concentrates K+ that effluxes from the HC). See Govindaraju et al. (2023) and Lim et al. (2011) for modeling and experiments around this phenomenon.

      Line 217-218. For some reason, I stumbled over this wording. Perhaps rearrange as "In type II HCs absence of Kv1.8 significantly increased Rin and tauRC. There was no effect on Vrest because the conductances to which Kv1.8 contributes, gA and gDR activate positive to the resting potential. (so which K conductances establish Vrest???). 

      We kept our original wording because we wanted to discuss the baseline (Vrest) before describing responses to current injection.

      Vrest is presumably maintained by ATP-dependent Na/K exchangers (ATP1a1), HCN, Kir, and mechanotransduction currents. Repolarization is achieved by delayed rectifier and A-type K+ conductances in type II HCs.

      Figure 4, panel C - provides absolute membrane potential for voltage responses. Presumably, these were the most 'ringy' responses. Were they obtained at similar Vm in all cells (i.e., comparisons of Q values in lines 229-230). 

      We added the absolute membrane potential scale. Type II HC protocols all started with 0 pA current injection at baseline, so they were at their natural Vrest, which did not differ by genotype or zone. Consistent with Q depending on expression of conductances that activate positive to Vrest, Q did not co-vary with Vrest (Pearson’s correlation coefficient = 0.08, p = 0.47, n= 85).

      Lines 254. Staining is non-specific? Rather than non-selective? 

      Yes, thanks - Corrected (Line 264).

      Figure 6. Do you have a negative control image for Kv1.4 immuno? Is it surprising that this label is all over the cell, but Kv1.8 is restricted to the synaptic pole? 

      We don’t have a null-animal control because this immunoreactivity was done in rat. While the cuticular plate staining was most likely nonspecific because we see that with many different antibodies, it’s harder to judge the background staining in the hair cell body layer. After feedback from the reviewers, we decided to pull the KV1.4 immunostaining from the paper because of the lack of null control, high background, and inability to reproduce these results in mouse tissue. In our hands, in mouse tissue, both mouse and rabbit anti-KV1.4 antibodies failed to localize to the hair cell membrane. Further optimization or another method could improve that, but for now the single-cell expression data (McInturff et al., 2018) remain the strongest evidence for KV1.4 expression in murine type II hair cells.

      Lines 400-404. Whew, this is pretty cryptic. Expand a bit? 

      We simplified this paragraph (revision lines 411-413): “We speculate that gA and gDR(KV1.8) have different subunit composition: gA may include heteromers of KV1.8 with other subunits that confer rapid inactivation, while gDR(KV1.8) may comprise homomeric KV1.8 channels, given that they do not have N-type inactivation .”

      Line 428. 'importantly different ion channels'. I think I understand what is meant but perhaps say a bit more. 

      Revised (Line 438): “biophysically distinct and functionally different ion channels”.

      Random thought. In addition to impacting Rin and TauRC, do you think the more negative Vrest might also provide a selective advantage by increasing the driving force on K entry from endolymph? 

      When the calyx is perfectly intact, gK,L is predicted to make Vrest less negative than the values we report in our paper, where we have disturbed the calyx to access the hair cell (–80, Govindaraju et al., 2023, vs. –87 mV, here). By enhancing K+ accumulation in the calyceal cleft, the intact calyx shifts EK—and Vrest—positively (Lim et al., 2011), so the effect on driving force may not be as drastic as what you are thinking.

      Reviewer #2 (Recommendations For The Authors):

      (1) Introduction: wouldn't the small initial paragraph stating the main conclusion of the study fit better at the end of the background section, instead of at the beginning? 

      Thank you for this idea, we have tried that and settled on this direct approach to let people know in advance what the goals of the paper are.

      (2) Pg.4: The following sentence is rather confusing "Between P5 and P10, we detected no evidence of a non-gK,L KV1.8-dependent.....". Also, Suppl. Fig 1A seems to show that between P5 and P10 hair cells can display a potassium current having either a hyperpolarised or depolarised Vhalf. Thus, I am not sure I understand the above statement. 

      Thank you for pointing out unclear wording. We used the more common “delayed rectifier” term in our revision (Lines 144-147): “Between P5 and P10, some type I HCs have not yet acquired the physiologically defined conductance, gK,L.. N effects of KV1.8 deletion were seen in the delayed rectifier currents of immature type I HCs (Suppl. Fig. 1B), showing that they are not immature forms of the Kv1.8-dependent gK,L channels. ”

      (3) For the reduced Cm of hair cells from Kv1.8 knockout mice, could another reason be simply the immature state of the hair cells (i.e. lack of normal growth), rather than less channels in the membrane? 

      There were no other signs to suggest immaturity or abnormal growth in KV1.8–/– hair cells or mice. Importantly, type II HCs did not show the same Cm effect.

      We further discussed the capacitance effect in lines 160-167: “Cm scales with surface area, but soma sizes were unchanged by deletion of KV1.8 (Suppl. Table 2). Instead, Cm may be higher in KV1.8+/+ cells because of gK,L for two reasons. First, highly expressed trans-membrane proteins (see discussion of gK,L channel density in Chen and Eatock, 2000) can affect membrane thickness (Mitra et al., 2004), which is inversely proportional to specific Cm. Second, gK,L could contaminate estimations of capacitive current, which is calculated from the decay time constant of transient current evoked by small voltage steps outside the operating range of any ion channels. gK,L has such a negative operating range that, even for Vm negative to –90 mV, some gK,L channels are voltage-sensitive and could add to capacitive current.”

      (4) Methods: The electrophysiological part states that "For most recordings, we used .....". However, it is not clear what has been used for the other recordings.

      Thanks for catching this error, a holdover from an earlier ms. version.  We have deleted “For most recordings” (revision line 466).

      Also, please provide the sign for the calculated 4 mV liquid junction potential. 

      Done (revision line 476).

      Reviewer #3 (Recommendations For The Authors): 

      (1) Some of the data in panels in Fig. 1 are hard to match up. The voltage protocols shown in A and B show steps from hyperpolarized values to -71mV (A) and -32 mV (B). However, the value from A doesn't seem to correspond with the activation curve in C.

      Thank you for catching this.  We accidentally showed the control I-X curve from a different cell than that in A. We now show the G-V relation for the cell in A.

      Also the Vhalf in D for -/- animals is ~-38 mV, which is similar to the most positive step shown in the protocol.

      The most positive step in Figure 1B is actually –25 mV. The uneven tick labels might have been confusing, so we re-labeled them to be more conventional.

      Were type I cells stepped to more positive potentials to test for the presence of voltage-activated currents at greater depolarizations? This is needed to support the statement on lines 147-148. 

      We added “no additional K+ conductance activated up to +40 mV” (revision line 149-150).  Our standard voltage-clamp protocol iterates up to ~+40 mV in KV1.8–/– hair cells, but in Figure 1 we only showed steps up to –25 mV because K+ accumulation in the synaptic cleft with the calyx distorts the current waveform even for the small residual conductances of the knockouts. KV1.8–/– hair cells have a main KV conductance with a Vhalf of ~–38 mV, as shown in Figure 1, and we did not see an additional KV conductance that activated with a more positive Vhalf up to +40 mV.

      (2) Line 151 states "While the cells of Kv1.8-/- appeared healthy..." how were epithelia assessed for health? Hair cells arise from support cells and it would be interesting to know if Kv1.8 absence influences supporting cells or neurons. 

      We added our criteria for cell health to lines 477-479: “KV1.8–/– hair cells appeared healthy in that cells had resting potentials negative to –50 mV, cells lasted a long time (20-30 minutes) in ruptured patch recordings, membranes were not fragile, and extensive blebbing was not seen.”

      Supporting cells were not routinely investigated. We characterized calyx electrical activity (passive membrane properties, voltage-gated currents, firing pattern) and didn’t detect differences between +/+, +/–, and –/– recordings (data not shown). KV1.8 was not detected in neural tissue (Lee et al., 2013). 

      (3) Several different K+ channel subtypes were found to contribute to inner hair cell K+ conductances (Dierich et al. 2020) but few additional K+ channel subtypes are considered here in vestibular hair cells. Further comments on calcium-activated conductances (lines 310-317) would be helpful since apamin-sensitive SK conductances are reported in type II hair cells (Poppi et al. 2018) and large iberiotoxin-sensitive BK conductances in type I hair cells (Contini et al. 2020). Were iberiotoxin effects studied at a range of voltages and might calcium-dependent conductances contribute to the enhanced resonance responses shown in Fig. 4? 

      We refer you to lines 310-317 in the original ms (lines 322-329 in the revised ms), where we explain possible reasons for not observing IK(Ca) in this study.

      (4) Similar to GK,L erg (Kv11) channels show significant Cs+-permeability. Were experiments using Cs+ and/or Kv11 antagonists performed to test for Kv11? 

      No. Hurley et al. (2006) used Kv11 antagonists to reveal Kv11 currents in rat utricular type I hair cells with perforated patch, which were also detected in rats with single-cell RT-PCR (Hurley et al. 2006) and in mice with single-cell RNAseq (McInturff et al., 2018).  They likely contribute to hair cell currents, alongside Kv7, Kv1.8, HCN1, and Kir. 

      (5) Mechanosensitive ("MET") channels in hair cells are mentioned on lines 234 and 472 (towards the end of the Discussion), but a sentence or two describing the sensory function of hair cells in terms of MET channels and K+ fluxes would help in the Introduction too. 

      Following this suggestion we have expanded the introduction with the following lines  (78-87): “Hair cells are known for their large outwardly rectifying K+ conductances, which repolarize membrane voltage following a mechanically evoked perturbation and in some cases contribute to sharp electrical tuning of the hair cell membrane.  Because gK,L is unusually large and unusually negatively activated, it strongly attenuates and speeds up the receptor potentials of type I HCs (Correia et al., 1996; Rüsch and Eatock, 1996b). In addition, gK,L augments a novel non-quantal transmission from type I hair cell to afferent calyx by providing open channels for K+ flow into the synaptic cleft (Contini et al., 2012, 2017, 2020; Govindaraju et al., 2023), increasing the speed and linearity of the transmitted signal (Songer and Eatock, 2013).”

      (6) Lines 258-260 state that GKL does not inactivate, but previous literature has documented a slow type of inactivation in mouse crista and utricle type I hair cells (Lim et al. 2011, Rusch and Eatock 1996) which should be considered. 

      Lim et al. (2011) concluded that K+ accumulation in the synaptic cleft can explain much of the apparent inactivation of gK,L. In our paper, we were referring to fast, N-type inactivation. We changed that line to be more specific; new revision lines 269-271: “KV1.8, like most KV1 subunits, does not show fast inactivation as a heterologously expressed homomer (Lang et al., 2000; Ranjan et al., 2019; Dierich et al., 2020), nor do the KV1.8-dependent channels in type I HCs, as we show, and in cochlear inner hair cells (Dierich et al., 2020).”

      (7) Lines 320-321 Zonal differences in inward rectifier conductances were reported previously in bird hair cells (Masetto and Correia 1997) and should be referenced here.

      Zonal differences were reported by Masetto and Correia for type II but not type I avian hair cells, which is why we emphasize that we found a zonal difference in I-H in type I hair cells. We added two citations to direct readers to type II hair cell results (lines 333-334): “The gK,L knockout allowed identification of zonal differences in IH and IKir in type I HCs, previously examined in type II HCs (Masetto and Correia, 1997; Levin and Holt, 2012).”

      Also, Horwitz et al. (2011) showed HCN channels in utricles are needed for normal balance function, so please include this reference (see line 171). 

      Done (line 184).

      (8) Fig 6A. Shows Kv1.4 staining in rat utricle but procedures for rat experiments are not described. These should be added. Also, indicate striola or extrastriola regions (if known). 

      We removed KV1.4 immunostaining from the paper, see above.

      (9) Table 6, ZD7288 is listed -was this reagent used in experiments to block Gh? If not please omit. 

      ZD7288 was used to block gH to produce a clean h-infinity curve in Figure 6, which is described in the legend.

      (10) In supplementary Fig. 5A make clear if the currents are from XE991 subtraction. Also, is the G-V data for single cell or multiple cells in B? It appears to be from 1 cell but ages P11-505 are given in legend. 

      The G-V curve in B is from XE991 subtraction, and average parameters in the figure caption are for all the KV1.8–/–  striolar type I hair cells where we observed this double Boltzmann tail G-V curve. I added detail to the figure caption to explain this better.

      (11) Supplementary Fig. 6A claims a fast activation of inward rectifier K+ channels in type II but not type I cells-not clear what exactly is measured here.

      We use “fast inward rectifier” to indicate the inward current that increases within the first 20 ms after hyperpolarization from rest (IKir, characterized in Levin & Holt, 2012) in contrast to HCN channels, which open over ~100 ms. We added panel C to show that the activation of IKir is visible in type II hair cells but not in the knockout type I hair cells that lack gK,L. IKir was a reliable cue to distinguish type I and type II hair cells in the knockout.

      For our actual measurements in Fig 6B, we quantified the current flowing after 250 ms at –124 mV because we did not pharmacologically separate IKir and IH.

      Could the XE991-sensitive current be activated and contributing?

      The XE991-sensitive current could decay (rapidly) at the onset of the hyperpolarizing step, but was not contributing to our measurement of IKir­ and IH, made after 250 ms at –124 mV, at which point any low-voltage-activated (LVA) outward rectifiers have deactivated. Additionally, the LVA XE991-sensitive currents were rare (only detected in some striolar type I hair cells) and when present did not compete with fast IKir, which is only found in type II hair cells.

      Also, did the inward rectifier conductances sustain any outward conductance at more depolarized voltage steps? 

      For the KV1.8-null mice specifically, we cannot answer the question because we did not use specific blocking agents for inward rectifiers.  However, we expect that there would only be sustained outward IR currents at voltages between EK and ~-60 mV: the foot of IKir’s I-V relation according to published data from mouse utricular hair cells – e.g., Holt and Eatock 1995, Rusch and Eatock 1996, Rusch et al. 1998, Horwitz et al., 2011, etc.  Thus, any such current would be unlikely to contaminate the residual outward rectifiers in Kv1.8-null animals, which activate positive to ~-60 mV. 

      (I-HCN is also not a problem, because it could only be outward positive to its reversal potential at ~-40 mV, which is significantly positive to its voltage activation range.)

    1. Author response:

      Reviewer #1 (Public Review):

      Greter et al. provide an interesting and creative use of lactulose as a "microbial metabolism" inducer, combined with tracking of H2 and other fermentation end products. The topic is timely and will likely be of broad interest to researchers studying nutrition, circadian rhythm, and gut microbiota. However, a couple of moderate to major concerns were noted that may impact the interpretation of the current data:

      (1)  Much of the data relies on housing gnotobiotic mice in metabolic cages, but I couldn't find any details of methods to assess contamination during multiple days of housing outside of gnotobiotic isolators/cages. Given the complexity of the metabolic cage system used, sterility would likely be incredibly challenging to achieve. More details needed to be included about how potential contamination of the mice was assessed, ideally with 16S rRNA gene sequencing data of the endpoint samples and/or qPCR for total colonization levels relative to the more targeted data shown.

      We thank the reviewer for pointing out that we have not made the experimental setup clear in the text. One of the unique features of our metabolic cage setup is that the mice do not need to be housed outside gnotobiotic isolators, but that the whole system is placed inside an isolator. We have developed and published this system recently (Hoces et al, PLOS Biol 2022), including extensive testing for sterility/gnotobiosis. We will improve clarity in a revised version.

      Given that 16S sequencing of germ-free mice will typically produce false positive reads, we used Blautia pseudococcoides as an indicator strain for contaminations. This strain is present in our SPF mouse colony, forms spores that are highly resilient to decontamination measures, and has been the most likely contaminant in our gnotobiotic system. We have checked for presence of this strain in the cecum content of all our animals at the end of each experiment, and only included experiments which had a B. pseudococcoides signal below threshold level.

      (2)  The language could be softened to provide a more nuanced discussion of the results. While lactulose does seem to induce microbial metabolism it also could have direct effects on the host due to its osmotic activity or other off-target effects. Thus, it seems more precise to just refer to lactulose specifically in the figure titles and relevant text. Additionally, the degree to which lactulose "disrupts the diurnal rhythm" isn't clear from the data shown, especially given that the markers of circadian rhythm rapidly recover from the perturbation. It is probably more precise to instead state that lactulose transiently induces fermentation during the light phase or something to that effect. The discussion could also be expanded to address what methods are available or could be developed to build upon the concepts here; for example, the use of genetic inducers of metabolism which may avoid the more complex responses to lactulose.

      The point about language is well taken. We tried to make the argument that what we call disruption of the diurnal rhythm is acute, meaning that it is not disrupting the rhythm "chronically" (i.e., for longer), but that it recovers rapidly from this transient disruption. Given the confusion this wording is causing we are rephrasing this in a new version of the manuscript.

      We also appreciate the mention of concepts from our study that can be built on in future studies, and we will add a paragraph on potential further research.

      Despite these concerns, this was still an intriguing and valuable addition to the growing literature on the interface of the microbiome and circadian fields.

      We thank the reviewer for all their encouraging and constructive remarks!

      Reviewer #2 (Public Review):

      Summary:

      The authors aimed to investigate how microbial metabolites, such as hydrogen and short-chain fatty acids (SCFAs), influence feeding behavior and circadian gene expression in mice.

      Specifically, they sought to understand these effects in different microbial environments, including a reduced community model (EAM), germ-free mice, and SPF mice. The study was designed to explore the broader relationship between the gut microbiome and host circadian rhythms, an area that is not well understood. Through their experiments, the authors hoped to elucidate how microbial metabolism could impact circadian clock genes and feeding patterns, potentially revealing new mechanisms of gut microbiome-host interactions.

      Strengths:

      The manuscript presents a well-executed investigation into the complex relationship between microbial metabolites and circadian rhythms, with a particular focus on feeding behavior and gene expression in different mouse models. One of the major strengths of the work lies in its innovative use of a reduced community model (EAM) to isolate and examine the effects of specific microbial metabolites, which provides valuable insights into how these metabolites might influence host behavior and circadian regulation. The study also contributes to the broader understanding of the gut microbiome's role in circadian biology, an area that remains poorly understood. The experiments are thoughtfully designed, with a clear rationale that ties together the gut microbiome, metabolic products, and host physiological responses. The authors successfully highlight an intriguing paradox: the significant influence of microbial metabolites in the EAM model versus the lack of effect in germ-free and SPF mice, which adds depth to the ongoing exploration of microbial-host interactions. Despite some methodological concerns, the manuscript offers compelling data and opens up new avenues for research in the field of microbiome and circadian biology.

      We thank the reviewer for their encouraging remarks, specifically on the surprising findings that microbial metabolism seems to affect circadian clock gene expression and behavior differently in EAM and SPF mice.

      Weaknesses:

      The manuscript, while providing valuable insights, has several methodological weaknesses that impact the overall strength of the findings. First, the process for stool collection lacks clarity, raising concerns about potential biases, such as the risk of coprophagia, which could affect the dry-to-wet weight ratio analysis and compromise the validity of these measurements.

      We thank the reviewer for pointing out that our description of the specific methods used for collecting feces were presented in a somewhat confusing manner. In short, dry and wet fecal weights were determined based on fecal pellets that were freshly produced and directly collected from restrained mice. To determine total fecal output over time, we collected all fecal pellets produced in a 5 hour window in a cage, determined their dry weight, and then used the water content determined for fresh feces to calculate wet weight. Using this method, we cannot account for potential differences in coprophagia between the groups. However, this is not likely to affect the dry-to-wet ratio of fecal output in our results.

      Additionally, the use of the term "circadian" in some contexts appears inaccurate, as "diurnal" might be more appropriate, especially given the uncertainty regarding whether the observed microbiome fluctuations are truly circadian.

      Similarly to our answer to reviewer 1 above, we appreciate this remark about imprecise language and have addressed this issue in the text. Indeed, we do not think the microbiota fluctuations are truly circadian, but likely a result of the entrainment through the host's food intake.

      Another significant issue is the unexpected absence of an osmotic effect of lactulose in EAM mice, which contradicts the known properties of lactulose as an osmotic laxative. This finding requires further verification, including the use of a positive control, to ensure it is not artifactual.

      This is a good point. We have used this lactulose dosage specifically to induce microbial metabolism without causing osmotic diarrhea, and went to some lengths do demonstrate this. In response to this comment (and one by reviewer 3 below about transit time), we are planning an experiment that will use a higher lactulose dose as a positive control.

      The presentation of qRT-PCR data as log2-fold changes, with a mean denominator, could introduce bias by artificially reducing variability, potentially leading to spurious findings or increased risk of Type I error. This approach may explain the unexpected activation of both the positive and negative limbs of the circadian clock.

      While we agree that our description of the qpcr method used for measuring circadian clock gene expression was lacking detail, we do not see how log2-fold changes (as opposed to, e.g., fold change) would lead to an increased risk of Type 1 error. We did not use a mean denominator for analyzing the data but used the house-keeping data for the same sample as denominator for the respective circadian clock genes. This will be described more clearly in a revised methods section.

      Moreover, the lack of detailed information on the primers and housekeeping genes used in the experiments is concerning, particularly given the importance of using non-circadian housekeeping genes for accurate normalization.

      We apologize for this omission, it seems like the resource table got lost in the submission, leading to missing information. It will be included in the revised manuscript.

      The methods for measuring metabolic hormones, such as GLP-1 and GIP, are also not adequately described. If DPP-IV/protease inhibitor tubes were not used, the data could be unreliable due to the rapid degradation of these hormones by circulating proteases.

      We thank the reviewer for spotting this mistake. We will add details of how GLP-1 and GIP were measured to the methods section. While we did not use DPP-IV/protease inhibitor tubes, we added the inhibitors to the syringes when sampling blood, leading to the same effect.

      Finally, the manuscript does not address the collection of hormone levels during both fasting and fed phases, a critical aspect for interpreting the metabolic impact of microbial metabolites.

      We agree that it will be interesting to measure hormone levels also in the fed phase, and we will include this data in a revised version of the manuscript. Even with that data, a more thorough examination of hormone levels over the diurnal cycle, as suggested by reviewer 3, might be relevant for a full-scale follow-up. Given our data, we of course cannot exclude that there may be time-point-specific differences and therefore have softened the language around this conclusion to state that hormone levels are not acutely changed after a lactulose intervention “at the time-points examined”.

      These methodological concerns collectively weaken the robustness of the study's results and warrant careful reconsideration and clarification by the authors.

      Because of these weaknesses, the authors have partially achieved their aims by providing novel insights into the relationship between microbial metabolites and host circadian rhythms. The data do suggest that microbial metabolites can significantly influence feeding behavior and circadian gene expression in specific contexts. However, the unexpected absence of an osmotic effect of lactulose, the potential biases introduced by the log2-fold change normalization in qRT- PCR data, and the lack of clarity in critical methodological details weaken the overall conclusions. While the study provides valuable contributions to understanding the gut microbiome's role in circadian biology, the methodological weaknesses prevent a full endorsement of the authors' conclusions. Addressing these issues would be necessary to strengthen the support for their findings and fully achieve the study's aims.

      We thank the reviewer again for their careful and critical reading of our work, and for their constructive input. We hope that many of the concerns will be addressed by providing more methodological detail and additional experimental data in the revised version of our manuscript.

      Despite the methodological concerns raised, this work has the potential to make a significant impact on the field of circadian biology and microbiome research. The study's exploration of the interaction between microbial metabolites and host circadian rhythms in different microbial environments opens new avenues for understanding the complex interplay between the gut microbiome and host physiology. This research contributes to the growing body of evidence that microbial metabolites play a crucial role in regulating host behaviors and physiological processes, including feeding and circadian gene expression.

      We thank the reviewer for their encouraging remarks!

      Reviewer #3 (Public Review):

      Summary:

      In the manuscript by Greter, et al., entitled "Acute targeted induction of gut-microbial metabolism affects host clock genes and nocturnal feeding" the authors are attempting to demonstrate that an acute exposure to a non-nutritive disaccharide (lactulose) promotes microbial metabolism that feeds back onto the host to impact circadian networks. The premise of the study is interesting and the authors have performed several thoughtful experiments to dissect these relationships, providing valuable insights for the field. However, the work presented does not necessarily support some of the conclusions that are drawn. For instance, lactulose is administered during the fasting period to mimic the impact of a feeding bout on the gut microbiota, but it would be important to perform this treatment during the fed state as well to show that the effects on food intake, etc. do not occur.

      This is a good point, and we will include an experiment addressing this in a revised version of the manuscript.

      To truly draw the conclusion that the current outcomes are directly connected to and mediated via an impact on the host circadian clock, it would be ideal to perform these studies in a circadian gene knock-out animal (i.e., Cry1 or Cry2 KO mice, or perhaps Bmal-VilCre tissue- specific KO mice). If the effects are lost in these animals, this would more concretely connect the current findings to the circadian clock gene network.

      We agree that these would be interesting experiments to follow up on the question how the observed effects are actuated by host functions. However, they would require a large amount of preparatory work (including rederiving the KO mice to get them germ-free in our gnotobiotic facility), we argue that they are beyond the scope of this study.

      Despite these reservations, the work is promising.

      We thank the reviewer for their encouraging assessment.

      Strengths:

      Attempting to disentangle nutrient acquisition from microbial fermentation and its impact on diurnal dynamics of gut microbes on host circadian rhythms is an important step for providing insights into these host-microbe interactions.

      The authors utilize a novel approach in leveraging lactulose coupled with germ-free animals and metabolic cages fitted with detectors that can measure microbial byproducts of fermentation, particularly hydrogen, in real-time.

      The authors consider several interesting aspects of lactulose delivery, including how it shifts osmotic balance as well as provides calculations that attempt to explain the caloric contribution of fermentation to the animal in the context of reduced food intake. This provides interesting fundamental insights into the role of microbial outputs on host metabolism.

      Thank you!

      Weaknesses:

      While the authors have done a large amount of work to examine the osmotic vs. metabolic influence of lactulose delivery, the authors have not accounted for the enlarged cecum and increased cecal surface area in germ-free mice. The authors could consider an additional control of cecectomy in germ-free mice.

      We thank the reviewer for pointing out the potential effect of the anatomical differences of germ- free and conventionally colonized mice. We agree that when comparing germ-free mice to SPF mice, the enlarged cecum area in germ-free animals could lead to differences in water release or uptake. However, this is not the case in the gnotobiotic mice colonized with our minimal microbiota, which have comparable cecum sizes to germ-free mice, and thus comparing water transport over the cecum wall between those groups can be done without correcting for cecal surface areas. We will add information on cecum sizes in the different experimental groups to a revised version of the manuscript.

      The authors have examined GI hormones as one possible mechanism for how food intake is altered by microbial fermentation of lactulose. However, the authors measure PYY and GLP-1 only at a single time point, stating that there are no differences between groups. Given the goal of the studies is to tie these findings back into circadian rhythms, it would be important to show if the diurnal patterns of these GI hormones are altered.

      We fully agree that a deeper investigation of the diurnal fluctuations of hormone levels would be an interesting next step in studying whether perturbations in food intake can disturb these rhythms. Doing this for the whole rhythm would really require a full second study. For a revised version of this manuscript, we will add a second time-point of hormone measurements (during the fed phase) to this study. In addition, we will soften the statements made around these data to point out just that hormone level fluctuations could not be detected during specific time points after lactulose treatment, and therefore do not seem to explain the imminent behavioral changes.

      Considerations of other factors, such as conjugated vs. deconjugated bile acids, microbial bile salt hydrolase activity, and bile acid resorption, might be an important consideration for how lactulose elicits more influence on ileal circadian clock genes relative to cecum and colon.

      We absolutely agree that investigation of microbial bile acid modification and their metabolism by the host would be an interesting topic for a follow-up study.

      Measurements of GI transit time (both whole gut and regional) would be an important for consideration for how lactulose might be impacting the ileum vs. cecum vs. colon.

      This is also an interesting point, and we will add an assessment of transit time to a revised version of the manuscript.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      We thank all reviewers for their constructive criticism and suggestions. We have addressed all the points as detailed below. We also added an experiment that strengthens the connection between replication stress and GSF2 and suggests a role of GSF2 in recovery from the DNA replication checkpoint arrest (Fig. 4g).

      Reviewer #1 (Evidence, reproducibility and clarity)

      Summary

      The manuscript by the Khmelinskii group reports that they have successfully constructed two conditional degron libraries of budding yeast for almost all proteins. For this purpose, the authors employed an improved auxin-inducible degron (AID2). Initially, they constructed yeast libraries by fusing HaloTag to the N- or C-terminus of proteins and found that C-terminal tagging is less likely to affect the location and function of proteins (Fig. 1). Based on this finding, the authors fused mNG-AID*-3Myc or AID*-3Myc (AID-v1 or AID-v2 library, respectively) to more than 5600 proteins and found that 4079 proteins were significantly depleted when cells were treated with 5-Ph-IAA (Fig. 2). A fitness defect was observed for over 60% of essential proteins, indicating the target depletion showed the expected phenotype in many cases (Fig. 3). Finally, the authors screened proteins required for maintaining viability in the presence of MMS, CPT and HU, and identified common proteins involved in DNA repair (such as RAD52 epistasis proteins) and other proteins specific for MMS, CPT or HU resistance (Fig. 4). Furthermore, the authors revealed that an ER membrane protein, Gsf2, is required for HU resistance, which was not found in previous studies with the YKO library because gsf2∆ cells in the YKP library had acquired a suppressor mutation (Fig. 4e).

      Major comments

      1 - In Figure S2a, the authors initially checked the growth of yeast cells expressing OsTIR1(F74G) under the GAL1 promoter, saying that "expression of OsTir1(F74G) from the strong galactose-inducible GAL1 promoter had a negligible impact on yeast fitness (page 3)". To me, the OsTIR1(G74G) expressing cells showed slightly slower growth compared to the control cells. Moreover, the cells expressing it under the very strong GPD promoter showed apparent slow growth, suggesting that OsTIR1(F74G) overexpression caused a side effect. The authors should carefully evaluate the cells with GAL1-OsTIR1(F74G).

      Indeed high levels of OsTir1(F74G) impaired growth, at least in the strain background used in our experiments. Expression from the strongest promoter we tested (GPD) resulted in an obvious fitness defect, whereas conditional expression from the strong GAL1 promoter had a small impact on fitness and expression from the weaker CYC1 and ADH1 promoters did not affect fitness (Fig. S2a). Despite the small fitness impairment, we decided to use the GAL1-OsTIR1(F74G) construct for the AID libraries for two reasons: the conditional nature of this promoter is likely to limit adaptation to expression of OsTir1(F74G) and the high expression levels of OsTir1(F74G) are less likely to limit degradation of AID-tagged proteins. We added this explanation to the Results section.

      As suggested by the reviewer, we quantitatively evaluated the fitness impact of the GAL1-OsTIR1(F74G) construct. Using the colony size data of the AID-v1 library (grown on galactose medium with 1 µM 5-Ph-IAA, Fig. 2c), we compared colony sizes of OsTIR1– and OsTIR1+ strains for non-essential ORFs. As degradation of non-essential proteins is not expected to affect fitness, the difference in colony size between OsTIR1– and OsTIR1+ strains can be attributed to OsTir1 expression. On average, the presence of the OsTIR1 construct reduced colony size by 7% (median fitness of OsTIR1+ strains relative to OsTIR1– strains of 0.93 ± 0.06, n = 4698 non-essential ORFs). We performed the same comparison for strains that did not exhibited OsTIR1-dependent protein degradation. In this set of strains, the presence of the OsTIR1 construct also reduced colony size by 7% (median fitness of OsTIR1+ strains relative to OsTIR1– strains of 0.93 ± 0.05, n = 624 ORFs in the “not affected” group in Fig. 2d). We added this information to Fig. S3a.

      2 - Given the possibility that OsTIR1(F74G) overexpression might cause a growth problem, it is not appropriate to compare OsTIR1+ and OsTIR1- conditions for evaluating growth fitness (Fig. 2). As shown in Fig. S4b, it is more appropriate to compare the +/- 5-Ph-IAA conditions. Additionally, the 5-Ph-IAA concentration used in this study was not clearly mentioned in the method section and figure legends.

      The two approaches, comparison of OsTIR1– and OsTIR1+ strains grown on galactose with 5-Ph-IAA (as was done for the AID-v1 library) and comparison of galactose ± 5-Ph-IAA conditions (as was done for the AID-v2 library), have advantages and disadvantages but should yield similar results. The technical noise (due to spatial effects on the screen plates) is lower for the comparison of OsTIR1– and OsTIR1+ strains, as the two strains for each ORF can be grown next to each other on the same plate (Fig. 2c). Furthermore, corrections of spatial effects are more precise with this layout as the frequency of fitness defects per plate is lower. On the other hand, comparison of galactose ± 5-Ph-IAA conditions implicitly corrects for the fitness impact of the GAL1-OsTIR1(F74G) construct, as the fitness distribution of each condition is normalized to the median of that condition, but this fitness impact of OsTir1 cannot be determine from the screen results.

      We now explicitly corrected the colony size data of the AID-v1 library for the fitness impact of OsTir1 expression (quantified in the previous point) and updated all the analyses and results shown in Fig. 3, Fig. S3b-e and Fig. S4a. The correction was performed using the multiplicative model, whereby the fitness impacts of OsTir1 expression and degradation of the AID-tagged protein are independent. Overall, our observations and conclusions stand unchanged with the corrected data.

      Finally, the 5-Ph-IAA concentration (1 µM) used in all experiments is now indicated in the figure legends and the Methods section.

      3 - The authors found that fitness defects were observed for over 60% of essential proteins (Fig. 3). In other words, depletion of the remaining 40% was not enough to induce growth defects. The authors should discuss how the current AID library can be improved to achieve better target depletion. Previous literature reported various possibilities, such as using a tandem degron tag and combining AID with the Tet promoter system (PMID 25181302, 26081484). Although optional, it would be wonderful if the authors would generate an improved library.

      Following the reviewer’s suggestion, we added the following statement to the discussion:

      “In the future, the libraries could be potentially improved with N-terminal tagging of ORFs that currently exhibit incomplete or no degradation of AID-tagged proteins or using multiple copies of the AID* tag to enhance protein degradation (Kubota et al, 2013; Nishimura & Kanemaki, 2014).”

      Minor comments

      4 - 5-Ph-IAA is not auxin because it does not induce the auxin responses in plants (PMID 29355850). Therefore, the authors should be careful when they refer to 5-Ph-IAA and should not call it auxin.

      We corrected this and now refer to 5-Ph-IAA explicitly throughout the manuscript.

      5 - The availability of the HaloTag and AID libraries should be indicated.

      We added the following statement to the Methods section: “All strains, plasmids and libraries are available upon request.”

      6 - Page 3: "Finally, the extent of AID-dependent degradation varied with protein abundance, in that highly expressed proteins were more likely to be only partially degraded compared to lowly expressed ones (Fig. 2e, Fig. S2e)". Fig. S2e should be Fig. S2d, shouldn't it?

      We corrected this mistake.

      Reviewer #1 (Significance):

      This paper is technically robust and well-conducted. It presents a comprehensive study showcasing the effectiveness of the conditional degron library. The HaloTag libraries will also be useful. The yeast libraries presented in this study will be invaluable for future screenings and studies across all aspects of yeast biology.

      Reviewer #2 (Evidence, reproducibility and clarity):

      In this study, the authors reported the development and characterization of two AID-tagged strain libraries for the model organism S. cerevisiae. The libraries are based on the latest AID technology, AID2. One library contains a fluorescent protein fused to the AID, whereas the other library does not have the fluorescent protein, thus offering better compatibility with imaging-based screens. The authors show that AID-dependent protein degradation can be achieved for most of the library strains, and a growth phenotype was induced for a high fraction of essential genes. Genetic screens for DNA damage-sensitive mutants showcased the applicability of the libraries.

      I only have the following minor comments and suggestions for the authors to consider.

      Point 1, Page 3

      "Optimized tagging of proteins with these N-terminal localization signals likely also contributes to the lack of correlation between differential fitness defects and occurrence of terminal localization signals (Fig. S1f, Table S2)."

      Is this because the genes that cannot tolerate C-terminal addition are already depleted in the C-SWAT library? In the C-SWAT library, a 15-amino-acid linker L3 is added to the C-terminus.

      That is certainly a possibility. During construction of SWAT library, tagging with N-SWAT and C-SWAT acceptor modules failed for 251 and 353 ORFs, respectively (Weill et al. 2018, Meurer et al. 2018). However, these ORFs are not enriched in N- or C-terminal localization signals, respectively (4.6% ORFs with C-terminal signals in C-SWAT library vs 3.3% among failed C-SWAT strains; 12.3% ORFs with N-terminal signals in N-SWAT library vs 2.0% among failed N-SWAT strains).

      The most significant trend in the data is enrichment of ribosomal subunits in both sets of failed strains: 3.9% and 16.3% of the genes mapped to the GO term “ribosome” in the N-SWAT library and the set of failed N-SWAT strains, respectively; 3.6% and 15.9% of the genes in the C-SWAT library and the set of failed C-SWAT strains, respectively. This is consistent with what was reported by Weill et al. for failed N-SWAT strains.

      Point 2, Page 3

      "Expression of OsTir1(F74G) from the strong galactose-inducible GAL1 promoter had a negligible impact on yeast fitness (Fig. S2a)."

      I wonder why the authors chose to use an inducible promoter to express OsTir1(F74G). In other studies, for example Snyder et al. 2019, OsTir1 has been expressed from a constitutive promoter.

      Despite the small fitness impairment, we decided to use the GAL1-OsTIR1(F74G) construct for the AID libraries for two reasons: the conditional nature of this promoter is likely to limit adaptation to expression of OsTir1(F74G) and the high expression levels of OsTir1(F74G) are less likely to limit degradation of AID-tagged proteins. We added this explanation to the Results section.

      Please see our response to reviewer 1, points 1 and 2.

      Point 3, Page 3

      "A similar frequency was previously observed with a set of AID alleles constructed for 758 essential ORFs using the original AID system (Snyder et al, 2019). However, over a third of these alleles exhibited fitness defects even in the absence of auxin, which were further compounded by off-target effects of auxin, highlighting the advantages of the AID2 system."

      Snyder et al. 2019 used a TAP-AID-6FLAG tag. The fitness defect in the absence of auxin may not necessarily be due to the AID part of the tag, as TAP tagging is known to compromise the functions of some genes.

      We corrected our statement as follows:

      “A similar frequency was previously observed with a set of AID alleles constructed for 758 essential ORFs using the original AID system (Snyder et al, 2019). However, over a third of these alleles exhibited fitness defects even in the absence of auxin, which were further compounded by off-target effects of auxin.”

      Point 4, Page 3

      "Interestingly, complete degradation of 33% of essential proteins did not result in a fitness defect. It is possible that in some cases partial degradation results in low protein levels that are below the detection limit of our assay but are sufficient for viability."

      Are these "33% of essential proteins" enriched with genes with low expression levels? I guess genes with low expression levels are more likely to fall below the detection limit even when partially depleted. Are there extreme examples where a highly expressed essential gene does not exhibit a fitness defect when the protein product is no longer detectable?

      We performed the analysis suggest by the reviewer, and observed no difference in pre-degradation protein levels between essential & degraded proteins with and without a fitness defect (now shown in Fig. S3b). This also showed that there are indeed several essential proteins with high pre-degradation proteins levels and without a fitness defects upon degradation to below our detection limit: Pgi1, Nhp2, Smt3, Gus1, Dys1, Sis1, Fas2 and Rpo26 (in the abundance bin 4 in Fig. S2f).

      In addition, we considered the nature of the essential genes in these two groups. Namely, we compared the frequency of core essential genes, which are always required for viability, and conditional essential genes, which vary in essentiality depending on the genetic background or environment (Bosch-Guiteras & van Leeuwen, 2022). Interestingly, the set of essential and degraded proteins without an accompanying fitness defect was enriched in conditional essential genes defined by two independent measures: essentiality across S. cerevisiae natural isolates (Peter et al, 2018) or with bypass suppression interactions in a laboratory strain (van Leeuwen et al, 2020) (Fig. S3c, odds ratio = 1.6, p-value = 0.04 in a Fisher’s exact test and odds ratio = 1.7, p-value = 0.02, respectively). This suggests that conditional essentiality could explain the observed lack of fitness defects upon degradation of some essential proteins.

      We added this analysis to the Results section.

      Reviewer #2 (Significance):

      This study generated highly valuable resources for functional genomic studies.

      Reviewer #3 (Evidence, reproducibility and clarity):

      Summary: In this manuscript, the authors construct and analyze a genome-wide collection of AID-tagged S. cerevisiae strains. The manuscript is clearly written and the analysis appears to be thorough. This collection will be quite useful to the yeast community. There are some issues to address, listed below.

      1. page 1, abstract - "...with protein abundance and tag accessibility as limiting factors." It's not clear what the authors mean by protein abundance as a limiting factor. Are they referring to the protein level pre-depletion? Please clarify.

      That is correct. We clarified this statement as follows:

      “Almost 90% of AID-tagged proteins were degraded in the presence of the auxin analog 5-Ph-IAA, with initial protein abundance and tag accessibility as limiting factors.”

      1. page 1, second paragraph of the Intro, end of the paragraph - There are publications prior to Van Leeuwen et al. 2016 that describe suppressors lurking in the deletion set. Here are two that should be cited: Hughes et al. 2000, DOI: 10.1038/77116 and Teng et al. 2013, DOI: 10.1016/j.molcel.2013.09.026.

      We added the references pointed out by the review.

      1. The goal of this work, stated in the first sentence of results, is to construct genome-wide AID libraries. Yet, to test whether N-terminal or C-terminal tagging is better, the authors used a Halo tag. Those results showed that, for the Halo tag, C-terminal tagging was less likely to impair function. Why weren't these tests done with the same AID tag used to build the libraries in the next section? What is the evidence that the results for a Halo tag will be the same as for an AID tag? While hard to find it documented in publications, there is a lot of anecdotal evidence that the type of tag can make a big difference, as well as its location. While this section will be of interest to those using Halo tags, it's not clear how it relates to the rest of the paper, especially given the careful characterization of the AID library in the next section.

      We chose the Halo tag due its size (33 kDa), similar to many commonly used fluorescent protein tags and to the mNG-AID*-3myc tag in the AID-v1 library, and lack of evidence for a dominant negative effect on the tagged proteins. This is now stated in the Results section.

      We agree that further work is needed to understand how the type of tag, its size and biophysical properties, and the linker between the tag and the protein of interest affect protein localization and function across the proteome. This is now stated in the Results section.

      1. Throughout this manuscript, including the Tables, in cases where the protein is no longer detected, please do not describe this as "complete degradation." Instead, please use "not detectable." This is clearly the case for essential proteins that are no longer detected but that still grow, so it is very likely the case for many or all of the others. If the authors have any understanding of the sensitivity of their fluorescence assay, then that would be helpful to know. For example, they could add a control, taking a known amount of a fluorescent protein and analyzing known dilutions to assay the level of detection.

      We appreciate the reviewer’s suggestion. We decided against “not detectable” instead of “complete degradation” to avoid confusion with proteins that are not detectable pre-degradation. Nevertheless, we replaced “complete degradation” with “degradation” and added the following explanation to the Results section:

      “Out of 5079 proteins detected in OsTIR1– strains, 4455 (~88%) were significantly depleted in OsTIR1+ strains (Fig. 2d, Table S3). 3981 proteins could not be detected specifically in the OsTIR1+ background. Hereafter, we will refer to these proteins as degraded, although it is likely that at least in some cases degradation is not complete but the remainder is below the detection limit of our plate reader assay. Nevertheless, 474 proteins were unequivocally degraded only partially, as they were detectable in the OsTIR1+ background but at reduced levels compared to the OsTIR1– background (Fig. 2d).”

      To estimate the detection limit of the colony fluorescence assay, we correlated the background-corrected mNG intensities in OsTIR1– strains with absolute levels (in molecules per cell) of 1167 proteins determined by Lawless et al. (PMID 26750110). Based on a linear fit, the threshold above which proteins are considered “detected” in our analysis, mNG/bkg(OsTIR1–) > 1.2, corresponds to 200 molecules per cell (95% confidence interval 18 to 2187 molecules per cell). We added this information to the Results section and Fig. S2c.

      This detection limit is in line with our results, where low abundance proteins such as the centromeric histone Cse4/CENP-A (with two Cse4 molecules per centromere adding to 64 molecules per cell, Aravamudhan et al. PMID: 23623551 and several times that amount elsewhere in the cell, Collins et al. PMID: 15530401) can be detected in the colony assay (Table S3).

      1. Fig. S2a and page 3, first full paragraph - The authors wrote that expression of OsTir1(F74G) from the strong GAL1 promoter had a negligible impact on growth. However, the figure shows that there is an obvious effect on growth after 48 hours of incubation, with much smaller colonies. This defect is much less obvious after 72 hours. This difference suggests that the growth effect would have been even more obvious at 24 hours. I think that the text should be modified to indicate this effect.

      We now quantified the fitness impact of the GAL1-OsTIR1(F74G) construct and rephrased this part of the manuscript. In addition, we corrected the AID-v1 library screen results for the fitness impact of the GAL1-OsTIR1(F74G) construct and updated all figures and tables. Please see our response to reviewer 1, points 1 and 2.

      1. One of the main justifications for the construction of the AID library is to allow assays for essential genes. Yet that was not a feature of the screen for DNA damage response factors. Were any essential genes identified in those screens? It would be of great interest to identify lower levels of 5-Ph-IAA that only mildly affect growth of essential genes and then to repeat the screens.

      58 out of the combined 165 potential resistance factors identified in the three screens are essential genes. We added this information to the Results section and essential genes are now indicated in Fig. S5c.

      We now show that chemical-genetic interactions for both essential and non-essential genes can be reproduced in spot tests using the MMS screen as an example (Fig. S5d). We also show that additional essential hits can be identified at lower concentrations of 5-Ph-IAA, which allow determining chemical-genetic interactions for strains that otherwise exhibit no growth in 1 μM 5-Ph-IAA (Fig. S5e). As the screens serve as a demonstration of possible uses of the AID libraries, we consider additional exhaustive screening for DNA damage response factors beyond the scope of this manuscript.

      1. A big advantage of AID depletion over deletions is the ability to look at strains very shortly after loss of the protein of interest. In many cases in the literature, experiments are done after one or two hours after depletion. Yet in this work, there are no data presented on how effective depletion is in the short term versus after a long period of growth (24 hours). It would be a strong addition to the manuscript to include a time course for at least a subset of the proteins to look at the loss of signal over time, either by fluorescence or by Western blots.

      We performed time courses of protein depletion with immunoblotting for 12 strains (4 proteins from the “degraded”, “partially degraded” and “not affected” groups each). The results in Fig. S2e show that “degraded” proteins are depleted to below the detection limit within 60min of 5-Ph-IAA addition, “partially degraded” proteins are depleted less or exhibit a degradation-resistant pool, and the levels of “not affected” proteins remain stable over time, consistent with their classification based on mNG fluorescence in the colony assay. We added this information to the Results section.

      Reviewer #3 (Significance):

      The library will be of use to the yeast community.

    1. What is a global history of architecture? There is, of course, no single answer, just as there is no single way to define words like global, history, and architecture. Nonetheless, these words are not completely open-ended, and they serve here as the vectors that have helped us construct the narratives of this volume. With this book, we hope to provoke discussion about these terms and at the same time furnish a framework students can use to begin discussion in the classroom.This book transcends the necessary restrictions of the classroom, where in a semester or even two, the teacher has to limit what is taught based on any number of factors. The reader should understand that there is always something over the horizon. Whereas any such book must inevitably be selective about what it can include, we have attempted to represent a wide swath of the globe, in all its diversity. At the same time,however, the book does not aspire to be an encyclopedia of everything that has been built; nor does it assume a universal principle that governs everything architectural. The buildings included are for us more than just monuments of achievement; we see them as set pieces allowing us to better appreciate the complex intertwining of social, political, religious, and economic contexts in which they are positioned. As much as possible, we emphasize urban contexts as well as materials and surfaces. We have also tried to emphasize quality as much as quantity. From that point of view, the word global in the title is not so much a geographic construct as an eruditional horizon. In that sense, this book is not about the sum of all local histories. Its mission is bound to the discipline of architecture, which requires us to see connections, tensions, and associations that transcend so-called local perspectives. In that respect, ours is only one of many possible narratives.Synchrony has served as a powerful frame for our discussion. For instance, as much as Seoul’s Gyeongbok Palace is today heralded in Korea as an example of traditional Korean architecture, we note that it also belongs to a Eurasian building campaign that stretched from Japan (the Katsura Imperial Villa), through China (Beijing and the Ming Tombs), to Persia (Isfahan), India (the Taj Mahal), Turkey (the Suleymaniye Complex), Italy (St. Peter’s Basilica and the Villa Rotonda), France (Chambord), and Russia (Cathedral of the Assumption). In some cases, one can assume that information flowed from place to place, but such movement is not itself a requirement for the architecture to qualify as “global.” It is enough for us to know, first, that these structures are contemporaneous and that each has a specific history. If there are additional connections that come as a result of trade, war, or other forms of contact, these are for us subsidiary to contemporaneity.This is not to say that our story is exclusively the story of individual buildings and sites, only that there is a give and take between explaining how a building works and how it is positioned in the world of its influences and connections. We have, therefore, tried to be faithful to the specificities of each individual building while acknowledging that every architectural project is always embedded in a larger world—and even a worldview—that affects it directly and indirectly.Our post-19th-century penchant for seeing history through the lens of the nation-state often makes it difficult to apprehend such global pictures. Furthermore, in the face of today’s increasingly hegemonic global economy, the tendency by historians, and often architects, to nationalize, localize, regionalize, and even micro-regionalize history—perhaps as meaningful acts of resistance—can blind us to the historical synchronicity and interconnectivity of global realities that existed long before our present moment of globalization. What would the Turks be today if they had stayed in East Asia? The movement of people, ideas, food, and wealth has bound us to each other since the beginning of history. And so without denying the reality of nation-states and their claims to unique histories and identities, we have resisted the temptation to streamline our narratives to fit nationalistic parameters. Indian architecture, for instance, may have some consistent traits from its beginnings to the present day, but there is less certainty about what those traits might be than one may think. The flow of Indian Buddhism to China, the opening of trade to Southeast Asia, the settling of Mongolians in the north, the arrival of Islam from the east, and the colonization by the English are just some of the more obvious links that bind India, for better or worse, to global events. It is these links, and the resultant architecture, more than the presumed “Indianness” of Indian architecture, that interests us. Furthermore, India has historically been divided into numerous kingdoms that, like Europe, could easily have evolved (and in some cases did evolve) into their own nations. The 10th-century Chola dynasty of peninsular India, for example, was not only an empire but possessed a unique worldview of its own. In writing its history, we have attempted to preserve its distinct identity while marking the ways in which it maps its own global imagination.Broadly speaking, our goal is to help students of architecture develop an understanding of the manner in which architectural production is always triangulated by the exigencies of time and location. More specifically, we have narrated these interdependencies to underscore what we consider to be the inevitable modernity of each period. We often think of the distant past as moving slowly from age to age, dynasty to Ching, F. D. K., Jarzombek, M. M., & Prakash, V. (2017). A global history of architecture. John Wiley & Sons, Incorporated.Created from udmercy on 2024-08-30 18:52:30.Copyright © 2017. John Wiley & Sons, Incorporated. All rights reserved.

      Buildings are more than just structures—they tell stories about the cultures and times they were built in

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Weakness 1. Enhancing Reproducibility and Robustness: To enhance the reproducibility and robustness of the findings, it would be valuable for the authors to provide specific numbers of animals used in each experiment. Explicitly stating the penetrance of the rod-like neurocranial shape in dact1/2-/- animals would provide a clearer understanding of the consistency of this phenotype. 

      In Fig. 3 and Fig. 4 animal numbers were added to the figure and figure legend (line 1111). In Fig. 5 animal numbers were added to the figure. We now state that dact1/2-/- animals exhibit the rod-like neurocranial shape that is completely penetrant (Line 260). 

      Weakness 2. Strengthening Single-Cell Data Interpretation: To further validate the single-cell data and strengthen the interpretation of the gene expression patterns, I recommend the following: 

      -Provide a more thorough explanation of the rationale for comparing dact1/2 double mutants with gpc4 mutants.

      -Employ genotyping techniques after embryo collection to ensure the accuracy of animal selection based on phenotype and address the potential for contamination of wild-type "delayed" animals.

      -Supplement the single-cell data with secondary validation using RNA in situ or immunohistochemistry techniques. 

      An explanation of our rationale was added to the results section (Lines 391403) and a summary schematic was added to Figure 6 (panel A).

      Genotyping of the embryos was not possible but quality control analysis by considering the top 2000 most variable genes across the dataset showed good clustering by genotype, indicating the reproducibility of individuals in each group (See Supplemental Fig. 4).

      The gene expression profiles obtained in our single-cell data analysis for gpc4, dact1, and dact2 correlate closely with our in situ hybridization analyses. Further, our data is consistent with published zebrafish single-cell data. We validated our finding of increased capn8 expression in dact1/2 mutants by in situ hybridization. Therefore we are confident in the robustness of our single-cell data.  

      Weakness 3. Directly Investigating Non-Cell-Autonomous Effects: To directly assess the proposed non-cell-autonomous role of dact1/2, I suggest conducting transplantation experiments to examine the ability of ectodermal/neural crest cells from dact1/2 double mutants to form wild-type-like neurocranium.  

      The reviewer’s suggestion is an excellent experiment and something to consider for future work. Cell transplant experiments between animals of specific genotypes are challenging and require large numbers. It is not possible to determine the genotype of the donor and recipient embryos at the early timepoint of 1,000 cell stage where the transplants would have to be done in the zebrafish. So that each transplant will have to be carried out blind to genotype from a dact1+/-; dact2+/- or dact1-/-; dact2+/- intercross and then both animals have to be genotyped at a subsequent time point, and the phenotype of the transplant recipient be analyzed. While possible, this is a monumental undertaking and beyond the scope of the current study.

      Weakness 4. Further Elucidating Calpain 8's Role: To strengthen the evidence supporting the critical role of Calpain 8, I recommend conducting overexpression experiments using a sensitized background to enhance the statistical significance of the findings. 

      We thank the reviewer for their suggestion and have now performed capn8 overexpression experiments in embryos generated from dact1/2 double heterozygous breeding. We found a statistically significant effect of capn8 overexpression in the dact1+/-,dact2+/- fish (Lines 462-464 and Fig. 8C,D). 

      Minor Comments:  

      Comment: Creating the manuscript without numbered pages, lines, or figures makes orientation and referencing harder.  

      Revised

      Comment: Authors are inconsistent in the use of font and adverbs, which requires extra effort from the reader. ("wntIIf2 vs wnt11f2 vs wnt11f2l"; "dact1/2-/- vs dact1/dact2 -/-"; "whole-mount vs wholemount vs whole mount").  

      Revised throughout.

      Comment: Multiple sentences in the "Results" belong to the "Materials and Methods" or the "Discussion" section. 

      We have worked to ensure that sentences are within the appropriate sections of the manuscript.

      Comment: Abstract:

      "wnt11f2l" should be "wnt11f2"  

      Revised (Line 24).

      Comment: Main text:

      Page 5 - citation Waxman, Hocking et al. 2004 is used 3x without interruption any other citation. 

      Revised (Line 112).

      Page 9 - "dsh" mutant is mentioned once in the whole manuscript - is this a mistake?

      Revised, Rewritten (Line 196).

      Page 10 - Fig 2B does not show ISH.

      Revised (Line 229).

      Page 11 - "kyn" mutant is mentioned here for the first time but defined on page 15.

      Revised (Line 245). Now first described on page 4.

      Page 14 - "cranial CNN" should be CNCC.

      Revised. (Line 334)

      Page 16 - dact1/dact2/gpc4: Fig. 5C is used but it should be Fig 5E.

      Revised. (Line 381)

      Page 18 - dact1/2-/- or dact1-/-, dact2-/-. 

      Revised. (Line 428)

      Comment: Methods:

      Page 24 - ZIRC () "dot" is missing. ChopChop ")" is missing. "located near the 5' end of the gene" - In the Supplementary Figure 1 looks like in the middle of the gene.

      Revised. (Lines 600, 609, 611, respectively).

      Page 25 - WISH -not used in the main text.

      Revised. (Line 346).

      Page 26 - 4% (v/v) formaldehyde; at 4C - 4{degree sign}C; 50% (v/v) ethanol; 3% (w/v) methylcellulose.

      Revised. (Lines 659, 660, 662).

      Page 27 - 0.1% (w/v) BSA. 

      Revised. (Line 668).

      Comment: Discussion:

      The overall discussion requires more references and additional hypotheses. On page 20, when mentioning 'as single mutants develop normally,' does this refer to the entire animals or solely the craniofacial domain? Are these mutants viable? If they are, it's crucial to discuss this phenomenon in relation to prior morpholino studies and genetic compensation.

      Observing how the authors interpret previously documented changes in nodal and shh signaling would be beneficial. While Smad1 is discussed, what about other downstream genes? Is shh signaling altered in the dact1/2 double mutants? 

      We have revised the Discussion to include more references (Lines 473, 476, 483, 488, 491, 499, 501, 502, 510, 515, 529, 557, 558) and additional hypotheses (Lines 503-505, 511-519, 522-525). We have added more specific information regarding the single mutants (Lines 270-275, 480-493, Fig. S3). We have added discussion of other downstream genes, including smad1 (Lines 561-572) and shh (Lines 572-580).

      Comment: Figures:

      Appreciating differences between specimens when eyes were or were not removed is quite hard.

      Yes this was an unfortunate oversight, however, the key phenotype is the EP shown in the dissections.

      Fig 1. - wntIIf2 vs wnt11f2? C - Thisse 2001 - correct is Thisse et al. 2001.

      Revised typo in Fig 1. (And Line 1083).

      Fig 1E: These plots are hard to understand without previous and detailed knowledge. Authors should include at least some demarcations for the cephalic mesoderm, neural ectoderm, mesenchyme, and muscle. Missing color code.

      We have moved this data to supplementary figure S1 and have added labels of the relevant cell types and have added the color code.

      Comment:- Fig 2 - In the legend for C - "wildtype and dact2-/- mutant" and "dact1/2 mutant"; in the picture is dact1-/-, dact2-/-.

      Revised (Line 1105).

      Fig 2 - B - it is a mistake in 6th condition dact1: 2x +/+, heterozygote (+/-) is missing.

      Revised Figure 2B.

      Fig 4. - Typo in the legend: dact1/"t"2-/- .

      Revised. (Line 1127).

      Fig 8C - In my view, when the condition gfp mRNA says "0/197, " none of the animals show this phenotype. I assume the authors wanted to say that all the animals show this phenotype; therefore, "197/197" should be used.

      We have removed this data from the figure as there were concerns by the reviewers regarding reproducibility. 

      Fig S1 - Missing legend for the 28 + 250, 380 + 387 peaks? RT-qPCR - is not mentioned in the Materials and Methods. In D - ratio of 25% (legend), but 35% (graph).

      Revised.(Line 1203, Line 625, Line 1213, respectively).

      Fig S2 - The word "identified" - 2x in one sentence. 

      Revised. (Line 1230).

      Reviewer #2 (Public Review):

      Weakness(1) While the qualitative data show altered morphologies in each mutant, quantifications of these phenotypes are lacking in several instances, making it difficult to gauge reproducibility and penetrance, as well as to assess the novel ANC forms described in certain mutants.  

      In Fig. 3 and Fig. 4 animal numbers were added to the figure legend. In Fig. 5 animal numbers were added to the figure to demonstrate reproducibility. We now state that dact1/2-/- animals exhibit the rod-like neurocranial shape that is completely penetrant (Line 260). As the altered morphologies that we report are qualitatively significant from wildtype we did not find it necessary to make quantitative measurements. For experiments in which it was necessary to in-cross triple heterozygotes (Fig 3, Fig. 5), we dissected and visually analyzed the ANC of at least 3 compound mutant individuals. At least one individual was dissected for the previously published or described genotypes/phenotypes (i.e. wt, wntllf2-/-, dact1/2-/-, gpc4-/-, wls/-). We realize quantitative measurements may identify subtle differences between genotypes. However, the sheer number of embryos needed to generate these relatively rare combinatorial genotypes and the amount of genotyping required prevented quantitative analyses. 

      Weakness 2) Germline mutations limit the authors' ability to study a gene's spatiotemporal functional requirement. They therefore cannot concretely attribute nor separate early-stage phenotypes (during gastrulation) to/from late-stage phenotypes (ANC morphological changes). 

      We agree that we cannot concretely attribute nor separate early and latestage phenotypes. Conditional mutants to provide temporal or cell-specific analysis are beyond the scope of this work. Here we speculate based on evidence obtained by comparing and contrasting embryos with grossly similar early phenotypes and divergent late-stage phenotypes. We believe our findings contribute to the existing body of literature on zebrafish mutants with both early convergent extension defects and craniofacial abnormalities.   

      Weakness (3) Given that dact1/2 can regulate both canonical and non-canonical wnt signaling, this study did not specifically test which of these pathways is altered in the dact1/2 mutants, and it is currently unclear whether disrupted canonical wnt signaling contributes to the craniofacial phenotypes, even though these phenotypes are typical non-canonical wnt phenotypes. 

      Previous literature has attributed canonical wnt, non-canonical wnt, and nonwnt functions to dact, and each of these likely contributes to the dact mutant phenotype (Lines 87-89). We performed cursory analyses of tcf/lef:gfp expression in the dact mutants and did not find evidence to support further analysis of canonical wnt signaling in these fish. Single-cell RNAseq did not identify differential expression of any canonical or non-canonical wnt genes in the dact1/2 mutants.

      Further research is needed to parse out the intracellular roles of dact1 and dact2 in response to wnt and tgf-beta signaling. Here we find that dact may also have a role in calcium signaling, and further experiments are needed to elaborate this role.      

      Weakness (4) The use of single-cell RNA sequencing unveiled genes and processes that are uniquely altered in the dact1/2 mutants, but not in the gpc4 mutants during gastrulation. However, how these changes lead to the manifested ANC phenotype later during craniofacial development remains unclear. The authors showed that calpain 8 is significantly upregulated in the mutant, but the fact that only 1 out of 142 calpainoverexpressing animals phenocopied dact1/2 mutants indicates the complexity of the system. 

      To further test whether capn8 overexpression may contribute to the ANC phenotype we performed overexpression experiments in the resultant embryos of dact1/dact2 double het incross. We found the addition of capn8 caused a small but statistically significant occurrence of the mutant phenotype in dact1/2 double heterozygotes (Fig.8D). We agree with the reviewer that our results indicate a complex system of dysregulation that leads to the mutant phenotype. We hypothesize that a combination of gene dysregulation may be required to recapitulate the mutant ANC phenotype. Further, as capn8 activity is regulated by calcium levels, overexpression of the mRNA alone likely has a small effect on the manifestation of the phenotype. 

      Weakness (5) Craniofacial phenotypes observed in this study are attributed to convergent extension defects but convergent extension cell movement itself was not directly examined, leaving open if changes in other cellular processes, such as cell differentiation, proliferation, or oriented division, could cause distinct phenotypes between different mutants. 

      Although convergent extension cell movements were not directly examined, our phenotypic analyses of the dact1/2 mutant are consistent with previous literature where axis extension anomalies were attributed to defects in convergent extension (Waxman 2004, Xing 2018, Topczewski 2001). We do not attribute the axis defect to differentiation differences as in situ analyses of established cell type markers show the existence of these cells, only displaced relative to wildtype (Figure 1). We agree that we cannot rule out a role for differences in apoptosis or proliferation however, we did not detect transcriptional differences in dact1/2 mutants that would indicate this in the single-cell RNAseq dataset. Defects in directed division are possible, but alone would not explain that dact1/2 mutant phenotype, particularly the widened dorsal axis (Figure 1).

      Major comments:  

      Comment (1) The author examined and showed convergent extension phenotype (CE) during body axis elongation in dact1/dact2-/- homozygous mutants. Given that dact2-/- single mutants also displayed shortened axis, the authors should either explain why they didn't analyze CE in dact2-/- (perhaps because that has been looked at in previously published dact2 morphants?) or additionally show whether CE phenotypes are present in dact1 and dact2 single mutants.  

      The authors should quantify the CE phenotype in both dact2-/- single mutants and dact1/dact2-/- double mutants, and examine whether the CE phenotypes are exacerbated in the double mutants, which may lend support to the authors' idea that dact1 can contribute to CE. The authors stated in the discussion that they "posit that dact1 expression in the mesoderm is required for dorsal CE during gastrulation through its role in noncanonical Wnt/PCP signaling". However, no evidence was presented in the paper to show that dact1 influences CE during body axis elongation.  

      Because any axis shortening in shortening in dact2-/- single mutants was overcome during the course of development and at 5 dpf there was no noticeable phenotype, we did not analyze the single mutants further.  

      We have added data to demonstrate the resulting phenotype of each combinatorial genotype to provide a more clear and detailed description of the single and compound mutants (Fig. S3). 

      Our hypothesis that dact1 may contribute to convergent extension is based on its apparent ability to compensate (either directly or indirectly) for dact2 loss in the dact2-/- single mutant. 

      Comment (2) Except in Fig. 2, I could not find n numbers given in other experiments. It is therefore unclear if these mutant phenotypes were fully or partially penetrant. In general, there is also a lack of quantifications to help support the qualitative results. For example, in Fig. 4, n numbers should be given and cell movements and/or contributions to the ANC should be quantified to statistically demonstrate that the second stream of CNCC failed to contribute to the ANC.  

      Similarly, while the fan-shaped and the rod-shaped ANCs are very distinct, the various rod-shaped ANCs need to be quantified (e.g. morphometry or measurements of morphological features) in order for the authors to claim that these are "novel ANC forms", such as in the dact1/2-/-, gpc4/dact1/2-/-, and wls/dact1/2-/- mutants (Fig. 5).  

      We have added n numbers for each experiment and stated that the rod-like phenotype of the dact1/2-/- mutant was fully penetrant. 

      Regarding CNCC experiments, we repeated the analysis on 3 individual controls and mutants and did not find evidence that CNCC migration was directly affected in the dact1/2 mutant. Rather, differences in ANC development are likely secondary to defects in floor plate and eye field morphometry. Therefore we did not do any further analyses of the CNCCs.

      Regarding figure 5, we have added n numbers. We dissected and analyzed a minimum of three triple mutants (dact1/2-/-,gpc4-/- and dact1/2-/-,wls-/-) and numerous dact1/s double mutants and found that the triple mutant ANC phenotype was consistent and recognizably different enough from the dact1/2-/-, or gpc4 or wls single mutant that morphometry measurements were not needed. Further, the triple mutant phenotype (narrow and shortened) appears to be a simple combination of dact1/2 (narrow) and gpc4/wls (shortened) phenotypes. As we did not find evidence of genetic epistasis, we did not analyze the novel ANC forms further.

      Comment (3): The authors have attributed the ANC phenotypes in dact1/2-/- to CE defects and altered noncanonical wnt signaling. However, no evidence was presented to support either. The authors can perhaps utilize diI labelling, photoconversionmediated lineage tracing, or live imaging to study cell movement in the ANC and compare that with the cell movement change in the gpc4-/- , and gpc4/dact1/2-/- mutants in order to first establish that dact1/2 affect CE and then examine how dact1/2 mutations can modulate the CE phenotypes in gpc4-/- mutants.  

      Concurrently, given that dact1 and dact2 can affect (perhaps differentially) both canonical and non-canonical wnt signaling, the authors are encouraged to also test whether canonical wnt signaling is affected in the ANC or surrounding tissues, or at minimum, discuss the potential role/contribution of canonical wnt signaling in this context.  

      Given the substantial body of research on the role of noncanonical wnt signaling and planar cell polarity pathway on convergent extension during axis formation (reviewed by Yang and Mlodzik 2015, Roszko et al., 2009) and the resulting phenotypes of various zebrafish mutants (i.e. Xing 2018, Topczewski 2001), including previous research on dact1 and 2 morphants (Waxman 2004), we did not find it necessary to analyze CE cell movements directly.  

      Our finding that CNCC migration was not defective in the dact1/2 mutants and the knowledge that various zebrafish mutants with anterior patterning defects (slb, smo, cyc) have a similar craniofacial abnormality led us to conclude that the rod-like ANC in the dact1/2 mutant was secondary to an early patterning defect (abnormal eye field morphology). Therefore, testing dact1/2 and convergent extension or wnt signaling in the ANC itself was not an aim of this paper.  

      Comment (4) The authors also have not ruled out other possibilities that could cause the dact1/2-/- ANC phenotype. For example, increased cell death or reduced proliferation in the ANC may result in the phenotype, and changes in cell fate specification or differentiation in the second CNCC stream may also result in their inability to contribute to the ANC. 

      We agree that we cannot rule out whether cell death or proliferation is different in the dact1/2 mutant ANC. However, because we do not find the second CNCC stream within the ANC, this is the most likely explanation for the abnormal ANC shape. Because the first stream of CNCC are able to populate the ANC and differentiate normally, it is most likely that the inability of the second stream to populate the ANC is due to steric hindrance imposed by the abnormal cranial/eye field morphology. These hypotheses would need to be tested, ideally with an inducible dact1/2 mutant, however, this is beyond the scope of this paper.     

      Comment (5) The last paragraph of the section "Genetic interaction of dact1/2 with Wnt regulators..." misuses terms and conflates phenotypes observed. For instance, the authors wrote "dact2 haploinsuffciency in the context of dact1-/-; gpc4-/- double mutant produced ANC in the opposite phenotypic spectrum of ANC morphology, appearing similar to the gpc4-/- mutant phenotype". However, if heterozygous dact2 is not modulating phenotypes in this genetic background, its function is not "haploinsuffcient". The authors then said, "These results show that dact1 and dact2 do not have redundant function during craniofacial morphogenesis, and that dact2 function is more indispensable than dact1". However this statement should be confined to the context of modulating gpc4 phenotypes, which is not clearly stated. 

      Revised (Lines 380, 382).   

      Comment (6) For the scRNA-seq analysis, the authors should show the population distribution in the UMAP for the 3 genotypes, even if there are no obvious changes. The authors are encouraged, although not required, to perform pseudotime or RNA velocity analysis to determine if differentiation trajectories are changed in the NC populations, in light of what they found in Fig. 4. The authors can also check the expression of reporter genes downstream of certain pathways, e.g. axin2 in canonical wnt signaling, to query if these signaling activities are changed (also related to point #3 above). 

      We have added population distribution data for the 3 genotypes to Supplemental Figure 4. Although RNA velocity analysis would be an interesting additional analysis, we would hypothesize that the NC population is not driving the differences in phenotype. Rather these are likely changes in the anterior neural plate and mesoderm. 

      Comment (7) While the phenotypic difference between gpc4-/- and dact1/2-/- are in the ANC at a later stage, ssRNA-seq was performed using younger embryos. The authors should better explain the rationale and discuss how transcriptomic differences in these younger embryos can explain later phenotypes. Importantly, dact1, dact2, and capn8 expression were not shown in and around the ANC during its development and this information is crucial for interpreting some of the results shown in this paper. For example, if dact1 and dact2 are expressed during ANC development, they may have specific functions during that stage. Alternatively, if dact1 and dact2 are not expressed when the second stream CNCCs are found to be outside the ANC, then the ANC phenotype may be due to dact1/2's functions at an earlier time point. The author's statement in the discussion that "embryonic fields determined during gastrulation effect the CNCC ability to contribute to the craniofacial skeleton" is currently speculative. 

      We have reworded our rationale and hypothesis to increase clarity (Lines 391-405). We believe that the ANC phenotype of the dact1/2 mutants is secondary to defective CE and anterior axis lengthening, as has been reported for the slb mutant (Heisenberg 1997, 2000). We utilized the gpc4 mutant as a foil to the dact1/2 mutant, as the gpc4 mutant has defective CE and axis extension without the same craniofacial phenotype.

      We have added dact1 and dact2 WISH of 24 and 48 hpf (Fig1. D,E) to show expression during ANC development. 

      Comment (8) The functional testing of capn8 did not yield a result that would suggest a strong effect, as only 1 in 142 animals phenocopied dact1/2. Therefore, while the result is interesting, the authors should tone down its importance. Alternatively, the authors can try knocking down capn8 in the dact1/2 mutants to test how that affects the CE phenotype during axis elongation, as well as ANC morphogenesis. 

      As overexpression of capn8 in wildtype animals did not result in a significant phenotype, we tested capn8 overexpression in compound dact1/2 mutants as these have a sensitized background. We found a small but statistically significant effect of exogenous capn8 in dact1+/-,dact2+/- animals. While the effect is not what one would expect comparing to Mendelian genetic ratios, the rod-like ANC phenotype is an extreme craniofacial dysmorphology not observed in wildtype or mRNA injected embryos hence significant. The experiment is limited by the available technology of over-expressing mRNA broadly without temporal or cell specificity control. It is possible that if capn8 over-expression was restricted to specific cells (floor plate, notochord or mesoderm) and at the optimal time period during gastrulation/segmentation that the aberrant ANC phenotype would be more robust. We agree with the reviewer that although the finding of a new role for capn8 during development is interesting, its importance in the context of dact should be toned down and we have altered the manuscript accordingly (Lines 455-467).  

      Comment (9) A difference between the two images in Fig. 8B is hard to distinguish.

      Consider showing flat-mount images. 

      We have added flat-mount images to Fig. 8B

      Minor comments:

      Comment (1) wnt11f2 is spelled incorrectly in a couple of places, e.g. "wnt11f2l" in the abstract and "wntllf2" in the discussion. 

      Revised throughout.

      Comment (2) For Fig. 1D, the white dact1 and yellow dact2 are hard to distinguish in the merged image. Consider changing one of their colors to a different one and only merge dact1 and dact2 without irf6 to better show their complementarity.  

      We agree with the reviewer that the expression patterns of dact1 and dact2 are difficult to distinguish in the merged image. We have added outlines of the cartilage elements to the images to facilitate comparisons of dact1 and dact2 expression (Fig 1F). 

      Comment (3) For Fig. 1E, please label the clusters mentioned in the text so readers can better compare expressions in these cell populations.  

      We have moved this data to supplementary figure S1 and have added labels.

      Comment (4) The citing and labelling of certain figures can be more specific. For example, Fig. S1A, B, and Fig. S1C should be used instead of just Fig. S1 (under the section titled dact1 and dact2 contribute to axis extension...". Similarly, Fig. 4 can be better labeled with alphabets and cited at the relevant places in the text.  

      We have modified the labeling of the figures according to the reviewer’s suggestion (Fig S2 (previously S1), Fig4) and have added reference to these labels in the text (Lines 202, 204, 212, 328, 334, 336). 

      Comment (5) For Fig. 2B, the (+/+,-/-) on x-axis should be (+/-,-/-).  

      Revised in Figure 2B.

      Comment (6) Several figures are incorrectly cited. Fig. 2C is not cited, and the "Fig. 2C" and "Fig. 2D" cited in the text should be "Fig. 2D" and "Fig. 2E" respectively. Similarly, Fig. 5C and D are not cited in the text and the cited Fig. 5C should be 5E. The VC images in Fig. 5 are not talked about in the text. Finally, Fig. 7C was also not mentioned in the text.  

      We have corrected the labeling and have added descriptions of each panel in the Results (Fig.2 Line 231, 237, 242, Fig 5 Line 373, 381, Fig 7 line 431). 

      Comment (7) In the main text, it is indicated that zebrafish at 3ss were used for ssRNAseq, but in the figure legend, it says 4ss. 

      Revised (Line 682)

      Comment (8) No error bars in Fig. S1B and the difference between the black and grey shades in Fig. S1D is not explained.  

      Error bars are not included in the graphs of qPCR results (now Fig S2C) as these are results of a pool of 8 embryos performed one time. We have added a legend to explain the gray vs. black bars (now Fig S2E). 

      Reviewer #3 (Public Review):  

      Weaknesses: The hypotheses are very poorly defined and misinterpret key previous findings surrounding the roles of wnt11 and gpc4, which results in a very confusing manuscript. Many of the results are not novel and focus on secondary defects. The most novel result of overexpressing calpain8 in dact1/2 mutants is preliminary and not convincing.  

      We apologize for not presenting the question more clearly. The Introduction was revised with particular attention to distinguish this work using genetic germline mutants from prior morpholino studies. Please refer to pages 4-5, lines 106-121.

      Weakness 1) One major problem throughout the paper is that the authors misrepresent the fact that wnt11f2 and gpc4 act in different cell populations at different times. Gastrulation defects in these mutants are not similar: wnt11 is required for anterior mesoderm CE during gastrulation but not during subsequent craniofacial development while gpc4 is required for posterior mesoderm CE and later craniofacial cartilage morphogenesis (LeClair et al., 2009). Overall, the non-overlapping functions of wnt11 and gpc4, both temporally and spatially, suggest that they are not part of the same pathway.  

      We have reworded the text to add clarity. While the loss of wnt11 versus the loss of gpc4 may affect different cell populations, the overall effect is a shortened body axis. We stressed that it is this similar impaired axis elongation phenotype but discrepant ANC morphology phenotypes in the opposite ends of the ANC morphologic spectrum that is very interesting and leads us to investigate dact1/2 in the genetic contexts of wnt11f2 and gpc4.  Pls refer to page 4, lines 73-84. Further, the reviewer’s comment that wnt11 and gpc4 are spatially and temporally distinct is untested. We think the reviewer’s claim of gpc4 acting in the posterior mesoderm refers to its requirement in the tailbud (Marlow 2004). However this does not exclude gpc4 from acting elsewhere as well. Further experiments would be necessary. Both wnt11f2 and gpc4 regulate non-canonical wnt signaling and are coexpressed during some points of gastrulation and CF development (Gupta et al., 2013; Sisson 2015). This data supports the possibility of overlapping roles. 

      Weakness 2) There are also serious problems surrounding attempts to relate single-cell data with the other data in the manuscript and many claims that lack validation. For example, in Fig 1 it is entirely unclear how the Daniocell scRNA-seq data have been used to compare dact1/2 with wnt11f2 or gpc4. With no labeling in panel 1E of this figure these comparisons are impossible to follow. Similarly, the comparisons between dact1/2 and gpc4 in scRNA-seq data in Fig. 6 as well as the choices of DEGs in dact1/2 or gpc4 mutants in Fig. 7 seem arbitrary and do not make a convincing case for any specific developmental hypothesis. Are dact1 and gpc4 or dact2 and wnt11 coexpressed in individual cells? Eyeballing similarity is not acceptable.  

      We have moved the previously published Daniocell data to Figure S1 and have added labeling. These data are meant to complement and support the WISH results and demonstrate the utility of using available public Daniocell data. Please recommend how we can do this better or recommend how we can remediate this work with specific comment. 

      Regarding our own scRNA-seq data, we have added rationale (line 391-403) and details of the results to increase clarity (Lines 419-436). We have added a panel to Figure 6 (panel A) to help illustrate or rationale for comparing dact1/2 to gpc4 mutants to wt. The DEGs displayed in Fig.7A are the top 50 most differentially expressed genes between dact1/2 mutants and WT (Figure 7 legend, line 422-424).   

      We have looked at our scRNA-seq gene expression results for our clusters of interest (lateral plate mesoderm, paraxial mesoderm, and ectoderm). We find dact1, dact2, and gpc4 co-expression within these clusters. Knowing whether these genes are coexpressed within the same individual cell would require going back and analyzing the raw expression data. We do not find this to be necessary to support our conclusions. The expression pattern of wnt11f2 is irrelevant here.   

      Weakness 3) Many of the results in the paper are not novel and either confirm previous findings, particularly Waxman et al (2004), or even contradict them without good evidence. The authors should make sure that dact2 loss-of-function is not compensated for by an increase in dact1 transcription or vice versa. Testing genetic interactions, including investigating the expression of wnt11f2 in dact1/2 mutants, dact1/2 expression in wnt11f2 mutants, or the ability of dact1/2 to rescue wnt11f2 loss of function would give this work a more novel, mechanistic angle.

      We clarified here that the prior work carried out by Waxman using morppholinos, while acceptable at the time in 2004, does not meet the rigor of developmental studies today which is to generate germline mutants. The reviewer’s acceptance of the prior work at face value fails to take the limitation of prior work into account. Further, the prior paper from Waxman et al did not analyze craniofacial morphology other than eyeballing the shape of the head and eyes. Please compare the Waxman paper and this work figure for figure and the additional detail of this study should be clear. Again, this is by no means any criticism of prior work as the prior study suffered from the technological limitations of 2004, just as this study also is the best we can do using the tools we have today. Any discrepancies in results are likely due to differences in morpholino versus genetic disruption and most reviewers would favor the phenotype analysis from the germline genetic context. We have addressed these concerns as objectively as we can in the text (Lines 482-493). The fact that dact1/2 double mutants display a craniofacial phenotype while the single mutants do not, suggests compensation (Lines 503-505), but not necessarily at the mRNA expression level (Fig. S2C). 

      This paper tests genetic interaction through phenotyping the wntll/dact1/dact2 mutant.

      Our results support the previous literature that dact1/2 act downstream of wnt11 signaling. There is no evidence of cross-regulation of gene expression. We do not expect that changes in wnt11 or dact would result in expression changes in the others.

      RNA-seq of the dact1/2 mutants did not show changes in wnt11 gene expression. Unless dact1 and/or dact2 mRNA are under expressed in the wnt11 mutant, we would not expect a rescue experiment to be informative. And as wnt11 is not a focus of this paper, we have not performed the experiment.  

      Weakness 4) The identification of calpain 8 overexpression in Dact1/2 mutants is interesting, but getting 1/142 phenotypes from mRNA injections does not meet reproducibility standards.

      As the occurrence of the mutant phenotype in wildtype animals with exogenous capn8 expression was below what would meet reproducibility standards, we performed an additional experiment where capn8 was overexpressed in embryos resulting from dact1/dact2 double heterozygotes incross (Fig. 8). We reasoned that an effect of capn8 overexpression may be more robust on a sensitized background. We found a statistically significant effect of capn8 in dact1/2 double heterozygotes, though the occurrence was still relatively rare (6/80). These data suggest dysregulation of capn8 contributes to the mutant ANC phenotype, though there are likely other factors involved. 

      Comment: The manuscript title is not representative of the findings of this study.  

      We revised the title to strictly describe that we generated and carried out genetic analysis in loss of function compound mutants (Genetic requirement) and that we found capn8 was important which modified this requirement.

      Introduction: p.4:

      Comment: Anterior neurocranium (ANC) - it has to be stated that this refers to the combined ethmoid plate and trabecular cartilages. 

      Thank you, we agree that the ANC and ethmoid plate terminology has been confusing in the literature and we should endeavor to more clearly describe that the phenotypes in question are all in the ethmoid plate and the trabeculae are not affected. ANC has been replaced with ethmoid plate (EP) throughout the manuscript and figures. We also describe that all the observed phenotypes affect the ethmoid plate and not the trabeculae, (pages 13, Lines 265-267).

      Comment: Transverse dimension is incorrect terminology - replace with medio-lateral.

      Revised (Lines 69, 74).

      Comment: Improper way of explaining the relationship between mutant and gene..."Another mutant knypek, later identified as gpc4..." a better  way to explain this would be that the knypek mutation was found to be a non-sense mutation in the gpc4 gene.  

      Revised (Line 71)

      Comment: "...the gpc4 mutant formed an ANC that is wider in the transverse dimension than the wildtype, in the opposite end of the ANC phenotypic spectrum compared to wnt11f2...These observations beg the question how defects in early patterning and convergent extension of the embryo may be associated with later craniofacial morphogenesis."

      This statement is broadly representative of the general failure to distinguish primary from secondary defects in this manuscript. Focusing on secondary defects may be useful to understand the etiology of a human disease, but it is misleading to focus on secondary defects when studying gene function. The rod-like ethmoid of slb mutant results from a CE defect of anterior mesoderm during gastrulation(Heisenberg et al. 1997, 2000), while the wide ethmoid plate of kny mutants results from CE defects of cartilage precursors (Rochard et al., 2016). Based on this evidence, wnt11f2 and gpc4 act in different cell populations at different times.  

      It is true that the slb mutant craniofacial phenotype has been stated as secondary to the CE defect during gastrulation and the kny phenotype as primary to chondrocyte CE defects in the ethmoid, however the direct experimental evidence to conclude only primary or only secondary effects does not yet exist. There is no experiment to our knowledge where wnt11f2 was found to not affect ethmoid chondrocytes directly. Likewise, there is no experiment having demonstrated that dysregulated CE in gpc4 mutants does not contribute to a secondary abnormality in the ethmoid. 

      Here, we are analyzing the CE and craniofacial phenotypes of the dact1/2 mutants without any assumptions about primary or secondary effects and without drawing any conclusions about wnt11f2 or gpc4 cellular mechanisms.     

      Comment: "The observation that wnt11f2 and gpc4 mutants share similar gastrulation and axis extension phenotypes but contrasting ANC morphologies supports a hypothesis that convergent extension mechanisms regulated by these Wnt pathway genes are specific to the temporal and spatial context during embryogenesis."

      This sentence is quite vague and potentially misleading. The gastrulation defects of these 2 mutants are not similar - wnt11 is required for anterior mesoderm CE during gastrulation and has not been shown to be active during subsequent craniofacial development while gpc4 is required for posterior mesoderm CE and craniofacial cartilage morphogenesis (LeClair et al., 2009). Here again, the non-spatially overlapping functions of wnt11 and gpc4 suggest that are not part of the same pathway.  

      Though the cells displaying defective CE in wnt11f2 and gpc4 mutants are different, the effects on the body axis are similar. The dact1/2 showed a similar axis extension defect (grossly) to these mutants. Our aim with the scRNA-seq experiment was to determine which cells and gene programs are disrupted in dact1/2 mutants. We found that some cell types and programs were disrupted similarly in dact1/2 mutants and gpc4 mutants, while other cells and programs were specific to dact1/2 versus gpc4 mutants. We can speculate that these that were specific to dact1/2 versus gpc4 may be attributed to CE in the anterior mesoderm, as is the case for wnt11. 

      p.5

      Comment: "We examined the connection between convergent extension governing gastrulation, body axis segmentation, and craniofacial morphogenesis." A statement focused on the mechanistic findings of this paper would be welcome here, instead of a claim for a "connection" that is vague and hard to find in the manuscript.  

      We have rewritten this statement (Line 125).

      p.7 Results:

      Comment: It is unclear why Farrel et al., 2018 and Lange et al., 2023 are appropriate references for WISH. Please justify or edit.  

      This was a mistake and has been edited (Page 9).

      Comment: " Further, dact gene expression was distinct from wnt11f2." This statement is inaccurate in light of the data shown in Fig1A and the following statements - please edit to reflect the partially overlapping expression patterns.  

      We have edited to clarify (Lines 142-143).

      p.8

      Comment: "...we examined dact1 and 2 expression in the developing orofacial tissues. We found that at 72hpf..." - expression at 72hpf is not relevant to craniofacial morphogenesis, which takes place between 48h-60hpf (Kimmel et al., 1998; Rochard et al., 2016; Le Pabic et al., 2014).  

      We have included images and discussion of dact1 and dact2 expression at earlier time points that are important to craniofacial development (Lines 160-171)(Fig 1D,E). 

      Comment: "This is in line with our prior finding of decreased dact2 expression in irf6 null embryos". - This statement is too vague. How are th.e two observations "in line".  

      We have removed this statement from the manuscript.

      Comment: Incomplete sentence (no verb) - "The differences in expression pattern between dact1 and dact2...".  

      Revised (Line 172).

      Comment: "During embryogenesis..." - Please label the named structures in Fig.1E.

      Please be more precise with the described expression time. Also, it would be useful to integrate the scRNAseq data with the WISH data to create an overall picture instead of treating each dataset separately.  

      We have moved the previously published Daniocell data to supplementary figure S1 and have labeled the key cell types. 

      p.9

      Comment: "The specificity of the gene disruption was demonstrated by phenotypic rescue with the injection of dact1 or dact2 mRNA (Fig. S1)." - please describe what is considered a phenotypic rescue.

      -The body axis reduction of dact mutants needs to be documented in a figure. Head pictures are not sufficient. Is the head alone affected, or both the head and trunk/tail? Fig.2E suggests that both head and trunk/tail are affected - please include a live embryos picture at a later stage.  

      We have added a description of how phenotypic rescue was determined (Line 208). We have added a figure with representative images of the whole body of dact1/2 mutants. Measurements of body length found a shortening in dact1/2 double mutants versus wildtype, however differences were not found to be significantly different by ANOVA (Fig. 3C, Fig. S3, Line 270-275).

      p. 11

      Comment: "These dact1-/-;dact2-/- CE phenotypes were similar to findings in other Wnt mutants, such as slb and kny (Heisenberg, Tada et al., 2000; Topczewski, Sepich et al., 2001)." The similarity between slb and kny phenotypes should be mentioned with caution as CE defects affect different regions in these 2 mutants. It is misleading to combine them into one phenotype category as wnt11 and gpc4 are most likely not acting in the same pathway based on these spatially distinct phenotypes.  

      Here we are referring to the grossly similar axis extension defects in slb and kny mutants. We refer to these mutants to illustrate that dact1 and or 2 deficiency could affect axis extension through diverse mechanisms. We have added text for clarity (Lines 249-252).  

      Comment: "No craniofacial phenotype was observed in dact1 or dact2 single mutants. However, in-crossing to generate [...] compound homozygotes resulted in dramatic craniofacial deformity."

      This result is intriguing in light of (1) the similar craniofacial phenotype previously reported by Waxman et al (2004) using morpholino- based knock-down of dact2, and the phenomenon of genetic compensation demonstrated by Jakutis and Stainier 2001 (https://doi.org/10.1146/annurev-genet-071719-020342). The authors should make sure that dact2 loss-of-function is not compensated for by an increase in dact1 transcription, as such compensation could lead to inaccurate conclusions if ignored.  

      We agree with the reviewer that genetic compensation of dact2 by dact1 likely explains the different result found in the dact2 morphant versus CRISPR mutant. We found increased dact1 mRNA expression in the dact2-/- mutant (Fig S2X) however a more thorough examination is required to draw a conclusion. Interestingly, we found that in wildtype embryos dact1 and dact2 expression patterns are distinct though with some overlap. It would be informative to investigate whether the dact1 expression pattern changes in dact2-/- mutants to account for dact2 loss.   

      Comment: "Lineage tracing of NCC movements in dact1/2 mutants reveals ANC composition" - the title is misleading - ANC composition was previously investigated by lineage tracing (Eberhardt et al., 2006; Wada et al., 2005).  

      This has been reworded (Line 292)

      p.13

      Comment: There is no frontonasal prominence in zebrafish.  

      This is true, texts have been changed to frontal prominence.  (Lines 293,

      299, 320)

      Comment: The rationale for investigating NC migration in mutants where there is a gastrula-stage failure of head mesoderm convergent extension is unclear. The whole head is deformed even before neural crest cells migrate as the eye field does not get split in two (Heisenberg et al., 1997; 2000), suggesting that the rod-like ethmoid plate is a secondary defect of this gastrula-stage defect. In addition, neural crest migration and cartilage morphogenesis are different processes, with clear temporal and spatial distinctions.  

      We carried out the lineage tracing experiment to determine which NC streams contributed to the aberrantly shaped EP, whether the anteromost NC stream frontal prominence, the second NC stream of maxillary prominence, or both.  We found that the anteromost NCC did contribute to the rod-like EP, which is different from when hedgehod signaling is disrupted,  So while it is possible that the gastrula-effect head mesoderm CE caused a secondary effect on NC migration, how the anterior NC stream and second NC stream are affected differently between dact1/2 and shh pathway is interesting.  We added discussion of this observation to the manuscript (page 23, Lines 514-520). 

      p. 14-16

      Comment: Based on the heavy suspicion that the rod-like ethmoid plate of the dact1/2 mutant results from a gastrulation defect, not a primary defect in later craniofacial morphogenesis, the prospect of crossing dact1/2 mutants with other wnt-pathway mutants for which craniofacial defects result from craniofacial morphogenetic defects is at the very least unlikely to generate any useful mechanistic information, and at most very likely to generate lots of confusion. Both predictions seem to take form here.  

      However, the ethmoid plate phenotype observed in the gpc4-/-; dact1+/-; dact2-/- mutants (Fig. 5E) does suggest that gpc4 may interact with dact1/2 during gastrulation, but that is the case only if dact1+/-; dact2-/- mutants do not have an ethmoid cartilage defect, which I could not find in the manuscript. Please clarify.  

      The perspective that the rod-like EP of the dact1/2 is due to gastrulation defect is being examined here. Why would other mutants such as wnt11f2 and gpc4 that have gastrulation CE defects have very different EP morphology, whether primary or secondary NCC effect?  Further dact1 and dact2 were reported as modifiers of Wnt signaling, so it is logical to genetically test the relationship between dact1, dact2, wnt11f2, gpc4 and wls. The experiment had to be done to investigate how these genetic combinations impact EP morphology. This study found that combined loss of dact1, dact2 and wls or gpc4 yielded new EP morphology different than those previously observed in either dact1/2, wls, gpc4, or any other mutant is important, suggesting that there are distinct roles for each of these genes contributing to facial morphology, that is not explained by CE defect alone.   

      Comment: I encourage the authors to explore ways to test whether the rod-like ethmoid of dact1/2 mutants is more than a secondary effect of the CE failure of the head mesoderm during gastrulation. Without this evidence, the phenotypes of dact1/2 -gpc4 or - wls are not going to convince us that these factors actually interact.  

      Actually, we find our results to support the hypothesis that the ethmoid of the dact1/2 mutants is a secondary effect of defective gastrulation and anterior extension of the body axis. However, our findings suggest (by contrasting to another mutant with impaired CE during gastrulation) that this CE defect alone cannot explain the dysmorphic ethmoid plate. Our single-cell RNA seq results and the discovery of dysregulated capn8 expression and proteolytic processes presents new wnt-regulated mechanisms for axis extension.    

      p. 20 Discussion

      Comment: "Here we show that dact1 and dact2 are required for axis extension during gastrulation and show a new example of CE defects during gastrulation associated with craniofacial defects."

      Waxman et al. (2004) previously showed that dact2 is involved in CE during gastrulation.

      Heisenberg et al. (1997, 2000), previously showed with the slb mutant how a CE defect during gastrulation causes a craniofacial defect.  

      The Waxman paper using morpholino to disrupt dact2 is produced limited analysis of CE and no analysis of craniofacial morphogenesis. We generated genetic mutants here to validate the earlier morpholino results and to analyze the craniofacial phenotype in detail. We have removed the word “new” to make the statement more clear (Line 475).

      Comment: "Our data supports the hypothesis that CE gastrulation defects are not causal to the craniofacial defect of medially displaced eyes and midfacial hypoplasia and that an additional morphological process is disrupted."

      It is unclear to me how the authors reached this conclusion. I find the view that medially displaced eyes and midfacial hypoplasia are secondary to the CE gastrulation defects unchallenged by the data presented. 

      This statement was removed and the discussion was reworded.

      Comment: The discussion should include a detailed comparison of this study's findings with those of zebrafish morpholino studies.  

      We have added more discussion to compare ours to the previous morpholino findings (Lines 476-484).

      Comment: The discussion should try to reconcile the different expression patterns of dact1 and dact2, and the functional redundancy suggested by the absence of phenotype of single mutants. Genetic compensation should be considered (and perhaps tested).  

      The different expression patterns of dact1 and dact2 along with our finding that dact1 and dact2 genetic deficiency differently affect the gpc4 mutant phenotype suggest that dact1 and dact2 are not functionally redundant during normal development. This is in line with the previously published data showing different phenotypes of dact1 or dact2 knockdown. However, our results that genetic ablation of both dact1 and dact2 are required for a mutant phenotype suggests that these genes can compensate upon loss of the other. This would suggest then that the expression pattern of dact1 would be changed in the dact2 mutant and visa versa. We find that this line of investigation would be interesting in future studies. We have addressed this in the Discussion (Lines 485498).

      Comment: "Based on the data...Conversely, we propose...ascribed to wnt11f2 "

      Functional data always prevail overexpression data for inferring functional requirements.  

      This is true.

      p.21

      Comment: "Our results underscore the crucial roles of dact1 and dact2 in embryonic development, specifically in the connection between CE during gastrulation and ultimate craniofacial development."

      How is this novel in light of previous studies, especially by Waxman et al. (2004) and Heisenberg et al. (1997, 2000). In this study, the authors fail to present compelling evidence that craniofacial defects are not secondary to the early gastrulation defects resulting from dact1/2 mutations.  p. 22

      We have not claimed that the craniofacial defects are not secondary to the gastrulation defects. In fact, we state that there is a “connection”. Further, we do not claim that this is the first or only such finding. We believe our findings have validated the previous dact morpholino experiments and have contributed to the body of literature concerning wnt signaling during embryogenesis. 

      Comment: The section on Smad1 discusses a result not reported in the results section. Any data discussed in the discussion section needs to be reported first in the results section.  

      We have added a comment on the differential expression of smad1 to the results section (Lines 446-448).

    1. Reliability refers to the consistency of the measurement

      I think that this is a very important point. When conducting assessments, we are acting as scientists conducting an experiment. Like any experiment, there are many different variables at play. Variables include the number of distractions in the room during instruction and during the assessment, the number of distractions in students individual lives, any learning disabilities the student may have, diagnosed or not, and any number of other factors.How can teachers go about assessing the reliability of an assessment and at what point do instructors throw out results all-together?

    2. Rather than being used for grading, formative assessment is used to inform instructional planning and to provide students with valuable feedback on their progress

      I love the idea behind this. Rather than assignments being for a grade, I feel as though the majority of the wprk for a class should count for a completion-type score as in the end, the student should not be punished for lack of understanding. Usually when a student does not understand something, it is not necessarily their fault. Sometimes there are underlying conditions the student is dealing with such as ADHD, depression, or autism, which can make learning hard for some students. Why should students be punished for something that is not their fault? Many may argue that it is the students fault as they simply refused to do the work or refused to put effort into the assignment, but I argue that humans, especially children, are naturally curious about the world. If they do not wish to take an opportunity to learn, then usually there is more to it than them simply not wanting to do it. On the other hand, though, many older students have other priorities than completing assignments they may feel are optional, such as a simple completion grade. Instead of trying their best on it, they may deem it worthy to turn in sub-par work instead of actually trying their best on the assignment. So how can we have an assignment that assesses students understanding of a topic without punishing students who do not understand the material? The only possible solution I can think of would be to have the assignment be for a grade that reflects the correctness of the assignment, but allow the students to revise the work until it meets their expectations. This would allow students to prioritize other work over said assignment, if necessary, but also still provide students with urgency to do their best on the assignment.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Response to Reviewer #1

      Major comments:

      1. * The connection between vB12 and MMA is weak, and the attempt to connect these pieces to PPI seems somehow forced. For instance, the authors do not convincingly demonstrate that MMA causes the PPI deficit. Furthermore, vB12 may rescue PPI independently of MMA. The authors should be more transparent about the lack of connection or causality between changes in metabolism and behavior.* We appreciate the reviewer's comment and acknowledge that we have not demonstrated causal relationships between increased MMA, PPI deficits in Tbx1+/- mice and their rescue by vB12. They are associations.

      In the revised manuscript, we have clarified this in the Discussion, para.4, by adding the following phrase. “The results of our study do not prove a causal relationship between elevated brain MMA and PPI impairment, nor do they tell us whether rescue of the PPI impairment by vB12 occurs by reducing MMA”.

      Regarding the comment of a weak connection between vitamin B12 and MMA, we respectfully disagree.

      The biochemistry underlying the connection is outlined clearly in the Introduction, page 4, para. 2.

      Patients with vitamin B12 deficiency typically exhibit elevated levels of MMA and administration of vitamin B12 (cobalamin) helps to normalize MMA values (Robert & Brown, 2003). Furthermore, several animal models with genetic alterations in the vitamin B12 pathway exhibit high levels of MMA. For instance, mice lacking the cobalamin transporter have increased MMA (Bernard et al., 2018). Additionally, mice lacking the mutase (Mut), which requires vitamin B12 as a cofactor for the conversion of methylmalonyl CoA to succinyl-CoA for entry into the Krebs cycle, demonstrate elevated levels of MMA and are unresponsive to vitamin B12 treatment (Peters et al. 2006). In the revised manuscript, we have cited these references (Introduction, page 4, para. 2).

      Throughout the manuscript, an important control is missing: WT+ vB12 group. Data from this group should be added to Figures 1, 3, 4, 6, and Supplemental Figures 3 and 4 to show the effect of vB12 on WT mice.

      All of the experiments reported in the original manuscript included this control group although it was not always included in the data analysis and therefore in the figures, as observed by the reviewer, In the revised manuscript all of the relevant figures and tables now include these data.

      In Figures 1C and 3, data from the respective WT controls in the Df1 and Tbx1 cohorts should be shown.

      • *

      The wild type (WT) animals serve as the control group for both Tbx1+/- and Df/+ mice because they were littermates, obtained from matings between Df1/+ and Tbx1+/- mice. This has been clarified in the Materials and Methods, and in a new cartoon which has been added to Supplementary Figure 1 (1C) showing all of the animal groups used for the various studies (NMR, transcriptomics, behavior).

      The Supplemental Table S1 includes 17 WT controls for Tbx1+/- and 6 WT controls for Df1/+, but Figure 1C includes only one group of 11 WT controls. For which group were those 11 WT controls?

      Here are several examples of inconsistency in the data: "For this, we first performed a preliminary metabolome analysis using isolated whole brains of male and female Tbx1+/- (n= 18) and WT (n= 10) mice between one and two months of age. A set of metabolites was quantified in brain extracts by liquid chromatography tandem mass spectrometry (LC-MS/MS) (Supplementary Table 1)." Again, the number of mice the authors note in the manuscript does not match that shown in Supplementary Table 1 (Tbx1+/- (n=14) and WT (N=17). "We analyzed whole brain tissue isolated from Df1/+ and WT (control) mice (n = 5 per genotype)." Again, the numbers of mice do not match those in Supplementary Table 1, which notes 6 Df1/+ mice and 6 WT mice.

      We apologize for these errors and inconsistencies in the text and tables, all of which have been corrected in the revised manuscript. In addition, we have added the aforementioned cartoon (Supplementary Figure 1C) and we have improved the presentation of the data (genotypes and treatments) in Supplementary Tables 1 and 2. We hope that these changes provide the expected clarity to the data.

      MMA is the only metabolite that is similarly changed between Tbx1+/- and Df1/+ brains. This is an interesting observation. However, there is no other overlap in metabolic changes between these two mutants. This is a concern that requires clarification

      We appreciate the reviewer's comment. The observation is not altogether surprising considering that the Df1 deletion includes at least 9 genes involved in metabolic pathways (cited refs (Maynard, 2008;Meechan, 2011; Devaraju and Zakharenko, 2017)) any of which might counteract or compensate for changes caused by Tbx1 mutation alone. In addition, heterozygosity for other genes in the deleted region (Df1 encompasses over 20 genes) might affect metabolic processes indirectly. In the revised manuscript we have added the following phrase to the Discussion, para.1 “Thus, even though the two mutants are genetically and metabolically very different, in Df1/+ mice, the MMA phenotype is not affected by heterozygosity of other genes in the deletion.

      The authors mention that MMA is not changed in pre-term Tbx1+/- embryos, but no data are provided. What about MMA levels in Df1/+ embryos?

      In pre-term Tbx1+/- and WT embryos MMA was undetectable. This is now stated in the final paragraph of the first section of the Results.

      We did not measure MMA in Df1/+ embryos. It was not a goal of the study to compare the metabolome of these two genetically very different mutants. The MMA data on Df1/+ mice are presented because they show the potential relevance of this phenotype for the human disease, and they justify the use of the single gene (Tbx1+/-) mutants for studies into metabolism-related disease mechanisms. See also response to point 12

      * * In some cases, the differences in metabolites (e.g., glutamine, glutamate, phosphoethanolamine, taurine, leucine, myo-inositol, and niacinamide) between the WT and Tbx1+/- mice is very minimal (Supplemental Figure 2). The y-axis scale should start at 0.

      We have changed the y-axis settings where necessary

      The vB12 is administered via two different regimens: 1) every 3 days for 28 days, at 4-8 weeks of age for metabolic measurements, and 2) twice a week for 2 months for PPI behavioral testing. Is there any reason the authors chose different protocols?

      We apologize for the confusion, which was due to an oversight in the Materials and Methods section that has been corrected. The weekly injection regimen was the same for mice used in the behavioral and metabolic studies, but the treatment time was shorter for the metabolic studies, for practical reasons beyond our control; mice received vB12 injections twice a week, beginning at 4 wks of age and continuing until 8 wks or 12 wks of age for metabolic and behavioral studies respectively.

      * * The authors should add the following references to the study: Long et al., Neurogenetics (2006), which shows no change in PPI in Tbx1+/- mice. This discrepancy compared with the current study results and those of Paylor et al., Proc Natl Acad Sci U S A (2006) should be discussed.

      We have not cited the study by Long et al. because there are no obvious reasons for the discrepancy (age, mouse strain, sex) that could be discussed. Beyond this of course we cannot comment on data generated by another research group. Nevertheless, the presence of the PPI deficit in Tbx1+/- mice has been confirmed in two different Tbx1 alleles Tbx1 lacZ/+ and Tbx1ΔE5/+, by two different investigators, Dr. Richard Paylor using Tbx1 lacZ/+ mice (Paylor et al. 2001) and Dr. Elvira De Leonibus (co-author of this manuscript) using Tbx1ΔE5/+ mice, in two different countries (USA and Italy) in a rederived colony of mice.

      • Figure 6B is a concern. The PPI decrease in the Tbx1+/- group appears to be driven by results from 3-4 mice. First, are those data statistical outliers? *

      With all due respect, this is not the case. Eight Tbx1+/- mice, i.e., >50% of those tested had PPI values below the minimum observed in WT mice. The behavioural data were checked for the presence of outliers in each group using the Grubbs test, which yielded negative results. Our finding of PPI deficits in Tbx1+/- mice are in line with previously published data in Tbx1+/- and other animal models of 22q11.2 microdeletion (Paylor et al., 2006; Paylor and Lindsay, 2006; Stark et al., 2008), as well as in humans (Sobin et al., 2005).

      Second, experiments in the same mice would be more informative. Do PBS-treated mutants recover PPI if they are treated with vB12 and vice versa? If the authors are concerned about the age difference, they also should include age-dependent effects on PPI.

      We decline to perform the proposed experiment for the reasons described in section 4 of the Revision Plan

      • *

      Because vB12 treatment completely rescued the MMA level in Df1/+ mice (Figure 3), the authors should include a figure showing PPI test results in Df1/+ mice.

      Vitamin B12 treatment fully rescued the MMA phenotype in both mutants (Figure 3). Whether it rescues the PPI defect in Df1/+ mice is not important for this study. We used Df1/+ mice as an entry point, in order to give validity to the pursuit of the MMA phenotype in the single gene mutant (Tbx1+/-), in which we expect that it will be easier to find disease mechanisms. For this reason, we focused our attention on identifying metabolic alterations in adult Tbx1+/- mice.

      See also response to point 7.

      *Figure 1A and B table: Did the authors mean Log2FC instead of FC? The authors also should present the *

      *source data by adding supplemental tables that include raw data and normalized conversion, etc., as described in the multivariate statistical data analysis of the LC-MS/MS data. *

      • *

      The new Figures 1A, Figure 3 and the accompanying tables now state Log2FC. New Supplementary Table 1 presents the raw data that were normalized on the basis of the amount of protein in the samples, described and referenced in the Materials and Methods

      "...we identified a new metabolic phenotype that was associated with reduced sensorimotor gating deficits in Tbx1+/- mice". Although the authors showed the PPI rescue by treating Tbx1+/- mice with vB12, that result alone does not prove the association of metabolic phenotype with sensorimotor-gating deficit; other supporting data are needed.

      • *

      This is perhaps a question of semantics; by associated we mean that the two phenotypes, metabolic alterations and reduced PPI were observed together

      The authors stated, "Results showed that there were very few differentially expressed genes in Tbx1+/- vs WT brains, (n=22 out of 14535 expressed genes (Fig. 5 and Supplementary Tab. 2)". However, they described how 3 differentially expressed genes are involved in mitochondrial activity in the Discussion. The authors should describe those 3 genes and their relation to the metabolic change.

      The results that the reviewer refers to have changed in the revised manuscript due to the inclusion of the control group WT +vB12 in the data analysis. The transcriptome analysis revealed that vB12 had a stronger impact than genotype, and as a consequence, the statistical analysis of all groups did not highlight minor differences between the two genotypes.

      Figure 5B: The authors claimed that they detected similar transcription profiles between WT+vB12 vs. Tbx1+/-+vB12, comparing Tbx1+/-+PBS vs. Tbx1+/-+vB12. This is based on 947 genes being downregulated and 834 being upregulated, which is not appropriate. The authors should normalize those data to the numbers of genes upregulated and downregulated in WT+PBS vs. WT+vB12 respective groups.

      We said that we detected similar transcription profiles in PBS-treated WT and Tbx1+/- brains; a WT+vB12 group was not present. The latter group is included in revised manuscript and the data reanalyzed comparing all groups.

      See also response to points 2 and 15.

      Minor comments

      1. Supplementary Table S1 shows the identical MMA concentration "0.2" for 6 controls. Is this correct? This was an error that has been corrected; the value is 0.00 (not detectable).

      * Remove the callout for Figure 1C at the end of the second paragraph in Results*.

      This figure is no longer present

      *There are multiple typos throughout the manuscript. *

      Here are several examples:

      1. * Fig1B graph- Df/+ => Df1/+* Figure changed in revised manuscript

      2. "Together, the hydrophilic and lipophilic results revealed a group of 6 compounds that characterized the brain metabolic differences between Tbx1+/- and WT mice (Figure 2B, 2C)". Figure 2A should be included also. Corrected

      3. "In support of this notion, is the finding that...(remove) Removed

      4. Remove double periods: "The pathways found are depicted in Figure 2C' which reports the impact of each pathway versus p values.." Corrected

      5. Panel labels in all figures are misplaced. Panel labels are aligned correctly

      We have performed a spelling and grammar check on the text

      "In support of this notion... at least nine orthologs are involved in mitochondrial metabolism". What are those 9 mitochondrial genes? Kolar et al., Schizophr Bull (2023) indicates that there are 8 mitochondrial genes within the 22q11.2 locus. The authors need to list these genes.

      This reference, which is a review, has been cited in the Introduction, para.3 along with the genes.

      The review presents nine mitochondrial genes which the authors divide into two groups, 1) Genes expressed in mitochondria (SLC25A1, TXNRD2, MRPL40, PRODH, and COMT) and 2) Genes that have been shown to have an impact on mitochondrial function (TANGO2, ZDHHC8, UFD1L, and DGCR8). In the abstract they mention only eight genes, the ninth gene COMT is mentioned in the text.

      Reviewer #2 (Significance (Required)):

      *The manuscript titled "Tbx1 haploinsufficiency causes brain metabolic and behavioral anomalies in adult mice which are corrected by vitamin B12 treatment" by Caterini et al. presents a comparative metabolomics study in the brains of mouse models carrying a heterozygous mutation in the transcription factor Tbx1. This mutation is contrasted with a chromosomal deficiency encompassing Tbx1, among other gene loci, known as Df1/+, which serves as a mouse model for 22q11 microdeletion syndrome. The primary and most significant finding of the study is that Tbx1 heterozygosity alone induces broad metabolomic alterations in the entire brain parenchyma, despite Tbx1 expression being confined to vascular endothelial cells. The authors leverage this observation to investigate the effects of dietary supplementation with vitamin B12, which alters the metabolome in a manner interpreted by the authors as rescuing or reversing the Tbx1 heterozygosity phenotype. This study holds promise for understanding the individual gene contributions to the penetrant behavioral phenotypes observed in Df1/+ and 22q11 affected subjects. This potential arises from the clear and consequential metabolic phenotypes described, notably the accumulation of methylmalonic acid.

      However, despite the intriguing metabolic phenotypes, there are significant issues hindering incontrovertible conclusions. *

      Response to Reviewer #2

      Major comments

      1. Despite the intriguing metabolic phenotypes, there are significant issues hindering incontrovertible conclusions. Chief among these problems is the experimental design's nature, where the effects of genotype and a pharmacological intervention, vitamin B12, are assessed. The current design overlooks the effects of vitamin B12 on wild-type animals in metabolic and behavioral measures, thus precluding the attribution of the effects of vitamin B12 to a rescue. See response to Reviewer 1 (point 2) who made the same criticism. This group is now included in the data analysis of the relevant experiments.

      *An alternative explanation, consistent with the measurements, is that vitamin B12 modifies metabolites and transcripts irrespective of genotype. A suggestion of this possibility is the observed effect of B12 lowering glutamate levels in Tbx1 mutant tissue below those in wild-type brain tissue (Fig. 4C). *

      This might be true for some metabolites. Indeed, we found 5 metabolites that responded similarly to vB12 in both WT and Tbx1 +/- mice. In contrast, three metabolites responded to vB12 in both WT and Tbx1+/- mice, but the response was more pronounced in Tbx1+/- mice. Finally, a group of eight metabolites was altered exclusively in Tbx1+/- mice after vB12 treatment, including inosine, glutamate and short-chain fatty acids (SCFAs), Figure 4 and Supplementary Figure 6. Thus, overall, our data suggest that with only a few exceptions, the metabolic response to vB12 treatment is genotype-dependent.

      • *

      This experimental design issue is exacerbated by the multitude of analytes measured by metabolomics, all collectively assumed to change as part of a common genotype-B12 interaction mechanism. This interpretation is feasible only if none of the analytes were to respond to B12 in wild-type animals.

      • *

      As specified above, the response to vB12 was genotype-dependent. The inclusion of the vB12-treated WT dataset should address this point.

      A second major issue arises from the assertion that Tbx1 is exclusively expressed in mouse brain endothelial cells and not in brain parenchyma. A significant unresolved question is how a gene expressed solely in endothelial cells can alter the brain parenchyma metabolome and transcriptome. This issue remains unaddressed and is not sufficiently discussed. If this assertion holds true, then the observations bear great importance in understanding how Df1/+ causes brain phenotypes and, by extension, in human 22q11.

      There are quite a lot of published data from the mouse demonstrating the brain endothelial-specific expression of Tbx1 and the lack of expression in other brain cell types. These include studies using reporter genes (Paylor, 2006; Cioffi, 2014), Tbx1Cre based cell fate mapping (Cioffi, 2014, Cioffi, 2022) and single cell whole genome transcriptions (Ximerakis et al., 2019); (https://portals.broadinstitute.org/single_cell/study/aging-mouse-brain). All are cited in the manuscript.

      HOW the mutation of Tbx1 alters the brain metabolome and transcriptome will be the object of future studies, Currently, we do not have any data. At the reviewer’s request, we have extended the discussion of this point in the revised manuscript (Discussion, para. 4).

      In this vein, the authors should consider that Tbx1 is not expressed in brain endothelial cells in humans and is minimally expressed in fetal astrocytes (see https://brainrnaseq.org/).

      https://brainrnaseq.org provides a tool to evaluate the gene expression in the fetal brain. The sequencing was performed on fetal human brain tissue after elective pregnancy termination (4wks-9wks, it is not clear). Our analysis focuses on adult mice, which may contribute to observed differences.

      Moreover, Yi et al. (2010) generated a gene expression atlas of human embryogenesis spanning from 4 to 9 weeks of gestational age, revealing downregulation of TBX1 during this timeframe. Conversely, in the normal adult human brain, TBX1 expression is identified in endothelial cells, as indicated in "The Human Brain Cell Atlas v1.0" presented for visualization and data mining through the Chan Zuckerberg Initiative’s CellxGene application, referring to the atlas ontology in Ding et al. (2016). * 3. A third major concern pertains to the general poor quality of the figures. Many figures appear to be directly exported from the software used for data analysis without proper curation. They are inadequately labeled, lack color codes to clarify differences (e.g., volcano plots), feature lettering fonts that are difficult to discern, and have lettering panels placed in awkward positions. Fig. 1 would benefit by the addition of a pathway diagram showing which metabolites are changing. Figure tables/spreadsheets either have sheets labeled in Italian or are empty. Collectively, the manuscript needs more careful data curation and presentation*.

      Many of the figures and tables have been modified with respect to the original manuscript and issues of clarity and quality have been improved where necessary.

      Other points for consideration are listed below. • The abstract results section does not mention the Df1 mutants at all, and overall the description of the results should be improved

      Corrected • The abstract would benefit from defining vB12 before using the abbreviation

      Corrected • The section of the Results describing MMA accumulation in the brain would benefit from

      • *explaining the choice of 1 month of age for terminal experiments and the choice to use whole brains (are there particularly brain regions suspected to be affected?), * The majority of animals were 2 months of age at sacrifice (age and sex of individual animals are indicated in Supplementary Table 1). Young adult mice were the object of the study for the reason described in the first paragraph of the Results section, namely “Human studies of brain metabolism have mainly been conducted on children and adolescent patients. Therefore, in order to determine whether similar anomalies were present in the mouse models, we performed our studies on young mice between 1 and 2 months of age (Dutta et al., 2016)”.

      This is also the age at which the behavioural phenotype has been demonstrated (Paylor et al., 2006), and therefore could, potentially be rescued by vB12 treatment. We do not have regional information pertaining to the adult brain.

      2) describing any sex effects for Tbx1 mutants (and clarifying what data points for Tbx1 animals correspond to which sex), and 3) including what sex was used for Df1 experiments.

      In preliminary experiment we analyzed males and females’ mice, before electing to use only males. To obtained reliable information about the impact of gender on metabolism and transcription we would have to use much larger numbers of animals. In Supplementary Table 1 pertaining to males and females are now indicated.

      • The authors demonstrate that vB12 rescues PPI but use no other behavioral paradigms. It is possible that these mutations and/or vB12 could be impacting anxiety-like behaviors or other behavioral phenotypes. By only including PPI, the authors limit the interpretation of the "rescue" of this phenotype by vB12. * Reduced PPI was the only behavioral anomaly identified in Tbx1+/- mice that were subjected to a standard battery of behavioral tests (Paylor et al., 2006). *

      * *Reduced PPI was the only behavioral anomaly identified in Tbx1+/- mice that were subjected to a standard battery of behavioral tests (Paylor et al., 2006).

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      This is a terrific paper looking at influences of Tbx1 heterozygosity on metabolic phenotypes in mice. A weakness is that the locus on effects of B12 is totally unclear--could be neurovascular or even peripheral, but correcting this weakness might include study of Tbx1 conditional mutants, beyond the scope of this study.

      • *

      Reviewer #3 (Significance (Required)):* good significance

      Only two minor suggestions. 'We selected to study primarily Tbx1 single gene mutants because it is the primary candidate disease gene". What is the basis for this statement? Mouse +/- mutants in Mrpl40, Txnrd2, ProhD, and probably others have shown brain phenotypes.*

      • *

      The basis for TBX1 being considered as the primary candidate disease gene is the finding of TBX1 point mutations in patients who have the full spectrum of clinical phenotypes associated with 22q11.2 deletion syndrome without the chromosomal deletion, namely, congenital heart defects, immune defects, facial dysmorphism, learning defects and developmental delay. Similarly, in the mouse, Tbx1 mutation recapitulates the phenotype observed in multigene deletion mutants, such as Df1/+ mice.

      We do not say (or think) that heterozygosity of other genes from del22q11.2 does not contribute to the disease, but mutations of other genes have not been found in individuals with a 22q11.2DS phenotype but without the chromosomal deletion.

      In the discussion, the authors could close the loop on low glutamine could result in lower gaba in inhibitory interneurons, and its correction with B12 could restore gaba levels.

      Discussion, para. 3. We thank the reviewer for comment. However, the GABA concentration is not altered in Tbx1 haploinsufficient brains; it is only upregulated by Vitamin B12. Therefore, this assumption may be very speculative. Due to differences in the release and reabsorption rates of the three compounds (glutamine, glutamate, and GABA), correctly evaluating the glutamine-glutamate cycle requires separating astrocytes from neurons. We have only discussed the upregulation of glutamate and the GABA response to Vitamin B12, which may counteract the excess of glutamate.

      1. __4__Description of analyses that authors prefer not to carry out Please include a point-by-point response explaining why some of the requested data or additional analyses might not be necessary or cannot be provided within the scope of a revision. This can be due to time or resource limitations or in case of disagreement about the necessity of such additional data given the scope of the study. Please leave empty if not applicable.

      Reviewer 1, point 11. The PPI decrease in the Tbx1+/- group appears to be driven by results from 3-4 mice. First, are those data statistical outliers?

      Second, experiments in the same mice would be more informative. Do PBS-treated mutants recover PPI if they are treated with vB12 and vice versa? If the authors are concerned about the age difference, they also should include age-dependent effects on PPI.

      We are unable to perform this experiment because, as stated in the manuscript, the mice were sacrificed at the end of the experiment and the brains preserved for histological analysis (not part of this study). The generation of mice for new experiments would take about one year. With all due respect, we do not believe that the data that would be obtained are sufficiently important to justify, ethically and economically, this work.

    1. Author response:

      We want to thank the reviewers for their constructive feedback.

      General

      The recall values of our method range between 78.6% for all urine cases to 83.3% for feces (and not between 70-80%, as stated by reviewer #2), with a mean precision of 85.6%. This is rather similar to other machine learning-based methods commonly used for the analysis of complicated behavioral readouts. For example, in the paper presenting DeepSqueak for analysis of mouse ultrasonic vocalizations (Coffey et al. DeepSqueak: a deep learning-based system for detection and analysis of ultrasonic vocalizations. Neuropsychopharmacol. 44, 859–868 (2019). https://doi.org/10.1038/s41386-018-0303-6), the recall values reported for both DeepSqueak, Mupet and Ultravox (Fig. 2c, f) are very similar to our method.

      We have analyzed and reported all the types of errors made by our methods, which are mostly technical. For example, depositions that overlap the mouse blob for too long till getting cold will be associated with the mouse and therefore will not be detected (“miss” events). These technical errors are not supposed to create a bias for a specific biological condition and, hence, shouldn’t interfere with the use of our method. A video showing all of the mistakes made by our algorithm on the test set was submitted (Figure 2-video 1).

      Below we will to relate to specific points and describe our plan to revise the manuscript accordingly.

      Detection accuracy

      a. It should be noted that when large urine spots are considered, our algorithm got 100% correct classification (Figure 2, supplement 1, panel b). However, small urine deposits are very similar to feces in their appearance in the thermal picture. In fact,  if the feces are not shifted, discrimination can be quite challenging even for human annotators. To demonstrate the accuracy of the proposed method relative to human annotators, we plan to compare its results with the accuracy of a second human annotator.

      b. As part of the revision, we plan to test general machine learning-based object detectors such as faster-RCNN or YOLO (as suggested by Reviewer 2) and compare them with our method.

      c. To check if our method may introduce bias to the results, we plan to check if the errors are distributed evenly across time, space, and genders.

      Design choices

      (A) The preliminary detection algorithm has several significant parameters. These are:

      a. Minimal temperature rise for detection: 1.1°C rise during 5 sec.

      b. Size limits of the detection: 2 - 900 pixels.

      c. Minimal cooldown during 40 sec: 1.1°C and at least half the rise.

      d. Minimal time between detections in the same location: 30 sec.

      We chose to use low thresholds for the preliminary detection to allow detection of very small urinations and to minimize the number of “miss” events, relying on the classifier to robustly reject false alarms. Indeed, we achieved a low rate of miss events: 5 miss events for the entire test set (1 miss event per ~90 minutes of video). We attribute these 5 “miss” events to partial occlusion of the detection by the mouse.

      To adjust the preliminary detection parameters to a new environment, one will need to calibrate these parameters in their own setup. Mainly, the size of the detection depends on the resolution of the video, and the cooldown rate might be affected by the material of the floor, as well as the room temperature.

      We plan to explore the robustness of these parameters in our setup and report the influence on the accuracy of the preliminary algorithm.

      (B) We chose to feed the classifier with 71 seconds of videos (11 seconds before the event and 60 seconds after it) as we wanted the classifier to be able to capture the moment of the deposition, the cooldown process, as well as urine smearing or feces shifting which might give an additional clue for the classification. In the revised paper we plan to report accuracy when using a shorter video for classification.

      Generability

      a. In the revised version, we plan to report the accuracy of the method used on a different strain of mice (C57), with a different arena color (white arena instead of black).

      Statistics

      a. In the revised paper, we will explain why we chose each time window for analysis. Also, we will report statistics for different time windows, as suggested by Reviewer 3.

      b. Unlike reviewer #2, we don’t think that the small difference in recall rate between urine and feces (78.6% vs. 83.3%, respectively) creates a bias between them. Moreover, we don’t compare the urine rate to the feces rate.

      c. In the revised manuscript we will explicitly report the precision scores, although they also appear in our manuscript in Fig. 2- Supplement 1b.

    1. 6.1.3 What Does “Big Data” Mean? One possible distinction between data science and statistics is the amount of data we’re working with. Technology coverage in the 2010s (and continuing to the present) made it hard to resist the idea that big data represents some kind of revolution that has turned the whole world of information and technology topsy-turvy. But is this really true? Does big data change everything? Business analyst Doug Laney suggested that three characteristics make big data different from what came before: volume, velocity, and variety. Volume refers to the sheer amount of data. Velocity focuses on how quickly data arrives as well as how quickly those data become “stale.” Finally, variety reflects the fact that there may be many different kinds of data. Together, these three characteristics are often referred to as the “three Vs” model of big data. Note, however, that even before the dawn of the computer age we’ve had a variety of data, some of which arrives quite quickly, and that can add up to quite a lot of total storage over time. Think, for example, of the large variety and volume of data that has arrived annually at Library of Congress since the 1800s! So, it is difficult to tell that big data is fundamentally a brand new thing. Furthermore, there are some concerns that we should exercise when it comes to big data. For example, when a data set gets to a certain size (into the range of thousands of rows), conventional tests of statistical significance are meaningless, because even the most tiny and trivial results are statistically significant. We’ll talk more about statistical significance later in the semester; for the time being, though, it suffices to say that statistical significance is how researchers have traditionally determined whether their results are important or not. If big data makes statistical significance more likely, then researchers who have access to more data will get more important results, whether or not that’s actually true in practical terms! Besides that, the quality and suitability of the data matters a lot: More data does not always mean better data.

      I find this part of this section. Will better help me understand about big data Business analyst Doug Laney suggested that three characteristics make big data different from what came before: volume, velocity, and variety. Volume refers to the sheer amount of data. Velocity focuses on how quickly data arrives as well as how quickly those data become “stale.” Finally, variety reflects the fact that there may be many different kinds of data. Together, these three characteristics are often referred to as the “three Vs” model of big data. Note, however, that even before the dawn of the computer age we’ve had a variety of data, some of which arrives quite quickly, and that can add up to quite a lot of total storage over time.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Reviewer 1:

      • Although ROC AUC is a widely used metric. Other metrics such as precision, recall, sensitivity, and specificity are not reported in this work. The last two metrics would help readers understand the model’s potential implications in the context of clinical research.

      In response to this comment and related ones by Reviewer 2, we have overhauled how we evaluate our models. In the revised version, we have removed Micro ROC-AUC, as this evaluation metric is hard to interpret in the recommender system setting. Instead, the updated version fully focuses on two metrics: ROC-AUC and Precision at 1 of the negative class, both computed per spectrum and then averaged (equivalent to the instance-wise metrics in the previous version of the manuscript). We believe these metrics best reflect the use-case of AMR recommenders. In addition, we have kept (drug-)macro ROC-AUC as a complementary evaluation metric. As the ROC-AUC can be decomposed into sensitivity and specificity (at different prediction probability thresholds), we have added a ROC curve where sensitivity and specificity are indicated in Figure 8 (Appendices).

      • The authors did not hypothesize or describe in any way what an acceptable performance of their recommender system should be in order to be adopted by clinicians.

      In Section 4.3, we have extended our experiments to include a baseline that represents a “simulated expert”. In short, given a species, an expert can already make some best guesses as to what drugs will be effective or not. To simulate this, we count resistance frequencies per species and per drug in the training set, and use this as predictions of a “simulated expert”.

      We now mention in our manuscript that any performance above this level results in a real-world information gain for clinical diagnostic labs.

      • Related to the previous comment, this work would strongly benefit from the inclusion of 1-2 real-life applications of their method that could showcase the benefits of their strategy for designing antibiotic treatment in a clinical setting.

      While we think this would be valuable to try out, we are an in silico research lab, and the study we propose is an initial proof-of-concept focusing on the methodology. Because of this, we feel a real-life application of the model is out-of-scope for the present study.

      • The authors do not offer information about the model features associated with resistance. This information may offer insights about mechanisms of antimicrobial resistance and how conserved they are across species.

      In general, MALDI-TOF mass spectra are somewhat hard to interpret. Because of a limited body of work analyzing resistance mechanisms with MALDI-TOF MS, it is hard to link peaks back to specific pathways. For this reason, we have chosen to forego such an analysis. After all, as far as we know, typical MALDI-TOF MS manufacturers’ software for bacterial identification also does not provide interpretability results or insights into peaks, but merely gives an identification and confidence score.

      However, we do feel that the whole topic revolving around “the degree of biological insight a data modality might give versus actual performance and usability” merits further discussion. We have ultimately decided not to include a segment in our discussion section as it is hard to discuss this matter concisely.

      • Comparison of AUC values across models lacks information regarding statistical significance. Without this information it is hard for a reader to figure out which differences are marginal and which ones are meaningful (for example, it is unclear if a difference in average AUC of 0.02 is significant). This applied to Figure 2, Figure 3, and Table 2 (and the associated supplementary figures).

      To make trends a bit more clear and easier to discern, in our revised manuscript, all models are run for 5 replicates (as opposed to 3 in the previous version).

      There is an ongoing debate in the ML community whether statistical tests are useful for comparing machine learning models. A simple argument against them is that model runs are typically not independent from each other, as they are all trained on the same data. The assumptions of traditional statistical tests are therefore violated (t-test, Wilcoxon test, etc.). With such tests statistical significance of the smallest differences can simply be achieved by increasing the number of replicates (i.e. training the same models more times).

      More complicated but more appropriate statistical tests also exist, such as the 5x2 cross-validated t-test of Dietterich: “Approximate statistical tests for comparing supervised classification learning algorithms”, Neural computation 1998. However, these tests are typically not considered in deep learning, because only 10% of the data can be used for training, which is practically not desirable. The Friedman test of Demšar "On the appropriateness of statistical tests in machine learning." Workshop on Evaluation Methods for Machine Learning in conjunction with ICML. 2008., in combination with posthoc pairwise tests, is still frequently used in machine learning, but that test is only applicable in studies where many datasets are tested.

      For those reasons, most deep learning papers that only analyse a few datasets typically do not consider any statistical tests. For the same reasons, we are also not convinced of the added value of statistical tests in our study.

      • One key claim of this work was that their single recommender system outperformed specialist (single species-antibiotic) models. However, in its current status, it is not possible to determine that in fact that is the case (see comment above). Moreover, comparisons to species-level models (that combine all data and antibiotic susceptibility profiles for a given species) would help to illustrate the putative advantages of the dual branch neural network model over species-based models. This analysis will also inform the species (and perhaps datasets) for which specialist models would be useful to consider.

      We thank the reviewer for this excellent suggestion. In our new manuscript, we have dedicated an entire section of experiments to testing such species-specific recommender models (Section 4.2). We find that species-specific recommender systems generally outperform the models trained globally across all species. As a result, our manuscript has been majorly reworked.

      • Taking into account that the clustering of spectra embeddings seemed to be species-driven (Figure 4), one may hypothesize that there is limited transfer of information between species, and therefore the neural network model may be working as an ensemble of species models. Thus, this work would deeply benefit from a comparison between the authors' general model and an ensemble model in which the species is first identified and then the relevant species recommender is applied. If authors had identified cases to illustrate how data from one species positively influence the results for another species, they should include some of those examples.

      See the answer to the remark above.

      • The authors should check that all abbreviations are properly introduced in the text so readers understand exactly what they mean. For example, the Prec@1 metric is a little confusing.

      See the answer to a remark above for how we have overhauled our evaluation metrics in the revised version. In addition, in the revised version, we have bundled our explanations on evaluation metrics together in Section 3.2. We feel that having these explanations in a separate section will improve overall comprehensibility of the manuscript.

      • The authors should include information about statistical significance in figures and tables that compare performance across models.

      See answer above.

      • An extra panel showing species labels would help readers understand Figure 11.

      We have tried to play around with including species labels in these plots, but could not make it work without overcrowding the figure. Instead, we have added a reminder in the caption that readers should refer back to an earlier figure for species labels.

      • The authors initially stated that molecular structure information is not informative. However, in a second analysis, the authors stated that molecular structures are useful for less common drugs. Please explain in more detail with specific examples what you mean.

      In the previous version of our manuscript, we found that one-hot embedding-based models were superior to structure-based drug embedders for general performance. The latter however, delivered better transfer learning performance.

      In our new experiments however, we perform early stopping on “spectrum-macro” ROC-AUC (as opposed to micro ROC-AUC in the previous version). As a consequence, our results are different. In the new version of our manuscript, Morgan Fingerprints-based drug embedders generally outperform others both “in general” and for transfer learning. Hence, our previously conflicting statements are not applicable to our new results.

      • The authors may want to consider adding a few sentences that summarize the 'Related work' section into the introduction, and converting the 'Related work' section into an appendix.

      While we acknowledge that such a section is uncommon in biology, in machine learning research, a “related work” section is very common. As this research lies on the intersection of the two, we have decided to keep the section as such.

      Reviewer 2:

      • Are the specialist models re-trained on the whole set of spectra? It was shown by Weis et al. that pooling spectra from different species hinders performance. It would then be better to compare directly to the models developed by Weis et al, using their splitting logic since it could be that the decay in performance from specialists comes from the pooling. See the section "Species-stratified learning yields superior predictions" in https://doi.org/10.1038/s41591-021-01619-9.

      We train our “specialist” (or now-called “species-drug classifiers”) just as described in Weis et al.: All labels for a drug are taken, and then subsetted for a single species. We have clarified this a bit better in our new manuscript. The text now reads:

      “Previous studies have studied AMR prediction in specific species-drug combinations. For this reason, it is useful to compare how the dual-branch setup weighs up against training separate models for separate species and drugs. In Weis et al. (2020b), for example, binary AMR classifiers are trained for the following three combinations: (1) E. coli with Ceftriaxone, (2) K. pneumoniae with Ceftriaxone, and (3) S. aureus with Oxacillin. Here, such "species-drug-specific classifiers" are trained for the 200 most-common combinations of species and drugs in the training dataset.

      • Going back to Weis et al. a high variance in performance between species/drug pairs was observed. The metrics in Table 2 do not offer any measurement of variance or statistical testing. Indeed, some values are quite close e.g. Macro AUROC of Specialist MLP-XL vs One-hot M.

      See our answer to a remark of Reviewer 1 for our viewpoint on statistical significance testing in machine learning.

      • Since this is a recommendation task, why were no recommendation system metrics used, e.g. mAP@K, mRR, and so (apart from precision@1 for the negative class)? Additionally, since there is a high label imbalance in this task (~80% negatives) a simple model would achieve a very high precision@1.

      See the answer to a remark above for how we have overhauled our evaluation metrics in the revised version. In addition, in choosing our metrics, we wanted metrics that are both (1) appropriate (i.e. recommender system metrics), but also (2) easy to interpret for clinicians. For this reason, we have not included metrics such as mAP@K or mRR. We feel that “spectrum-macro” ROC-AUC and precision@1 cover a sufficiently broad evaluation set of metrics but are easy enough to interpret.

      • A highly similar approach was recently published (https://doi.org/10.1093/bioinformatics/btad717). Since it is quite close to the publication date of this paper, it could be discussed as concurrent work.

      We thank the reviewer for bringing our attention to this study. We have added a paragraph in our revised version discussing this paper as concurrent work.

      • It is difficult to observe a general trend from Figure 2. A statistical test would be advised here.

      See our answer to a remark of Reviewer 1 for our viewpoint on statistical significance testing in machine learning.

      • Figure 5. UMAPs generally don't lead to robust quantitative conclusions. However, the analysis of the embedding space is indeed interesting. Here I would recommend some quantitative measures directly using embedding distances to accompany the UMAP visualizations. E.g. clustering coefficients, distribution of pairwise distances, etc.

      In accordance with this recommendation, we have computed many statistics on the MALDI-TOF spectra embedding spaces. However, we could not come up with any statistic that illuminated us more than the visualization itself. For this reason, we have kept this section as is, and let the figure speak for itself.

      • Weis et al. also perform a transfer learning analysis. How does the transfer learning capacity of the proposed models differ from those in Weis et al?

      Weis et al. perform experiments towards “transferability”, not actual transfer learning. In essence, they use a model trained on data from one diagnostic lab towards prediction on data from another. However, they do not conduct experiments to learn how much data such a pre-trained classifier needs to fine-tune it for adequate performance on the new diagnostic lab, as we do. The end of Section 4.4 discusses how our proposed models specifically shine in transfer learning. The paragraph reads:

      “Lowering the amount of data required is paramount to expedite the uptake of AMR models in clinical diagnostics. The transfer learning qualities of dual-branch models may be ascribed to multiple properties. First of all, since different hospitals use much of the same drugs, transferred drug embedders allow for expressively representing drugs out of the box. Secondly, owing to multi-task learning, even with a limited number of spectra, a considerable fine-tuning dataset may be obtained, as all available data is "thrown on one pile".”

    1. Our IEP team is careful to say to our student's parents that it is our recommendation that their student is in the correct placement for the upcoming year. Our unit classes include students that have severe intellectual disabilities. When we have our general ed teachers talk about what standards the general ed students are currently working on, the parent's are sometimes shocked.

      At our school, we haven't had any issues with failing to assemble an appropriate IEP team. Our administrators do a great job of ensuring that all IEP team members know what is expected of them. We have sped meetings weekly to ensure that we stay on top of IEP due dates and re-evaluations.

      I believe that there is less oversight on the substantive side of IEP implementation accountability. In two years I've never had my progress monitoring checked by anybody. I've also never had parents inquire as to how their child is progressing towards their IEP goals.

      Regarding implementation errors, I don't think I've ever seen an administrator verify the services provided by different related service providers such as speech or OT/PT. It may be assumed that these related services are professionals and that they will do their duty to maintain fidelity to the IEP.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer1

      I recommend combining Figures 1 and 3 either before or after the results shown in Figure 2, so the reader's expectation for quantification is immediately satisfied.

      Response:

      Thank you for your suggestion. In the revised manuscript, images of GFP::POP-1 in compound mutants are moved to Figure 3. The schematic diagram of the gonad (previously Fig. 1A) and GFP::POP-1 images in wild type are kept in Figure 1, as they are described in Introduction.

      Major comments:

      Delving into the figure legend of Fig. 3 and the normalization procedure described in the Methods "Quantification of POP-1 asymmetry in the Z1 and Z4 division" raised concerns. The method therein described is overly complicated but also neglects background subtraction. My first question about this method: what range of distances between daughters is measured in Z? These distances are not discussed in absolute terms, and this is important for our understanding of how much correction for tissue depth might be necessary, as L1s are very thin.

      To check my understanding, the authors use as a control a nuclear-localized GFP driven in the somatic gonad precursors in otherwise wild-type worms by the sys-1 promoter. They observe that the regression on a log scale of anterior:posterior (and vice versa) Z1 and Z4 daughter fluorescence over the distance between the daughters in the Z plane is fit by y = −0.034x + 0.0148, which is practically a slope of 0 and an intercept of 0. This means that they observed an ~1:1 ratio (as log(1)=0) of fluorescence in the anterior and posterior daughters of otherwise wild-type worms, at least across the range of very small X values of relevant distances between daughters (again, the relevant range of distances really matters and should be presented), making the normalization seem unnecessary.

      Response:

      Normalization is essential to compare POP-1 signals between daughter cells since the signal intensities depend on the depth of cells. Depth differences between SGP daughter cells range from 0 to 7.5 micrometers. For example, when we input the maximum difference (7.5) into our correction equation y = −0.034x + 0.0148 (the logarithmically transformed linear regression equation), we get:

      y = −0.034 * 7.5 + 0.0148 = -0.2402

      To interpret this on the original scale, we apply the inverse logarithmic transformation:

      10^(-0.2402) ≈0.575

      This result indicates that even if GFP::POP-1 expression is the same in both cells, the depth difference alone can cause approximately a 1.74-fold (1/0. 575) difference in fluorescence intensity.

      Similarly, if we use a median value of 3.5 micrometers as the depth difference, we get: y = -0.1042. After the inverse logarithmic transformation, this corresponds to a 0.787 or 1.27 (1/0.787) fold difference in fluorescence intensity.

      Without normalization, we risk misinterpreting such differences in expression levels when in reality the expression is the same. Conversely, actual differences in GFP::POP-1 signal could be masked or overestimated due to the depth effect.

      In the revised manuscript, examples of depth differences between SGP daughters are shown in Fig. 2S which is added in response to the comment of reviewer 2, asking images of lin-17 mom-5 animals.

      In the revised manuscript, we explained the depth effects in the legend of Fig. 3 as follows.

      "Since SGP daughter cells are often present at distinct focal planes, we normalized the depth effects on fluorescence intensities (see Materials and Methods for details) for the quantification shown in (B). The images in (A) and (C) are from animals with SGP daughters at similar depths."

      Then based on this regression and 95% CI, the authors predict values that reflect true equivalence of fluorescence of POP-1::GFP in the two SGP daughters, compare the observed values to these predictions, and ultimately display in violin plots these differences of observed and expected. Correct?

      Response:

      Yes, your understanding is correct.

      Is this complicated treatment the only way to detect differences in polarity of anterior and posterior daughters of Z1 and Z4? What happens if the authors measure GFP::POP-1 and calculate the following?

      Z1p(MGV - background control from same focal plane)


      Z1a(MGV - background control from same focal plane)

      If this straightforward analysis shows asymmetric signal in the control that is made symmetrical or reversed in the mutants, the hypothesis would seem to be supported with a much more straightforward method. Samples could be analyzed separately in two bins by worm body position, which affects which cell is superficial in the sample. As it is, the Figure 3 Y axis label is hard to interpret without reading the methods at length, diminishing its impact.

      Response:

      Thank you for the suggestion. Your suggested calculation would be simple if we could assume that control signals (sys-1p::GFP::NLS or sys-1p::GFP::POP-1 in the same wild-type cell) on the same focal plane are the same among animals. However, since there are apparent variations in expression levels among individuals, your suggested method is not appropriate for evaluating differences in sys-1p::GFP::POP-1 intensities between the SGP daughter cells of the same animal.

      Missing control: The sys-1 promoter-driven NLS-tagged fluorescent protein as a control to compare to the GFP::POP-1 is analyzed only in the wild-type, and apparently not in the mutants under consideration. Phillips et al. (2007) show that sys-1p transcriptional activity is equivalent between the SGP daughters in wild-type worms, but neither those results nor the method of normalizing to a sys-1p::GFP::NLS signal in this paper address the question of whether sys-1 promoter activity is equivalent in these cells in mutants upstream in the Wnt pathway. If the current method of normalization is to be used, it seems important to normalize to the sys-1p::GFP::NLS regression in each mutant background.

      Response:

      Thank you for your suggestion. We used sys-1p::GFP::NLS as a control to normalize depth effects, which should be the same across all genotypes because the GFP molecules in SGPs should be equally distributed between SGP daughter cells, not because sys-1 promoter activities are similar among them. Since SGP daughters divide within a short time (about 2 hours), it is likely that the fluorescence of newly synthesized GFP (maturation time of about 1 hour) in SGP daughters is neglectable compared to GFP inherited from the SGP cells. Similarly, sys-1p::GFP::POP-1 signals in SGP daughters reflect the distribution of GFP::POP-1 from SGPs rather than the transcriptional activities of the sys-1 promoter in the daughter cells. sys-1p::GFP::POP-1 or sys-1p::GFP::SYS-1 has been widely used to evaluate polarity of asymmetric divisions in a number of studies, none of which consider transcriptional differences of the sys-1 promoter in the daughter cells.

      1. How was lin-17(mn589) generated? if this is the first report of this allele, full information on what the lesion is and how it was derived should to be reported.

      Response:

      Thank you for your question regarding the lin-17(mn589) allele. We would like to point out that the information about this allele is provided in the Methods section of the original manuscript as follows.

      "lin-17(mn589) (gifted by Mike Herman) carries a mutation in the seventh cysteine residue of the CRD domain (C104Y). mn589 exhibits 47% Psa phenotype (indicating T cell polarity defects)."

      The methods section lacks a description of how the mes-1 experiments were done, in terms of timing, duration, and temperature; mes-1(bn7) is a temperature sensitive allele.

      Response:

      Thank you for pointing out the lack of detailed methodology for the mes-1 experiments. The germless phenotype of mes-1 mutants is partial even at high temperatures. We have not performed temperature shifts to observe the phenotype. As per your suggestion, we added the following text to the Strains section:

      "mes-1(bn7) is a temperature-sensitive allele with higher penetrance of the germless phenotype at 25{degree sign}C than at 15{degree sign}C, and was grown at 22.5{degree sign}C. The germless phenotype of mes-1(bn7) was observed by the absence of the mex-5::GFP::PH signal through direct observation of epifluorescence."

      Minor comments

      1. The paper lacks a discussion of precedent in the literature for Wnt-independent Frizzled activity; this is a major finding that is being undersold in the current version of the manuscript.

      Response:

      Thank you very much for appreciating out finding. We have added the following paragraph to the Discussion section:

      "Wnt-independent functions of Frizzled receptors

      We have shown that lin-17/Fzd functions in a Wnt-independent manner to control SGP polarity, since the missing DTC phenotype of lin-17; cwn-2 and lin-17 mom-5 was completely rescued by ΔCRD-LIN-17. In addition, SGP polarity is normal in the quintuple Wnt mutant that has mutations in all the Wnt genes (Yamamoto et al., 2011). In seam cells, Wnt receptors including LIN-17/Fzd and MOM-5/Fzd appear to have Wnt-independent functions for cell polarization, as seam cells are still mostly polarized in the quintuple Wnt mutants, while they are strongly unpolarized in the triple receptor mutants (lin-17 mom-5 cam-1/Ror) (Yamamoto et al., 2011). In Drosophila, Fz/Fzd has been primarily considered to function Wnt-independently to coordinate planar cell polarity (PCP) between neighboring cells (Lawrence et al., 2007), though Fz function can still be regulated by Wnt, as PCP orientation can be directed by ectopically expressed Wnt proteins (Wu et al., 2013).

                In Drosophila, Fz regulates PCP by interacting with other PCP components including Van Gogh (Vang). In C. elegans, we found that vang-1/Vang does not appear to function with LIN-17/Fz, since most vang-1 single mutants and cwn-1 cwn-2 vang-1 triple mutants have two gonadal arms (215/216 and 58/58, respectively). As Fz interacts with Disheveled (DSH) in Drosophila PCP regulation, in C. elegans, the Disheveled homologs DSH-2 and MIG-5 regulate SGP polarity (Phillips et al., 2007). Therefore, LIN-17 might regulate the DSH homologs in a Wnt-independent manner. "
      

      Added Reference:

      1. Lawrence PA, Struhl G, Casal J. (2007). Planar cell polarity: one or two pathways? Nat Rev Genet. 8, 555-563.
      2. Wu, J., Roman, A.C., Carvajal-Gonzalez, J.M., & Mlodzik, M. (2013). Wg and Wnt4 provide long-range directional input to planar cell polarity orientation in Drosophila. Nature Cell Biology, 15(9), 1045-1055.

      Important: I think "Fig. 6 Germ cell independent migration of germ cells" title is a typo; should be "Germ cell independent migration of DTCs"

      Response:

      Thank you for pointing out the typo. We corrected it in the revised manuscript.

      This is a very important experiment! I think a greater description of the mes-1 phenotype would be helpful, since loss of germline was not 100% penetrant in mes-1(bn7) hermaphrodites in Strome et al., 1995. The legend says "Germless mes-1 phenotype was confirmed by the absence of the mex-5::GFP::PH signal in the gonad." Consider adding a few sentences to the results (or methods, from which the mes-1 experiments are currently missing) describing that only mes-1 animals that lacked germline fluorescence were analyzed for DTC migration.

      Response:

      Thank you for providing the context. To address the concerns, we made the following changes to our manuscript:

      1. In the Results section, we revised the sentence "We found that 84% of DTCs (n = 90) in germless mes-1 animals..." to "Among mes-1 animals that lack germ cells, we found 84% of DTCs (n = 90)...".
      2. We also modified the sentence "We noticed that some germless mes-1 animals..." to "We noticed that some mes-1 animals that lack germ cells...".

      Please correct "secreting the Notch ligand LAG-2" this is a membrane-bound, not secreted ligand

      Response:

      Thank you for your comment. In the revised manuscript, we modified the relevant sentence in the Introduction section as follows:

      "Firstly, DTCs function as niche cells for germline stem cells, inhibiting their entry into meiosis by expressing the Notch ligand LAG-2 (Henderson et al., 1994)."

      Fig 1. The qualitative loss of polarity would be better depicted with

      a grayscale image instead of green-on-black.

      Response:

      Thank you for your suggestion. The GFP::POP-1 images are raw images of the green channel of the confocal microscopy. We believe that SGP polarity is clearly depicted by them.

      Fig. 3 the presentation of these violin plots is confusing. The central text that reads "normal polarity, loss of polarity, reversed polarity" with arrows looks like a second Y axis label attached to the Z4 plot. I recommend rearranging. Consider shading the top, bottom, and central regions and explaining the meaning of the shading in the legend.

      Response:

      Thank you for your suggestions regarding the presentation of Figure 3. In response to your feedback, we have made the following modifications:

      First, we moved the text and arrows from the center to the right side of the figure, creating a clearer layout. As you recommended, we applied shading to the top, bottom, and central regions of the violin plots. Additionally, to explain the meaning of the shading, we added a new explanation to the figure legend. Specifically, we included the following text:

      "Values within the 95% CI (between the red lines; light green regions) indicate symmetric localization. Values below the lower red line (light blue regions) indicate reversed localization, while values above the upper red line (light red regions) indicate normal localization."

      We applied the same modification to Supplemental Fig. 1.

      Reviewer 2

      Major comments

      1- Are the effects of combining the different Wnts with the lin-17 allele specific to the n3091 allele? It would be important to test another allele, for example the sy277 allele has a similar phenotype and is available at CGC. A null would be even better if it is viable. Alternatively, lin-17(RNAi) could instead be used if efficient enough. This is important since the n3091 allele could differentially alter the binding to the various Wnts, resulting in their distinct phenotypes in that background. However, these distinct phenotypes may not be relevant in a wild-type context.

      Response:

      Thank you for your insightful comment. The lin-17(n3091) allele contains a nonsense mutation at the 35th codon, located between the second and third cysteine residues in the CRD domain (Wnt binding domain) (Sawa et al 1996). Therefore, it is highly unlikely that the N-terminal protein of 34 amino acids produced in lin-17(n3091) can bind to Wnts. In the revised manuscript, we added the missing-DTC phenotype of lin-17(n671) cwn-2 animals, which show a similar phenotype to lin-17(n3091) cwn-2. n671 is a reference allele in WormBase and has a nonsense mutation. Although sy277 has a deletion in the N-terminal region, its phenotype is weaker than that of n3091 and n671 (Sawa et al 1996).

      In the revised manuscript, we described lin-17(n671) cwn-2, in the Table 1, Table S1 and added the following sentence.

      "We observed a similar phenotype in lin-17(n671); cwn-2 double mutants, confirming that this genetic interaction is not allele-specific."

      2- In the lin-17; mom-5 double mutant which lacks DTCs, are Z1 and Z4 there but they do not express DTC markers, or are they never born? A lineage analysis should be presented. Also, are Z2 and Z3 still there on their own? Please show images.

      Response:

      Thank you for your comments. We quantified sys-1p::GFP::POP-1 signals in Z1 and Z4 daughter cells of lin-17 mom-5 and have not observed any animals lacking Z1, Z4 or germ cells. In the revised manuscript, as Fig. S2, we added images of sys-1p::GFP::POP-1 localizations in SGP daughters, along with germ cells in lin-17 mom-5 as well as in lin-17 cwn-1 egl-20 cwn-2, both of which were not shown in the original manuscript. In response to Reviewer 1's comment, we also included examples of depth effects on fluorescence intensities in Fig. S2 with images of different focal planes.

      Fig. S2 is quoted it at the end of the following sentence.

      "Then, we quantified the ratios (on a logarithmic scale) of sys-1p::GFP::POP-1 signal intensities proximal to distal daughter cells in various genotypes (Fig. 3A and Fig. S2)."

      The loss of polarity phenotype of lin-17 mom-5 has been described in Phillips et al. We missed to cite this in the original manuscript. We added the citation in the revised manuscript.

      "These asymmetries were strongly disrupted and weakly affected in lin-17 mom-5 double and lin-17 single mutants, respectively, as described previously (Phillips et al., 2007; Siegfried et al., 2004)."

      Minor comments

      1- The term "mirror-symmetry" is redundant. Consider using "symmetry"

      or "symmetrical polarity".

      Response:

      As noted in the cross-comment by Reviewer 1, we believe that "mirror-symmetry" is the appropriate term.

      We think that "symmetry" implies the same lineage, whereas the relationship between the Z1 and Z4 lineages is not. "Mirror symmetry" was also used in Herman & Horvitz (1994) to describe the defect in the F lineage in lin-44/Wnt mutants as follows.

      "we observed division patterns that were mirror symmetric to those of the wild type (Fig. 2). One plausible explanation is that the polarity of the first asymmetric cell division was reversed, causing the polarities of all subsequent asymmetric cell divisions also to be reversed."

      2- "... they are permissively pushed distally by germ cells while proliferating" is confusing as it is unclear what proliferating cell you are referring to - germ cells or the DTC? proliferating? sense. Replace by: "they are pushed distally by proliferating germ cells"

      Response:

      Thank you for your helpful comment. We agree with your suggestion and modify the sentence as follows:

      Original: "... they are permissively pushed distally by germ cells while proliferating" Revised: "... they are pushed distally by proliferating germ cells"

      3- Fig. 2 is cited in the text before Fig. 1.

      Response:

      Thank you for pointing this out. Figure1 is mentioned in the Introduction before Figure 2 is referenced in the Result section in the original manuscript. We think the reviewer might be confused, as the POP-1 localization defect was shown in Figure 1. In response to the reviewer 1's comment, we moved the POP-1 localization images of the compound mutants to Figure 3. In addition, we noticed that in the original manuscript, Figure 1B was mentioned before Figure 1A in the Introduction. Therefore, we have modified the sentences in the Introduction.

      The original sentence was:

      "In the gonad, at the L1 stage, somatic gonadal precursor cells (SGPs), Z1 and Z4 have LH and HL polarity, respectively (Siegfried et al., 2004) (Fig. 1B). This mirror-symmetric polarity creates their mirror-symmetric lineages producing distal tip cells (DTCs) from the distal daughters (Z1.a and Z4.p) (Fig. 1B)."

      The revised sentence now reads:

      "In the gonad, at the L1 stage, somatic gonadal precursor cells (SGPs), Z1 and Z4 have LH and HL polarity, respectively, creating their mirror-symmetric lineages producing distal tip cells (DTCs) from the distal daughters (Z1.a and Z4.p) (Siegfried et al., 2004) (Fig. 1A and 1B)."

      4- The results also suggest that MOM-5/Frizzled might be the receptor for Wnts regulating DTC production, as lin-17 mom-5 double mutants completely lack DTCs." Table 1 results rather suggest that lin-17 and mom-5 are the two frizzled receptor involved in DTC specification and that they are largely redundant.

      Response:

      As the reviewer noted, lin-17 and mom-5 function redundantly in DTC specification (SGP polarization). However, their functions are clearly different in terms of genetic interactions with Wnt genes (e.g. lin-17 cwn-2 but not mom-5 cwn-2 show the DTC-missing phenotype). We propose that MOM-5 but not LIN-17 functions as a receptor for Wnts.

  4. Aug 2024
    1. Second, one must perceive that an injustice has occurred. Many women find this difficult because they compare today to the past and can see that progress has been made (e.g., women can vote, women have more education than men, women frequently work outside the home).

      I've seen this a few times with people who do not want to identify as feminists. A lot of times their reasoning is that it is no longer necessary. They might say they believe men and women should be equal, but that they already are (at least in the United States), so they do not want to identify as a feminist. I think one reason this might occur is that people might not want to admit to themselves that any injustice is still occurring as this may be distressing. Also, like the text says, I've heard people use progress as proof that feminism should be "done" and men and women are now equal, ignoring the ways that we still aren't. Some of achievements of feminism in the past might be more obvious as they are directly and clearly written into the law (like earning the right to vote), but there are still so many ways that gender inequality persists today.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this study, the authors used a stopped-flow method to investigate the kinetics of substrate translocation through the channel in hexameric ClpB, an ATP-dependent bacterial protein disaggregase. They engineered a series of polypeptides with the N-terminal RepA ClpB-targeting sequence followed by a variable number of folded titin domains. The authors detected translocation of the substrate polypeptides by observing the enhancement of fluorescence from a probe located at the substrate's C-terminus. The total time of the substrates' translocation correlated with their lengths, which allowed the authors to determine the number of residues translocated by ClpB per unit time.

      Strengths:

      This study confirms a previously proposed model of processive translocation of polypeptides through the channel in ClpB. The novelty of this work is in the clever design of a series of kinetic experiments with an engineered substrate that includes stably folded domains. This approach produced a quantitative description of the reaction rates and kinetic step sizes. Another valuable aspect is that the method can be used for other translocases from the AAA+ family to characterize their mechanism of substrate processing.

      Weaknesses:

      The main limitation of the study is in using a single non-physiological substrate of ClpB, which does not replicate the physical properties of the aggregated cellular proteins and includes a non-physiological ClpB-targeting sequence. Another limitation is in the use of ATPgammaS to stimulate the substrate processing. It is not clear how relevant the results are to the ClpB function in living cells with ATP as the source of energy, a multitude of various aggregated substrates without targeting sequences that need ClpB's assistance, and in the presence of the co-chaperones.

      Indeed, we agree that our RepA-Titinx substrates are not aggregates but are model, soluble, substrates used to reveal information about enzyme catalyzed protein unfolding and translocation.  Our substrates are similar to RepA-GFP and GFP-SsrA used by multiple labs including Wickner, Horwich, Sauer, Baker, Shorter, Bukua, to name only a few.  The fact that “this is what everyone does” does not make the substrates physiological or the most ideal. However, this is the technology we currently have until we and others develop something better. In the meantime, we contend that  the results presented here do advance our knowledge on enzyme catalyzed protein unfolding

      Part of what this manuscript seeks to accomplish is presenting the development of a single-turnover experiment that reports on processive protein unfolding by AAA+ molecular motors, in this case, ClpB.  Importantly, we are treating translocation on an unfolded polypeptide chain and protein unfolding of stably folded proteins as two distinct reactions catalyzed by ClpB. If these functions are used to disrupt protein aggregates, in vivo, then this remains to be seen.

      We contend that processive ClpB catalyzed protein unfolding has not been rigorously demonstrated prior to our results presented here.  Avellaneda et al mechanically unfolded their substrate before loading ClpB (Avellaneda, Franke, Sunderlikova et al. 2020).  Thus, their experiment represents valuable observations reflecting polypeptide translocation on a pre-unfolded protein.  Our previous work using single-turnover stopped-flow experiments employed unstructured synthetic polypeptides and therefore reflects polypeptide translocation and not protein unfolding (Li, Weaver, Lin et al. 2015).  Weibezahn et al used unstructured substrates in their study with ClpB (BAP/ClpP), and thus their results represent translocation of a pre-unfolded polypeptide and not enzyme catalyzed protein unfolding (Weibezahn, Tessarz, Schlieker et al. 2004). 

      Many studies have reported the use of  GFP with tags or RepA-GFP and used the loss of GFP fluorescence to conclude protein unfolding.  However, such results do not reveal if ClpB processively and fully translocates the substrate through its axial channel.  One cannot rule out, even when trapping with “GroEL trap”, the possibility that ClpB only needs to disrupt some of the fold in GFP before cooperative unfolding occurs leading to loss of fluorescence.  Once the cooperative collapse of the structure occurs and fluorescence is lost it has not been shown that ClpB will continue to translocate on the newly unfolded chain or dissociate. In fact, the Bukau group showed that folded YFP remained intact after luciferase was unfolded (Haslberger, Zdanowicz, Brand et al. 2008).  Our approach, reported here, yields signal upon arrival of the motor at the c-terminus or within the PIFE distance thus we can be certain that the motor does arrive at the c-terminus after unfolding up to three tandem repeats of the Titin I27 domain.

      ATPgS is a non-physiological nucleotide analog.  However, ClpB has been shown to exhibit curious behavior in its presence that we and others, as the reviewer acknowledges, do not fully understand (Doyle, Shorter, Zolkiewski et al. 2007).  Some of the experiments reported here are seeking to better understand that fact.  Here we have shown that ATPgS alone will support processive protein unfolding. With this assay in hand, we are now seeking to go forward and address many of the points raised by this reviewer. 

      The authors do not attempt to correlate the kinetic step sizes detected during substrate translocation and unfolding with the substrate's structure, which should be possible, given how extensively the stability and unfolding of the titin I27 domain were studied before. Also, since the substrate contains up to three I27 domains separated with unstructured linkers, it is not clear why all the translocation steps are assumed to occur with the same rate constant.

      We assume that all protein unfolding steps occur with the same rate constant, ku.  We conclude that we are not detecting the translocation rate constant, kt, as our results support a model where kt is much faster than ku.  We do think it makes sense that the same slow step occurs between each cycle of protein unfolding.

      We have added a discussion relating our observations to mechanical unfolding of tandem repeats of Titin I27 from AFM experiments  (Oberhauser, Hansma, Carrion-Vazquez and Fernandez 2001). Most interestingly, they report unfolding of Titin I27 in 22 nm steps.  Using 0.34 nm per amino acids this yields ~65 amino acids per unfolding step, which is comparable to our kinetic step-size of 57 – 58 amino acids per step.

      Some conclusions presented in the manuscript are speculative:

      The notion that the emission from Alexa Fluor 555 is enhanced when ClpB approaches the substrate's C-terminus needs to be supported experimentally. Also, evidence that ATPgammaS without ATP can provide sufficient energy for substrate translocation and unfolding is missing in the paper.

      In our previous work we have used fluorescently labeled 50 amino acid peptides as substrates to examine ClpB binding (Li, Lin and Lucius 2015, Li, Weaver, Lin et al. 2015).  In that work we have used fluorescein, which exhibits quenching upon ClpB binding.  We have added a control experiment where we have attached alexa fluor 555 to the 50 amino acid substrate so we can be assured the ClpB binds close to the fluorophore.  As seen in supplemental Fig. 1 A  upon titration with ClpB, in the presence of ATPγS, we observe an increase in fluorescence from AF555, consistent with PIFE.  Supplemental Fig. 1 B shows the relative fluorescence enhancement at the peak max increases up to ~ 0.2 or a 20 % increase in fluorescence, due to PIFE, upon ClpB binding.   

      Further, peak time is our hypothesized measure of ClpB’s arrival at the dye. Our results indicate that the peak time linearly increases as a function of an increase in the number of folded TitinI27 repeats in the substrates which also supports the PIFE hypothesis. Finally, others have shown that AF555 exhibits PIFE and we have added those references.

      The evidence that ATPγS alone can support translocation is shown in Fig. 2 and supplemental Figure 1.  Fig. 2 and supplemental Figure 1 are two different mixing strategies where we use only ATPgS and no ATP at all.  In both cases the time courses are consistent with processive protein unfolding by ClpB with only ATPγS.

      Reviewer #2 (Public Review):

      Summary:

      The current work by Banwait et al. reports a fluorescence-based single turnover method based on protein-induced fluorescence enhancement (PIFE) to show that ClpB is a processive motor. The paper is a crucial finding as there has been ambiguity on whether ClpB is a processive or non-processive motor. Optical tweezers-based single-molecule studies have shown that ClpB is a processive motor, whereas previous studies from the same group hypothesized it to be a non-processive motor. As co-chaperones are needed for the motor activity of the ClpB, to isolate the activity of ClpB, they have used a 1:1 ratio ATP and ATPgS, where the enzyme is active even in the absence of its co-chaperones, as previously observed. A sequential mixing stop-flow protocol was developed, and the unfolding and translocation of RepA-TitinX, X = 1,2,3 repeats was monitored by measuring the fluorescence intensity with the time of Alexa F555 which was labelled at the C-terminal Cysteine. The observations were a lag time, followed by a gradual increase in fluorescence due to PIFE, and then a decrease in fluorescence plausibly due to the dissociation from the substrate allowing it to refold. The authors observed that the peak time depends on the substrate length, indicating the processive nature of ClpB. In addition, the lag and peak times depend on the pre-incubation time with ATPgS, indicating that the enzyme translocates on the substrates even with just ATPgS without the addition of ATP, which is plausible due to the slow hydrolysis of ATPgS. From the plot of substrate length vs peak time, the authors calculated the rate of unfolding and translocation to be ~0.1 aas-1 in the presence of ~1 mM ATPgS and increases to 1 aas-1 in the presence of 1:1 ATP and ATPgS. The authors have further performed experiments at 3:1 ATP and ATPgS concentrations and observed ~5 times increase in the translocation rates as expected due to faster hydrolysis of ATP by ClpB and reconfirming that processivity is majorly ATP driven. Further, the authors model their results to multiple sequential unfolding steps, determining the rate of unfolding and the number of amino acids unfolded during each step. Overall, the study uses a novel method to reconfirm the processive nature of ClpB.

      Strengths:

      (1) Previous studies on understanding the processivity of ClpB have primarily focused on unfolded or disordered proteins; this study paves new insights into our understanding of the processing of folded proteins by ClpB. They have cleverly used RepA as a recognition sequence to understand the unfolding of titin-I27 folded domains.

      (2) The method developed can be applied to many disaggregating enzymes and has broader significance.

      (3) The data from various experiments are consistent with each other, indicating the reproducibility of the data. For example, the rate of translocation in the presence of ATPgS, ~0.1 aas-1 from the single mixing experiment and double mixing experiment are very similar.

      (4) The study convincingly shows that ClpB is a processive motor, which has long been debated, describing its activity in the presence of only ATPgS and a mixture of ATP and ATPgS.

      (5) The discussion part has been written in a way that describes many previous experiments from various groups supporting the processive nature of the enzyme and supports their current study.

      Weaknesses:

      (1) The authors model that the enzyme unfolds the protein sequentially around 60 aa each time through multiple steps and translocates rapidly. This contradicts our knowledge of protein unfolding, which is generally cooperative, particularly for titinI27, which is reported to unfold cooperatively or utmost through one intermediate during enzymatic unfolding by ClpX and ClpA.

      We do not think this represents a contradiction.  In fact, our observations are in good agreement with mechanical unfolding of tandem repeats of Titin I27 using AFM experiments (Oberhauser, Hansma, Carrion-Vazquez and Fernandez 2001).  They showed that tandem repeats of TitinI27 unfolded in steps of ~22 nm.  Dividing 22 nm by 0.34 nm/Amino Acid gives ~65 amino acids per unfolding event.  This implies that, under force, ~65 amino acids of folded structure unfolds in a single step.  This number is in excellent agreement with our kinetic step-size of 65 AA/step. 

      Importantly, the experiments cited by the reviewer on ClpA and ClpX are actually with ClpAP and ClpXP.  We assert that this is an important distinction as we have shown that ClpA employs a different mechanism than ClpAP (Rajendar and Lucius 2010, Miller, Lin, Li and Lucius 2013, Miller and Lucius 2014).  Thus, ClpA and ClpAP should be treated as different enzymes but, without question, ClpB and ClpA are different enzymes.

      (2) It is also important to note that the unfolding of titinI27 from the N-terminus (as done in this study) has been reported to be very fast and cannot be the rate-limiting step as reported earlier(Olivares et al, PNAS, 2017). This contradicts the current model where unfolding is the rate-limiting step, and the translocation is assumed to be many orders faster than unfolding.

      Most importantly, the Olivares paper is examining ClpXP and ClpAP catalyzed protein unfolding and translocation and not ClpB.  These are different enzymes.  Additionally, we have shown that ClpAP and ClpA translocate unfolded polypeptides with different rates, rate constants, and kinetic step-sizes indicating that ClpP allosterically impacts the mechanism employed by ClpA to the extent that even ClpA and ClpAP should be considered different enzymes (Rajendar and Lucius 2010, Miller, Lin, Li and Lucius 2013).  We would further assert that there is no reason to assume ClpAP and ClpXP would catalyze protein unfolding using the same mechanism as ClpB as we do not think it should be assumed ClpA and ClpX use the same mechanism as ClpAP and ClpXP, respectively. 

      The Olivares et al paper reports a dwell time preceding protein unfolding of ~0.9 and ~0.8 s for ClpXP and ClpAP, respectively.   The inverse of this can be taken as the rate constant for protein unfolding and would yield a rate constant of ~1.2 s-1, which is in good agreement with our observed rate constant of 0.9 – 4.3 s-1 depending on the ATP:ATPγS mixing ratio.  For ClpB, we propose that the slow unfolding is then followed by rapid translocation on the unfolded chain where translocation by ClpB must be much faster than for ClpAP and ClpXP.  We think this is a reasonable interpretation of our results and not a contradiction of the results in Olivares et al. Moreover, this is completely consistent with the mechanistic differences that we have reported, using the same single-turnover stopped flow approach on the same unfolded polypeptide chains with ClpB, ClpA, and ClpAP (Rajendar and Lucius 2010, Miller, Lin, Li and Lucius 2013, Miller and Lucius 2014, Li, Weaver, Lin et al. 2015).

      (3) The model assumes the same time constant for all the unfolding steps irrespective of the secondary structural interactions.

      Yes, we contend that this is a good assumption because it represents repetition of protein unfolding catalyzed by ClpB upon encountering the same repeating structural elements, i.e. Beta sheets. 

      (4) Unlike other single-molecule optical tweezer-based assays, the study cannot distinguish the unfolding and translocation events and assumes that unfolding is the rate-limiting step.

      Although we cannot, directly, distinguish between protein unfolding and translocation we have logically concluded that protein unfolding is likely rate limiting. This is because the large kinetic step-size represents the collapse of ~60 amino acids of structure between two rate-limiting steps, which we interpret to represent cooperative protein unfolding induced by ClpB.  It is not an assumption it is our current best interpretation of the observations that we are now seeking to further test. 

      Reviewer #3 (Public Review):

      Summary:

      The authors have devised an elegant stopped-flow fluorescence approach to probe the mechanism of action of the Hsp100 protein unfoldase ClpB on an unfolded substrate (RepA) coupled to 1-3 repeats of a folded titin domain. They provide useful new insight into the kinetics of ClpB action. The results support their conclusions for the model setup used.

      Strengths:

      The stopped-flow fluorescence method with a variable delay after mixing the reactants is informative, as is the use of variable numbers of folded domains to probe the unfolding steps.

      Weaknesses:

      The setup does not reflect the physiological setting for ClpB action. A mixture of ATP and ATPgammaS is used to activate ClpB without the need for its co-chaperones, Hsp70. Hsp40 and an Hsp70 nucleotide exchange factor. This nucleotide strategy was discovered by Doyle et al (2007) but the mechanism of action is not fully understood. Other authors have used different approaches. As mentioned by the authors, Weibezahn et al used a construct coupled to the ClpA protease to demonstrate translocation. Avellaneda et al used a mutant (Y503D) in the coiled-coil regulatory domain to bypass the Hsp70 system. These differences complicate comparisons of rates and step sizes with previous work. It is unclear which results, if any, reflect the in vivo action of ClpB on the disassembly of aggregates.

      We agree with the reviewer, there are several strategies that have been employed to bypass the need for Hsp70/40 or KJE to simplify in vitro experiments.  Here we have developed a first of its kind transient state kinetics approach that can be used to examine processive protein unfolding.  We now seek to go forward with examining the mechanisms of hyperactive mutants, like Y503D, and add the co-chaperones so that we can address the limitations articulated by the reviewer.   In fact we already began adding DnaK to the reaction and found that DnaK induced ClpB to release the polypeptide chain (Durie, Duran and Lucius 2018).  However, the sequential mixing strategy developed here was needed to go forward with examining the impact of co-chaperones. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Line 1: I recommend changing the title of the paper to remove the terms that are not clearly defined in the text: "robust" and "processive". What are the Authors' criteria for describing a molecular machine as "robust" vs. "not robust"? A definition of processivity is given in equation 2, but its value for ClpB is not reported in the text, and the criteria for classifying a machine as "processive" vs. "non-processive" are not included. Besides, the Authors have previously reported that ClpB is non-processive (Biochem. J., 2015), so it is now clear that a more nuanced terminology should be applied to this protein. Also, Escherichia coli should be fully spelled out in the title.

      The title has been changed.  We have removed “robust” as we agree with the reviewer, there is no way to quantify “robust”.  However, we have kept “processive” and have added to the discussion a calculation of processivity since we can quantify processivity.  Importantly, the unstructured substrates used in our previous studies represent translocation and not protein unfolding.  here, on folded substrates, we detect rate-limiting protein unfolding followed by rapid translocation.  Thus, we report a lower bound on protein unfolding processivity of 362 amino acids. 

      Line 20: The comment about mitochondrial SKD3 should be removed. SKD3, like ClpB, belongs to the AAA+ family, and it is simply a coincidence that the original study that discovered SKD3 termed it an Hsp100 homolog. The similarity between SKD3 and ClpB is limited to the AAA+ module, so there are many other metazoan ATPases, besides SKD3, that could be called homologs of ClpB, including mitochondrial ClpX, ER-localized torsins, p97, etc.

      Removed.

      Lines 133-139. Contrary to what the authors state, it is not clear that the "lag-phase" becomes significantly shorter for subsequent mixing experiments (Figure 1E) perhaps except for the last one (2070 s). It is clear, however, that the emission enhancement becomes stronger for later mixes. This effect should be discussed and explained, as it suggests that the pre-equilibrations shorter than ~2000 sec do not produce saturation of ClpB binding to the substrate.

      We have added supplemental figure 2, which represents a zoom into the lag region.  This better illustrates what we were seeing but did not clearly show to the reader.  In addition, we address all three changes in the time courses, i.e. extend of lag, change in peak position, and the change in peak height. 

      Line 175. The hydrolysis rate of ATPgammaS in the presence of ClpB should be measured and compared to the hydrolysis rate with ATP/ATPgammaS to check if the ratio of those rates agrees with the ratio of the translocation rates. These experiments should be performed with and without the RepA-titin substrate, which could reveal an important linkage between the ATPase engine and substrate translocation. These experiments are essential to support the claim of substrate translocation and unfolding with ATPgammaS as the sole energy source.

      The time courses shown in figure 2 and supplemental Figure 1 are collected with only ATPgS and no ATP.  The time courses show a clear increase in lag and appearance of a peak with increasing number of tandem repeats of titin domains.  We do not see an alternate explanation for this observation other than ATPγS supports ClpB catalyzed protein unfolding and translocation.  What is the reviewers alternate explanation for these observations?

      We agree with the reviewer that the linkage of ATP hydrolysis to protein unfolding and translocation is essential and we are seeking to acquire this knowledge.  However, a simple comparison of the ratio of rates is not adequate. We contend that a complete mechanistic study of ATP turnover by ClpB is required to properly address this linkage and such a study is too substantial to be included here but is currently underway. 

      All that said, the statement on line 175 was removed since we do not report any ATPase measurements in this paper.

      Line 199: It is an over-simplification to state that "1:1 mix of ATP to ATPgammaS replaces the need for co-chaperones". This sentence should be corrected or removed. The ClpB co-chaperones (DnaK, DnaJ, GrpE) play a major role in targeting ClpB to its aggregated substrates in cells and in regulating the ClpB activity through interactions with its middle domain. ATPgammaS does not replace the co-chaperones; it is a chemical probe that modifies the mechanism of ClpB in a way that is not entirely understood.

      We agree with the reviewer.  The sentence has been modified to point out that the mix of ATP and ATPγS activates ClpB.

      Figure 3B, Supplementary Figure 5A. The solid lines from the model fit cannot be distinguished from the data points. Please modify the figures' format to clearly show the fits and the data points.

      Done.

      Lines 326, 329. It is not clear why the authors mention a lack of covalent modification of substrates by ClpB. AAA+ ATPases do not produce covalent modifications of their substrates.

      The issue of covalent modification was presented in the introduction lines 55 – 60 pointing out that much of what we have learned about protein unfolding and translocation catalyzed by ClpA and ClpX is from the observations of proteolytic degradation catalyzed by the associated protease ClpP.  However, this approach is not possible for ClpB/Hsp104 as these motors do not associate with a protease unless they have been artificially engineered to do so. 

      Lines 396-399. I am puzzled why the authors try to correlate the size of the detected kinetic step with the length of the ClpB channel instead of the size characteristics of the substrate.

      We are attempting to discuss/rationalize the observed large kinetic step-size which, in part, is defined by the structural properties of the enzyme as well as the size characteristics of the substrate.  We have attempted to clarify this and better discuss the properties of the substrate as well as ClpB.

      As I mentioned in the Public Review, it is essential to demonstrate that the emission increase used as the only readout of the ClpB position along the substrate is indeed caused by the proximity of ClpB to the fluorophore. One way to accomplish that would be to place the fluorophore upstream from the first I27 domain and determine if the "lag phase" in the emission enhancement disappears.

      Alexa Fluor 555 is well established to exhibit PIFE.  However, as in the response to the public review, we have included an appropriate control showing this in supplemental Fig. 1.

      Finally, the authors repetitively place their results in opposition to the study of Weibezahn et al. published in 2004 which first demonstrated substrate translocation by engineering a peptidase-associated variant of ClpB. It should be noted that the field of protein disaggregases has moved since the time of that publication from the initial "from-start-to-end" translocation model to a more nuanced picture of partial translocation of polypeptide loops with possible substrate slipping through the ClpB channel and a dynamic assembly of ClpB hexamers with possible subunit exchange, all of which may affect the kinetics in a complex way. However, the present study confirmed the "start-to-end" translocation model, albeit for a non-physiological ClpB substrate, and that is the take-home message, which should be included in the text.

      It is not clear to us that the field has “moved on” since Weibezahn et al 2004.  Their engineered construct that they term “BAP” with ClpP is still used in the field despite us reporting that proteolytic degradation is observed in the absence of ATP with that system  (Li, Weaver, Lin et al. 2015) and should, therefore, not be used to conclude processive energy driven translocation. The “partial translocation” by ClpB is also grounded in observations of partial degradation catalyzed by ClpP with BAP from the same group (Haslberger, Zdanowicz, Brand et al. 2008). It is not clear to us that the idea of subunit exchange leading to the possibility of assembly around internal sequences is being considered.  We do agree that this is an important mechanistic possibility that needs further interrogation. We agree with the reviewer, all these factors are confounding and lead to a more nuanced view of the mechanism.

      All that said, we have removed some of the opposition in the discussion.

      Reviewer #2 (Recommendations For The Authors):

      (1) It is assumed that the lag phase will be much longer than the phase in which we see a gradual increase in fluorescence, as the effect of PIFE is significant only when the enzyme is very close to the fluorophore. Particularly for RepA-titin3, the enzyme has to translocate many tens of nm before it is closer to the C-terminus fluorophore. However, in all cases, the lag time is lower or similar to the gradual increase phase (for example, Figure 3B). Could the authors explain this?

      The extent of the lag, or time zero until the signal starts to increase, is interpreted to indicate the time the motor moves from it’s initial binding site until it gets close enough to the fluorophore that PIFE starts to occur.  In our analysis we apply signal change to the last intermediate and dissociation or release of unfolded RepA-TitinX.  The increase in PIFE is not “all or nothing”.  Rather, it is starting to increase gradually.  Further, because these are ensemble measurements, and each molecule will exhibit variability in rate there is increased breadth of the peak due to ensemble averaging. 

      (2) Although the reason for differences in the peak position (for example, Figure 1E, 2B) is apparent, the reason for variations in the relative intensities has to be given or speculated.

      We have addressed the reason for the different peak heights in the revised manuscript.  It is the consequence of the fact that each substrate has slightly different fluorescent labeling efficiencies.  Thus, for each sample there is a mix of labeled and unlabeled substrates both of which will bind to ClpB but the unlabeled ClpB bound substrates do not contribute to the fluorescence signal, but will represent a binding competitor.  Thus, for low labeling efficiency there is a lower concentration of ClpB bound to fluorescent RepA-Titinx and for higher labeling efficiency there is higher concentration of ClpB bound to RepA-Titinx leading to an increased peak height.  RepA-Titin2 has the highest labeling efficiency and thus the largest peak height.

      Reviewer #3 (Recommendations For The Authors):

      The authors should make it clear that they and previous authors have used different constructs or conditions to bypass the physiological regulation of ClpB action by Hsp70 and its co-factors as mentioned above. In particular, the construct used by Avellaneda et al should be explained when they challenge the findings of those authors.

      Minor points:

      The lines fitting the experimental points are difficult or impossible to see in Figures 2B, 3B, and s5B.

      Fixed

      Typo bottom of p6 - "averge"

      Fixed

      Avellaneda, M. J., K. B. Franke, V. Sunderlikova, B. Bukau, A. Mogk and S. J. Tans (2020). "Processive extrusion of polypeptide loops by a Hsp100 disaggregase." Nature.

      Doyle, S. M., J. Shorter, M. Zolkiewski, J. R. Hoskins, S. Lindquist and S. Wickner (2007). "Asymmetric deceleration of ClpB or Hsp104 ATPase activity unleashes protein-remodeling activity." Nature structural & molecular biology 14(2): 114-122.

      Durie, C. L., E. C. Duran and A. L. Lucius (2018). "Escherichia coli DnaK Allosterically Modulates ClpB between High- and Low-Peptide Affinity States." Biochemistry 57(26): 3665-3675.

      Haslberger, T., A. Zdanowicz, I. Brand, J. Kirstein, K. Turgay, A. Mogk and B. Bukau (2008). "Protein disaggregation by the AAA+ chaperone ClpB involves partial threading of looped polypeptide segments." Nat Struct Mol Biol 15(6): 641-650.

      Li, T., J. Lin and A. L. Lucius (2015). "Examination of polypeptide substrate specificity for Escherichia coli ClpB." Proteins 83(1): 117-134.

      Li, T., C. L. Weaver, J. Lin, E. C. Duran, J. M. Miller and A. L. Lucius (2015). "Escherichia coli ClpB is a non-processive polypeptide translocase." Biochem J 470(1): 39-52.

      Miller, J. M., J. Lin, T. Li and A. L. Lucius (2013). "E. coli ClpA Catalyzed Polypeptide Translocation is Allosterically Controlled by the Protease ClpP." Journal of Molecular Biology 425(15): 2795-2812.

      Miller, J. M. and A. L. Lucius (2014). "ATP-gamma-S Competes with ATP for Binding at Domain 1 but not Domain 2 during ClpA Catalyzed Polypeptide Translocation." Biophys Chem 185: 58-69.

      Oberhauser, A. F., P. K. Hansma, M. Carrion-Vazquez and J. M. Fernandez (2001). "Stepwise unfolding of titin under force-clamp atomic force microscopy." Proc Natl Acad Sci U S A 98(2): 468-472.

      Rajendar, B. and A. L. Lucius (2010). "Molecular mechanism of polypeptide translocation catalyzed by the Escherichia coli ClpA protein translocase." J Mol Biol 399(5): 665-679.

      Weibezahn, J., P. Tessarz, C. Schlieker, R. Zahn, Z. Maglica, S. Lee, H. Zentgraf, E. U. Weber-Ban, D. A. Dougan, F. T. Tsai, A. Mogk and B. Bukau (2004). "Thermotolerance requires refolding of aggregated proteins by substrate translocation through the central pore of ClpB." Cell 119(5): 653-665.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This study provides a useful strategy for treating mouse cutaneous squamous cell carcinoma (mCSCC) with serum derived from mCSCC-exposed mice. The exploration of serum-derived antibodies as a potential therapy for curing cancer is particularly promising but the study provides inadequate evidence for specific effects of mCSCC-binding serum antibodies. This study will be of interest to scientists seeking a novel immunotherapic strategy in cancer therapy.

      Joint Public Review:

      Summary:

      This study presents an immunotherapeutic strategy for treating mouse cutaneous squamous cell carcinoma (mCSCC) using serum from mice inoculated with mCSCC. The author hypothesizes that antibodies in the generated serum could aid the immune system in tumor volume reduction. The study results showed a reduction in tumor volume and altered expression of several cancer markers (p53, Bcl-xL, NF-κB, Bax) suggesting the potential effectiveness of this approach.

      Strengths:

      The approach shows potential effect on preventing tumor progression, from both the tumor size and the cancer biomarker expression levels bringing attention to the potential role of antibodies and B cell responses in cancer therapy.

      We greatly appreciate your positive feedback on our study.

      Weaknesses:

      These are some of the specific things that the author could consider to strengthen the evidence supporting the claims in their study.

      (1) The study fails to provide evidence of the specific effect of mCSCC-antibodies on mCSCC. The study utilized serum which also contains many immune response factors like cytokines that could contribute to tumor reduction. There is no information on serum centrifugation conditions, which makes it unclear whether immune components like antigen-specific T cells, activated NK cells, or other immune cells were removed from the serum. The study does not provide evidence of neutralizing antibodies through isolation, analysis of B cell responses, or efficacy testing against specific cancer epitopes. To affirm the specific antibodies' role in the observed immune response, isolating antibodies rather than employing whole serum could provide more conclusive evidence. Purifying the serum to isolate mCSCC-binding antibodies, such as through protein A purification, and ELISA would have been more useful to quantify the immune response. It would be interesting to investigate the types of epitopes targeted following direct tumor cell injection. A more thorough characterization of the antibodies, including B cell isolation and/or hybridoma techniques, would strengthen the claim.

      I am deeply appreciative of the reviewer's highly professional comments. Tumor development involves the coexistence of cancer cells at different developmental stages, each harboring a variety of known and unknown mutated proteins. These mutated proteins expose multiple known and unknown epitopes, each capable of stimulating the production of corresponding antibodies in healthy mice. Identifying all these antibodies presents a significant challenge. Current research methodologies, such as ELISA, WB, and ChIP, can only identify known antibodies based on existing antigens. A prerequisite for using these techniques is that both antigens and antibodies are identified. At present, there is no technology available to identify antibodies produced by an unknown mutated protein and epitope. However, I find the reviewer's comments insightful. Perhaps we can initially identify some known mCSCC-antibodies on mCSCC. However, studying the specific effect of these known mCSCC-antibodies on mCSCC is uncertain because we believe that tumor shrinkage results from the combined action of both known and unknown antibodies.

      We concur with the reviewer's observations regarding the use of serum, which is rich in immune response factors such as cytokines that could potentially contribute to tumor reduction. In our future research, we plan to systematically analyze the individual roles of these antibodies and cytokines in tumor reduction. In 1973, Nature published a report indicating that serum demonstrated promising results in tumor treatment (Immunotherapy of Cancer with Antibody in Rats. Nature 243, 492 (1973). https://doi.org/10.1038/243492b0). Since then, there have been scarcely any reports on serum therapy for tumors. The primary focus of our study is to evaluate the efficacy of serum therapy in treating tumors. We hypothesize that antibodies and cytokines form a complex interactive network, working in synergy to reduce tumors. Consequently, we believe that studying these antibodies and cytokines in isolation may not yield effective results.

      In this study, the methodology section outlines the process of serum preparation. It is important to note that serum is devoid of blood cells. I hypothesized that whole blood might have superior therapeutic effects compared to serum. This is because antibodies could potentially synergize with immune cells (including T cells, B cells, and NK cells), thereby enhancing the effectiveness of the treatment. As previously discussed, these antibodies, cytokines, and immune cells form a complex interactive network aimed at tumor reduction. Consequently, there are numerous factors that could influence the experimental outcomes, which presents a challenge for analyzing the results. Furthermore, the implementation of whole blood transfusion therapy introduces additional considerations, such as potential side effects and reactions associated with blood transfusions.

      We thank the reviewers for their suggestion to purify the serum in order to isolate mCSCC-binding antibodies. As we previously mentioned, separating a large number of both known and unknown serum antibodies presents a significant technical challenge. We are eager to discuss and consider suggestions from the reviewers regarding methods to identify a large variety and number of unknown antibodies on cells. Perhaps, as the reviewer suggested, we could begin with known antibodies and employ Protein A purification technology to purify these antibodies and subsequently detect immune responses. We could also categorize the types of epitopes targeted, direct tumor cell injection, to study the epitopes of these types in further studies. The suggestion to study the response of B cells is valuable, and we plan to conduct comprehensive research on the response and status of B cells in our future studies.  

      The purification of antibodies to enhance the specificity of their effectiveness against tumors is a critical aspect of our study. However, we would like to address some concerns raised. (1) The separation of all antibodies and cytokines presents a significant technical challenge. Particularly, there is a risk of overlooking antibodies that are present in low concentrations but play crucial roles. (2) What concerns us is that studying the composition separately would lose the overall effectiveness of the study. Our primary concern is that studying these components in isolation could compromise the holistic understanding of the study. This is akin to current research on traditional medicine, where the separation and individual study of compounds often result in a loss of overall therapeutic efficacy. For instance, consider a scenario where 100 antibodies collectively work to shrink a tumor. These antibodies interact with 20 cytokines, forming a complex network that enhances the cytokines' activity against tumor cells. Furthermore, many important antibodies and cytokines are currently unknown. Studying these antibodies in isolation could potentially result in the loss of this therapeutic effect. Therefore, in the discussion section, we have emphasized that our study considers a tumor mass, including tumor cells at various stages of development, as a single entity. As a practicing clinician, my primary focus is on the therapeutic outcomes in tumor treatments, despite the mechanisms of serum therapy remaining largely elusive, liking a black box.

      (2) In the study design, the control group does not account for the potential immunostimulatory effects of serum injection itself. A better control would be tumor-bearing mice receiving serum from healthy non-mCSCC-exposed mice. Additionally, employing a completely random process for allocating the treatment groups would be preferable. Also, the study does not explain why intravenous injection of tumor cells would produce superior antibodies compared to those naturally generated in mCSCC-bearing mice.

      I concur with the reviewer's perspective that using serum from healthy, non-mCSCC exposed mice as a control could potentially improve our study. Initially, our primary concern was to minimize harm to the mice and avoid excessive blood reactions, which led us to exclude the use of serum from healthy, non-mCSCC exposed mice in our control group. The main objective of our study was to investigate tumor shrinkage through serum treatment, specifically serum-derived antibodies. We anticipated that tumor-bearing mice receiving serum from healthy, non-mCSCC exposed mice would exhibit a response to the injected serum, which would manifest as a blood reaction. However, we did not expect this to result in a tumor treatment effect. If it turns out that normal serum (from healthy, non-mCSCC-exposed mice) possesses tumor-reducing properties, it would indeed be a novel discovery. We appreciate the reviewer's insightful suggestion and will consider incorporating it into our future research.

      We concur with the reviewer's observations that the use of a completely random process for assigning treatment groups would be more desirable. Indeed, the complete randomization of the entire process further underscores the efficacy and universality of serum therapy. In this study, we utilized paired mice to mitigate the risk of cross-infection and adverse reactions associated with blood transfusions. We deeply value the reviewer's expert feedback.  

      Lastly, the reason why tumor cells, when intravenously injected, produce antibodies superior to those naturally generated in mCSCC-bearing mice, is due to the following reasons. As tumor cells grow, they produce a variety of mutated proteins to adapt to the immune microenvironment and evade the immune system of mCSCC-bearing mice. However, these tumor cells with mutated proteins are exceptionally sensitive and recognizable to healthy mice. This recognition triggers an immune response in healthy mice, leading to the production of specific therapeutic antibodies. This simultaneous production of diverse and abundant antibodies is only achievable by living organisms.

      (3) In Figure 2B, it would be more helpful if the author could provide raw data/figures of the tumor than just the bar graph. Similarly in Figure 3, the author should show individual data points in addition to the error bar to visualize the actual distribution.

      Raw data (numerical values) have been incorporated into Figures 2B and 3, but the data is placed in the table below the graph. If placed above the error bar, it requires a small font and may not be clear.

      (4) The author mentioned that different stages of tumor cells have different surface biomarkers. Therefore, experimenting with injecting tumor cells at various stages could reveal the most immunogenic stage. Such an approach would allow for a comparative analysis of immune responses elicited by tumor cells at different stages of development.

      Yes, throughout the course of tumor development, tumor cells at various stages will exhibit distinct markers or possess different mutated proteins. The concept of segregating tumor cells from different stages and independently comparing their immune responses is indeed commendable. Future research could involve isolating cells that express identical biomarkers at each stage for a comparative analysis of the immune responses triggered by the tumor cells. However, this approach diverges from the original intent of this study.

      Most tumor cells exist within the same developmental stage. However, this does not imply that all tumor cells within the tumor mass are at the same stage. For instance, a stage III liver cancer tumor may contain both stage I and stage IV tumor cells. Moreover, due to the complexity of tumor development, not all tumor cell surface markers are identical, even for tumors at the same stage. For instance, 20 major proteins and 100 minor proteins are implicated in tumor formation. In fact, random mutations in just 5 of these major proteins and 10 minor proteins can instigate the development of tumors. This implies that the protein pattern (tumor cell surface markers) associated with each individual's tumor is unique. While studying tumor cells at different stages separately allows for the observation of the immune response of tumor cells at each stage, it lacks a comprehensive research and treatment effect. For this reason, the design of this study treats a tumor mass as a whole, encompassing both the primary stage tumor cells and those not in that stage. These tumor cells are then injected to produce corresponding therapeutic antibodies. Furthermore, if tumor cells from only one stage are isolated and specific antibodies are produced against these cells, it could lead to immune escape of tumor cells at other stages, preventing the tumor from shrinking. Therefore, our approach aims to address this issue by considering the tumor mass as a whole.

      (5) In the abstract the author mentioned that using mCSCC is a proof-of-concept for this potential cancer treatment strategy. The discussion session should extend to how this strategy might apply to other cancer types beyond carcinoma.

      We have incorporated an additional paragraph in the discussion section where we delve into the concepts and experimental principles underpinning this study. This, we believe, addresses the reviewer's query regarding the applicability of our study's methodology to other types of tumors. The process for other tumors also involves isolating cells from the tumor, stimulating therapeutic antibody production in healthy mice using these cells, and ultimately reintroducing these antibodies into mice with tumors to facilitate tumor elimination

      Recommendations For The Authors:

      The author is encouraged to refine the study's design in future studies considering the weaknesses highlighted above, summarize the results more effectively, and seek opportunities to expand on this promising idea and enhance the research's impact and applicability.

      We greatly appreciate the valuable suggestions provided by the editor and reviewers. These insights will certainly be addressed in our future research endeavors.

      Suggestions for title modification:

      Following the scope of the study, the term 'specific homologous neutralizing-antibodies' may be misleading as neutralizing antibodies typically refer to antibodies preventing viral cell entry. In cancer therapy, 'neutralization' is not a relevant concept, as cancer cells do not infect host cells. Using whole tumor cells as immunogens diverges from the specificity of traditional vaccination approaches that utilize well-defined proteins or antigens. Furthermore, the term "homologous" suggests a precision in targeting that is not demonstrated by reintroducing serum without isolating its specific components. Therapeutic effects should not be attributed to "neutralizing antibodies" without isolating or characterizing the antibody response or verifying their efficacy against specific cancer epitopes. Additionally, it is suggested that you indicate the biological system that your study utilised in the title. More so, this approach is not entirely novel, as seen with the use of adjuvants in some flu vaccines, or in Moderna's cancer vaccine mRNA-4157, which encodes up to 34 patient-specific tumor neoantigens. You can consider the title below or a variant of the same.

      Suggested title: Generating serum-based antibodies from tumor-exposed mice: a potential strategy in cutaneous squamous cell carcinoma treatment

      I concur with your suggestion and have modified the title to " Generating serum-based antibodies from tumor-exposed mice: a new potential strategy for cutaneous squamous cell carcinoma treatment ". I believe this research remains some new, hence the addition of the word "new". Furthermore, the term "novel" in the paper has been either removed or substituted.

      Moreover, I propose that this study shares similarities with Moderna's cancer vaccine mRNA-415, albeit with certain differences. Moderna's cancer vaccine mRNA-415 encodes 34 recognized neoantigens to stimulate an immune response by eliciting specific T cell responses. This is similar to the strategy of some companies developing a protein set for diagnosing lung cancer, liver cancer, among others. Without a doubt, these methods have improved the effectiveness of tumor diagnosis and treatment. However, I think that these methods currently face challenges in completely eradicating tumors because they perceive tumors as a static process and cells that express certain mutated proteins in a fixed manner. I believe that small molecule antibodies, cytokines, and immune cells present in serum that are difficult to detect, have low concentrations, or are unknown are essential for maintaining the expression of important mutant proteins and the escape of tumor cells. This is also the primary reason why tumors are difficult to treat and prone to recurrence at present.

      From my perspective, different tumors, as well as different stages of the same tumor, express varying mutated proteins or surface markers. Targeting some may result in others escaping or even creating a more conducive growth environment for those that do escape. Our study adopts a comprehensive view of a tumor block, encompassing tumor cells at different stages and tumor cells at the same stage but expressing different biomarkers. This approach generates a multitude of known and unknown antibodies that work in concert with cytokines and immune cells. While our method may not be capable of generating all mutated proteins and epitope antibodies due to the weakness of some antigens (epitopes of mutated proteins), it can still be effective. As long as the number of tumor cells is reduced below a certain threshold following multiple rounds of treatment with various antibodies produced at different stages, these cancer cells can be eradicated by the body's immune system. This is a process that is real-time and dynamic. Undoubtedly, if it becomes evident that alterations in a set of proteins can bolster the immune system and eradicate tumor cells, then the implications are significant. The immunotherapy proteins, which have demonstrated positive therapeutic effects, developed by certain companies are also predicated on this very principle.

      Finally, I greatly appreciate your suggestions, which will be considered and gradually addressed in future research.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review): 

      By mapping H3K4me2 in mouse oocytes and pre-implantation embryos, the authors aim to elucidate how this histone modification is erased and re-established during the parental-to-zygotic transition, as well as how the reprogramming of H3K4me2 regulates gene expression and facilitates zygotic genome activation.

      Employing an improved CUT&RUN approach, the authors successfully generated H3K4me2 profiling data from a limited number of embryos. While the profiling experiments are very well executed, several weaknesses, particularly in data analysis, are apparent:

      (1) The study emphasizes H3K4me2, which often serves as a precursor to H3K4me3, a well-studied modification during early development. Analyzing the new H3K4me2 dataset alongside published H3K4me3 data is crucial for comprehensively understanding epigenetic reprogramming post-fertilization and the interplay between histone modifications. However, the current analysis is preliminary and lacks depth.

      Thank you very much for your valuable suggestions. The data of histone H3K4me3 in humans and mice has been published,and our previous data revealed the unique pattern of H3K4me3 during early human embryos and oocytes (Xia et al., 2019). So, this study mainly focuses on the localization of H3K4me2 in mouse oocytes and preimplantation embryos, how it is erased and re-established during mammalian parental-to-zygote transition, and its function. The combined analysis of H3K4me2 and H3K4me3 is not our main work, but it is not ruled out that there may be new discoveries between these two histones. Previously, our data tended to show that the H3K4me2 not only acts as a precursor of H3K4me3, but also plays its role independently.

      (2) Tranylcypromine (TCP) is known as an irreversible inhibitor of monoamine oxidase and LSD1. While the authors suggest TCP inhibits the expression of LSD2, this assertion is questionable. Given TCP's potential non-specific effects in cells, conclusions related to the experiments using TCP should be made with caution.

      Thank you for pointing this out, and we thank the reviewer again for the important suggestion. We found that the previous study indicated that TCP was a non-reversible inhibitor of LSD1 and LSD2, but according to our data, the content of LSD1 was very low in the early stages of mouse embryos, which mainly inhibited the function of LSD2. (Binda et al., 2010; Fang et al., 2010 )

      (3) Some batches of H3K4me2 antibody are known to cross-react with H3K4me3. Has the H3K4me2 antibody used in CUT&RUN been tested for such cross-reactivity? Heatmaps in the figures indeed show similar distribution for H3K4me2 and H3K4me3, further raising concerns about antibody specificity.

      We thank the reviewer for the insightful comments. The H3K4me2 antibody was purchased from Millipore (cat. 07030). Figure 2A shows the specific enrichment area of H3K4me2 in promoter and distal region. Some batches of H3K4me2 antibody are known to cross-react with H3K4me3, but the H3K4me2 antibody we used in our CUT&RUN seems to have Low cross-reactivity.

      (4) Certain statements lack supporting references or figures (examples on page 9 can be found on line 245, line 254, and line 258).

      Thank you for pointing this out, and we will add references to support the statement in the paper as suggested.

      (5) Extensive language editing is recommended to clarify ambiguous sentences. Additionally, caution should be taken to avoid overstatement - most analyses in this study only suggest correlation rather than causality.

      Thank you for your kind comments. We will revise the expression in the manuscript later.

      Reviewer #2 (Public Review):

      Chong Wang et al. investigated the role of H3K4me2 during the reprogramming processes in mouse preimplantation embryos. The authors show that H3K4me2 is erased from GV to MII oocytes and re-established in the late 2-cell stage by performing Cut & Run H3K4me2 and immunofluorescence staining. Erasure and re-establishment of H3K4me2 have not been studied well, and profiling of H3K4me2 in germ cells and preimplantation embryos is valuable to understanding the reprogramming process and epigenetic inheritance.

      (1) The authors claim that the Cut & Run worked for MII oocytes, zygotes, and the 2-cell embryos. However, it is unclear if H3K4me2 is erased during the stage or if the Cut & Run did not work for these samples. To support the hypothesis of the erasure of H3K4me2, the authors conducted immunofluorescence staining, and H3k4me2 was undetected in the MII oocyte, PN5, and 2-cell stage. However, the published papers showed strong staining of H3K4me2 at the zygote stage and 2-cell stage ((Ancelin et al., 2016; Shao et al., 2014)). The authors need to cite these papers and discuss the contradictory findings.

      The authors used 165 MII oocytes and 190 GV oocytes for the Cut & Run. The amount of DNA in MII oocytes is halved because of the emission of the first polar body. Would it be a reason that H3K4me2 has fewer H3K4me2 peaks in MII oocytes than GV oocytes?

      First of all, thank you for your valuable advice. The published papers showed strong staining of H3K4me2 at the zygote stage and 2-cell stage, which is interesting. I think we may have used different parameters in the confocal laser shooting process(Ancelin et al., 2016). We used the same parameter to continuously shoot the blastocyst stage from the GV stage. If we only shot the fertilized egg and the 2-cell stage, I think we may also see weak fluorescence at the 2-cell stage under different parameters. We will refer to this reference and discuss it in the resubmitted version.

      Moreover, you mentioned the H3K4me2 has fewer H3K4me2 peaks in MII oocytes than GV oocytes, because the MII expelled the polar body. There is no problem with this logic. However, the first polar body expelled from the MII stage is still in the zona pellucida, and we also collected the polar body in the CUT&RUN experiment; Therefore, compared to GV, the DNA content of MII samples is not halved. After further discussion, we believe that the reduction of H3K4me2 peaks in MII stage compared with GV stage may be closely related to oocyte maturation. It is the specific modification of histones in different forms at different times that affects the chromatin structure change appropriately with the different stages of meiosis. At present, it has been confirmed that H3K4me3 gradually decreases from GV to MII stage during the maturation of human oocytes. H3K27me3 did not change from GV to MII stage.

      In Figure 3C, 98% (13,183/13,428) of H3K4me2 marked genes in GV oocytes overlap with those in the 4-cell stage. Furthermore, 92% (14,049/15,112) of H3K4me2 marked genes in sperm overlap with those in the 4-cell stage. Therefore, most regions maintain germ line-derived H3K4me2 in the 4-cell stage. The authors need to clarify which regions of germ line-derived H3K4me2 are maintained or erased in preimplantation embryos. Additionally, it would be interesting to investigate which regions show the parental allele-specific H3K4me2 in preimplantation embryos since the authors used hybrid preimplantation embryos (B6 x DBA).

      Thank you very much for your suggestion. Further analysis of which regions show the parental allele-specific H3K4me2 in preimplantation embryos will make the study more interesting. We will discuss this in depth in resubmitted vision.

      (2) The authors claim that Kdm1a is rarely expressed during mouse embryonic development (Figure 4A). However, the published paper showed that KDM1a is present in the zygote and 2-cell stage using immunostaining and western blotting ((Ancelin et al., 2016)). Additionally, this paper showed that depletion of maternal KDM1A protein results in developmental arrest at the two-cell stage, and therefore, KDM1a is functionally important in early development. The authors should have cited the paper and described the role of KDM1a in early embryos.

      In the analysis of this experiment, we believe that in the early embryonic development of mice, the expression of KDM1A is lower than that of KDM1B, which is relative. Similarly, the transcriptome data we cite also show that KDM1A is expressed at elevated levels during oocyte maturation and fertilization compared to immature oocytes. In addition, the effects of loss of maternal KDM1a on embryonic development were not discussed. We believe that the absence of maternal KDM1b blocks embryonic development, and we will cite and discus the references later.

      (3) The authors used the published RNA data set and interpreted that KDM1B (LSD2) was highly expressed at the MII stage (Figure S3A). However, the heat map shows that KDM1B expression is high in growing oocytes but not at 8w_oocytes and MII oocytes. The authors need to interpret the data accurately.

      After re-checking the data, we found that there was a problem with the normalization method of our heat map, and we will re-make the heatmap and submit it in the modified version. With reference to Figure 4A, the content of Kdm1b is indeed higher than that of Kdm1a.

      (4) All embryos in the TCP group were arrested at the four-cell stage. Embryos generated from KDM1b KO females can survive until E10.5 (Ciccone et al., 2009); therefore, TCP-treated embryos show a more severe phenotype than oocyte-derived KDM1b deleted embryos. Depletion of maternal KDM1A protein results in developmental arrest at the two-cell stage ((Ancelin et al., 2016)). The authors need to examine whether TCP treatment affects KDM1a expression. Western blotting would be recommended to quantify the expression of KDM1A and KDM1B in the TCP-treated embryos.

      We will further dig the transcriptome data to confirm the specificity of TCP to KDM1b. In addition, the intervention of TCP on the whole fertilized egg in this study increased the H3K4me2 content, and the embryo development retarding effect was more significant than that obtained by crossing with normal paternal lines after knocking down KDM1B from the mother.

      (5) H3K4me2 is increased dramatically in the TCP-treated embryos in Figure 4 (the intensity is 1,000 times more than the control). However, the Cut & Run H3K4me2 shows that the H3K4me2 signal is increased in 251 genes and decreased in 194 genes in the TCP-treated embryos (Fold changes > 2, P < 0.01). The authors need to explain why the gain of H3K4me2 is less evident in the Cut & Run data set than in the immunofluorescence result.

      Thanks a lot for your question. In the experimental group, the fluorescence value of H3K4me2 in IF was increased by 1000 times (Figure 4E), and the expression of H3K4Me2-related genes in CR was up-regulated and down-regulated for a total of 445 changes (Figure 6A). In our opinion, as a semi-quantitative analysis, immunofluorescence cannot be compared with the quantitative analysis method of CR because of the different analysis models and threshold Settings.

      References

      Ancelin, K., ne Syx, L., Borensztein, M., mie Ranisavljevic, N., Vassilev, I., Briseñ o-Roa, L., Liu, T., Metzger, E., Servant, N., Barillot, E., Chen, C.-J., Schü le, R., & Heard, E. (2016). Maternal LSD1/KDM1A is an essential regulator of chromatin and transcription landscapes during zygotic genome activation. https://doi.org/10.7554/eLife.08851.001

      Ciccone, D. N., Su, H., Hevi, S., Gay, F., Lei, H., Bajko, J., Xu, G., Li, E., & Chen, T. (2009). KDM1B is a histone H3K4 demethylase required to establish maternal genomic imprints. Nature, 461(7262), 415-418. https://doi.org/10.1038/nature08315

      Shao, G. B., Chen, J. C., Zhang, L. P., Huang, P., Lu, H. Y., Jin, J., Gong, A. H., & Sang, J. R. (2014). Dynamic patterns of histone H3 lysine 4 methyltransferases and demethylases during mouse preimplantation development. In Vitro Cellular and Developmental Biology - Animal, 50(7), 603-613. https://doi.org/10.1007/s11626-014-9741-6

      References

      Xia W, Xu J, Yu G, Yao G, Xu K, Ma X, Zhang N, Liu B, Li T, Lin Z, Chen X, Li L, Wang Q, Shi D, Shi S, Zhang Y, Song W, Jin H, Hu L, Bu Z, Wang Y, Na J, Xie W, Sun YP. Resetting histone modifications during human parental-to-zygotic transition. Science. 2019 Jul 26;365(6451):353-360. doi: 10.1126/science.aaw5118. Epub 2019 Jul 4. PMID: 31273069.

      Binda C, Valente S, Romanenghi M, Pilotto S, Cirilli R, Karytinos A, Ciossani G, Botrugno OA, Forneris F, Tardugno M, Edmondson DE, Minucci S, Mattevi A, Mai A. Biochemical, structural, and biological evaluation of tranylcypromine derivatives as inhibitors of histone demethylases LSD1 and LSD2. J Am Chem Soc. 2010 May 19;132(19):6827-33.

      Fang R, Barbera AJ, Xu Y, Rutenberg M, Leonor T, Bi Q, Lan F, Mei P, Yuan GC, Lian C, Peng J, Cheng D, Sui G, Kaiser UB, Shi Y, Shi YG. Human LSD2/KDM1b/AOF1 regulates gene transcription by modulating intragenic H3K4me2 methylation. Mol Cell. 2010 Jul 30;39(2):222-33. doi: 10.1016/j.molcel.2010.07.008. PMID: 20670891; PMCID: PMC3518444.

      Ancelin K, Syx L, Borensztein M, Ranisavljevic N, Vassilev I, Briseño-Roa L, Liu T, Metzger E, Servant N, Barillot E, Chen CJ, Schüle R, Heard E. Maternal LSD1/KDM1A is an essential regulator of chromatin and transcription landscapes during zygotic genome activation. Elife. 2016 Feb 2;5:e08851. doi: 10.7554/eLife.08851. PMID: 26836306; PMCID: PMC4829419.

      Reviewer #3 (Public Review):

      Summary:

      This study explores the dynamic reprogramming of histone modification H3K4me2 during the early stages of mammalian embryogenesis. Utilizing the advanced CUT&RUN technique coupled with high-throughput sequencing, the authors investigate the erasure and re-establishment of H3K4me2 in mouse germinal vesicle (GV) oocytes, metaphase II (MII) oocytes, and early embryos.

      Strengths:

      The findings provide valuable insights into the temporal and spatial dynamics of H3K4me2 and its potential role in zygotic genome activation (ZGA).

      Weaknesses:

      The study primarily remains descriptive at this point. It would be advantageous to conduct further comprehensive functional validation and mechanistic exploration.

      Key areas for improvement include enhancing the innovation and novelty of the study, providing robust functional validation, establishing a clear model for H3K4me2's role, and addressing technical and presentation issues. The text would benefit from the introduction of a novel conceptual framework or model that provides a clear explanation of the functional consequences and molecular mechanisms underlying H3K4me2 reprogramming in the transition from parental to early embryonic development.

      While the findings are significant, the current manuscript falls short in several critical areas. Addressing major and minor issues will significantly strengthen the study's contribution to the field of epigenetic reprogramming and embryonic development.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Summary of the changes

      Changes in the manuscript were made to clarify some ambiguities raised by the reviewers and to improve the report following their recommendations. A summary of the main changes is listed below:

      - The title was changed to better reflect the results of this study - Re-training the model on log transformed FACS scores.

      - Testing the specificity of the FEPS to facial expression of pain within this experimental setup by comparing it to the activation maps obtained from the Warm stimulation condition.

      - Testing for sensitization/habituation of the behavioral measures (FACS scores and pain ratings).

      - Adding a section in the discussion to better address the limitations of this study and provide potential directions for future studies.

      Other changes target areas where the original manuscript may have been ambiguous or lacked precision. To address these concerns, additional details have been incorporated, and certain terms have been revised to ensure a more precise and transparent presentation of the information.

      Public Reviews:

      Reviewer #1 (Public Review):

      Picard et al. report a novel neural signature of facial expressions of pain. In other words, they provide evidence that a specific set of brain activations, as measured by means of functional magnetic resonance imaging (fMRI), can tell us when someone is expressing pain via a concerted activation of distinctive facial muscles. They demonstrate that this signature provides a better characterization of this pain behaviour when compared with other signatures of pain reported by past research. The Facial Expression of Pain Signature (FEPS) thus enriches this collection and, if further validated, may allow scientists to identify the neural structures subserving important non-verbal pain behaviour. I have, however, some reservations about the strength of the evidence, relating to insufficient characterization of the underlying processes involved.

      We are thankful for the summary of our work. We are hopeful that the modifications made in the latest version effectively address these concerns. The changes are outlined in the summary above, and detailed in the following point-by-point response.

      Strengths:

      The study relies on a robust machine-learning approach, able to capitalise on the multivariate nature of the fMRI data, an approach pioneered in the field of pain by one of the authors (Dr. Tor Wager). This paper extends Wager's and other colleagues' work attempting to identify specific combinations of brain structures subserving different aspects of the pain experience while examining the extent of similarity/dissimilarity with the other signatures. In doing so, the study provides further methodological insight into fine-grained network characterization that may inspire future work beyond this specific field.

      We are thankful for the positive comments.

      Weaknesses:

      The main weakness concerns the lack of a targeted experimental design aimed to dissect the shared variance explained by activations both specific to facial expressions and to pain reports. In particular, I believe that two elements would have significantly increased the robustness of the findings:

      (1) Control conditions for both the facial expressions and the sensory input. An efficient signature should not be predictive of neutral and emotional facial expressions (e.g., disgust) other than pain expressions, as well as it should not be predictive of sensations originating from innocuous warm stimulation or other unpleasant but non-painful stimulation.

      We do recognize the lack of specificity testing for the FEPS, especially towards negative emotional facial expressions. This would be relevant to test given the behavioural overlap between the facial expressions of pain and disgust, fear, anger, and sadness (Kunz et al., 2013; Williams, 2003). The experimental design used in this study did not include other negative states. However, we fully support the necessity of collecting data throughout those conditions, and we believe that the present study highlights the importance of such a demonstration. Future research should involve recording facial expressions while exposing participants to stimuli that elicit a range of negative emotions but, to our knowledge, such combination of fMRI and behavioural data is currently unavailable. As raised by the reviewer, this approach would allow us to assess the specificity of the FEPS to the facial expression evoked by pain compared to different affective states. We would like to emphasise that specificity and generalizability testing is a massive amount of work, requiring multiple studies to address comprehensively. A Limitations paragraph addressing this research direction has been added to the Discussion. A conclusion was added to the abstract as follows: “Future studies should explore other pain-relevant manifestations and assess the specificity of the FEPS against other types of aversive or emotional states.”

      (2) Graded intensity of the sensory stimulation: different intensities of the thermal stimulation would have caused a graded facial expression (from neutral to pain) and graded verbal reports (from no pain to strong pain), thus offering a sensitive characterisation of the signal associated with this condition (and the warm control condition).

      However, these conditions are missing from the current design, and therefore we cannot make a strong conclusion about the generalisability of the signature (regardless of whether it can predict better than other signatures - which may/may not suffer from similar or other methodological issues - another potential interesting scientific question!). The authors seem to work on the assumption that the trials where warm stimulation was delivered are of no use. I beg to disagree. As per my previous comment, warm trials (and associated neutral expressions) could be incorporated into the statistical model to increase the classification sensitivity and precision of the FEPS decoding.

      The experience of pain can fluctuate for a fixed intensity or after controlling statistically for the intensity of the stimulation (Woo et al., 2017). Consistent with this, the current study focused on spontaneous facial expression in response to noxious thermal stimuli delivered at a constant intensity that produced moderate to strong pain in every participant. As the reviewer points out, this does not allow us to characterise and compare the stimulus-response function of facial expression and pain ratings. The advantage of the approach adopted is to maximise the number of trials where facial expression is more likely to occur, while ensuring that changes in facial expression and pain ratings are not confounded with changes in stimulus intensity. The manuscript has been revised to clarify that point. However, we do agree that it would be interesting to conduct more studies focusing on facial expression in response to a range of stimulus intensities. This discussion has been added to the Limitations paragraph.

      Furthermore, following the reviewer’s suggestion, we performed complementary analyses on the warm trials in the proposed revisions. The dot product (FEPS scores) between the FEPS and the activation maps associated with the warm condition was computed. A linear mixed model was conducted to investigate the association between FEPS scores and the experimental condition (warm vs pain). The trials in the pain condition were divided into two conditions: null FACS scores (painful trials with no facial response; FACS scores = 0) and non-null FACS scores (painful trials with a facial response; FACS > 0). The details of this analysis have been added to the manuscript (see Response of the FEPS to pain and warm section in the Methods; lines 427 to 439) as well as the corresponding results (see Results and Discussion; lines 138 to 158). The FEPS scores were larger in the pain condition where a facial response was expressed, compared to both the pain condition without facial expression and the warm condition. These results confirmed the sensitivity of the FEPS to facial expression of pain.

      Reviewer #2 (Public Review):

      Summary:

      The objective of this study was to further our understanding of the brain mechanisms associated with facial expressions of pain. To achieve this, participants' facial expressions and brain activity were recorded while they received noxious heat stimulation. The authors then used a decoding approach to predict facial expressions from functional magnetic resonance imaging (fMRI) data. They found a distinctive brain signature for pain facial expressions. This signature had minimal overlap with brain signatures reflecting other components of pain phenomenology, such as signatures reflecting subjective pain intensity or negative effects.

      We appreciate this concise and accurate summary of our study.

      Strength:

      The manuscript is clearly written. The authors used a rigorous approach involving multivariate brain decoding to predict the occurrence and intensity of pain facial expressions during noxious heat stimulation. The analyses seem solid and well-conducted. I think that this is an important study of fundamental and clinical relevance.

      Weaknesses:

      Despite those major strengths, I felt that the authors did not suffciently explain their own interpretation of the significance of the findings. What does it mean, according to them, that the brain signature associated with facial expressions of pain shows a minimal overlap with other pain-related brain signatures?

      We express our sincere gratitude for the valuable insights and constructive comments on the strengths and weaknesses of the current study. We thank reviewer 2 for the encouragement to reinforce our interpretation of the significance of the findings, while acknowledging the limitations raised by the three reviewers.

      A few questions also arose during my reading.

      Question 1: Is the FEPS really specific to pain expressions? Is it possible that the signature includes a facial expression signal that would be shared with facial expressions of other emotions, especially since it involves socio-affective regulation processes? Perhaps this question should be discussed as a limit of the study?

      We acknowledge this limitation as outlined in response to Reviewer #1. We have incorporated a Limitations paragraph to provide a more in-depth discussion of this limitation and to explore potential future avenues (lines 225 to 268). Again, please note that the demonstration of specificity is an incremental process that requires a systematic comparison with other conditions where facial expressions are produced without pain. A concluding sentence was added to the abstract to encourage specificity testing in future studies. as indicated above.

      Question 2: All AUs are combined together in a composite score for the regression. Given that the authors have other work showing that different AUs may be associated with different components of pain (affective vs. sensory), is it possible that combining all AUs together has decreased the correlation with other pain signatures? Or that the FEPS actually reflects multiple independent signatures?

      The question raised is consistent with the work of Kunz, Lautenbacher, LeBlanc and Rainville (2012), and Kunz, Chen and Rainville (2020). In the current study, the pain-relevant action units were combined in order to increase the number of trials where a facial response to pain was expressed, thus enhancing the robustness of our analyses. Given the limited sample size, our current dataset is unfortunately insufficient to perform such analysis as there would not be enough trials to look at the action units separately or in subgroups. While the approach of combining the different AUs has proven to be valid and useful, we recognize the value of investigating potential independent signatures associated with the different AUs within the FEPS, and examining whether those signatures can lead to more similar patterns compared to previously developed pain signatures. This discussion has been included in the Limitations paragraph in the Discussion (lines 225 to 268).

      Question 3: Is facial expressivity constant throughout the experiment? Is it possible that the expressivity changes between the beginning and the end of the experiment? For instance, if there is a habituation, or if the participant is less surprised by the pain, or in contrast if they get tired by the end of the experiment and do not inhibit their expression as much as they did at the beginning. If facial expressivity changes, this could perhaps affect the correlation with the pain ratings and/or with the brain signatures; perhaps time (trial number) could be added as one of the variables in the model to address this question.

      The concern raised by the reviewer is legitimate. We conducted a mixed-effects model to assess the impact of successive trials and runs on facial expressivity. Results indicate that the FACS scores did not change significantly throughout the experiment, suggesting no notable effect of habituation or sensitization on the facial expressivity in our study. Details about the analysis and the results have been added to the Facial Expression section in the Methods (lines 335 to 346).

      Reviewer #3 (Public Review):

      In this manuscript, Picard et al. propose a Facial Expression Pain Signature (FEPS) as a distinctive marker of pain processing in the brain. Specifically, they attempt to use functional magnetic resonance imaging (fMRI) data to predict facial expressions associated with painful heat stimulation. The main strengths of the manuscript are that it is built on an extensive foundation of work from the research group, and that experience can be observed in the analysis of fMRI data and the development of the machine learning model. Additionally, it provides a comparative account of the similarities of the FEPS with other proposed pain signatures. The main weaknesses of the manuscript are the absence of a proper control condition to assess the specificity of the facial pain expressions, a few relevant omissions in the methodology regarding the original analysis of the data and its purpose, and a biased interpretation of the results.

      I believe that the authors partially succeed in their aims, as described in the introduction, which are to assess the association between pain facial expression and existing pain-relevant brain signatures, and to develop a predictive brain activation model of the facial responses to painful thermal stimulation. However, I believe that there is a clear difference between those aims and the claim of the title, and that the interpretation of the results needs to be more rigorous.

      We wish to express our appreciation for the insightful and constructive critique provided. The limitation pertaining to the absence of specificity testing had been addressed in response to Reviewer #1, and it has been incorporated into the manuscript (lines 251 to 258).

      The commentary made by Reviewer #3 has drawn our attention to a critical concern, namely the potential misalignment between the study findings and our original title. Consequently, we have changed the title to “A distributed brain response predicting the facial expression of acute nociceptive pain”. We also revised the interpretation of the results in the discussion section and we have added a section on limitations.

      Recommendations for the Authors:

      Reviewer #1 (Recommendations For The Authors):

      I hope the following comments will be useful to improve the manuscript.

      Abstract

      I felt the abstract could be more clear in terms of experimental or scientific questions, hypotheses/expectations, and findings. I also feel the abstract should briefly support the conclusive claim ("is better than...": how better? Or according to what criterion? This may be more relevant than the final conclusive general sentence that does not specifically address the significance of the findings).

      The abstract was revised to reinforce the functional perspective adopted to interpret brain activity produced by noxious stimuli and predicting various pain-relevant manifestations. We also mention explicitly the other pain-relevant signatures against which the FEPS is compared in this report, and we added a concluding sentence highlighting the importance of assessing the specificity of the FEPS in future studies.

      Introduction - background and rationale

      I would postpone the discussion around pain signature and anticipate the one about the brain mechanisms of facial expressions of pain. This will allow you to reinforce the logical flow of rationale, literature gap/question, why the problem is important, and study aims. Only then go for a review of relevant literature on signatures before providing a more specific final paragraph about the study-specific questions, expectations, and implementation. At the moment this is limited to a single very descriptive short paragraph at the end of the intro.

      The introduction was structured to guide the readers through a comprehensive understanding of different pain neurosignatures. The introduction aimed to establish a robust rationale for the subsequent analyses detailed in the results section. Indeed, the presentation of that literature ensured that the discussion around pain signatures is contextualised within a broader continuous framework. We acknowledge the reviewer’s comment on the limited description of the brain mechanisms of facial expression of pain. However, this was addressed in several previous reports of our laboratory (Kunz et al. 2011; Vachon-Presseau et al. 2016; Kunz, Chen, and Rainville 2020). We have added some more details about the brain mechanisms of facial expression, and highlighted those references in the first paragraph of the introduction.

      Methods and Results

      (1) Was there any indication of power based on the previous work or the other signature papers? If yes, how that would inform the present analysis?

      The NPS was trained on 20 participants that experienced 12 trials at each of four different intensities. The assessment of the effect sizes was performed on the Neurological Pain Signature in Han et al. (2022). That study revealed a moderate effect size for predicting between-subject pain reports, and a large one for predicting within-subject pain reports. We trained our model on 34 participants that underwent 16 trials. We expected our results to show a smaller effect size as the current experimental design only allowed us to examine spontaneous changes in the facial expression, as noted in the comments made by Reviewer #1. However, the best way to calculate the unbiased effect size of the results presented in the current study would be to test the unchanged model on new independent datasets (see Reddan, Lindquist, and Wager, 2017). Unfortunately, such datasets do not currently exist.

      (2) I would clarify to the reader what is meant by normal range of thermal pain and why is this relevant. Also, I did not find data about this assessment nor about the assessment of facial expressiveness (or reference to where it can be found).

      We changed this formulation to “All participants included in this study had normal thermal pain sensitivity” and we added a few references. By targeting a healthy population with normal thermal pain sensitivity, our study sought to identify a predictive brain pattern related to facial expression evoked by typical responses to pain that could eventually be generalised to other individuals from the same population. Details about the assessment of facial expressiveness have been added in the appropriate section in the Methods.

      (3) That pain ratings are only weakly associated with facial responses is, in its own right, an interesting finding, as a naïve reader would expect the two to be highly positively correlated. I'd suggest discussing this aspect (in reference to previous research) as it is interesting on both theoretical and empirical grounds.

      The likelihood and the strength of pain facial expression generally increase with pain ratings in response to acute noxious stimuli of increasing physical intensities, thereby leading to a positive association between the two responses that is driven by the stimulus. However, the poor correlation or the dissociation between facial pain expression and pain rating is a very well known phenomenon that can be demonstrated easily using experimental methods where the stimulus intensity is held constant and spontaneous fluctuations are observed in both facial expression and pain ratings. This result was not discussed in the current manuscript as it was already addressed in the work of Kunz et al. (2011) and Kunz, Karos and Vervoot (2018). We added the references to these studies in the revised manuscript (lines 330 to 334).

      (4) It may be worth having CIs throughout the whole set of analyses.

      Thanks for the suggestions, this was an oversight. The confidence intervals have been added in the manuscript where applicable.

      (5) I would clarify if there are two measures of the brain signature: dot-product and activation map. Relatedly, I cannot find where the authors explained what "FEPS pattern expression scores". Can the authors please clarify?

      The clarification has been added in the manuscript (lines 413 to 414).

      (6) There seems to be the assumption that the relationship between pain-relevant brain signatures and facial expressions of pain would be parametric and linear. However, this might not hold true. Did the authors test these assumptions?

      We indeed decided to use a linear regression technique (i.e. LASSO regression) to model the association between the brain activity and the facial expression of pain. The algorithm choice was mainly based on the simplicity and the interpretability of that approach, and our limited number of observations. The choice was also coherent with previous studies in the domain (e.g. Wager et al., 2011; Wager et al., 2013; Krishnan et al. 2016; Woo et al., 2017). Using a linear model, we were able to predict above chance level the facial expression evoked by pain using the fMRI activation. However, it is legitimate to think that more complex non linear models can better capture the brain patterns predictive of that behavioural manifestation of pain.

      (7) Did the authors assess whether the FACS were better to be transformed/normalised? More generally, I would report any data assessment/transformation that has not been reported.

      Thank you for this highly relevant suggestion. FACS scores were indeed not normally distributed and the analyses were conducted again to predict the log transformed FACS scores. This transformation was effective to normalize the distribution (skewness = 0.75, kurtosis = -0.84). The predictive model was confirmed on transformed data.

      (8) Page 12: I am not clear on whether all the signatures are included in the same model (like a multiple regression) or if separate regressions are calculated per signature. The authors seem to imply that several regressions have been computed (possibly one per comparison with each signature?).

      The correlation between the FACS scores and the pain-related signatures was computed separately for each signature. This information has been clarified.

      (9) MVPA: See my main comment about warm trials and experimental/statistical design. For example, the LASSO regression model for the pain trials could be compared with a model using warm trials besides (or instead of) the unfitted model. Otherwise, add the warm trials as another predictor or within the subject level in a dummy fixed factor comprising pain and warm trials.

      The inclusion of warm trials in the model training would be inconsistent with the goal of the main analysis to predict the facial expression of pain when a noxious pain stimulus is presented. Secondary analyses were conducted to compare the response of the FEPS to the warm trials compared to noxious pain trials. The dot product between the FEPS and the activation maps (FEPS scores) associated with the warm condition was computed. A linear mixed model was conducted to investigate the association between FEPS scores and the experimental condition (warm vs pain). Additional contrasts compared the warm trials with the pain trials with and without pain facial expression. The details of this analysis have been added to the manuscript (see Response of the FEPS to pain and warm in the Methods) as well as the corresponding results (see Results and Discussion).

      (10) I would clarify for the reader why the separate M1 analysis has been run. Although obvious, I feel the reader would benefit from the specific hypothesis about this control analysis being spelled out together with the other statistical hypotheses within the statistical design in a more streamlined manner.

      We extended the discussion on the rationale of that analysis and its interpretation taking into account the most recent results using the log transformed FACS scores (lines 125 to 133).

      (11) The mixed model aimed to assess the relationship between pain ratings FEPS scores and facial scores is a crucial finding. I believe it speaks to the importance of a more complete design, which I already highlighted. I have a couple of technical questions: did the authors assess random slopes too? And, what was the strategy used to determine the random effects structure?

      The linear mixed model considered the participants as a random effect, with random intercepts, considering the grouping structure in our data (i.e., each participant completed multiple trials). The reported results in the original manuscript were considering fixed slopes. However, following the reviewer’s comment, we re-computed the mixed linear models allowing the slopes to vary according to the intensity ratings. The results were changed in the manuscript to represent the output of those models.

      (12) The text from lines 63 to 67 could go in the methods.

      We decided to include those lines within the Result and Discussion section to give the reader more specification about the FACS scores, as this term is subsequently referenced in the following part of the Results and Discussion section. We are concerned that putting this information only in the Methods section would disrupt the reading.

      Reviewer #2 (Recommendations For The Authors):

      p. 4-5. When you report the positive weight clusters, you follow up with a sentence specifying which cognitive processes those brain regions are typically associated with. However, when you report the negative weight clusters, you do not specify the cognitive processes typically associated with those brain areas. I think that providing that information would be helpful to the readers.

      Thanks for noticing this omission. The information has been added in the most recent version of the manuscript (lines 119 to 121).

      p. 9. You specify that the degree of expressiveness of participants was evaluated. How did you evaluate expressiveness? Did you use this variable in your analyses? Were participants excluded based on their degree of expressiveness?

      Details about the assessment of facial expressiveness have been added in the appropriate section in the Methods (lines 285 to 289).

      p. 10. You explain that two certified FACS-coders evaluated the video recordings to rate the frequency of AUs. Could you please provide more details about the frequency measure? I think that there are different ways in which this could have been done. For instance, were the videos decomposed into frames, and then the frequency measured by summing the number of frames in which the AU occurred? Or was it "expression-based", so one occurrence of an AU (frequency of 1) would correspond to the whole period between its activation onset and offset? Both ways have pros and cons. For example, if the frequency represents the number of frames, then it controls for the total duration of the AU activation within a trial (pro); but if there were multiple activations/deactivations of the AU within one trial, this will not be controlled for (con). And vice-versa with the second way of calculating frequency.

      Details about the frequency scores have been added to the manuscript (lines 315 to 319).

      p. 11. When you explained how you calculated the association between the facial expression of pain and pain-related brain signatures, I felt that there was some information missing. Did you use the thresholded maps (available in the published articles), or did you somehow have access to the complete, voxel-by-voxel, raw regression coefficient maps?

      The unthresholded maps were used. The information has been clarified in the latest version of the manuscript, as well as the details about the availability of the maps (see Data Availability section at the end of the manuscript).

      Reviewer #3 (Recommendations For The Authors):

      Format

      The authors will notice that many observations about the manuscript are related to missing information and a lack of graphical representations. I believe the topic and the content of the manuscript are too complex to condense into a short report.

      Title

      The claim of the title is simply not substantiated by the content of the manuscript. Demonstrating that the FEPS is a distinctive (i.e., specific) marker of pain processing requires a substantially different experimental design, with more rigorous controls and a broader set of painful stimulations. The manuscript would benefit from a more accurate title.

      We agree that the title could better align with our findings. We modified the title accordingly : “A distributed brain response predicting the facial expression of acute nociceptive pain”.

      Abstract

      I find it puzzling that the authors claim that there is limited knowledge of the neural correlates of facial expression of pain given what they describe in the first paragraph of the introduction. Besides, they propose to reanalyze a dataset that has been extensively described in Kunz et al. (2011), which is unlikely to provide any new significant information.

      We respectfully disagree with that comment. We considered that three articles (i.e., Kunz et al., 2011; Vachon-presseau et al., 2016; Kunz, Chen and Rainville, 2020) on the topic do constitute limited knowledge, especially if we compare it to the very large body of literature on the neural correlates associated with pain ratings. Except for these three studies, all the other citations pertain to behavioral studies on facial expression of pain, and do not examine the brain activity related to it. Furthermore, we believe that the complementary nature of the analyses performed in Kunz et al. (2011) and in this manuscript offers new insights into our understanding of facial expression in the context of pain. Indeed, the multivariate approach used in this study addresses some limitations present in Kunz et al. (2011) univariate analyses, mainly that it provides a quantifiable way to compare the similarity between different predictive patterns (Reddan and Wager, 2017). We submit that the assessment of the FEPS against several other pain-relevant signatures provides new and important information.

      Furthermore, the abstract does not clearly state the aim, and the first line of the results does not match what the authors claim in the preceding line. The take-home message (last sentence) introduces the concept of a biomarker, which, as stated before, cannot be validated with the current data/experimental design. To put it in plain words, a given facial expression (or a composite score derived from a combination of expressions) cannot be a specific biomarker for pain, because a person can always mimic the same expression without feeling pain. Whether a given facial expression can be predicted from brain activity is a different issue, and whether that prediction can differentiate between painful and non-painful origins of the facial expression is another different issue. Unfortunately, neither of those issues can be tested with the current data/experimental design. The abstract would improve if the authors would circumscribe to what they actually tested, which is accurately described in the last sentence of the Introduction.

      The abstract was revised accordingly. The term ‘biomarker’ was used in accordance with preceding studies in the field (see Reddan and Wager, 2017; Lee et al., 2021). Please note that we applied the same reasoning to fluctuations in pain expression as previous studies have applied to pain ratings. Of course, we can not dismiss the possibility of someone mimicking facial expressions. Similar reasoning applies to subjective reports, as individuals can intentionally overestimate their pain experience conveyed through verbal reports. This is another case of specificity testing that cannot be addressed in the present study (see new conclusion of the abstract and discussion of limitations). The challenge of pain assessment is a classical problem within both the scientific and the clinical literature. Here, we suggest that the consideration of multiple manifestations of pain is necessary to address this challenge and will provide a more comprehensive portrait of pain-related brain function.

      Introduction

      I believe that the Introduction would benefit from a strict definition of what is a marker/biomarker/neuromarkers (all those terms are used in the manuscript) and what are its desirable features (validity, reliability, specificity, etc.). I also believe that the Introduction (and the rest of the text) would benefit from a critical assessment of the term "signature". The Introduction describes four existing "signatures", all of them differing in the experimental condition in which acute nociceptive pain is studied, and proposes a fifth one. Keeping with the analogy, I'm wondering whether they should be called (pain) "signatures" if there is a different one for each experimental acute pain condition, and they are so dissimilar between them when they are tested on the same condition (this dataset).

      The last part of that comment raises fundamental methodological potential limitations that should be addressed in more depth in another article. That point goes beyond the scope of a research article. Regarding the stability aspect of the signatures, most of the signatures have not been studied extensively. It is thus difficult to currently assess their reliability. However, Han et al. (2022) showed high within-individual test-retest reliability for the NPS across eight different studies. Given that pain is a multidimensional experience, it is not surprising to find different patterns of activation predictive of different aspects or dimensions of the pain experience (see Čeko et al., 2022 for a similar discussion applied to negative affect).

      The authors state that "As an automatic behavioral manifestation, pain facial expression might be an indicator of activity in nociceptive systems, perceptual and evaluative processes, or general negative affect." Doesn't it reflect all three of them? (and instead of or?) Why "might"?

      The original sentence has been modified as follows: “As an automatic behavioral manifestation, pain facial expression is considered to be an indicator of activity in nociceptive systems, and to reflect perceptual and affective-evaluative processes” (lines 65 to 67).

      Methods

      The pain scale should be described. Kunz et al. used a 0-100 scale, where 50 was the pain threshold. This is crucial to interpret the 75-80/100 score for the painful thermal intensity.

      The description of the pain scale has been added to the manuscript (lines 299 to 300).

      Ratings for warm and painful temperatures should be reported (ideally plotted with individual-trial/subject data). In the same line of reasoning, FACS scores should be reported as well (ideally plotted with individual-trial/subject data). It would be interesting to explore the across-trial variability of pain ratings and FACS scores. That is, do people keep giving the same ratings and making the same facial expression after 16 trials? How much variability is between trials and between subjects?

      The point raised in that comment was already addressed in response to a comment made by Reviewer #1 (also see the new Figures S2 and S4; see also lines 335 to 346).

      How come only painful trials are analyzed? What if the FEPS signature was the same for warm and painful stimulation, thus reflecting the settings (fMRI experiment, stimulation, etc.) rather than the brain response to the stimuli?

      The point raised in that comment was already addressed in response to a comment made by Reviewer #1. There was no pain expression in the warm trials and the FEPS shows no response to warm trials. This is now illustrated in the new Figure S4B (see also lines 138 to 158).

      The authors propose to predict the trial-by-trial FACS composite score from the pain ratings using a LMM. However, it is interesting that they aim for an almost constant within- and between-subject pain score (75-80/100) as stated in the Methods. This should theoretically render the linear model invalid since its first (and main) assumption would be that FACS should vary linearly with the pain score. Even if patients were not aware that the temperatures were constant across trials, the variation in pain scores should be explained by random noise for a constant stimulation intensity.

      Reviewer #3 raises an important point that we need to clarify. Contrary to the expectation that FACS responses should be strongly correlated to pain ratings, we posited that these response channels depend at least in part on separate brain networks that may be differentially sensitive to a variety of modulatory mechanisms (attention, emotion, expectancy, motor priming, social context, etc.). This implies that part of the variance in FACS is independent from pain ratings. We, therefore, consider what Reviewer #3 refers to as random noise to be relevant and meaningful fluctuations reflecting endogenous processes influencing one’s experience of pain and differentially affecting various output responses.

      I noticed that fMRI data was analyzed with SPM5 in the original paper (Kunz et al., 2011) and with SPM8 in this manuscript. Was fMRI data re-processed for this manuscript? Were there any differences between the original analysis and this one that might induce changes in the interpretation of results?

      The data were indeed re-processed using SPM8, which was the most recent version available when we started the analyses reported here. We used trial-by-trial activation maps for MVPA, which differs from what was used in the previous study (contrast maps at the level of the conditions, not the trials). We have no reason to believe that the different versions will change the message of this manuscript since those versions do not differ significantly in terms of the fMRI preprocessing pipeline (see SPM8 release notes; https://www.fil.ion.ucl.ac.uk/spm/software/spm8/). Furthermore, the aim of this present study is not to compare the different analysis parameters implemented in SPM5 vs SPM8.

      What is the rationale for including PVP in the comparison among signatures? The experimental settings in which it was devised are distant from those described here.

      The inclusion of the PVP was aimed at enhancing our comparative analysis with the FEPS, as we sought to investigate the potential functional meaning of the FEPS. The PVP was developed to capture the aversive value of pain, a dimension that is conceptually proximal to the interpretation of the facial expression as a manifestation of the affective response to nociceptive pain.

      The LASSO-PCR approach is, in my opinion, not a procedure for (brain) decoding in this context. It is accurately described in the section title as a method for multivariate pattern analysis, or as a variable selection and regularization method for a prediction model. Here, brain activity in specific areas related to pain processing can hardly be described as "encoded", and the method just helps select those activations relevant for explaining a certain outcome (in this case, facial expressions).

      We understand the point made by reviewer #3. The term brain decoding was changed for multivariate pattern analysis in the latest version of the manuscript.

      Details are missing with regards to the dataset split into training, validation, and testing.

      Details about the training and testing procedure were added in the manuscript (lines 383 to 385).

      This might just be ignorance from me, so I apologize in advance, but what are "contrast" fMRI images? They are mentioned three times in the text but not really described. Are they the "Pain > Warm" contrasts from the original paper?

      We apologize for any confusion caused by the use of the term “contrast images” which suggests a direct comparison between two experimental conditions. We have replaced “contrast images” with “activation maps” to provide a more accurate description of the nature of the data used in the multivariate pattern analysis (lines 388 to 389).

      In the "Facial expression" section, the authors run an LMM to test the association between pain ratings (response variable) and facial responses (explanatory variable). If I understand correctly, in the "Multivariate pattern analysis" section they test the association between facial composite scores (response variable) and pain ratings (explanatory variable), but they obtain different results.

      The analyses were recomputed on the log transformed data, as mentioned previously in the response to reviewers 1-2. The first model (in the “Facial expression” section) used the log transformed FACS scores as a dependent variable, the pain ratings as the fixed effect, and the participants as the random effect. The results of that analysis suggested that the transformed facial expression scores were not significantly associated with the pain ratings (p = .07). The second model uses both the FEPS pattern expression scores and pain ratings as fixed effects to predict facial responses. This analysis showed the significant contribution of the FEPS to the prediction of FACS scores (p < .001) and no significant effect of the pain ratings. However, a significant interaction was found (p = .03) suggesting that the prediction of the pain facial expression by the FEPS may vary with pain ratings (i.e. moderator effect). Those results have been clarified in the “Multivariate pattern analysis” section in the Methods (lines 416 to 426).

      In this same section, what are "FEPS pattern expression scores"? They are used three times in the text, but I could not find their description.

      The FEPS pattern expression scores correspond to the dot product between the trial-by-trial activation maps and the unthresholded FEPS signature. This information has been added to the manuscript (lines 413 to 414).

      It would not be far-fetched to hypothesize that FACS scores could be predicted using solely activity from the motor cortex. The authors attempted to do this, but only with information from M1. Why did they not use the entire motor cortex, or better, regions of the motor cortex directly linked with the AUs described in the manuscript?

      The selection of the primary motor area (M1) was based on the results found in Kunz et al. (2011). In this study, M1 showed the strongest correlation with facial expression of pain. There are numerous possibilities of combinations of multiple brain regions considering a variety of criteria based on distributed networks involved in motor, affective, or pain-related processes. We limited our exploration to the region with the strongest hypothesis due to practical feasibility concerns.

      Results and Discussion

      As a general recommendation, results should present individual data whenever possible. For example, the association between signatures and facial expression should be plotted using scatterplots.

      We have added figures showing individual data when it was applicable (Figure S2; Figure S4).

      The authors state that the LASSO-PCR model accounts for the facial responses to pain. I believe this is an overstatement, considering:

      - A Pearson's r of 0.49 is usually considered low/weak correlation (moderate at best). In the same line, an R2 of 0.17 means that only 17% of the variance is explained by the model.

      More nuanced interpretation of the results has been added to the discussion. A section has been added to highlight the limitations of the study.

      - Figure 1 needs to display individual subject data and the ideal regression line.

      The model was trained using a k-fold cross-validation procedure. The regression lines thus represent the model’s prediction for each one of the 10 folds (i.e. each fold is trained and tested on a different subset of the data). A scatter plot including the ideal regression line computed across all trials and subjects was added in supplementary material to illustrate the relation between the FACS scores and the FEPS pattern expression scores (Figure S4).

      - Looking at Figure 1, it is clear that the model has an intercept different from zero. This means that when the FACS score was zero (i.e., volunteers did not make any distinguishable facial expression), the model predicted a score larger than zero. This is not discussed in the manuscript, and in simple terms, it means that there are brain activation patterns when no discernible facial expression is being made by the volunteers. In the original paper by Kunz et al., two groups of subjects were categorized, and one of them was a facially low- or non-expressive group (n=13). This fact is not even mentioned in the manuscript.

      The categorization in the previous report (Kunz et al., 2012) was based on a pre-experimental session. All subjects were included in the current analysis. This is now indicated in the Methods (lines 287 to 289).

      - On the other end of the range in Figure 1, differences between the FACS scores near the maximum range (40) are underestimated by 23 to 33 points! I guess that the RMSE is smaller (6-7 points), because many FACS scores are concentrated on the low end of the scale.

      This is a very interesting comment. A section discussing the limits of the model to predict the lower and higher FACS scores has been added in the manuscript (lines 232 to 250).

      It is of course acceptable to interpret the low similarity between signatures as a sign that each signature describes a different mechanism related to pain processing. However, I believe that a complete discussion should contemplate other competing hypotheses. Considering that all signatures were developed using a similar painful thermal stimulation protocol, it is reasonable to expect larger similarities between signatures. The fact that they are so dissimilar could be a reflection of model overfit, i.e., all these signatures are just fitted to these particular experimental protocols and data, and do not generalize to brain mechanisms of pain processing.

      We appreciate the pertinent observation. We have included a limitations section in which we discussed, among other considerations, the possible overfitting of models and the necessity of pursuing generalizability studies (lines 225 to 268).

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      This is an important study on the regulation of chlorophyll biosynthesis in rice embryos. It provides insights into the genetic and molecular interactions that underlie chlorophyll accumulation, highlighting the inhibition of OsGLK1 by OsNF-YB7 and the broader implications for understanding chloroplast development and seed maturation in angiosperms. The results presented, including mutation analysis, gene expression profiles, and protein interaction studies, provide convincing evidence for the function of OsNF-YB7 as a repressor in the chlorophyll biosynthesis pathway.

      Thank you very much for your positive assessment of our manuscript. We have carefully revised the manuscript according to the reviewers’ valuable suggestions and comments. For more details, please see the point-to-point response to the reviewers below.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This manuscript investigates the regulation of chlorophyll biosynthesis in rice embryos, focusing on the role of OsNF-YB7. The rigorous experimental approach, combining genetic, biochemical, and molecular analyses, provides a robust foundation for these findings. The research achieves its objectives, offering new insights into chlorophyll biosynthesis regulation, with the results convincingly supporting the authors' conclusions.

      Strengths:

      The major strengths include the detailed experimental design and the findings regarding OsNF-YB7's inhibitory role.

      Weaknesses:

      However, the manuscript's discussion on the practical implications for agriculture and the evolutionary analysis of regulatory mechanisms could be expanded.

      Thank you for your insightful comments and suggestions. In the revised manuscript, we discussed the potential application of the chlorophyllous embryo (please see line 270-274). The presence of chlorophyll in the embryo facilitates photosynthesis at early developmental stages, potentially leading to improved seedling growth and vigor (Smolikova and Medvedev, 2016). In crops such as soybean and canola, green embryo is considered as a valuable trait due to its association with enhanced photosynthetic capacity, which consequently promotes fatty acid biosynthesis (Ruuska et al., 2004). However, chlorophyll degradation must be carefully managed during seed maturation to avoid negative effects on seed viability and meal quality (Chung et al., 2006). Interestingly, the green embryo of lotus (Nelumbo nucifera) is widely used as a food ingredient in Asian, Australia, and North America. It is employed in herbal medicine to treat nervous disorders, insomnia, and other conditions (Zhu et al., 2017; Ha et al., 2022), highlighting the significant potential value of the green embryo.

      In many chloroembryophytes, such as Arabidopsis, the embryo occupies a large proportion of the seed. From an evolutionary perspective, the presence of chlorophyll in the embryo may promote adaptation in such chloroembryophytes because more reserves can be accumulated in the seed through active photosynthesis, better supporting the embryo development and subsequent seedling growth (Sela et al., 2020). On the other hand, some leucoembryophytes, such as rice, have persistent endosperm rich in storage reserves to nourish embryo development (Liu et al., 2022). Gaining the ability to accumulate chlorophyll in the embryo is unnecessary for such species. In agreement with this hypothesis, cholorophyllous embryos are more prevalent in non-endospermous seeds (Dahlgren, 1980). However, we would like to emphasize that the evolutionary force driving the divergence of chloroembryophytes and leucoembryophytes is currently almost completely unknown and deserves in-depth investigation in the future. We discussed the possible evolution of the ability to accumulate chlorophyll in the embryo, please find the details in Line 276-295.

      Reviewer #2 (Public Review):

      Summary:

      The authors set out to establish the role of the rice LEC1 homolog OsNF-YB7 in embryo development, especially as it pertains to the development of photosynthetic capacity, with chlorophyll production as a primary focus.

      Strengths:

      The results are well-supported and each approach used complements each other. There are no major questions left unanswered and the central hypothesis is addressed in every figure.

      Weaknesses:

      There are a handful of sections that could use clarifying for readers, but overall this is a solidly composed manuscript.

      The authors clearly achieved their aims; the results compellingly establish a disparity between how this system operates in rice and Arabidopsis. Conclusions are thoroughly supported by the provided data and interpretations. This work will force a reconsideration of the value of Arabidopsis as a model organism for embryo chlorophyll biosynthesis and possibly photosynthesis during embryo maturation more broadly, as rice is a major crop organism and it very clearly does not follow the Arabidopsis model. It will thus be useful to carry out similar tests in other organisms rather than relying on Arabidopsis and attempting to more fully establish the regulatory mechanism in rice.

      Thank you very much for your positive comments. We have carefully revised the manuscript according to your and the other reviewers’ comments and suggestions. Particularly, we emphasized the necessary to carry out similar tests in other organisms rather than relying on Arabidopsis to better understand the regulatory mechanism in rice.

      Reviewer #3 (Public Review):

      Summary:

      In this study, the authors set out to understand the mechanisms behind chlorophyll biosynthesis in rice, focusing in particular on the role of OsNF-YB7, an ortholog of Arabidopsis LEC1, which is a positive regulator of chlorophyll (Chl) biosynthesis in Arabidopsis. They showed that OsNF-YB7 loss-of-function mutants in rice have chlorophyll-rich embryos, in contrast to Arabidopsis LEC1 loss-of-function mutants. This contrasting phenotype led the authors to carry out extensive molecular studies on OsNF-YB7, including in vitro and in vivo protein interaction studies, gene expression profiling, and protein-DNA interaction assays. The evidence provided well supported the core arguments of the authors, emphasising that OsNF-YB7 is a negative regulator of Chl biosynthesis in rice embryos by mediating the expression of OsGLK1, a transcription factor that regulates downstream Chl biosynthesis genes. In addition, they showed that OsNF-YB7 interacts with OsGLK1 to negatively regulate the expression of OsGLK1, demonstrating the broad involvement of OsNF-YB7 in rice Chl biosynthetic pathways.

      Strengths:

      This study clearly demonstrated how OsNF-YB7 regulates its downstream pathways using several in vitro and in vivo approaches. For example, gene expression analysis of OsNF-YB7 loss-of-function and gain-of-function mutants revealed the expression of selected downstream chl biosynthetic genes. This was further validated by EMSA on the gel. The authors also confirmed this using luciferase assays in rice protoplasts. These approaches were used again to show how the interaction of OsNF-YB7 and OsGLK1 regulates downstream genes. The main idea of this study is very well supported by the results and data.

      Weaknesses:

      From an evolutionary perspective, it is interesting to see how two similar genes have come to play opposite roles in Arabidopsis and rice. It would have been more interesting if the authors had carried out a cross-species analysis of AtLEC1 and OsNF-YB7. For example, overexpressing AtLEC1 in an osnf-yb7 mutant to see if the phenotype is restored or enhanced. Such an approach would help us understand how two similar proteins can play opposite roles in the same mechanism within their respective plant species.

      We appreciate your insightful comments and suggestions. It is a very interesting question whether AtLEC1 can fully restore osnf-yb7, given the possible functional divergence between the genes in terms of regulation of chlorophyll biosynthesis in the embryo. We have previously expressed OsNF-YB7 in the lec1-1 background in Arabidopsis, driven by the native promoter of LEC1 (Niu et al., 2021). We found that OsNF-YB7 could almost completely rescue the embryo defects in Arabidopsis, indicating that OsNF-YB7 plays a resemble role in rice as the LEC1 does in Arabidopsis (Niu et al., 2021). We sought to determine whether AtLEC1 can complement the chlorophyll defect in osnf-yb7. However, given the fact that osnf-yb7 shows severe callus induction defect, which is not surprising, because many studies have shown that LEC1 is indispensable for somatic embryo development in various plant species, we are struggling to obtain the genetic materials for analysis. We have to transform OsNF-YB7pro::AtLEC1 into the WT background first, and then cross the transformant with the osnf-yb7 mutant. This is a time-consuming process in rice, but hopefully we will able to isolate a line expressing OsNF-YB7pro::AtLEC1 in the osnf-yb7 background from the resulting segregating population.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      A minor comment regarding the chlorophyll contents quantification in the study. Line 87: "The results showed that WT had an achlorophyllous embryo throughout embryonic development,...." In the TEM result, chloroplast was not observed in the WT embryo sections, indicating a lack of chlorophyll-containing structures, contrary to what was found in the osnf-yb7 embryos where chloroplasts were observed.

      The authors stated that the embryo morphologies and Chl autofluorescence data showed that WT had an achlorophyllous embryo throughout embryonic development. However, the quantification of Chl levels in Figure 1D and Figure 4C showed that WT does produce some chlorophylls, albeit at lower levels than osnf-yb7 or OSGLK-OX embryos (WT values in the two figures are slightly different). This discrepancy warrants clarification to ensure consistency and accuracy in the manuscript's findings.

      We re-evaluated the Chl content in the embryos of WT and OsGLK1-OX mature seeds. The result confirmed our previous finding that WT embryos produce a small amount of chlorophyll (please see the updated Fig. 4C). Notably, we observed that the dark-grown etiolated plants still have measurable chlorophyll content as reported in many studies (for example, Wang et al., 2017; Yoo et al., 2019), suggesting that there is potential bias in measuring chlorophyll content using an absorbance-based approach. We assume this possibly explains the concern you have raised.

      Reviewer #2 (Recommendations For The Authors):

      Mild editing for grammar is needed throughout, e.g. line 73, "It is still a mysterious why plant species".

      We have carefully edited the grammar.

      As a minor point, the placement of figure panels, such as in Figure 1, is not always intuitive.

      Thank you for your suggestion. This figure has been revised as suggested. Please see the updated Fig. 1.

      What is the significance of the two GFP mutants in Figures 2C and 2D? Is one of those the mislabeled Flag mutant?

      The lines showed in Fig. 2C and D were not mislabeled. They were two independent transgenic events, both of which showed that OsNF-YB7 inhibited the expression of OsPORA and OsLHCB4 in rice. The transgenic lines overexpressing OsNF-YB7 tagging with the 3× Flag (NF-YB7-Flag) were also used for this experiment. In agreement, OsPORA and OsLHCB4 were significantly downregulated in the three independent NF-YB7-Flag lines (Fig. S4C), confirming the results showed in Fig. 2C and D.

      In Figures 2G and 2H, what is that enormous band at the bottom of the gel?

      The bands at the bottom of the gel were free probes. We indicated this in the revised figure.

      Not until the Materials and Methods section did I realize that any of this study was being done in tobacco; the Introduction implies it's rice vs. Arabidopsis and it might be a good idea to mention the organism of study somewhere before Figure 6.

      We apologize for any confusion caused by our previous writing. While the majority of this study was performed with rice plants or protoplasts, the split complementary LUC assays and BiFC assays were performed with tobacco. We have specified these in the revised manuscript as suggested.

      Reviewer #3 (Recommendations For The Authors):

      It would be nice if the author could show what the phenotype is in AtLEC1 OX in osnf-yb7 and also OsNF-YB7 OX in atlec1 mutants.

      Thank you for your suggestion. We have previously expressed OsNF-YB7 in the lec1-1 background of Arabidopsis, driven by the native promoter of Arabidopsis LEC1 (Niu et al., 2021). Since OsNF-YB7 could rescue the embryo morphogenesis defects in Arabidopsis (Niu et al., 2021), we assumed that OsNF-YB7 plays a similar role in rice as the LEC1 does in Arabidopsis. However, it remains unknown whether expression of LEC1 in osnf-yb7 may restore the chlorophyllous embryo phenotype in rice. As the generation of genetic material is time-consuming, and especially given the fact that osnf-yb7 has a severe callus induction defect, we are struggling to obtain the complementary line for analysis. We have to transform OsNF-YB7pro::AtLEC1 in a WT background first, and then cross the transformant with the osnf-yb7 mutant. Hopefully, we will be able to isolate a line expressing OsNF-YB7pro::AtLEC1 in osnf-yb7 background, from the derived segregating population. We discussed the reviewer’s concern in the revised manuscript, please see Line 369-376.

      Line 46, I think it is vague to mention that 'Like most plant species'. Some species might have different copy numbers, for example, a single GLK in liverwort M. polymorpha.

      The statement has been revised. Please see Line 46.

      Figures 2F and 5B, why was only one promoter region used for OsLHCB4? It would be better to have more regions like OsPORA.

      Thank you for your comments. Here, we have examined more promoter regions (P1, P2 and P3) in the revised manuscript as suggested, among which, the previously selected promoter region (P3) contains both the G-box and CCAATC motifs that can be potentially recognized by GLK1. Consistent to our previous report, the results showed that OsNF-YB7 (left) and OsGLK1 (right) were associated with the P3 region, but showed no significant differences in the other probes. Please see the results in Fig. 2F and Fig. 5B of the revised manuscript.

      Legend of Figures 2G, H, OsPORA (I), and OsLHCB (J) should be (G) and (H) respectively.

      Corrected.

      References

      Chung, D.W., Pruzinska, A., Hortensteiner, S., and Ort, D.R. (2006). The role of pheophorbide a oxygenase expression and activity in the canola green seed problem. Plant Physiol 142, 88-97.

      Ha, T., Kim, M.S., Kang, B., Kim, K., Hong, S.S., Kang, T., Woo, J., Han, K., Oh, U., Choi, C.W., and Hong, G.S. (2022). Lotus Seed Green Embryo Extract and a Purified Glycosyloxyflavone Constituent, Narcissoside, Activate TRPV1 Channels in Dorsal Root Ganglion Sensory Neurons. J Agric Food Chem 70, 3969-3978.

      Liu, J., Wu, M.W., and Liu, C.M. (2022). Cereal Endosperms: Development and Storage Product Accumulation. Annu Rev Plant Biol 73, 255-291.

      Niu, B., Zhang, Z., Zhang, J., Zhou, Y., and Chen, C. (2021). The rice LEC1-like transcription factor OsNF-YB9 interacts with SPK, an endosperm-specific sucrose synthase protein kinase, and functions in seed development. Plant J 106, 1233-1246.

      Ruuska, S.A., Schwender, J., and Ohlrogge, J.B. (2004). The capacity of green oilseeds to utilize photosynthesis to drive biosynthetic processes. Plant Physiol 136, 2700-2709.

      Sela, A., Piskurewicz, U., Megies, C., Mene-Saffrane, L., Finazzi, G., and Lopez-Molina, L. (2020). Embryonic Photosynthesis Affects Post-Germination Plant Growth. Plant Physiol 182, 2166-2181.

      Smolikova, G.N., and Medvedev, S.S. (2016). Photosynthesis in the seeds of chloroembryophytes. Russ J Plant Physl+ 63, 1-12.

      Wang, Z., Hong, X., Hu, K., Wang, Y., Wang, X., Du, S., Li, Y., Hu, D., Cheng, K., An, B., and Li, Y. (2017). Impaired Magnesium Protoporphyrin IX Methyltransferase (ChlM) Impedes Chlorophyll Synthesis and Plant Growth in Rice. Front Plant Sci 8, 1694.

      Yoo, C.Y., Pasoreck, E.K., Wang, H., Cao, J., Blaha, G.M., Weigel, D., and Chen, M. (2019). Phytochrome activates the plastid-encoded RNA polymerase for chloroplast biogenesis via nucleus-to-plastid signaling. Nat Commun 10, 2629.

      Zhu, M., Liu, T., Zhang, C., and Guo, M. (2017). Flavonoids of Lotus (Nelumbo nucifera) Seed Embryos and Their Antioxidant Potential. J Food Sci 82, 1834-1841.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Reviews):

      Summary: 

      The authors use a combination of biochemistry and cryo-EM studies to explore a complex between the cap-binding complex and an RNA binding protein, ALYREF, that coordinates mRNA processing and export.

      Strengths: 

      The biochemistry and structural biology are supported by mutagenesis which tests the model in vitro. The structure provides new insight into how key events in RNA processing and export are likely to be coordinated.

      Weaknesses: 

      The authors provide biochemical studies to confirm the interactions that they identify; however, they do not perform any studies to test these models in cells or explore the consequences of mRNA export from the nucleus. In fact, several of the amino acids that they identified in ALYREF that are critical for the interaction, as determined by their own biochemical studies, are conserved in budding yeast Yra1 (residues E124/E128 are E/Q in budding yeast and residues Y135/V138/P139 are F/S/P), where the impact on poly(A) RNA export from the nucleus could be readily evaluated. The authors could at least mention this point as part of the implications and the need for future studies. No one seems to have yet targeted any of these conserved residues, so this would be a logical extension of the current work.

      We thank the reviewer for the feedback on our work. ALYREF coordinates pre-mRNA processing and export through interactions with a plethora of mRNA biogenesis factors including the DDX39B subunit of the TREX complex, CBC, EJC, and 3’ processing factors. ALYREF mediates the recruitment of the TREX complex on nascent transcripts which depends on its interactions with both CBC and EJC. Our work and studies by others indicate that ALYREF uses overlapping interfaces including both the N-terminal WxHD motif and the RRM domain to bind CBC and EJC. Thus, ALYREF mutants deficient in CBC interaction will also disrupt the ALYREF-EJC interaction and are not ideal for functional studies. In addition, the CBC plays important roles in multiple steps of mRNA metabolism through interactions with a plethora of factors, which often interact competitively with CBC. Identification of separation-of-function mutations on CBC or ALYREF that specifically disrupt their interaction but not other cellular complexes containing CBC or ALYREF would be an important future area to test the model in cells. 

      We appreciate the reviewer’s insightful comments regarding yeast Yra1. Thus far, the physical and functional connection between Yra1 and CBC in yeast has not been demonstrated. There are major differences between yeast Yra1 and human ALYREF. Given the lack of an EJC in S. cerevisiae, it is unclear whether Yra1 acts in a similar manner as human ALYREF. In addition, Yra1 does not contain a WxHD motif in its N-terminal unstructured region, which is involved in CBC and EJC interactions in ALYREF. Characterization of the Yra1-CBC interaction will be an interesting future direction. We now include a discussion about yeast Yra1 in the newly added “Conclusion and perspectives” section. 

      Specific suggestions:

      The authors could put their work in context by speculating how some of the amino acids that they identify as being critical for the interactions they identify could contribute to cancer. For example, they mention mutations of interacting residues in NCBP2 are associated with human cancers, pointing out that NCBP2 R105C amino acid substitution has been reported in colorectal cancer and the NCBP2 I110M mutation has been found in head and neck cancer. Do the authors speculate that these changes would decrease the interaction between NCBP2 and ALYREF and, if so, how would this contribute to cancer? They also mention that a K330N mutation in NCBP1 in human uterine corpus endometrial carcinoma, where Y135 on the α2 helix of mALYREF2 makes a hydrogen bond with K330 of NCBP1. How do they speculate loss of this interaction would contribute to cancer?

      In the revised manuscript, we include a discussion about these CBC mutants found in human cancers in the “Conclusion and perspectives” section. We think some of these CBC mutants, such as NCBP-1 K330N, could reduce interaction with ALYREF. Compromised CBC-ALYREF interaction will affect the recruitment of the TREX complex on nascent transcripts and cause dysregulation of mRNA export. In addition, that could also change the partition of CBC and ALYREF in different cellular complexes and cause perturbation of various steps in mRNA biogenesis that are regulated by CBC and ALYREF. Thus far, it is unclear whether and how loss of the CBC-ALYREF interaction directly contributes to cancer. Our work and that of others provide molecular insights to test in future studies. 

      Reviewer #2 (Public Reviews):

      Summary: 

      In this manuscript, Bradley and his colleagues represented the cryo-EM structure of the nuclear cap-binding complex (CBC) in complex with an mRNA export factor, ALYREF, providing a structural basis for understanding CBC regulating gene expression.

      Strengths: 

      The authors successfully modeled the N-terminal region and the RRM domain of ALYREF (residues 1-183) within the CBC-ALYREF structure, which revealed that both the NCBP1 and NCBP2 subunits of the CBC interact with the RBM domain of ALYREF. Further mutagenesis and pull-down studies provided additional evidence to the observed CBC-ALYREF interface. Additionally, the authors engaged in a comprehensive discussion regarding other cellular complexes containing CBC and/or ALYREF components. They proposed potential models that elucidated coordinated events during mRNA maturation. This study provided good evidence to show how CBC effectively recruits mRNA export factor machinery, enhancing our understanding of CBC regulating gene expression during mRNA transcription, splicing, and export. 

      Weaknesses: 

      No in vivo or in vitro functional data to validate and support the structural observations and the proposed models in this study. Cryo-EM data processing and structural representation need to be strengthened. 

      We appreciate the reviewer’s comments and suggestions. The fact that ALYREF uses highly overlapped binding interfaces for CBC and EJC interactions prevents us from a clear functional dissection of the ALYREF-CBC interaction using in vitro assays or in cells at the current stage. Please also see our response to Reviewer 1. 

      In this revised manuscript, we have reprocessed the cryo-EM data using a different strategy which yields significantly improved maps. We have made improvements to the presentation of the structural work based on the reviewer’s specific comments. 

      Reviewer #3 (Public Reviews):

      Summary: 

      The authors carried out structural and biochemical studies to investigate the multiple functions of CBC and ALYREF in RNA metabolism.

      Strengths: 

      For the structural study part, the authors successfully revealed how NCBP1 and NCBP2 subunits interact with mALYREF (residues 1-155). Their binding interface was then confirmed by biochemical assays (mutagenesis and pull-down assays) presented in this study. 

      Weaknesses: 

      The authors did not provide functional data to support their proposed models. The authors should include more details regarding the workflow of their cryo-EM data processing in the figure. 

      We thank the reviewer for the comments. We completely agree that testing the proposed models in cells would be ideal. However, as we also respond to the other reviewers, functional studies are premature at the current stage because both ALYREF and CBC are components of many cellular complexes that regulate mRNA metabolism. Separation-of-function mutations on CBC or ALYREF first need to be identified in future studies for further investigation. Please also see our response to Reviewer 1. 

      As suggested by the reviewer, we have included more details of the cryo-EM workflow in this revised manuscript. We have also included various validation measures including 3DFSC analyses, map vs model FSC curves, and representative density maps at various protein-protein binding interfaces. 

      Recommendations for the Authors:

      Reviewer #1 (Recommendations for the Authors):

      Major points:

      The authors should take advantage of Figure 1, which shows the domain structures of NCBP1, NCBP2, and ALYREF to indicate for the reader specifically which protein domains are included in the biochemical and structural analyses. In the current version of the manuscript, there is plenty of space to indicate below each domain structure precisely what regions are included.

      In this revised manuscript, we have revised Figure 1A to indicate the protein constructs used in this work. 

      Although it is fine to combine the Results and Discussion, the authors should really offer a concluding paragraph to highlight the novel results from this study and put the results in context.

      We thank the reviewer for the recommendation. We now include a “Conclusion and perspectives” section in this revised manuscript.  

      Minor comments:

      Page 5, last sentence (and others) starts a sentence with the word "Since" when likely "As" which does not imply a time element to the phrase, is the correct word.

      "Since the ALYREF/mALYREF2 interaction with the CBC is conserved and mALYREF2 exhibits better solubility, we focused on mALYREF2 in the cryo-EM investigations."

      Would be more correct as: "As the ALYREF/mALYREF2 interaction with the CBC is conserved and mALYREF2 exhibits better solubility, we focused on mALYREF2 in the cryo-EM investigations."

      We thank the reviewer for the comments. We have made the corrections. 

      The word 'data' is plural so the sentence at the bottom of p.9 that includes the phrase "...in vivo data shows.." should read "..in vivo data show.." 

      Corrected in the revised manuscript.

      Reviewer #2 (Recommendations for the Authors):

      Major points:

      (1) The authors claimed the improved solubility of mouse ALYREF2 (mALYREF2, residues 1-155) compared to the previously employed ALYREF construct. However, human ALYREF has already been purified successfully for pull down assay, indicating soluble human ALYREF obtained, why not use human ALYREF directly? Please clarify. 

      Pull-down studies were performed with GST-tagged ALYREF. For cryo-EM studies, untagged ALYREF is preferred to avoid potential issues that may arise from the expression tag. However, untagged ALYREF is less soluble than GST-tagged ALYREF and is not amenable for structural studies. We have revised the text to clarify this point. 

      (2) The authors confirmed CBC-ALYREF interfaces through mutagenesis and pull-down assays in vitro. However, it would be more informative and interesting to include functional assays in vitro or/and in vivo with mutagenesis. 

      We completely concur with the reviewer that testing the proposed models in vitro and in vivo would be important. However, as we pointed out in our response to public reviews, the highly overlapped binding interfaces on ALYREF for CBC and EJC interactions pose a great challenge for functional studies. Furthermore, both ALYREF and CBC are multifunctional factors and interact with a number of partners. Ideally, separation-of-function mutants that specifically disrupt the CBC-ALYREF interaction but not others need to be identified in future studies in order to perform functional studies. 

      (3) About cryo-EM data processing and structural representation:

      (1) In the description of the cryo-EM data processing, the authors claimed they did heterogeneous refinement, homogenous refinement, and then local refinement. This reviewer is puzzled by this process because the normal procedure should be non-uniform refinement following homogenous refinement. If the authors did not perform non-uniform refinement, they should do it because it would significantly improve the quality and resolution of cryo-EM maps. In addition, the right local refinement should include mask files and only show the density/map of the local region. 

      We thank the reviewer for the suggestions. In response to the reviewer’s comment on the preferred orientation issue (point 5 below), we reprocessed the cryo-EM data and obtained significantly improved cryo-EM maps. In this revised manuscript, the CBC-mALYREF map was refined using homogeneous refinement; the CBC map was refined using homogenous refinement followed by non-uniform refinement. Refinement masks are included in Figure 2-figure supplement1. 

      (2) Further local refinements with signal subtraction should be performed to improve the density and resolution of mALYREF2. 

      We tested local refinement with or without signal subtraction using masks covering mALYREF2 and various regions of CBC. Unfortunately, this approach did not improve the density of mALYREF2. We suspect that the small size of mALYREF2 (77 residues for the RRM domain) and the intrinsic flexibility of CBC are the limiting factors in these attempts. 

      (3) Figures with cryoEM map showing the side chains of the residues on the CBC-mALYREF2 interface should be included to strengthen the claims. Authors could add the map to Figure 3b/c or present it as a supplementary figure.

      We include new supplementary figures (Figure 3-figure supplement 1) to show the electron densities corresponding to the views in Figure 3B and 3C. Residues labeled in Figure 3B and 3C are shown in sticks in these supplementary figures.

      (4) For cryo-EM date processing, the authors have omitted lots of important details. Could the authors elaborate on the data processing with more details in the corresponding Figure and Methods Sections? Only one abi-initial model from the picked good particles was displayed in the figure. Are there any other different conformations of 3D classes for the dataset? In addition, too few classes have been considered in 3D classification, more classes may give a class with better density and resolution.

      We thank the reviewer for the comments. We have reprocessed the cryo-EM data. A major change is to use Topaz for particle picking. We now include more details for data processing in Figure 2-figure supplement 1 and the method section. The cryo-EM sample is relatively uniform. Ab-initio reconstruction and heterogenous refinement yielded only one good class and the other classes are “junk” classes (omitted in the workflow figure). No major conformational changes were observed throughout the multiple rounds of heterogenous refinement for both CBC and CBCmALYREF2. In this revised manuscript, we have been able to obtain significantly improved maps through the new data processing strategy employing Topaz as illustrated in Figure 2-figure supplement 1 to 5.

      (5) Angular distribution plots should be included to show if there is a preferred orientation issue. Based on the presented maps in validation reports, there may exist a preferred orientation issue for the reported two cryo-EM maps. Detailed 3D-Histogram and directional FSC plots for all the cryo-EM maps using 3DFSC web server should be presented to show the overall qualities (https://www.nature.com/articles/nmeth.4347 and https://3dfsc.salk.edu/).

      We thank the reviewer for the recommendations. In response to the reviewer’s comment on the preferred orientation issue, we reprocessed the cryo-EM data. Topaz was used for particle picking instead of template picking. 3DFSC analyses indicate that the new CBC-mALREF2 map has a sphericity of 0.946, which is a significant improvement from the previous map which has a sphericity of 0.815. Consistently, the maps presented in this revised manuscript show significantly improved densities. We now include angular distribution and 3DFSC analyses of the EM maps (Figure 2-figure supplement 2 and 4). 

      (6) Figures of model-to-map FSCs need to be present to demonstrate the quality of the models and the corresponding ones (model resolution when FSC=0.5) should also be included in Table 1. The accuracy of the model is important for structural explanations and description.

      The model-to-map FSCs are now included in Figure 2-figure supplement 3A and 5A. The model resolutions of CBC-mALYREF2 and CBC are estimated to be 3.5 Å and 3.6 Å at an FSC of 0.5. These numbers are now included in Table 1. 

      (7) In addition, figures of local density maps with different regions of the models, showing side chains, are necessary and important to justify the claimed resolutions. 

      We now include density maps overlayed with residue side chains at various regions. For the CBCmALYREF2 map, density maps are shown at the mALYREF2-NCBP1 interfaces (Figure 3-figure supplement 1A and 1B), mALYREF2-NCBP2 interface (Figure 3-figure supplement 1C), NCBP1NCPB2 interface (Figure 2-figure supplement 5B), and the region near m7G (Figure 2-figure supplement 5C). For the CBC map, density maps are shown at the NCBP1-NCPB2 interface (Figure 2-figure supplement 3B) and the region near m7G (Figure 2-figure supplement 3C). 

      Minor points:

      (1) A figure superimposing the models from the CBC-mALYREF2 amp and mALYREF2 alone map is necessary to present that there are no other CBC binding-induced conformational changes in CBC except the claimed by the authors. In addition, a figure showing the density of m7GpppG should be included as well.  

      Overlay of CBC and CBC-mALYREF2 models is now presented in Figure 2-figure supplement 3D. Comparing CBC and CBC-mALYREF2, NCBP1 and NCBP2 have a RMSD of 0.32 Å and 0.30 Å, respectively. The density maps near the M7G cap analog are shown in Figure 2-figure supplement 3C for CBC and Figure 2-figure supplement 5C for CBC-mALYREF2. 

      (2) Authors obtained the two maps from one dataset, so "we first determined" and "we next determined" (page 6) should be replaced with something like "One class of 3D cryo-EM map revealed' and "Another class of 3D cryo-EM map defined". 

      We have revised the text as suggested by the reviewer.  

      (3) In 'Abstract', 'a mRNA export factor' should be 'an mRNA export factor'. 

      Corrected in the revised manuscript.

      (4) In 'Abstract', the final sentence 'Comparison of CBC- ALYREF to other CBC and ALYREF containing cellular complexes provides insights into the coordinated events during mRNA transcription, splicing, and export' doesn't read smoothly, I would suggest revising it to 'Comparing CBC-ALYREF with other cellular complexes containing CBC and/or ALYREF components provides insight into the coordinated events during mRNA transcription, splicing, and export.' 

      We thank the reviewer for the recommendation and have revised accordingly. 

      (5) In paragraph 'CBC-ALYREF and viral hijacking of host mRNA export pathway', line 6, the sentences preceding and following the term 'However' indicate a progressive or parallel relationship, rather than a transitional one. To enhance the coherence, I would suggest replacing 'However' with 'Furthermore' or 'In addition'. 

      Corrected in the revised manuscript.

      (6) In both Figure 5 and Figure 6, the depicted models are proposed and constructed exclusively through the comparison of the CBC-partial ALYREF with other cellular complexes containing components of CBC and/or ALYREF, which need to be confirmed by more studies. To prevent potential confusion and misunderstandings, it is recommended to replace the term 'model' with 'proposed model'. 

      Corrected in the revised manuscript.

      Reviewer #3 (Recommendations for the Authors):

      Major points:

      (1) In the Results and Discussion section, the authors mentioned "Recombinant human ALYREF protein was shown to interact with the CBC in RNase-treated nuclear extracts." However, they used mouse ALYREF for cryo-EM investigations. Can the authors include an explanation for this choice during the revision?  

      In our work, we used a mixture of glutamic acid and arginine to increase the solubility of GSTALYREF. For cryo-EM studies, we use untagged ALYREF to avoid potential issues that may arise from the expression tag. However, untagged ALYREF is less soluble than GST-tagged ALYREF and is not suitable for structural studies in standard buffers. We have made further clarification on this point in this revised manuscript. 

      (2) In the paragraph on "CBC-ALYREF interfaces", the authors stated "For example, E97 forms salt bridges with K330 and K381 of NCBP1. Y135 on the α2 helix of mALYREF2 makes a hydrogen bond with K330 of NCBP1. The importance of this interface between ALYREF and NCBP1 is highlighted by a K330N mutation found in human uterine corpus endometrial carcinoma." I fail to see a strong connection between their structural observations and previous findings regarding the role of a K330N mutation found in human uterine corpus endometrial carcinoma. The authors should add more words to thread these two parts.  

      In response to the reviewer’s comment, we now move the discussion of these CBC mutants to the newly added “Conclusion and perspectives” section. 

      (3) The authors should include side chains of the residues in their figure of Local resolution estimation and FSC curves, especially when they are presenting the binding interface between two components. 

      We have now included density maps that are overlayed with structural models showing side chains of critical residues. These maps include the NCBP1-mALYREF2 interfaces (Figure 3-figure supplement 1A and 1B), NCBP2-mALYREF2 interface (Figure 3-figure supplement 1C), NCBP1NCBP2 interface (Figure 2-figure supplement 3B and 5B), and the m7G cap region (Figure 2figure supplement 3C and 5C). 

      Minor points: 

      (1) Some grammatical mistakes need to be corrected. For example, it is "an mRNA" instead of "a mRNA".  

      Corrected in the revised manuscript.

      (2) The authors can provide more information for the audience to know better about ALYREF when it first appears in the 5th line in the Abstract section. For example, "It promotes mRNA export through direct interaction with ALYREF, a key mRNA export factor, ...". 

      We have revised the sentence based on the reviewer’s comment.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      Some of the data is problematic and does not always support the authors' conclusions:

      (1) Fig. 1K and H are identical.

      Thank you for pointing out this problem in manuscript. We apologize for this unintentional mistake and have replaced Fig. 1K.

      (2) The graph in Figure 2B contradicts the text. It is not obvious how the image was quantified to produce the histological score graph..

      We thank the reviewer for pointing out this problem in manuscript, as the reviewer suggested, we have replaced the Figure 2B.

      (3) In Figures 2C and D, there is no clear pattern of changes in pro-inflammatory or anti-inflammatory cytokines, despite the authors' claims in the text.

      We appreciate the comment, we think the reason is that the level of cytokines in the tissue is low, so the pattern of changes is not obvious.

      (4) It is unclear why the anti-dsDNA antibody does not stain the nucleus in Figure 4B. The staining with anti-dsDNA and DAPI does not match well. Figure 5H shows there is still lots of cytosolic DNA in OGT-/- HCF-1-C, measured by DAPI. These data do not support the authors' conclusion that HCFC600 eliminates cytosolic DNA accumulation (line 229). There is no support for the authors' claim that HCF-1 restrains the cGAS-STING pathway (line 330).

      We thank these insightful comments, the most critical step in staining cytosolic DNA is to proceed to a low-permeabilization as to allow the antibody to cross the cellular membrane but not the nuclear membrane, that’s why the anti-dsDNA antibody does not stain the nucleus. In Figure 5H, we think we used a high concentrated DAPI to do the staining and nucleus DNA get stained, looks like it’s the cytosolic DNA. 

      (5) In Figure 5B, there is no increase in HCF-1 cleavage after OGT over-expression.

      We appreciate the reviewer for his/her comment, we think the reason is that we used the cell line to stably overexpress OGT-GFP and we may have missed the time point when the increase in HCF-1 cleavage occurred, so there is no big increase of it. However, there is a significant increase in Figure 5C.

      (6) In Figure 7, the TNF-a staining does not inspire confidence.

      We thank the reviewer for his/her comment, from both Figure 7K (MC38 tumor model) and Figure 7N (LLC tumor model), we observed a significant increase in TNF-α+ CD8+ T cells in the group treated with the combination of OSMI-1 and anti-PD-L1 compared to the control group, as evidenced by the clear clustering.

      The writing needs significant improvement:

      (1) There are multiple English grammar mistakes throughout the paper. It is recommended that the authors run the manuscript through an editing service.

      We thank the reviewer for his/her suggestion. We apologize for the poor language of our manuscript. We worked on the manuscript for a long time and the repeated addition and removal of sentences and sections obviously led to poor readability. We have now worked on both language and readability and have also involved native English speakers for language corrections. We really hope that the flow and language level have been substantially improved.

      (2) Some passages are misleading -- lines 161-162, line 217, lines 241-242, 263-264, 299-300. They need to be changed substantially.

      We apologize for these mistakes, we have changed them.

      (3) Figure legends should be rewritten. Currently, they are too abbreviated to be understood.

      We apologize for that, we have rewritten them.

      (4) Discussion should also be thoroughly reworked. Currently, it is merely restating the authors' findings. The authors should put their findings in the broader context of the field.

      We apologize for that. For a better understanding of our study, we have reworked the discussion.

      Reviewer #2 (Recommendations For The Authors):

      (1) Previous studies (DOI: 10.1093/nar/gkw663, 10.1016/j.jgg.2015.07.002, 10.1016/j.dnarep.2022.103394) have suggested that OGT deficiency triggers DNA damage, connecting it to DNA repair and maintenance through various mechanisms. This should be acknowledged in the manuscript. Conversely, the role of HCF1 and its cleaved products in maintaining genomic integrity hasn't been previously shown. The authors investigate HCF1's role solely in the context of OGT inhibition. It is unclear whether this is also true under other stimuli that trigger DNA damage, whether fragments of HCF1 specifically reduce DNA damage, or if HSF1 is involved in the basal machinery that would be defective only in the absence of OGT.

      We have acknowledged the manuscript mentioned above. In this paper we focused on the OGT function, which is related to HCF1. The role of HCF1 and its cleaved products in maintaining genomic integrity is an interesting topic, we may focus on it in next project.

      (2) In villin-CRE-deficient mice, the authors observe generic inflammation in the intestine unrelated to tumor development. It's unclear if this also occurs in the presence of OGT inhibitors in mice, whether these inhibitors induce a systemic inflammatory (Type I interferon) response, or if certain tissues like the intestine or proliferating tumor cells are more susceptible to such a response.

      We thank the comment, yes, investigating whether OGT inhibitors induce an inflammatory response, either systemically or tissue-specifically, is a very interesting project to focus on. However, in our current paper, we use a genetic method to identify the role of OGT deficiency in intestine inflammation-induced tumor development. This approach provides convincing evidence for our hypothesis. We may test the effect of OGT inhibitors on inflammation and tumor development in our next project.

      (3) Another critical observation is the magnitude of the interferon response triggered by DNA damage in the OGT-deficient models. While it's known that DNA damage can activate cGAS-STING, the response's extent in the absence of OGT prompts the question of whether additional OGT-specific features could explain this phenomenon. For example, Lamin A, essential for nuclear envelope integrity and shown to be O-glycosylated (DOI: 10.3390/cells7050044), and other components of the nuclear envelope or its repair might be affected by OGT. The impact of OGT inhibition on nuclear envelope integrity compared to other DNA-damaging agents could be explored.

      We appreciate the comment, in this project, we find an OGT binding protein, HCF1, though LC–MS/MS assay, it’s a top one candidate in binding profiles, so we focus on it. Like Lamin A and other components of the nuclear envelope still are good targets to check, we may explore these in our next project.

      (4) The authors also demonstrate a correlation between OGT expression in tumors compared to healthy tissues. However, the reason is unclear, raising questions about whether this is a consequence of proliferation or metabolic deregulation in the cancer. The authors should address this aspect.

      We appreciate the reviewer’s insightful point. It is very good questions and very interesting research. However, in this paper we focused on how OGT influence its downstream molecules to promote tumor, we didn’t check why OGT is increased in tumors, it is not the scope of this current work, we would love to investigate it in the future.

      Minor points

      Please add the legend to Figures S2, S3 and S5.

      We thank the comment, we have added the legend to Figures S2, S3 and S5.

      The sentence line 137 should be clarified as OGT deficiency seems more related to increased inflammation in this model.

      We thank the comment, we have corrected the sentence line 137.

      Line 732 has a ( typo before the number 34.

      We thank the comment, we have corrected the sentence line 732.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      In this important study, the authors manually assessed randomly selected images published in eLife between 2012 and 2020 to determine whether they were accessible for readers with deuteranopia, the most common form of color vision deficiency. They then developed an automated tool designed to classify figures and images as either "friendly" or "unfriendly" for people with deuteranopia. While such a tool could be used by publishers, editors or researchers to monitor accessibility in the research literature, the evidence supporting the tools' utility was incomplete. The tool would benefit from training on an expanded dataset that includes different image and figure types from many journals, and using more rigorous approaches when training the tool and assessing performance. The authors also provide code that readers can download and run to test their own images. This may be of most use for testing the tool, as there are already several free, user-friendly recoloring programs that allow users to see how images would look to a person with different forms of color vision deficiency. Automated classifications are of most use for assessing many images, when the user does not have the time or resources to assess each image individually.

      Thank you for this assessment. We have responded to the comments and suggestions in detail below. One minor correction to the above statement: the randomly selected images published in eLife were from articles published between 2012 and 2022 (not 2020).

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      The authors of this study developed a software application, which aims to identify images as either "friendly" or "unfriendly" for readers with deuteranopia, the most common color-vision deficiency. Using previously published algorithms that recolor images to approximate how they would appear to a deuteranope (someone with deuteranopia), authors first manually assessed a set of images from biology-oriented research articles published in eLife between 2012 and 2022. The researchers identified 636 out of 4964 images as difficult to interpret ("unfriendly") for deuteranopes. They claim that there was a decrease in "unfriendly" images over time and that articles from cell-oriented research fields were most likely to contain "unfriendly" images. The researchers used the manually classified images to develop, train, and validate an automated screening tool. They also created a user-friendly web application of the tool, where users can upload images and be informed about the status of each image as "friendly" or "unfriendly" for deuteranopes.

      Strengths:

      The authors have identified an important accessibility issue in the scientific literature: the use of color combinations that make figures difficult to interpret for people with color-vision deficiency. The metrics proposed and evaluated in the study are a valuable theoretical contribution. The automated screening tool they provide is well-documented, open source, and relatively easy to install and use. It has the potential to provide a useful service to the scientists who want to make their figures more accessible. The data are open and freely accessible, well documented, and a valuable resource for further research. The manuscript is well written, logically structured, and easy to follow.

      We thank the reviewer for these comments.

      Weaknesses:

      (1) The authors themselves acknowledge the limitations that arise from the way they defined what constitutes an "unfriendly" image. There is a missed chance here to have engaged deuteranopes as stakeholders earlier in the experimental design. This would have allowed [them] to determine to what extent spatial separation and labelling of problematic color combinations responds to their needs and whether setting the bar at a simulated severity of 80% is inclusive enough. A slightly lowered barrier is still a barrier to accessibility.

      We agree with this point in principle. However, different people experience deuteranopia in different ways, so it would require a large effort to characterize these differences and provide empirical evidence about many individuals' interpretations of problematic images in the "real world." In this study, we aimed to establish a starting point that would emphasize the need for greater accessibility, and we have provided tools to begin accomplishing that. We erred on the side of simulating relatively high severity (but not complete deuteranopia). Thus, our findings and tools should be relevant to some (but not all) people with deuteranopia. Furthermore, as noted in the paper, an advantage of our approach is that "by using simulations, the reviewers were capable of seeing two versions of each image: the original and a simulated version." We believe this step is important in assessing the extent to which deuteranopia could confound image interpretations. Conceivably, this could be done with deuteranopes after recoloration, but it is difficult to know whether deuteranopes would see the recolored images in the same way that non-deuteranopes see the original images. It is also true that images simulating deuteranopia may not perfectly reflect how deuteranopes see those images. It is a tradeoff either way. We have added comments along these lines to the paper.

      (2) The use of images from a single journal strongly limits the generalizability of the empirical findings as well as of the automated screening tool itself. Machine-learning algorithms are highly configurable but also notorious for their lack of transparency and for being easily biased by the training data set. A quick and unsystematic test of the web application shows that the classifier works well for electron microscopy images but fails at recognizing red-green scatter plots and even the classical diagnostic images for color-vision deficiency (Ishihara test images) as "unfriendly". A future iteration of the tool should be trained on a wider variety of images from different journals.

      Thank you for these comments. We have reviewed an additional 2,000 images, which were randomly selected from PubMed Central. We used our original model to make predictions for those images. The corresponding results are now included in the paper.

      We agree that many of the images identified as being "unfriendly" are microscope images, which often use red and green dyes. However, many other image types were identified as unfriendly, including heat maps, line charts, maps, three-dimensional structural representations of proteins, photographs, network diagrams, etc. We have uploaded these figures to our Open Science Framework repository so it's easier for readers to review these examples. We have added a comment along these lines to the paper.

      The reviewer mentioned uploading red/green scatter plots and Ishihara test images to our Web application and that it reported they were friendly. Firstly, it depends on the scatter plot. Even though some such plots include green and red, the image's scientific meaning may be clear. Secondly, although the Ishihara images were created as informal tests for humans, these images (and ones similar to them) are not in eLife journal articles (to our knowledge) and thus are not included in our training set. Thus, it is unsurprising that our machine-learning models would not classify such images correctly as unfriendly.

      (3) Focusing the statistical analyses on individual images rather than articles (e.g. in figures 1 and 2) leads to pseudoreplication. Multiple images from the same article should not be treated as statistically independent measures, because they are produced by the same authors. A simple alternative is to instead use articles as the unit of analysis and score an article as "unfriendly" when it contains at least one "unfriendly" image. In addition, collapsing the counts of "unfriendly" images to proportions loses important information about the sample size. For example, the current analysis presented in Fig. 1 gives undue weight to the three images from 2012, two of which came from the same article. If we perform a logistic regression on articles coded as "friendly" and "unfriendly" (rather than the reported linear regression on the proportion of "unfriendly" images), there is still evidence for a decrease in the frequency of "unfriendly" eLife articles over time.

      Thank you for taking the time to provide these careful insights. We have adjusted these statistical analyses to focus on articles rather than individual images. For Figure 1, we treat an article as "Definitely problematic" if any image in the article was categorized as "Definitely problematic." Additionally, we no longer collapse the counts to proportions, and we use logistic regression to summarize the trend over time. The overall conclusions remain the same.

      Another issue concerns the large number of articles (>40%) that are classified as belonging to two subdisciplines, which further compounds the image pseudoreplication. Two alternatives are to either group articles with two subdisciplines into a "multidisciplinary" group or recode them to include both disciplines in the category name.

      Thank you for this insight. We have modified Figure 2 so that it puts all articles that have been assigned two subdisciplines into a "Multidisciplinary" category. The overall conclusions remain the same.

      (4) The low frequency of "unfriendly" images in the data (under 15%) calls for a different performance measure than the AUROC used by the authors. In such imbalanced classification cases the recommended performance measure is precision-recall area under the curve (PR AUC: https://doi.org/10.1371%2Fjournal.pone.0118432) that gives more weight to the classification of the rare class ("unfriendly" images).

      We now calculate the area under the precision-recall curve and provide these numbers (and figures) alongside the AUROC values (and figures). We agree that these numbers are informative; both metrics lead to the same overall conclusions.

      Reviewer #2 (Public Review):

      Summary:

      An analysis of images in the biology literature that are problematic for people with a color-vision deficiency (CVD) is presented, along with a machine learning-based model to identify such images and a web application that uses the model to flag problematic images. Their analysis reveals that about 13% of the images could be problematic for people with CVD and that the frequency of such images decreased over time. Their model yields 0.89 AUC score. It is proposed that their approach could help making biology literature accessible to diverse audiences.

      Strengths:

      The manuscript focuses on an important yet mostly overlooked problem, and makes contributions both in expanding our understanding of the extent of the problem and in developing solutions to mitigate the problem. The paper is generally well-written and clearly organized. Their CVD simulation combines five different metrics. The dataset has been assessed by two researchers and is likely to be of high-quality. Machine learning algorithm used (convolutional neural network, CNN) is an appropriate choice for the problem. The evaluation of various hyperparameters for the CNN model is extensive.

      We thank the reviewer for these comments.

      Weaknesses:

      The focus seems to be on one type of CVD (deuteranopia) and it is unclear whether this would generalize to other types.

      We agree that it would be interesting to perform similar analyses for protanopia and other color-vision deficiencies. But we leave that work for future studies.

      The dataset consists of images from eLife articles. While this is a reasonable starting point, whether this can generalize to other biology/biomedical articles is not assessed.

      This is an important point. We have reviewed an additional 2,000 images, which were randomly selected from PubMed Central, and used our original model to make predictions for those images. The corresponding results are now included in the paper.

      "Probably problematic" and "probably okay" classes are excluded from the analysis and classification, and the effect of this exclusion is not discussed.

      We now address this in the Discussion section.

      Machine learning aspects can be explained better, in a more standard way.

      Thank you. We address this comment in our responses to your comments below.

      The evaluation metrics used for validating the machine learning models seem lacking (e.g., precision, recall, F1 are not reported).

      We now provide these metrics (in a supplementary file).

      The web application is not discussed in any depth.

      The paper includes a paragraph about how the Web application works and which technologies we used to create it. We are unsure which additional aspects should be addressed.

      Reviewer #3 (Public Review):

      Summary:

      This work focuses on accessibility of scientific images for individuals with color vision deficiencies, particularly deuteranopia. The research involved an analysis of images from eLife published in 2012-2022. The authors manually reviewed nearly 5,000 images, comparing them with simulated versions representing the perspective of individuals with deuteranopia, and also evaluated several methods to automatically detect such images including training a machine-learning algorithm to do so, which performed the best. The authors found that nearly 13% of the images could be challenging for people with deuteranopia to interpret. There was a trend toward a decrease in problematic images over time, which is encouraging.

      Strengths:

      The manuscript is well organized and written. It addresses inclusivity and accessibility in scientific communication, and reinforces that there is a problem and that in part technological solutions have potential to assist with this problem.

      The number of manually assessed images for evaluation and training an algorithm is, to my knowledge, much larger than any existing survey. This is a valuable open source dataset beyond the work herein.

      The sequential steps used to classify articles follow best practices for evaluation and training sets.

      We thank the reviewer for these comments.

      Weaknesses:

      I do not see any major issues with the methods. The authors were transparent with the limitations (the need to rely on simulations instead of what deuteranopes see), only capturing a subset of issues related to color vision deficiency, and the focus on one journal that may not be representative of images in other journals and disciplines.

      We thank the reviewer for these comments. Regarding the last point, we have reviewed an additional 2,000 images, which were randomly selected from PubMed Central, and used our original model to make predictions for those images. The corresponding results are now included in the paper.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      N/A

      Thank you.

      Reviewer #2 (Recommendations For The Authors):

      - The web application link can be provided in the Abstract for more visibility.

      We have added the URL to the Abstract.

      - They focus on deuteranopia in this paper. It seems that protanopia is not considered. Why? What are the challenges in considered this type of CVD?

      We agree that it would be interesting to perform similar analyses for protanopia and other color-vision deficiencies. But we leave that work for future studies. Deuteranopia is the most common color-vision deficiency, so we focused on the needs of these individuals as a starting point.

      - The dataset is limited to eLife articles. More discussion of this limitation is needed. Couldn't one also include some papers from PMC open access dataset for comparison?

      We have reviewed an additional 2,000 images, which we randomly selected from PubMed Central, and used our original model to make predictions for those images. The corresponding results are now included in the paper.

      - An analysis of the effect of selecting a severity value of 0.8 can be included.

      We agree that this would be interesting, but we leave it for future work.

      - "Probably problematic" and "probably okay" classes are excluded from analysis, which may oversimplify the findings and bias the models. It would have been interesting to study these classes as well.

      We agree that this would be interesting, but we leave it for future work. However, we have added a comment to the Discussion on this point.

      - Some machine learning aspects are discussed in a non-standard way. Class weighting or transfer learning would not typically be considered hyperparameters."corpus" is not a model. Description of how fine-tuning was performed could be clearer.

      We have updated this wording to use more appropriate terminology to describe these different "configurations." Additionally, we expanded and clarified our description of fine tuning.

      - Reporting performance on the training set is not very meaningful. Although I understand this is cross-validated, it is unclear what is gained by reporting two results. Maybe there should be more discussion of the difference.

      We used cross validation to compare different machine-learning models and configurations. Providing performance metrics helps to illustrate how we arrived at the final configurations that we used. We have updated the manuscript to clarify this point.

      - True positives, false positives, etc. are described as evaluation metrics. Typically, one would think of these as numbers that are used to calculate evaluation metrics, like precision (PPV), recall (sensitivity), etc. Furthermore, they say they measure precision, recall, precision-recall curves, but I don't see these reported in the manuscript. They should be (especially precision, recall, F1).

      We have clarified this wording in the manuscript.

      - There are many figures in the supplementary material, but not much interpretation/insights provided. What should we learn from these figures?

      We have revised the captions and now provide more explanations about these figures in the manuscript.

      - CVD simulations are mentioned (line 312). It is unclear whether these methods could be used for this work and if so, why they were not used. How do the simulations in this work compare to other simulations?

      This part of the manuscript refers to recolorization techniques, which attempt to make images more friendly to people with color vision deficiencies. For our paper, we used a form of recolorization that simulates how a deuteranope would see a figure in its original form. Therefore, unless we misunderstand the reviewer's question, these two types of simulation have distinct purposes and thus are not comparable.

      - relu -> ReLU

      We have corrected this.

      Reviewer #3 (Recommendations For The Authors):

      The title can be more specific to denote that the survey was done in eLife papers in the years 2012-2022. Similarly, this should be clear in the abstract instead of only "images published in biology-oriented research articles".

      Thank you for this suggestion. Because we have expanded this work to include images from PubMed Central papers, we believe the title is acceptable as it stands. We updated the abstract to say, "images published in biology- and medicine-oriented research articles"

      Two mentions of existing work that I did not see are to Jambor and colleagues' assessment on color accessibility in several fields: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8041175/, and whether this work overlaps with the 'JetFighter' tool

      (https://elifesciences.org/labs/c2292989/jetfighter-towards-figure-accuracy-and-accessibility).

      Thank you for bringing these to our attention. We have added a citation to Jambor, et al.

      We also mention JetFighter and describe its uses.

      Similarly, on Line 301: Significant prior work has been done to address and improve accessibility for individuals with CVD. This work can be generally categorized into three types of studies: simulation methods, recolorization methods, and estimating the frequency of accessible images.

      - One might mention education as prior work as well, which might in part be contributing to a decrease in problematic images (e.g., https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8041175/)

      We now suggest that there are four categories and include education as one of these.

      Line 361, when discussing resources to make figures suitable, the authors may consider citing this paper about an R package for single-cell data: https://elifesciences.org/articles/82128

      Thank you. We now cite this paper.

      The web application is a good demonstration of how this can be applied, and all code is open so others can apply the CNN in their own uses cases. Still, by itself, it is tedious to upload individual image files to screen them. Future work can implement this into a workflow more typical to researchers, but I understand that this will take additional resources beyond the scope of this project. The demonstration that these algorithms can be run with minimal resources in the browser with tensorflow.js is novel.

      Thank you.

      General:

      It is encouraging that 'definitely problematic' images have been decreasing over time in eLife. Might this have to do with eLife policies? I could not quickly find if eLife has checks in place for this, but given that JetFighter was developed in association with eLife, I wonder if there is an enhanced awareness of this issue here vs. other journals.

      This is possible. We are not aware of a way to test this formally.

    1. Reviewer #1 (Public Review):

      Summary:

      A nice study trying to identify the relationship between E. coli O157 from cattle and humans in Alberta, Canada.

      Strengths:

      (1) The combined human and animal sampling is a great foundation for this kind of study.

      (2) Phylogenetic analyses seem to have been carried out in a high-quality fashion.

      Weaknesses:

      I think there may be a problem with the selection of the isolates for the primary analysis. This is what I'm thinking:

      (1) Transmission analyses are strongly influenced by the sampling frame.

      (2) While the authors have randomly selected from their isolate collections, which is fine, the collections themselves are not random.

      (3) The animal isolates are likely to represent a broad swathe of diversity, because of the structured sampling of animal reservoirs undertaken (as I understand it).

      (4) The human isolates are all from clinical cases. Clinical cases of the disease are likely to be closely related to other clinical cases, because of outbreaks (either detected, or undetected), and the high ascertainment rate for serious infections.

      (5) Therefore, taking an equivalent number of animal and clinical isolates, will underestimate the total diversity in the clinical isolates because the sampling of the clinical isolates is less "independent" (in the statistical sense) than sampling from the animal isolates.

      (6) This could lead to over-estimating of transmission from cattle to humans.

      (7) "We hypothesize that the large proportion of disease associated with local transmission systems is a principal cause of Alberta's high E. coli O157:H7 incidence" - this seems a bit tautological. There is a lot of O157 because there's a lot of transmission. What part of the fact it is local means that it is a principal cause of high incidence? It seems that they've observed a high rate of local transmission, but the reasons for this are not apparent, and hence the cause of Alberta's incidence is not apparent. Would a better conclusion not be that "X% of STEC in Alberta is the result of transmission of local variants"? And then, this poses a question for future epi studies of what the transmission pathway is.

    2. Author response:

      Reviewer #1 (Public Review):

      Summary:

      A nice study trying to identify the relationship between E. coli O157 from cattle and humans in Alberta, Canada.

      Strengths:

      (1) The combined human and animal sampling is a great foundation for this kind of study.

      (2) Phylogenetic analyses seem to have been carried out in a high-quality fashion.

      Weaknesses:

      I think there may be a problem with the selection of the isolates for the primary analysis. This is what I'm thinking:

      (1) Transmission analyses are strongly influenced by the sampling frame.

      (2) While the authors have randomly selected from their isolate collections, which is fine, the collections themselves are not random.

      (3) The animal isolates are likely to represent a broad swathe of diversity, because of the structured sampling of animal reservoirs undertaken (as I understand it).

      (4) The human isolates are all from clinical cases. Clinical cases of the disease are likely to be closely related to other clinical cases, because of outbreaks (either detected, or undetected), and the high ascertainment rate for serious infections.

      (5) Therefore, taking an equivalent number of animal and clinical isolates, will underestimate the total diversity in the clinical isolates because the sampling of the clinical isolates is less "independent" (in the statistical sense) than sampling from the animal isolates.

      (6) This could lead to over-estimating of transmission from cattle to humans.

      We appreciate the reviewer’s careful thoughts about our sampling strategy. We agree with points (1) and (2), and we will provide additional details on the animal collections as requested.

      We agree with point (3) in theory but not in fact. As shown in Figure 3a, the cattle isolates were very closely related, despite the temporal and geographic breadth of sampling within Alberta. The median SNP distance between cattle sequences was 45 (IQR 36-56), compared to 54 (IQR 43-229) SNPs between human sequences from cases in Alberta during the same years. Additionally, as shown in Figure 2, only clade A and B isolates – clades that diverge substantially from the rest of the tree – were dominated by human cases in Alberta. We will better highlight this evidence in the revision.

      We agree with the reviewer in point (4) that outbreaks can be an important confounder of phylogenetic inference. This is why we down-sampled outbreaks (based on genetic relatedness, not external designation) in our extended analyses (lines 192-194). We did not do this in the primary analysis, because there were no large clusters of identical isolates. Figure 3b shows a limited number of small clusters; however, clustered cattle isolates outnumbered clustered human isolates, suggesting that any bias would be in the opposite direction the reviewer suggests. Regarding severe cases being oversampled among the clinical isolates, this is absolutely true and a limitation of all studies utilizing public health reporting data. We will make this limitation to generalizability clearer in the discussion. However, as noted above, clinical isolates were more variable than cattle isolates, so it does not appear to have heavily biased the analysis.

      We disagree with the reviewer on point (5). While the bias toward severe cases could make the human isolates less independent, the relative sampling proportions are likely to induce greater distance between clinical isolates than cattle isolates, which is exactly what we observe (see response to point (3) above). Cattle are E. coli O157:H7’s primary reservoir, and humans are incidental hosts not able to sustain infection chains long-term. Not only is the bacteria prevalent among cattle, cattle are also highly prevalent in Alberta. Thus, even with 89 sampling points, we are still capturing a small proportion of the E. coli O157:H7 in the province. Being able to sample only a small proportion of cattle’s E. coli O157:H7 increases the likelihood of only sampling from the center of the distribution, making extreme cases such as that shown at the very bottom of the tree in Figure 3b, rare and important. In comparison, sampling from human cases constitutes a higher proportion of human infections relative to cattle, and is therefore more representative of the underlying distribution, including extremes. We will add this point to the limitations. As with the clustering above, if anything, this outcome would have biased the study away from identifying cattle as the primary reservoir. Additionally, the relatively small proportion of cattle sampled makes our finding that 15.7% of clinical isolates were within 5 SNPs of a cattle isolate, the distance most commonly used to indicate transmission for E. coli O157:H7, all the more remarkable.

      Because of the aforementioned points, we disagree with the reviewer’s conclusion in point (6). We believe transmission from cattle-to-humans is likely underestimated for the reasons given above. Not only do all prior studies indicate ruminants as the primary reservoirs of E. coli O157:H7, and humans as only incidental hosts, our specific data do not support the reviewer’s individual contentions. That said, we will conduct a sensitivity analysis as recommended to determine the impact of sampling and inclusion of the small clusters on our primary findings.

      (7) We hypothesize that the large proportion of disease associated with local transmission systems is a principal cause of Alberta's high E. coli O157:H7 incidence" - this seems a bit tautological. There is a lot of O157 because there's a lot of transmission. What part of the fact it is local means that it is a principal cause of high incidence? It seems that they've observed a high rate of local transmission, but the reasons for this are not apparent, and hence the cause of Alberta's incidence is not apparent. Would a better conclusion not be that "X% of STEC in Alberta is the result of transmission of local variants"? And then, this poses a question for future epi studies of what the transmission pathway is.

      The reviewer is correct, and the suggestion for the direction of future studies was our intent with this statement. We will revise it.

      Reviewer #2 (Public Review):

      This study identified multiple locally evolving lineages transmitted between cattle and humans persistently associated with E. coli O157:H7 illnesses for up to 13 years. Furthermore, this study mentions a dramatic shift in the local persistent lineages toward strains with the more virulent stx2a-only profile. The authors hypothesized that this phenomenon is the large proportion of disease associated with local transmission systems is a principal cause of Alberta's high E. coli O157:H7 incidence. These opinions more effectively explain the role of the cattle reservoir in the dynamics of E. coli O157:H7 human infections.

      (1) The authors acknowledge the possibility of intermediate hosts or environmental reservoirs playing a role in transmission. Further discussion on the potential roles of other animal species commonly found in Alberta (e.g., sheep, goats, swine) could enhance the understanding of the transmission dynamics. Were isolates from these species available for analysis? If not, the authors should clearly state this limitation.

      We will expand the discussion of other species in Alberta, as suggested, including other livestock, wildlife, and the potential role of birds and flies. Unfortunately, we did not have sequences available from other species, and we will add this to the limitations. Sequences from other species may be available from sequences collected by others, which as we note in the limitations do not have sufficient metadata to assign them to Alberta vs. the rest of Canada. While we have requested this data, we have been unsuccessful in obtaining it. We will continue to pursue it.

      (2) The focus on E. coli O157:H7 is understandable given its prominence in Alberta and the availability of historical data. However, a brief discussion on the potential applicability of the findings to non-O157 STEC serogroups, and the limitations therein, would be beneficial. Are there reasons to believe the transmission dynamics would be similar or different for other serogroups?

      We appreciate this comment and will expand our discussion of relevance to non-O157 STEC. Other authors have proposed that transmission dynamics differ, and studies of STEC risk factors, including our own, support this. However, there has been very little direct study of non-O157 transmission dynamics and there is even less cross-species genomic and metadata available for non-O157 isolates of concern.

      (3) The authors briefly mention the need for elucidating local transmission systems to inform management strategies. A more detailed discussion on specific public health interventions that could be targeted at the identified LPLs and their potential reservoirs would strengthen the paper's impact.

      We agree with the reviewer that this would be a good addition to the manuscript. The public health implications for control are several and extend to non-STEC reportable zoonotic enteric infections, such as Campylobacter and Salmonella. We will add a discussion of these.

      (4) Understanding the relationship between specific risk factors and E. coli O157:H7 infections is essential for developing effective prevention strategies. Have case-control or cohort studies been conducted to assess the correlation between identified risk factors and the incidence of E. coli O157:H7 infections? What methodologies were employed to control for potential confounders in these studies?

      Yes, there have been several case-control studies of reported cases. Many of these are referenced in the discussion in terms of the contribution of different sources to infection. However, we will add a more explicit discussion of risk factors.

      (5) The study's findings are noteworthy, particularly in the context of E. coli O157:H7 epidemiology. However, the extent to which these results can be replicated across different temporal and geographical settings remains an open question. It would be constructive for the authors to provide additional data that demonstrate the replication of their sampling and sequencing experiments under varied conditions. This would address concerns regarding the specificity of the observed patterns to the initial study's parameters.

      We appreciate the reviewer’s comment, as we are currently building on this analysis with an American dataset with different types of data available than were used in this study. We will add a discussion of this. We will also be adding a sensitivity analysis to the manuscript simulating a different sampling approach, which should also be informative to this question.

    1. Reviewer #2 (Public Review):

      Summary:

      The goal is to ask if common species when studied across their range tend to have larger ranges in total. To do this the authors examined a very large citizen science database which gives estimates of numbers, and correlated that with the total range size, available from Birdlife. The average correlation is positive but close to zero, and the distribution around zero is also narrow, leading to the conclusion that, even if applicable in some cases, there is no evidence for consistent trends in one or other direction.

      Strengths:

      The study raises a dormant question, with a large dataset.

      Weaknesses:

      This study combines information from across the whole world, with many different habitats, taxa, and observations, which surely leads to a quite heterogeneous collection.

      First, scale. Many of the earlier analyses were within smaller areas, and for example, ranges are not obviously bounded by a physical barrier. I assume this study is only looking at breeding ranges; that should be stated, as 40% of all bird species migrate, and winter limitation of populations is important. Also are abundances only breeding abundances or are they measured through the year? Are alien distributions removed?

      Second, consider various reasons why abundance and range size may be correlated (sometimes positively and sometimes negatively) at large scales. Combining studies across such a large diversity of ecological situations seems to create many possibilities to miss interesting patterns. For example:

      (1) Islands are small and often show density release.

      (2) North temperate regions have large ranges (Rapoport's rule) and higher population sizes than the tropics.

      (3) Body size correlates with global range size (I am unsure if this has recently been tested but is present in older papers) and with density. For example, cosmopolitan species (barn owl, osprey, peregrine) are relatively large and relatively rare.

      (4) In the consideration of alien species, it certainly looks to me as if the law is followed, with pigeon, starling, and sparrow both common and widely distributed. I guess one needs to make some sort of statement about anthropogenic influences, given the dramatic changes in both populations and environments over the past 50 years.

      (5) Wing shape correlates with ecological niche and range size (e.g. White, American Naturalist). Aerial foraging species with pointed wings are likely to be easily detected, and several have large ranges reflecting dispersal (e.g. barn swallow).

      Third, biases. I am not conversant with ebird methodology, but the number appearing on checklists seems a very poor estimate of local abundance. As noted in the paper, common species may be underestimated in their abundance. Flocking species must generate large numbers, skulking species few. The survey is often likely to be in areas favorable to some species and not others. The alternative approach in the paper comes from an earlier study, based on ebird but then creating densities within grids and surely comes with similar issues.<br /> Biases are present in range as well. Notably, tropical mountain-occupying species have range sizes over-estimated because holes in the range are not generally accounted for (Ocampo-Peñuela et al Nature communications). These species are often quite rare too.

      Fourth, random error. Random error in ebird assessments is likely to be large, with differences among observers, seasons, days, and weather (e.g. Callaghan et al. 2021, PNAS). Range sizes also come with many errors, which is why occupancy is usually seen as the more appropriate measure.

      If we consider both range and abundance measurements to be subject to random error in any one species list, then the removal of all these errors will surely increase the correlation for that list (the covariance shouldn't change but the variances will decrease). I think (but am not sure) that this will affect the mean correlation because more of the positive correlations appear 'real' given the overall mean is positive. It will definitely affect the variance of the correlations; the low variance is one of the main points in the paper. A high variance would point to the operation of multiple mechanisms, some perhaps producing negative correlations (Blackburn et al. 2006).

      On P.80 it is stated: "Specifically, we can quantify how AOR will change in relation to increases in species richness and sampling duration, both of which are predicted to reduce the magnitude of AORs" I haven't checked the references that make this statement, but intuitively the opposite is expected? More species and longer durations should both increase the accuracy of the estimate, so removing them introduces more error? Perhaps dividing by an uncertain estimate introduces more error anyway. At any rate, the authors should explain the quoted statement in this paper.

      It would be of considerable interest to look at the extreme negative and extreme positive correlations: do they make any biological sense?

      Discussion:

      I can see how publication bias can affect meta-analyses (addressed in the Gaston et al. 2006 paper) but less easily see how confirmation bias can. It seems to me that some of the points made above must explain the difference between this study and Blackburn et al. 2006's strong result.

      Certainly, AOR really does seem to be present in at least some cases (e.g. British breeding birds) and a discussion of individual cases would be valuable. Previous studies have also noted that there are at least some negative and some non-significant associations, and understanding the underlying causes is of great interest (e.g. Kotiaho et al. Biology Letters).

    1. Textbook authors also never invite students to critique their own work. Again, our Mississippi textbook shows this can be done. For example, we noted that only four of our twenty-five mini-biographies were of women. “Has the book therefore been guilty of discrimination against women?” we then asked. Such a question implies that students can think for themselves, which then helps them learn to do so. When students are not asked to assess, but only to remember, they do not learn how to assess or how to think for themselves.

      It is not easy to crtitque your own work in a way such as these authors did. However, by stating in their book that "only four of our twenty-five mini-biographies were of women," shows that it is okay to admit your faults. Nobody is perfect and it is foolish to illustrate yourself as such. Another benefit of this particular group making these statements is that it draws the student to look closer at these types of things. To ask questions, such as, "Out of these authors, how many are women? How many are of a different race?" While these questions may cause some backlash for "discrimination", they are valid questions for this instance. As long as you are not using gender or race in a hateful way, it is okay to observe these things. It is common sense that people of different genders and races might have different opinions, life styles, experiences, and so much more.

    1. Late work may be accepted with a request for extension which was submitted up to 48 hours before the due date.

      This is good to know that we are able to receive extensions. I know with papers I tend to reach a writers block, or sometimes need extra time to re-read my paper and make sure it is up to my standards. I think that this is very beneficial to the student, and you as a professor. I think with having that policy allows writers to be more comfortable with turning in their work that is completed, it also doesn't waste your time either, by reading/grading a paper that could've used a little more work.

    2. In this course I need you to be brave. You will read things that may make you uncomfortable. You will discuss difficult topics. This will stretch the boundaries of what you may think you are capable of to new levels.

      I am looking forward to writing about topics that are more uncomfortable. I feel like, as students, we focus a lot on writing reports and more analytical projects. I hope that this class allows us to have a more vulnerable perspective on writing.

    1. Author response:

      The following is the authors’ response to the current reviews.

      The concerns raised during the review have been incorporated into the discussion of the results, and the need for further research is acknowledged in the paper. This is not possible in the present study, as the clinical project has been completed and further patients cannot be enrolled without starting a new project. We are confident that the results are scientifically valid and that the methodology was scientifically sound and up to date. They were obtained on a dataset that was obviously large enough to allow 20% of it to be set aside and a machine-learned classifier to be trained on the remaining 80%, which then assigned samples to neuropathy with an accuracy better than guessing.

      Furthermore, our results are at least tentatively replicated in a completely independent data set from another patient cohort. The strengths and limitations of the study design, in particular the latter, are discussed in the necessary depth. In summary, the machine-learned results provided major hits on one side and probably unimportant lipids on the other side of the variable importance scale. Both could be verified in vitro. We are therefore confident that we have contributed to the advancement of knowledge about cancer therapy-associated neuropathy and look forward to further developments in this area.


      The following is the authors’ response to the original reviews.

      Weaknesses Reviewer 1: 

      There are a number of weaknesses in the study. The small sample size is a significant limitation of the study. Out of 31 patients, only 17 patients were reported to develop neuropathy, with significant neuropathy (grade 2/3) in only 5 patients. The authors acknowledge this limitation in the results and discussion sections of the manuscript, but it limits the interpretation of the results. Also acknowledged is the limited method used to assess neuropathy. 

      We agree with the reviewer that the cohort size and assessment of neuropathy are limitations of our study as we already described in the corresponding section of the manuscript. However, occurrence and grade of the neuropathy are in line with results reported from previous studies. From these studies, the expected occurrence of neuropathy with our therapeutic regimen is around 50-70% (54.9% in our cohort), and most patients (80-90%) are expected to experience Grade 1 neuropathy after 12 weeks (13). In these studies, neuropathy is assessed by using questionnaires or by grading via NCTCTCAE as in our study. In summary, assessment and occurrence of neuropathy of our reported cohort are in line with previous reports.

      Potentially due to this small number of patients with neuropathy, the machine learning algorithms could not distinguish between samples with and without neuropathy. Only selected univariate analyses identified differences in lipid profiles potentially related to neuropathy.  

      The data analysis consistently followed a "mixture of experts" approach, as this seems to be the most successful way to deal with omics data. We have elaborated on this in the Methods section, including several supporting references. Regarding the quoted sentence from the results section, after rereading it, we realized that it was somewhat awkwardly worded. What we mean is now better worded in the results section, namely “Although the three algorithms detected neuropathy in new cases, unseen during training, at balanced accuracy of up to 0.75, while only the guess level of 0.5 was achieved when using permuted data for training, the 95% CI of the performance measures was not separated from guess level”. Therefore, multivariate feature selection was not considered a valid approach, since it requires that the algorithms from which the feature importance is read can successfully perform their task of class assignment (4). Therefore, univariate methods (Cohen's d, FPR, FWE) were preferred, as well as a direct hypothesis transfer of the top hits from the abovementioned day1/2 assessments to neuropathy. Classical statistics consisting of direct group comparisons using Kruskal-Wallis tests (5) were performed.” 

      It was our approach to investigate the data set in an unbiased manner by different machine learning algorithms and select those lipids that the majority of the algorithms considered important for distinguishing the patient groups (majority voting). This way, the inconsistencies and limitations of a single evaluation method, such as regression analysis, that occur in some datasets, can be mitigated. 

      Three sphingolipid mediators including SA1P differed between patients with and without neuropathy at the end of treatment. These sphingolipids were elevated at the end of treatment in the cohort with neuropathy, relative to those without neuropathy. However, across all samples from pre to post-paclitaxel treatment, there was a significant reduction in SA1P levels. It is unclear from the data presented what the underlying mechanism for this result would be. 

      We agree with the reviewer that our study does not identify the mechanism by which paclitaxel treatment alters sphingolipid concentrations in the plasma of patients. It has been reported before that paclitaxel may increase expression and activity of serine palmitoyltransferase (SPT) which is the crucial enzyme and rate-limiting step in the denovo synthesis of sphingolipids. This may be associated with a shift towards increased synthesis of 1-deoxysphingolipids and a decrease of “classical” sphingolipids (6) and may explain the general reduction of SA1P and other sphingolipid levels after paclitaxel treatment in our study. 

      It is also conceivable that paclitaxel reduces the release of sphingolipids into the plasma. Paclitaxel is a microtubule stabilizing agent (7) that may interfere with intracellular transport processes and release of paracrine mediators. 

      The mechanistic details of paclitaxel involvement in sphingolipid metabolism or transport are highly interesting but identifying them is beyond the scope of our manuscript.

      If elevated SA1P is associated with neuropathy development, it would be expected to increase in those who develop neuropathy from pre to post-treatment time points. 

      There is a general trend of reduced plasma SA1P concentrations following paclitaxel treatment. Nevertheless, patients experiencing neuropathy exhibit significantly elevated SA1P levels post-treatment. 

      It has been shown before that paclitaxel-induced neuropathic pain requires activation of the S1P1 receptor in a preclinical study (8). Moreover, a meta-analysis of genome-wide association studies (GWAS) from two clinical cohorts identified multiple regulatory elements and increased activity of S1PR1 associated with paclitaxel-induced neuropathy (9). These data imply that enhanced S1P receptor activity and signaling are key drivers of paclitaxel-induced neuropathy. It seems that both, increased levels of the sphingolipid ligands in combination with enhanced expression and activity of S1P receptors can potentiate paclitaxel-induced neuropathy in patients. This explains why also decreased SA1P concentrations after paclitaxel treatment can still enhance neuropathy via the S1PRTRPV1 axis in sensory neurons.

      We added this paragraph to the discussions section of our manuscript.

      Primary sensory neuron cultures were used to examine the effects of SA1P application.

      SA1P application produced calcium transients in a small proportion of sensory neurons. It is not clear how this experimental model assists in validating the role of SA1P in neuropathy development as there is no assessment of sensory neuron damage or other hallmarks of peripheral neuropathy. These results demonstrate that some sensory neurons respond to SA1P and that this activity is linked to TRPV1 receptors. However, further studies will be required to determine if this is mechanistically related to neuropathy.

      As we detected elevated levels of SA1P in the plasma of PIPN patients, we can assume higher concentrations in the vicinity of sensory neurons. These neurons are the main drivers for neuropathy and neuropathic pain and are strongly affected by paclitaxel in their activity (10-15). Also, TRPV1 shows altered activity patterns in response to paclitaxel treatment (16). Because of its relevance for nociception and pathological pain, TRPV1 activity is a suitable and representative readout for pathological pain states in peripheral sensory neurons (17, 18), which is why we investigated them.

      We would like to point out the potency of SA1P to increase capsaicin-induced calciumtransients in sensory neurons at submicromolar concentrations. 

      We also agree with the reviewer that further studies need to investigate the underlying mechanisms in more detail. We added this sentence to the final paragraph in the discussion section of our manuscript.

      Weaknesses Reviewer 2: 

      The article is poorly written, hindering a clear understanding of core results. While the study's goals are apparent, the interpretation of sphingolipids, particularly SA1P, as key mediators of paclitaxel-induced neuropathy lacks robust evidence. 

      We agree that the relevance of SA1P as key mediator of paclitaxel-induced neuropathy might be overstated and changed the wording throughout the manuscript accordingly. However, we would like to point out the potency of this lipid to increase capsaicin-induced calcium-transients in sensory neurons at submicromolar concentrations. 

      Also, the lipid signature in the plasma of PIPN patients shows a unique pattern and sphingolipids are the group that showed the strongest alterations when comparing the patient groups. We also measured eicosanoids, such as prostaglandins, linoleic acid metabolites, endocannabinoids and other lipid groups that have previously been associated with influences on pain perception or nociceptor sensitization. However, none of these lipids showed significant differences in their concentrations in patient plasma. This is why we consider sphingolipids as contributors to or markers of paclitaxel-induced neuropathy in patients.

      We also revised the entire article to improve its clarity.

      The introduction fails to establish the significance of general neuropathy or peripheral neuropathy in anticancer drug-treated patients, and crucial details, such as the percentage of patients developing general neuropathy or peripheral neuropathy, are omitted. This omission is particularly relevant given that only around 50% of patients developed neuropathy in this study, primarily of mild Grade 1 severity with negligible symptoms, contradicting the study's assertion of CIPN as a significant side effect. 

      As we already described in the introduction, CIPN is a serious dose- and therapy-limiting side effect, which affects up to 80% of treated patients. This depends on dose and combination of chemotherapeutic agents. For paclitaxel, therapeutic doses range from 80 – 225 mg/m². As CIPN symptoms are dose-dependent, the number of PIPN patients that receive a high paclitaxel dose is higher than the number of PIPN patient receiving a low dose.

      In our study, we mainly used a low dose paclitaxel, because this therapeutic regimen is the most widely used paclitaxel monotherapy. From previous studies, the expected occurrence of neuropathy with this therapeutic regimen is around 50-70%, and most patients (8090%) are expected to experience Grade 1 neuropathy after 12 weeks (1-3).

      Our results are within the range reported by these studies (54.9% patients with neuropathy). Also, as we highlight in Table S1, the neuropathy symptoms persist in most cases for several years after chemotherapy, affecting quality of life of these patients which makes it far from being a negligible symptom.

      We added some more information concerning PIPN in the introduction section in which we emphasize the clinical problem.

      The lack of clarity in distinguishing results obtained by lipidomics using machine learning methods and conventional methods adds to the confusion. The poorly written results section fails to specify SA1P's downregulation or upregulation, and the process of narrowing down to sphingolipids and SA1P is inadequately explained. 

      We have tried to keep the machine learning part in the main manuscript short and moved major parts of it to a supplement. However, as this has been claimed to have led to a lack of clarity, we have expanded the description of the data analysis and added extensive explanations and supporting references for the mixed expert approach that was used throughout the analysis. We hope this is now clear.

      Integrating a significant portion of the discussion section into the results section could enhance clarity. An explanation of the utility of machine learning in classifying patient groups over conventional methods and the citation of original research articles, rather than relying on review articles, may also add clarity to the usefulness of the study. 

      As suggested by the reviewer, we moved the relevant parts from the discussion to the results section in the revised version of our manuscript.

      Reviewer #1 (Recommendations For The Authors): 

      Figure 2 should be better explained or removed. In its current form, it does not add to the interpretation of the manuscript.  

      As mentioned above, we have expanded the description of the ESOM/U-matrix method in the Methods section and rewritten the figure legend. In addition, we have annotated the U-matrix in the figure. The method has been reported extensively in the computer science and biomedical literature, and a more detailed description in the referenced papers would go beyond the current focus on lipidomics. However, we believe that this discussion is sufficiently detailed for the readers of this report: "… a second unsupervised approach was used to verify the agreement between the lipidomics data structure and the prior classification, implemented as self-organizing maps (SOM) of artificial neurons (19). In the special form of an “emergent” SOM (ESOM (20)), the present map consisted of 4,000 neurons arranged on a two-dimensional toroidal grid with 50 rows and 80 columns (21, 22). ESOM was used because it has been repeatedly shown to correctly detect subgroup structures in biomedical data sets comparable to the present one (20, 22, 23). The core principle of SOM learning is to adjust the weights of neurons based on their proximity to input data points. In this process, the best matching unit (BMU) is identified as the neuron closest to a given data point. The adaptation of the weights is determined by a learning rate (η) and a neighborhood function (h), both of which gradually decrease during the learning process. Finally, the groups are projected onto separate regions of the map. On top of the trained ESOM, the distance structure in the high-dimensional feature space was visualized in the form of a so-called U-matrix (24) which is the canonical tool for displaying the distance structures of input data on ESOM (21). 

      The visual presentation facilitates data group separation by displaying the distances between BMUs in high-dimensional space in a color-coding that uses a geographical map analogy, where large "heights" represent large distances in feature space, while low "valleys" represent data subsets that are similar. "Mountain ranges" with "snow-covered" heights visually separate the clusters in the data. Further details about ESOM can be found in (24)."

      The second patient cohort is only included in the discussion - with cohort details in the supplementary material and figures included in the main text. Perhaps these data should be removed entirely. The findings are described as trends and not statistically significant and multiple issues with this second cohort are mentioned in the discussion. 

      We agree with the reviewer that including the second patient cohort in the discussion is inadequate. Of course, there are differences between the patient cohorts that do not allow direct comparison and that are highlighted in the section on limitations of the study. However, we still think it is interesting and relevant to show these data, because we used our algorithms trained on the first patient cohort to analyze the second cohort. And these data support the main results. 

      We therefore moved the entire paragraph to the results section of to improve coherence of our manuscript. The passage was introduced with the subheading:  “Support of the main results in an independent second patient cohort”.

      The title does not reflect the content of the paper and should be changed to better reflect the content and its significance. 

      We change the title to “Machine learning and biological validation identify sphingolipids as potential mediators of paclitaxel-induced neuropathy in cancer patients” to avoid overstating the results as suggested by the Reviewer.

      Further, the discussion should be modified to avoid overstating the results. 

      As the reviewer suggests, we changed the wording to avoid overstating the results. 

      Reviewer #2 (Recommendations For The Authors): 

      Please address the absence of clear neuropathy in the majority of patients after treatment with paclitaxel in your discussion. 

      As stated above, occurrence and grade of the neuropathy are in line with the results from previous studies. From these studies, the expected occurrence of neuropathy with our therapeutic regimen is around 50-70%, (the variability is due to differences in the assessment methods) and most patients (80-90%) are expected to experience Grade 1 neuropathy after 12 weeks (1-3). 

      We added this information in the discussion section of the revised manuscript.

      Line 65: Kindly replace review articles with original research articles for proper citation. 

      We replaced the review articles with original publications, focusing on clinical observations. We added the following publications: Jensen et al., Front Neurosci 2020; Chen et al., Neurobiol Aging 2018; Igarashi et al., J Alzheimers Dis. 2011; Kim et al., Oncotarget 2017 as references 17-20 in the revised version of our manuscript.

      Line 260: The mention of SA1P is introduced here without prior reference (do not use words like "again", or "see above", if it is not previously mentioned). Adjust the text for coherence.

      We agree with the reviewer that the introduction of SA1P in this passage in incoherent. We replaced the sentence in line 260 with: 

      The small set of lipid mediators emerging from all three methods as informative for neuropathy included the sphingolipid sphinganine-1-phosphate (SA1P), also known as dihydrosphingosine-1-phosphate (DH-S1P)…”

      Lines 301-315: Consider relocating several lines from this section to the results section for improved clarity. 

      We moved the lines 309-312 explaining the algorithm selection and their validation success in the corresponding results section (Lipid mediators informative for assigning postpaclitaxel therapy samples to neuropathy).

      Lines 382-396: Move this content to the results section to enhance the organization and coherence of the manuscript. 

      We moved the entire paragraph to the results section of our manuscript to improve coherence. The passage was introduced with the subheading:  “Support of the main results in an independent second patient cohort”.

      References

      (1) Barginear M, Dueck AC, Allred JB, Bunnell C, Cohen HJ, Freedman RA, et al. Age and the Risk of Paclitaxel-Induced Neuropathy in Women with Early-Stage Breast Cancer (Alliance A151411): Results from 1,881 Patients from Cancer and Leukemia Group B (CALGB) 40101. Oncologist. 2019;24(5):617-23.

      (2) Mauri D, Kamposioras K, Tsali L, Bristianou M, Valachis A, Karathanasi I, et al. Overall survival benefit for weekly vs. three-weekly taxanes regimens in advanced breast cancer: A metaanalysis. Cancer Treat Rev. 2010;36(1):69-74.

      (3) Budd GT, Barlow WE, Moore HC, Hobday TJ, Stewart JA, Isaacs C, et al. SWOG S0221: a phase III trial comparing chemotherapy schedules in high-risk early-stage breast cancer. J Clin Oncol. 2015;33(1):58-64.

      (4) Lötsch J, and Ultsch A. Pitfalls of Using Multinomial Regression Analysis to Identify ClassStructure-Relevant Variables in Biomedical Data Sets: Why a Mixture of Experts (MOE) Approach Is Better. BioMedInformatics. 2023;3(4):869-84.

      (5) Kruskal WH, and Wallis WA. Use of Ranks in One-Criterion Variance Analysis. J Am Stat Assoc. 1952;47(260):583-621.

      (6) Kramer R, Bielawski J, Kistner-Griffin E, Othman A, Alecu I, Ernst D, et al. Neurotoxic 1deoxysphingolipids and paclitaxel-induced peripheral neuropathy. FASEB J. 2015;29(11):4461-72.

      (7) Field JJ, Diaz JF, and Miller JH. The binding sites of microtubule-stabilizing agents. Chem Biol. 2013;20(3):301-15.

      (8) Janes K, Little JW, Li C, Bryant L, Chen C, Chen Z, et al. The development and maintenance of paclitaxel-induced neuropathic pain require activation of the sphingosine 1-phosphate receptor subtype 1. J Biol Chem. 2014;289(30):21082-97.

      (9) Chua KC, Xiong C, Ho C, Mushiroda T, Jiang C, Mulkey F, et al. Genomewide Meta-Analysis Validates a Role for S1PR1 in Microtubule Targeting Agent-Induced Sensory Peripheral Neuropathy. Clin Pharmacol Ther. 2020;108(3):625-34.

      (10) Kawakami K, Chiba T, Katagiri N, Saduka M, Abe K, Utsunomiya I, et al. Paclitaxel increases high voltage-dependent calcium channel current in dorsal root ganglion neurons of the rat. J Pharmacol Sci. 2012;120(3):187-95.

      (11) Pittman SK, Gracias NG, Vasko MR, and Fehrenbacher JC. Paclitaxel alters the evoked release of calcitonin gene-related peptide from rat sensory neurons in culture. Exp Neurol. 2013.

      (12) Luo H, Liu HZ, Zhang WW, Matsuda M, Lv N, Chen G, et al. Interleukin-17 Regulates NeuronGlial Communications, Synaptic Transmission, and Neuropathic Pain after Chemotherapy.

      Cell reports. 2019;29(8):2384-97 e5.

      (13) Pease-Raissi SE, Pazyra-Murphy MF, Li Y, Wachter F, Fukuda Y, Fenstermacher SJ, et al. Paclitaxel Reduces Axonal Bclw to Initiate IP3R1-Dependent Axon Degeneration. Neuron. 2017;96(2):373-86 e6.

      (14) Duggett NA, Griffiths LA, and Flatters SJL. Paclitaxel-induced painful neuropathy is associated with changes in mitochondrial bioenergetics, glycolysis, and an energy deficit in dorsal root ganglia neurons. Pain. 2017.

      (15) Li Y, Adamek P, Zhang H, Tatsui CE, Rhines LD, Mrozkova P, et al. The Cancer Chemotherapeutic Paclitaxel Increases Human and Rodent Sensory Neuron Responses to TRPV1 by Activation of TLR4. J Neurosci. 2015;35(39):13487-500.

      (16) Hara T, Chiba T, Abe K, Makabe A, Ikeno S, Kawakami K, et al. Effect of paclitaxel on transient receptor potential vanilloid 1 in rat dorsal root ganglion. Pain. 2013;154(6):882-9.

      (17) Jardin I, Lopez JJ, Diez R, Sanchez-Collado J, Cantonero C, Albarran L, et al. TRPs in Pain Sensation. Front Physiol. 2017;8:392.

      (18) Julius D. TRP Channels and Pain. Annual review of cell and developmental biology.

      2013;29:355-84.

      (19) Kohonen T. Self-Organized Formation of Topologically Correct Feature Maps. Biol Cybern. 1982;43(1):59-69.

      (20) Lötsch J, Lerch F, Djaldetti R, Tegder I, and Ultsch A. Identification of disease-distinct complex biomarker patterns by means of unsupervised machine-learning using an interactive R toolbox (Umatrix). Big Data Analytics. 2018;3(1):5.

      (21) Ultsch A. 2003.

      (22) Lotsch J, Geisslinger G, Heinemann S, Lerch F, Oertel BG, and Ultsch A. Quantitative sensory testing response patterns to capsaicin- and ultraviolet-B-induced local skin hypersensitization in healthy subjects: a machine-learned analysis. Pain. 2018;159(1):11-24.

      (23) Lötsch J, Thrun M, Lerch F, Brunkhorst R, Schiffmann S, Thomas D, et al. Machine-Learned Data Structures of Lipid Marker Serum Concentrations in Multiple Sclerosis Patients Differ from Those in Healthy Subjects. Int J Mol Sci. 2017;18(6).

      (24) Lötsch J, and Ultsch A. Cham: Springer International Publishing; 2014:249-57.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this manuscript, Wu et al. introduce a novel approach to reactivate the Muller glia cell cycle in the mouse retina by simultaneously reducing p27Kip1 and increasing cyclin D1 using a single AAV vector. The approach effectively promotes Muller glia proliferation and reprograming without disrupting retinal structure or function. Interestingly, reactivation of the Muller glia cell cycle downregulates IFN pathway, which may contribute to the induced retinal regeneration. The results presented in this manuscript may offer a promising approach for developing Müller glia cell-mediated regenerative therapies for retinal diseases.

      Strengths:

      The data are convincing and supported by appropriate, validated methodology. These results are both technically and scientifically exciting and are likely to appeal to retinal specialists and neuroscientists in general.

      Weaknesses:

      There are some data gaps that need to be addressed.

      (1) Please label the time points of AAV injection, EdU labeling, and harvest in Figure 1B.

      We thank the reviewer for highlighting the lack of clarity in our experimental design. We will label all experiment timelines in the figures where appropriate in the revised version.

      (2) What fraction of Müller cells were transduced by AAV under the experimental conditions?

      We apologize for not clearly conveying the transduction efficiency. The retinal region adjacent to the injection site, typically near the central retina, exhibits a transduction efficiency of nearly 100%. In contrast, the peripheral retina shows a lower transduction efficiency compared to the central region. We will include the quantification of AAV transduction efficiency in the revised manuscript.

      The quantification of Edu+ MG or other markers was conducted in the area with the highest efficiency. 

      (3) It seems unusually rapid for MG proliferation to begin as early as the third day after CCA injection. Can the authors provide evidence for cyclin D1 overexpression and p27 Kip1 knockdown three days after CCA injection?

      In our pilot study, we tested the onset time of GFP expression from AAV-GFAP-GFP following intravitreal injection. We observed GFP expression in MG as early as two days post-infection. These findings will be included in the revised manuscript. Additionally, we plan to perform qPCR or Western blot analysis to confirm cyclin D1 overexpression and p27kip1 knockdown at the onset of Müller glia proliferation, which will also be included in the revised manuscript.

      (4) The authors reported that MG proliferation largely ceased two weeks after CCA treatment. While this is an interesting finding, the explanation that it might be due to the dilution of AAV episomal genome copies in the dividing cells seems far-fetched.

      We believe that the lack of durability in high Cyclin D1 and low p27kip1 levels in MG contributes to the cessation of their proliferation. A potential reason for the loss of high Cyclin D1 overexpression and p27kip1 knockdown during MG proliferation could be the dilution of the AAV episomal genome. However, testing this hypothesis is challenging. Instead, we plan to provide direct evidence in the revised manuscript by examining the levels of Cyclin D1 and p27kip1 in the retina treated with CCA before and after the peak of MG proliferation.

      Reviewer #2 (Public Review):

      This manuscript by Wu, Liao et al. reports that simultaneous knockdown of P27Kip1 with overexpression of Cyclin D can stimulate Muller glia to re-enter the cell cycle in the mouse retina. There is intense interest in reprogramming mammalian muller glia into a source for neurogenic progenitors, in the hopes that these cells could be a source for neuronal replacement in neurodegenerative diseases. Previous work in the field has shown ways in which mouse Muller glia can be neurogenically reprogrammed and these studies have shown cell cycle re-entry prior to neurogenesis. In other works, typically, the extent of glial proliferation is limited, and the authors of this study highlight the importance of stimulating large numbers of Muller glia to re-enter the cell cycle with the hopes they will differentiate into neurons. While the evidence for stimulating proliferation in this study is convincing, the evidence for neurogenesis in this study is not convincing or robust, suggesting that stimulating cell cycle-reentry may not be associated with increasing regeneration without another proneural stimulus.

      Below are concerns and suggestions.

      Intro:

      (1) The authors cite past studies showing "direct conversion" of MG into neurons. However, these studies (PMID: 34686336; 36417510) show EdU+ MG-derived neurons suggesting cell cycle re-entry does occur in these strategies of proneural TF overexpression.

      We thank the reviewer for pointing this out. We will revise the statement to "MG neurogenesis," which encompasses both direct conversion and Müller glia proliferation followed by neuronal differentiation.

      (2) Multiple citations are incorrectly listed, using the authors first name only (i.e. Yumi, et al; Levi, et al;). Studies are also incompletely referenced in the references.

      We apologize for the mistake with the reference. We will fix these mistakes in the revised version.

      Figure 1:

      (3) When are these experiments ending? On Figure 1B it says "analysis" on the end of the paradigm without an actual day associated with this. This is the case for many later figures too. The authors should update the paradigms to accurately reflect experimental end points.

      We thank the reviewer for highlighting the lack of clarity in our experimental design. We will label all experiment timelines in the figures where appropriate in the revised version.

      (4) Are there better representative pictures between P27kd and CyclinD OE, the EdU+ counts say there is a 3 fold increase between Figure 1D&E, however the pictures do not reflect this. In fact, most of the Edu+ cells in Figure 1E don't seem to be Sox9+ MG but rather horizontally oriented nuclei in the OPL that are likely microglia.

      Thanks to the reviewer for pointing this out. We will replace the image of Cyclin D1 which a better representative image.

      (5) Is the infection efficacy of these viruses different between different combinations (i.e. CyclinD OE vs. P27kd vs. control vs. CCA combo)? As the counts are shown in Figure 1G only Sox9+/Edu+ cells are shown not divided by virus efficacy. If these are absolute counts blind to where the virus is and how many cells the virus hits, if the virus efficacy varies in efficiency this could drive absolute differences that aren't actually biological.

      Because the AAV-GFAP-Cyclin D1 and AAV-GFAP-Cyclin D1-p27kip1 shRNA viruses do not carry a fluorescent reporter gene, we cannot easily measure viral efficacy in the same experiment. We believe that variations in viral efficacy cannot account for the significant differences in MG proliferation for two reasons: 1) We injected the same titer for all three viruses, and 2) Viral infection efficacy is very high, approaching 100% in the central retina. Nonetheless, to rule out the possibility that the differences in MG proliferation among the Cyclin D overexpression, p27kip1 knockdown, and CCA groups are due to variations in viral efficacy, we will include the p27kip1 knockdown and Cyclin D1 overexpression efficiencies for all four groups using qPCR and/or Western blot analysis in the revised manuscript.

      (6) According to the Jax laboratories, mice aren't considered aged until they are over 18months old. While it is interesting that CCA treatment does not seem to lose efficacy over maturation I would rephrase the findings as the experiment does not test this virus in aged retinas.

      Thank you to the reviewer for bringing this to our attention. We will void using “aged mice” in our revised manuscript.

      (7) Supplemental Figure 2c-d. These viruses do not hit 100% of MG, however 100% of the P27Kip staining is gone in the P27sh1 treatment, even the P27+ cell in the GCL that is likely an astrocyte has no staining in the shRNA 1 picture. Why is this?

      For Supplementary Figure 2c-d, we focused on the central area where knockdown efficiency was high, approaching 100%. We will replace this image with one that includes both high and low Müller glia transduction efficiency regions, clearly demonstrating the complete loss of p27kip1 staining in the area of high transduction efficiency.

      Figure 2

      (8) Would you expect cells to go through two rounds of cell cycle in such a short time? The treatment of giving Edu then BrdU 24 hours later would have to catch a cell going through two rounds of division in a very short amount of time. Again the end point should be added graphically to this figure.

      We thank the reviewer for raising this important point. While the typical cell cycle time for human cells is approximately 24 hours, we hypothesized that 24 hours would be the most likely timepoint to capture cells continuously progressing through the cell cycle. However, we acknowledge that we cannot exclude the possibility of some cells entering a second cell cycle at much later timepoints.

      In the revised manuscript, we will carefully qualify our conclusion to state that the majority of MG do not immediately undergo another cell division, rather than making a definitive statement. This more cautious phrasing will better reflect the limitations of the 24-hour timepoint and allow for the potential of a small subset of cells proceeding through additional rounds of division at later stages.

      Figure 3

      (9) I am confused by the mixing of ratios of viruses to indicate infection success. I know mixtures of viruses containing CCA or control GFP or a control LacZ was injected. Was the idea to probe for GFP or LacZ in the single cell data to see which cells were infected but not treated? This is not shown anywhere?

      The virus infection was not uniform across the entire retina. To mark the infection hotspots, we added 10% GFP virus to the mixture. Regions of the retina with low infection efficiency were removed by dissection and excluded from the scRNA-seq analysis. We apologize for not clearly explaining this methodological detail in the original text, and will update the Methods section accordingly.

      (10) The majority of glia sorted from TdTomato are probably not infected with virus. Can you subset cells that were infected only for analysis? Otherwise it makes it very hard to make population judgements like Figure 3E-H if a large portion are basically WT glia.

      This question is related to the last one. Since the regions with high virus infection efficiency were selectively dissected and isolated for analysis, the percentage of CCA-infected MG should constitute the majority in the scRNA-seq data.

      (11) Figure 3C you can see Rho is expressed everywhere which is common in studies like this because the ambient RNA is so high. This makes it very hard to talk about "Rod-like" MG as this is probably an artifact from the technique. Most all scRNA-seq studies from MG-reprogramming have shown clusters of "rods" with MG hybrid gene expression and these had in the past just been considered an artifact.

      We agree that the low levels of Rho in other MG clusters (such as quiescent, reactivated, and proliferating MG) are likely due to RNA contamination. However, the level of Rho in the rod-like MG is significantly higher than in the other clusters, indicating that this is unlikely to be solely due to contamination.

      As shown in Supplementary Figure 7A-C, a cluster of MG-rod hybrid cells (cluster C4) was present in all three experimental groups at similar ratios, and this hybrid cluster was excluded from further analysis. In contrast, the rod-like Müller glia (cluster C3) were predominantly found in the CCA and CCANT groups, suggesting a genuine response to CCA treatment.

      Furthermore, we will conduct Rho and Gnat1 RNA in situ hybridization on the dissociated retinal cells to further support the conclusion that rod-specific genes are upregulated in a subset of MG in the revised manuscript.

      (12) It is mentioned the "glial" signature is downregulated in response to CCA treatment. Where is this shown convincingly? Figure H has a feature plot of Glul , which is not clear it is changed between treatments. Otherwise MG genes are shown as a function of cluster not treatment.

      We will add box plots of several MG-specific genes to better illustrate the downregulation of the glial signature in the relevant cell cluster in the revised manuscript.

      Figure 4

      (13) The authors should be commended for being very careful in their interpretations. They employ the proper controls (Er-Cre lineage tracing/EdU-pulse chasing/scRNA-seq omics) and were very careful to attempt to see MG-derived rods. This makes the conclusion from the FISH perplexing. The few puncta dots of Rho and GNAT in MG are not convincing to this reviewer, Rho and GNAT dots are dense everywhere throughout the ONL and if you drew any random circle in the ONL it would be full of dots. The rigor of these counts also comes into question because some dots are picked up in MG in the INL even in the control case. This is confusing because baseline healthy MG do not express RNA-transcripts of these Rod genes so what is this picking up? Taken together, the conclusion that there are Rod-like MG are based off scRNA-seq data (which is likely ambient contamination) and these FISH images. I don't think this data warrants the conclusion that MG upregulate Rod genes in response to CCA.

      We performed RNA in situ hybridization on retinal sections because we aimed to correlate cell localization with rod gene expression. We understand the reviewer’s concern that the punctate signals of Rho and GNAT1 in the ONL MG may actually originate from neighboring rods. In the revised manuscript, we will conduct RNAscope on dissociated retinal cells to avoid this issue.

      Figure 5

      (14) Similar point to above but this Glul probe seems odd, why is it throughout the ONL but completely dark through the IPL, this should also be in astrocytes can you see it in the GCL? These retinas look cropped at the INL where below is completely black. The whole retinal section should be shown. Antibodies exist to GS that work in mouse along with many other MG genes, IHC or western blots could be done to better serve this point.

      Indeed, the GCL was cropped out in Figure 5 A-B. We have other images with all retinal layers, which we will use in the revised manuscript. Additionally, we will perform the GS antibody staining to demonstrate partial MG dedifferentiation following CCA treatment.

      Figure 6

      (15) Figure 6D is not a co-labeled OTX2+/ TdTomato+ cell, Otx2 will fill out the whole nucleus as can be seen with examples from other MG-reprogramming papers in the field (Hoang, et al. 2020; Todd, et al. 2020; Palazzo, et al. 2022). You can clearly see in the example in Figure 6D the nucleus extending way beyond Otx2 expression as it is probably overlapping in space. Other examples should be shown, however, considering less than 1% of cells were putatively Otx2+, the safer interpretation is that these cells are not differentiating into neurons. At least 99.5% are not.

      We have additional examples of Otx2+ Tdt+ Edu+ cells, which suggest that MG neurogenesis to Otx2+ cells does occur, despite the low efficiency. We will include these images in the revised manuscript.

      (16) Same as above Figure 6I is not convincingly co-labeled HuC/D is an RNA-binding protein and unfortunately is not always the clearest stain but this looks like background haze in the INL overlapping. Other amacrine markers could be tested, but again due to the very low numbers, I think no neurogenesis is occurring.

      We have additional examples of HuC/D+ Tdt+ Edu+ cells, which we will show in the revised manuscript.

      (17) In the text the authors are accidently referring to Figure 6 as Figure 7.

      We thank the reviewer for pointing out the mistake. We will correct the mistake in the revised manuscript.

      Figure 7

      (18) I like this figure and the concept that you can have additional MG proliferating without destroying the retina or compromising vision. This is reminiscent of the chick MG reprogramming studies in which MG proliferate in large numbers and often do not differentiate into neurons yet still persist de-laminated for long time points.

      General:

      (19) The title should be changed, as I don't believe there is any convincing evidence of regeneration of neurons. Understanding the barriers to MG cell-cycle re-entry are important and I believe the authors did a good job in that respect, however it is an oversell to report regeneration of neurons from this data.

      We thank the reviewer for the suggestion. We will consider changing the title in the revised manuscript.

      (20) This paper uses multiple mouse lines and it is often confusing when the text and figures switch between models. I think it would be helpful to readers if the mouse strain was added to graphical paradigms in each figure when a different mouse line is employed.

      We will label the mouse lines used in each experiment in the figures where appropriate.

    1. Reviewer #1 (Public Review):

      Summary:

      In this paper the researchers aimed to address whether bees causally understand string-pulling through a series of experiments. I first briefly summarize what they did:

      - In experiment 1, the researchers trained bees without string and then presented them with flowers in the test phase that either had connected or disconnected strings, to determine what their preference was without any training. Bees did not show any preference.

      - In experiment 2, bees were trained to have experience with string and then tested on their choice between connected vs. disconnected string.

      - Experiment 3 was similar except that instead of having one option which was an attached string broken in the middle, the string was completely disconnected from the flower.

      - In experiment 4, bees were trained on green strings and tested on white strings to determine if they generalize across color.

      - In experiment 5, bees were trained on blue strings and tested on white strings.

      - In experiment 6, bees were trained where black tape covered the area between the string and the flower (i.e. so they would not be able to see/ learn whether it was connected or disconnected).

      - In experiments 2-6, bees chose the connected string in the test phase.

      - In experiment 7, bees were trained as in expt 3 and then tested where string was either disconnected or coiled i.e. still being 'functional' but appearing different.

      - In experiment 8, bees were trained as before and then tested on string that was in a different coiled orientation, either connected or disconnected.

      - In experiments 7 and 8 the bees showed no preference.

      Strengths:

      I appreciate the amount of work that has gone into these experiments and think they are a nice, thorough set of experiments. I enjoyed reading the paper and felt that it was overall well-written and clear. I think experiment 1 shows that bees do not have an untrained understanding of the function of the string in this context. The rest of the experiments indicate that with training, bees have a preference for unbroken over broken string and likely use visual cues learned during training to make this choice. They also show that as in other contexts, bees readily generalize across different colors.

      The 'weaknesses' that I previously listed were dealt with by the authors in the revised version of the manuscript. I think the only point that we disagreed on was relating to the ecological relevance of the task to the bees.

      Here is my previous comment:

      I think the paper would be made stronger by considering the natural context in which the bee performs this behavior. Bees manipulate flowers in all kinds of contexts, and scrabble with their legs to achieve nectar rewards. Rather than thinking that it is pulling a string, my guess would be that the bee learns that a particular motor pattern within their usual foraging repertoire (scrabbling with legs), leads to a reward. I don't think this makes the behavior any less interesting - in fact, I think considering the behavior through an ecological lens can help make better sense of it.

      The authors disagreed, writing the following:

      "Here we respectfully disagree. The solving of Rubik s cube by humans could be said to be version of finger movements naturally required to open nuts or remove ticks from fur, but this is somewhat beside the point: it s not the motor<br /> sequences that are of interest, but the cognition involved. A general approach in work on animal intelligence and cognition is to deliberately choose paradigms that are outside the animals daily routines this is what we have done here, in asking whether there is means end comprehension in bee problem solving. Like comparable studies on this question in other animals, the experiments are designed to probe this question, not one of ecological validity."

      I think the difference would be that humans know that they are doing a rubik's cube whereas I do not think that the bee knows that it is pulling string- I think the bee thinks that it is foraging on a flower. Therefore, I stand by my statement that I think it's worth considering what the bee is experiencing in this task and how it relates to what it would be doing while foraging. I think that as animal cognition researchers we can design tasks that are distinct from what the animal would naturally encounter to ask specific questions about what they are thinking- but that we can never remove the ecological context since the animal will always be viewing the task through that lens. However, I think this may be a philosophical difference in opinion and I am happy with the manuscript as it stands.

    2. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      In this paper, the researchers aimed to address whether bees causally understand string-pulling through a series of experiments. I first briefly summarize what they did:

      - In experiment 1, the researchers trained bees without string and then presented them with flowers in the test phase that either had connected or disconnected strings, to determine what their preference was without any training. Bees did not show any preference.

      - In experiment 2, bees were trained to have experience with string and then tested on their choice between connected vs. disconnected string.

      - experiment 3 was similar except that instead of having one option which was an attached string broken in the middle, the string was completely disconnected from the flower.

      - In experiment 4, bees were trained on green strings and tested on white strings to determine if they generalize across color.

      - In experiment 5, bees were trained on blue strings and tested on white strings.

      - In experiment 6, bees were trained where black tape covered the area between the string and the flower (i.e. so they would not be able to see/ learn whether it was connected or disconnected).

      - In experiments 2-6, bees chose the connected string in the test phase.

      - In experiment 7, bees were trained as in experiment 3 and then tested where the string was either disconnected or coiled i.e. still being 'functional' but appearing different.

      - In experiment 8, bees were trained as before and then tested on a string that was in a different coiled orientation, either connected or disconnected.

      - In experiments 7 and 8 the bees showed no preference.

      Strengths:

      I appreciate the amount of work that has gone into this study and think it contains a nice, thorough set of experiments. I enjoyed reading the paper and felt that overall it was well-written and clear. I think experiment 1 shows that bees do not have an untrained understanding of the function of the string in this context. The rest of the experiments indicate that with training, bees have a preference for unbroken over broken string and likely use visual cues learned during training to make this choice. They also show that as in other contexts, bees readily generalize across different colors.

      Weaknesses:

      (1) I think there are 2 key pieces of information that can be taken from the test phase - the bees' first choice and then their behavior across the whole test. I think the first choice is critical in terms of what the bee has learned from the training phase - then their behavior from this point is informed by the feedback they obtain during the test phase. I think both pieces of information are worth considering, but their behavior across the entire test phase is giving different information than their first choice, and this distinction could be made more explicit. In addition, while the bees' first choice is reported, no statistics are presented for their preferences.

      We agree with the reviewer that the first choice is critical in terms of what the bumblebees have learned from the training phase. We analyzed the bees’ first choice in Table 1, and we added the tested videos. The entire connected and disconnected strings were glued to the floor, the bees were unable to move either the connected or disconnected strings, and avoid learning behavior during the tests. We added the data of bee's each choice in the Supplementary table.

      (2) It seemed to me that the bees might not only be using visual feedback but also motor feedback. This would not explain their behavior in the first test choice, but could explain some of their subsequent behavior. For example, bees might learn during training that there is some friction/weight associated with pulling the string, but in cases where the string is separated from the flower, this would presumably feel different to the bee in terms of the physical feedback it is receiving. I'd be interested to see some of these test videos (perhaps these could be shared as supplementary material, in addition to the training videos already uploaded), to see what the bees' behavior looks like after they attempt to pull a disconnected string.

      We added supplementary videos of testing phase. As noted in General Methods, both connected and disconnected strings were glued to the floor to prevent the air flow generated by flying bumblebees’ wings from changing the position of the string during the testing phase. The bees were unable to move either the connected or disconnected strings during the tests, and only attempted to pull them. Therefore, the difference in the friction/weight of pulling the both strings cannot be a factor in the test.

      (3) I think the statistics section needs to be made clearer (more in private comments).

      We changed the statistical analysis section as suggested by the reviewer.

      (4) I think the paper would be made stronger by considering the natural context in which the bee performs this behavior. Bees manipulate flowers in all kinds of contexts and scrabble with their legs to achieve nectar rewards. Rather than thinking that it is pulling a string, my guess would be that the bee learns that a particular motor pattern within their usual foraging repertoire (scrabbling with legs), leads to a reward. I don't think this makes the behavior any less interesting - in fact, I think considering the behavior through an ecological lens can help make better sense of it.

      Here we respectfully disagree. The solving of Rubik’s cube by humans could be said to be version of finger-movements naturally required to open nuts or remove ticks from fur, but this is somewhat beside the point: it’s not the motor sequences that are of interest, but the cognition involved. A general approach in work on animal intelligence and cognition is to deliberately choose paradigms that are outside the animals’ daily routines-this is what we have done here, in asking whether there is means-end comprehension in bee problem solving. Like comparable studies on this question in other animals, the experiments are designed to probe this question, not one of ecological validity.

      Reviewer #2 (Public Review):

      Summary:

      The authors wanted to see if bumblebees could succeed in the string-pulling paradigm with broken strings. They found that bumblebees can learn to pull strings and that they have a preference to pull on intact strings vs broken ones. The authors conclude that bumblebees use image matching to complete the string-pulling task.

      Strengths:

      The study has an excellent experimental design and contributes to our understanding of what information bumblebees use to solve a string-pulling task.

      Weaknesses:

      Overall, I think the manuscript is good, but it is missing some context. Why do bumblebees rely on image matching rather than causal reasoning? Could it have something to do with their ecology? And how is the task relevant for bumblebees in the wild? Does the test translate to any real-life situations? Is pulling a natural behaviour that bees do? Does image matching have adaptive significance?

      We appreciate the valuable comment from the reviewer. Our explanation, which we have now added to the manuscript, is as follows:

      “Different flower species offer varying profitability in terms of nectar and pollen to bumblebees; they need to make careful choices and learn to use floral cues to predict rewards (Chittka, 2017). Bumblebees can easily learn visual patterns and shapes of flower (Meyer-Rochow, 2019); they can detect stimuli and discriminate between differently coloured stimuli when presented as briefly as 25 ms (Nityananda et al., 2014). In contrast, causal reasoning involves understanding and responding to causal relationships. Bumblebees might favor, or be limited to, a visual approach, likely due to the efficiency and simplicity of processing visual cues to solve the string-pulling task. ”

      As above, it worth noting that our work is not designed as an ecological study, but one about the question of whether causal reasoning can explain how bees solve a string-pulling puzzle. We have a cognitive focus, in line with comparable studies on other animals. We deliberately chose a paradigm that is to some extent outside of the daily challenges of the animal.

      Reviewer #3 (Public Review):

      Summary:

      This paper presents bees with varying levels of experience with a choice task where bees have to choose to pull either a connected or unconnected string, each attached to a yellow flower containing sugar water. Bees without experience of string pulling did not choose the connected string above chance (experiment 1), but with experience of horizontal string pulling (as in the right-hand panel of Figure 4) bees did choose the connected string above chance (experiments 2-3), even when the string colour changed between training and test (experiments 4-5). Bees that were not provided with perceptual-motor feedback (i.e they could not observe that each pull of the string moved the flower) during training still learned to string pull and then chose the connected string option above chance (experiment 6). Bees with normal experience of string pulling then failed to discriminate between connected and unconnected strings when the strings were coiled or looped, rather than presented straight (experiments 7-8).

      Weaknesses:

      The authors have only provided video of some of the conditions where the bees succeeded. In general, I think a video explaining each condition and then showing a clip of a typical performance would make it much easier to follow the study designs for scholars. Videos of the conditions bees failed at would be highly useful in order to compare different hypotheses for how the bees are solving this problem. I also think it is highly important to code the videos for switching behaviours. When solving the connected vs unconnected string tasks, when bees were observed pulling the unconnected string, did they quickly switch to the other string? Or did they continue to pull the wrong string? This would help discriminate the use of perceptual-motor feedback from other hypotheses.

      We added the test videos as suggested by the reviewer, and we added the data for each bee's choice. However, both connected and disconnected strings were glued to the floor, and therefore perceptual-motor feedback was equal and irrelevant between the choices during the test.

      The experiments are also not described well, for my below comments I have assumed that different groups of bees were tested for experiments 1-8, and that experiment 6 was run as described in line 331, where bees were given string-pulling training without perceptual feedback rather than how it is described in Figure 4B, which describes bees as receiving string pulling training with feedback.

      We now added figures of Experiment 6 and 7 in the Figure 1B, and we mentioned that different groups of bees were tested for Experiments 1-9.

      The authors suggest the bees' performance is best explained by what they term 'image matching'. However, experiment 6 does not seem to support this without assuming retroactive image matching after the problem is solved. The logic of experiment 6 is described as "This was to ensure that the bees could not see the familiar "lollipop shape" while pulling strings....If the bees prefer to pull the connected strings, this would indicate that bees memorize the arrangement of strings-connected flowers in this task." I disagree with this second sentence, removing perceptual feedback during training would prevent bees memorising the lollipop shape, because, while solving the task, they don't actually see a string connected to a yellow flower, due to the black barrier. At the end of the task, the string is now behind the bee, so unless the bee is turning around and encoding this object retrospectively as the image to match, it seems hard to imagine how the bee learns the lollipop shape.

      We agree with the reviewer that while solving the task in the last step during training, the bees don't actually see a string connected to a yellow flower, due to the black barrier. Since the full shape is only visible after the pulling is completed and this requires the bee to “check back” on the entire display after feeding, to basically conclude “ this is the shape that I need to be looking for later”.

      Another possibility is that bumblebees might remember the image of the “lollipop shape” while training the bees in the first step, in which the “lollipop shape” was directly presented to the bumblebee in the early step of the training.

      We added the experiment suggested by the reviewer, and the result showed that when a green table was placed behind the string to obscure the “lollipop shape” at any point during the training phase, the bees were unable to identify the connected string. The result further supports that bumblebees learn to choose the connected string through image matching.

      Despite this, the authors go on to describe image matching as one of their main findings. For this claim, I would suggest the authors run another experiment, identical to experiment 6 but with a black panel behind the bee, such that the string the bee pulls behind itself disappears from view. There is now no image to match at any point from the bee's perspective so it should now fail the connectivity task.

      Strengths:

      Despite these issues, this is a fascinating dataset. Experiments 1 and 2 show that the bees are not learning to discriminate between connected and unconnected stimuli rapidly in the first trials of the test. Instead, it is clear that experience in string pulling is needed to discriminate between connected and unconnected strings. What aspect of this experience is important? Experiment 6 suggests it is not image matching (when no image is provided during problem-solving, but only afterward, bees still attend to string connectivity) and casts doubt on perceptual-motor feedback (unless from the bee's perspective, they do actually get feedback that pulling the string moves the flower, video is needed here). Experiments 7 and 8 rule out means-end understanding because if the bees are capable of imagining the effect of their actions on the string and then planning out their actions (as hypotheses such as insight, means-end understanding and string connectivity suggest), they should solve these tasks. If the authors can compare the bees' performance in a more detailed way to other species, and run the experiment suggested, this will be a highly exciting paper

      We appreciate the valuable comment from the reviewer. We compared the bees' performance to other species, and conducted the experiment as suggested by the reviewer.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Smaller comments:

      Line 64: is the word 'simple' needed here? It could also be explained by more complex forms of associative learning, no?

      We deleted “simple”.

      Methods:

      Line 230: was it checked that this was high-contrast for the bees?

      We added the relevant reference in the revised manuscript.

      Line 240: how much sucrose solution was present in the flowers?

      We added 25 microliters sucrose solution in the flowers. We added the information in the revised manuscript.

      Line 266: check grammar.

      We checked the grammar as follows: “During tests, both strings were glued to the floor of the arena to prevent the air flow generated by flying bumblebees’ wings from changing the position of the string.”

      Statistical analysis:

      - What does it mean that "Bees identity and colony were analyzed with likelihood ratio tests"?

      Bees identity and colony was set as a random variable. We changed the analysis methods in the revised manuscript, and results of the all the experiments did not changed.

      - Line 359: do you mean proportion rather than percentage?

      We mean the percentage.

      - "the number of total choices as weights" - this should be explained further. This is the number of choices that each bee made? What was the variation and mean of this number? If bees varied a lot in this metric, it might make more sense to analyze their first choice (as I see you've done) and their first 10 choices or something like that - for consistency.

      This refers to the total number of choices made by each bumblebee. We added the mean and standard error of each bee’s number of choices in Table 1. Some bees pulled the string fewer than 10 times; we chose to include all choices made by each bee.

      - More generally I think the first test is more informative than the subsequent choices, since every choice after their first could be affected by feedback they are getting in that test phase. Or rather, they are telling you different things.

      All the bees were tested only once, however, you might be referring to the first choice. We used Chi-square test to analyze the bumblebees’ first choices in the test. It is worth noting that both connected and disconnected strings were glued to the floor. The bees were unable to move either the connected or disconnected strings during the tests, and only attempted to pull them. Therefore,the feedback from pulling either the connected or disconnected strings is the same.

      - Line 362: I think I know what you mean, but this should be re-phrased because the "number of" sounds more appropriate for a Poisson distribution. I think what you are testing is whether each individual bee chose the connected or the disconnected string - i.e. a 0 or 1 response for each bee?

      We agree with the reviewer that each bee chose the connected or the disconnected string - i.e. a 0 or 1 response for each bee, but not the number. We clarify this as: “The total number of the choices made by each bee was set as weights.” 

      - Line 364-365: here and elsewhere, every time you mention a model, make it clear what the dependent and independent variables are. i.e. for the mixed model, the 'bee' is the random factor? Or also the colony that the bee came from? Were these nested etc?

      We clarify this in the revised manuscript. The bee identity and colony is the random factor in the mixed model.

      - Line 368: "Latency to the first choice of each bee was recorded" - why? What were the hypotheses/ predictions here?

      The latency to the first choice was intended to see if the bumblebees were familiarizing with the testing pattern. A shorter delay time might indicate that the bumblebees were more familiar with the pattern.

      - Line 371: "Multiple comparisons among experiments were.." - do you mean 'within' experiments? It seems that treatments should not be compared between different experiments.

      We mean multiple comparisons among different experiments; we clarify this in the revised manuscript.

      Results

      Experiment 1: From the methods, it sounded like you both analyzed the bees' first choice and their total no. of choices, but in the results section (and Figure 1) I only see the data for all choices combined here.

      In table 1 and in the text you report the number of bees that chose each option on their first choice, but there are no statistical results associated with these results. At the very least, a chi square or binomial test could be run.

      Line 138: "Interestingly, ten out of fifteen bees pulled the connected string in their first choice" - this is presented like it is a significant majority of bees, but a chi-square test of 10 vs 5 has a p-value = 0.1967

      We used the Chi square test to analyzed of the bees’ first choice. We also added the analyzed data in the Table 1.

      Line 143: "It makes sense because the bees could see the "lollipop shape" once they pulled it out from the table." - this feels more like interpretation (i.e. Discussion) rather than results.

      We moved the sentence to the discussion.

      Line 162: again this feels more like interpretation/ conjecture than results.

      We removed the sentence in the results.

      Line 184: check grammar.

      We checked the grammar. We changed “task” to “tasks”.

      Figures

      I really appreciated the overview in Figure 5 - though I think this should be Figure 1? Even if the methods come later in eLife, I think it would be nice to have that cited earlier on (e.g. at the start of the results) to draw the reader's attention to it quickly, since it's so helpful. It also then makes the images at the bottom of what is currently Figure 1 make more sense. I also think that the authors could make it clearer in Figure 5 which strings are connected vs disconnected in the figure (even if it means exaggerating the distance more than it was in real life). I had to zoom in quite a bit to see which were connected vs. not. Alternatively, you could have an arrow to the string with the words "connected" "disconnected" the first time you draw it - and similar labels for the other string conditions.

      We appreciate the valuable comment from the reviewer. We changed Figure 5 to Figure 2, and Figure 4 to Figure 1. We cited the Figures at the start of the results. We also changed the gap distance between the disconnected strings. Additionally, we added arrows to indicate “connected” and “disconnected” strings in the Figure.

      Figure 1 - I think you could make it clearer that the bars refer to experiments (e.g. have an x-axis with this as a label). Also, check the grammar of the y-axis.

      We added the experiments number in the Figures. Additionally, we checked the grammar of the y-axis. We changed “percentages” to “parentage”. 

      I also think it's really helpful to see the supplementary videos but I think it would be nice to see some examples of the test phase, and not just the training examples.

      We added Supplementary videos of the testing phase.

      Reviewer #2 (Recommendations For The Authors):

      Below are also some minor comments:

      L40: "approaches".

      We changed “approach” to “approaches”.

      L42: but likely mainly due to sampling bias of mammals and birds.

      We changed the sentence as follows: String pulling is one of the most extensively used approaches in comparative psychology to evaluate the understanding of causal relationships (Jacobs & Osvath, 2015), with most research focused on mammals and birds, where a food item is visible to the animal but accessible only by pulling on a string attached to the reward (Taylor, 2010; Range et al., 2012; Jacobs & Osvath, 2015; Wakonig et al., 2021).

      L64: remove "in this study"

      We removed “in this study”.

      L64: simple associative learning of what? Isn't your image matching associative too?

      We removed “ simple”.

      L97: remove "a" before "connected".

      We removed “a” before “connected”.

      L136-138: but maybe they could still feel the weight of the flower when pulling?

      Because both strings were glued to the floor in the test phase, the feedback was the same and therefore irrelevant. This information is noted in the General Methods.

      L161: what are these numbers?

      We removed the latency in the revised manuscript.

      L167/ Table 1: I realise that the authors never tried slanted strings to check if bumblebees used proximity as a cue. Why?

      This was simply because we wanted to focus on whether bumblebees could recognize the connectivity of the string.

      Discussion: Why did you only control for colour of the string? What if you had used strings with different textures or smells? Unclear if the authors controlled for "bumblebee smell" on the strings, i.e., after a bee had used the string, was the string replaced by a new one or was the same one used multiple times?

      We used different colors to investigate featural generalization of the visual display of the string connected to the flower in this task. We controlled for color because it is a feature that bumblebees can easily distinguish.

      Both the flowers and the strings were used only once, to prevent the use of chemosensory cues. We clarify this in the revised manuscript.

      L182: since what?

      We deleted “since” in the revised manuscript.

      L182-188: might be worth mentioning that some crows and parrots known for complex cognition perform poorly on broken strings (e.g., https://doi.org/10.1098/rspb.2012.1998 ; https://doi.org/10.1163/1568539X-00003511 ; https://doi.org/10.1038/s41598-021-94879-x ) and Australian magpies use trial and error (https://doi.org/10.1007/s00265-023-03326-6).

      We added the following sentences as suggested by the reviewer: “It is worth noting that some crows and parrots known for complex cognition perform poorly on the broken string task without perceptual feedback or learning. For example, New Caledonian crows use perceptual feedback strategies to solve the broken string-pulling task, and no individual showed a significant preference for the connected string when perceptual feedback was restricted (Taylor et al., 2012). Some Australian magpies and African grey parrots can solve the broken string task, but they required a high number of trials, indicating that learning plays a crucial role in solving this task (Molina et al., 2019; Johnsson et al., 2023).”

      L193: maybe expand on this to put the task into a natural context?

      We added the following sentences as suggested by the reviewer:

      “Different flower species offer varying profitability in terms of nectar and pollen to bumblebees; they need to make careful choices and learn to use floral cues to predict rewards (Chittka, 2017). Bumblebees can easily learn visual patterns and shapes of flower (Meyer-Rochow, 2019); they can detect stimuli and discriminate between differently coloured stimuli when presented as briefly as 25 ms (Nityananda et al., 2014). In contrast, causal reasoning involves understanding and responding to causal relationships. Bumblebees might favor, or be limited to, a visual approach, likely due to the efficiency and simplicity of processing visual cues to solve the string-pulling task. ”

      L204: is causal understanding the same as means-end understanding?

      Means-end understanding is expressed as goal-directed behavior, which involves the deliberate and planned execution of a sequence of steps to achieve a goal. Includes some understanding of the causal relationship (Jacobs & Osvath, 2015; Ortiz et al., 2019). .

      L235: this is a very big span of time. Why not control for motivation? Cognitive performance can vary significantly across the day (at least in humans).

      Bumblebee motivation is understood to be rather consistent, as those that were trained and tested came to the flight arena of their own volition and were foragers looking to fill their crop load each time to return it to the colony.

      L232: what is "(w/w)" ? This occurs throughout the manuscript.

      “w/w” represents the weight-to-weight percentage of sugar.

      L250: this sentence sounds odd. "containing in the central well.." ?? Perhaps rephrase? Unclear what central well refers to? Did the flowers have multiple wells?

      We rephrased the sentence as follows: For each experiment, bumblebees were trained to retrieve a flower with an inverted Eppendorf cap at the center, containing 25 microliters of 50% sucrose solution, from underneath a transparent acrylic table

      L268: why euthanise?

      The reason for euthanizing the bees is that new foragers will typically only become active after the current ones were removed from the hive.

      L270: chemosensory cues answer my concern above. Maybe make it clear earlier.

      We moved this sentence earlier in the result.

      L273: did different individuals use different pulling strategies? Do you have the data to analyse this? This has been done on birds and would offer a nice comparison.

      We analyzed the string-pulling strategies among different individuals, and provided Supplementary Table 1 to display the performances of each individual in different string-pulling experiments.

      L365: unclear why both models. Would be nice to see a GLM output table.

      The duration of pulling different kinds of strings were first tested with the Shapiro-Wilk test to assess data normality. The duration data that conforms to a normal distribution was compared using linear mixed-effects models (LMM), while the data that deviates from normality were examined with a generalized linear-mixed model (GLMM). We added a GLM and GLMM output table in the revised manuscript.

      L377: should be a space between the "." and "This".

      We added a space between the “.” and “This”.

      L383-390: some commas and semicolons are in the wrong places.

      We carefully checked the commas and semicolons in this sentence.

      Reviewer #3 (Recommendations For The Authors):

      Minor comments

      Line 32: seems to be missing a word, suggest "the bumblebees' ability to distinguish".

      we added “the” in the revised manuscript.

      Line 47: it would be good to reference other scholars here, this is the central focus of all work in comparative psychology.

      We added the reference in the revised manuscript.

      Line 50-61: I think the string-pulling literature could be described in more detail here, with mention of perceptual-motor feedback loops as a competing hypothesis to means-end understanding (see Taylor et al 2010, 2012). It seems a stretch to suggest that "String-pulling studies have directly tested means-end comprehension in various species", when perceptual-motor feedback is a competing hypothesis that we have positive evidence for in several species.

      We mentioned the perceptual-motor feedback in the introduction as follow:

      “Multiple mechanisms can be involved in the string-pulling task, including the proximity principle, perceptual feedback and means-end understanding (Taylor et al., 2012; Wasserman et al., 2013; Jacobs & Osvath, 2015; Wang et al., 2020). The principle of proximity refers to animals preferring to pull the reward that is closest to them (Jacobs & Osvath, 2015). Taylor et al. (2012) proposed that the success of New Caledonian crows in string-pulling tasks is based on a perceptual-motor feedback loop, where the reward gradually moves closer to the animal as they pull the strings. If the visual signal of the reward approaching is restricted, crows with no prior string-pulling experience are unable to solve the broken string task (Taylor et al., 2012).

      However, when a green table was placed behind the string to obscure the “lollipop” structure during the training, the bees could not see the “lollipop” during the initial training stage or after pulling the string from under the table. In this situation, the bees were unable to identify the connected string, further proving that bumblebees chose the connected string based on image matching.

      Line 68: suggest remove 'meticulously'.

      We removed “meticulously”.

      Line 99: This is an exciting finding, can the authors please provide a video of a bee solving this task on its first trial?

      We added videos in the supplementary materials.

      Line 133: perceptual-motor feedback loops should be introduced in the introduction.

      We introduced perceptual-motor feedback loops in the revised manuscript.

      Line 136: please clarify the prior experience of these bees, it is not clear from the text.

      We clarified the prior experience of these bees as follow: Bumblebees were initially attracted to feed on yellow artificial flowers, and then trained with transparent tables covered by black tape (S7 video) through a four-step process.

      Line 138: from the video it is not possible to see the bee's perspective of this occlusion. Do the authors have a video or image showing the feedback the bees received? I think this is highly important if they wish to argue that this condition prevents the use of both image matching and a perceptual-motor feedback loop.

      We prevented the use of image matching: the bees were unable to see the flower moving towards them above the table during the training phase in this condition. But the bees may receive visual image both after pulling the string out from the table and in the initial stages of training in this condition.

      Line 147: please clarify what experience these bees had before this test.

      We added the prior experience of bumblebees before training as follow: We therefore designed further experiments based on Taylor et al. (2012) to test this hypothesis. Bumblebees were first trained to feed on yellow artificial, and then trained with the same procedure as Experiment 2, but the connected strings were coiled in the test.

      Line 155: This is a highly similar test to that used in Taylor et al 2012, have the authors seen this study?

      We mentioned the reference in the revised manuscript as follows: We therefore designed further experiments based on Taylor et al. (2012) to test this hypothesis.

      Line 183: This sentence needs rewriting "Since the vast majority of animals, including dogs 183 (Osthaus et al., 2005), cats (Whitt et al., 2009), western scrub-jays (Hofmann et al.,2016) and azure-winged magpies (Wang et al., 2019) are failing in such tasks spontaneously".

      We changed the sentence as suggested by the reviewer as follow:  Some animals, including dogs (Osthaus et al., 2005), cats (Whitt et al., 2009), western scrub-jays (Hofmann et al., 2016) and azure-winged magpies (Wang et al., 2019) fail in such task spontaneously.

      Line 186: "complete comprehension of the functionality of strings is rare" I am not sure the evidence in the current literature supports any animal showing full understanding, can the authors explain how they reach this conclusion?

      We wished to say that few animal species could distinguish between connected and disconnected strings without trial and error learning. We revised the sentence as follows:

      It is worth noting that some crows and parrots known for complex cognition perform poorly on broken string task without perceptual feedback or learning. For example, New Caledonian crows use perceptual feedback strategies to solve broken string-pulling task, and no individual showed a significant preference for the connected string when perceptual feedback is restricted (Taylor et al., 2012). Some Australian magpies and African grey parrots can solve the broken string task, but it required a high number of trials, indicating that learning plays a crucial role in solving this task (Molina et al., 2019; Johnsson et al., 2023).

      Line 190: the authors need to clarify which part of their study provides positive evidence for this conclusion.

      We added the evidence for this conclusion as follows: Our findings suggest that bumblebees with experience of string pulling prefer the connected strings, but they failed to identify the interrupted strings when the string was coiled in the test.

      Line 265: was the far end of the string glued only?

      The entire string was glued to the floor, not just the far ends of the string.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews: 

      Reviewer #1 (Public Review):

      Summary: 

      In this paper, the authors used target agnostic MBC sorting and activation methods to identify B cells and antibodies against sexual stages of Plasmodium falciparum. While they isolated some Mabs against PFs48/45 and PFs230, two well-known candidates for "transmission blocking" vaccines, these antibodies' efficacies, as measured by TRA, did not perform as well as other known antibodies. They also isolated one cross-reactive mAb to proteins containing glutamic acid-rich repetitive elements, that express at different stages of the parasite life cycle. They then determined the structure of the Fab with the highest protein binder they could determine through protein microarray, RESA, and observed homotypic interactions. 

      Strengths: 

      -  Target agnostic B cell isolation (although not a novel methodology). 

      -  New cross-reactive antibody with some "efficacy" (TRA) and mechanism (homotypic interactions) as demonstrated by structural data and other biophysical data. 

      Weaknesses: 

      The paper lacks clarity at times and could benefit from more transparency (showing all the data) and explanations. 

      We have added the oocyst count data from the SMFA experiments as Supplementary Table 2, and ELISA binding curves underlying Figure 4B as Supplementary Figure 5.

      In particular: 

      - define SIFA 

      - define TRAbs 

      We have carefully gone through the manuscript and have introduced abbreviations at first use, removed unnecessary abbreviations and removed unnecessary jargon to increase readability.

      - it is not possible to read the Figure 6B and C panels. 

      We regret that the labels in Supplementary Figures 6 and 7 were of poor quality and have now included higher resolution images to solve this issue.

      Reviewer #2 (Public Review): 

      This manuscript by Amen, Yoo, Fabra-Garcia et al describes a human monoclonal antibody B1E11K, targeting EENV repeats which are present in parasite antigens such as Pfs230, RESAs, and 11.1. The authors isolated B1E11K using an initial target agnostic approach for antibodies that would bind gamete/gametocyte lysate which they made 14 mAbs. Following a suite of highly appropriate characterization methods from Western blotting of recombinant proteins to native parasite material, use of knockout lines to validate specificity, ITC, peptide mapping, SEC-MALS, negative stain EM, and crystallography, the authors have built a compelling case that B1E11K does indeed bind EENV repeats. In addition, using X-ray crystallography they show that two B1E11K Fabs bind to a 16 aa RESA repeat in a head-to-head conformation using homotypic interactions and provide a separate example from CSP, of affinity-matured homotypic interactions. 

      There are some minor comments and considerations identified by this reviewer, These include that one of the main conclusions in the paper is the binding of B1E11K to RESAs which are blood stage antigens that are exported to the infected parasite surface. It would have been interesting if immunofluorescence assays with B1E11K mAb were performed with blood-stage parasites to understand its cellular localization in those stages. 

      In the current manuscript, we provide multiple lines of evidence that B1E11K binds (with high affinity) to repeats that are present in RESAs, i.e. through micro-array studies, in vitro binding experiments such as Western blot, ELISA and BLI, and through X-ray crystallography studies on B1E11k – repeat peptide complexes. Taken together, we think we provide compelling evidence that B1E11k binds to repeats present in RESA proteins. We do agree that studies on the function of this mAb against other stages of the parasite could be of interest, but as our manuscript focuses on the sexual stage of the parasite, we feel that this is beyond scope of the current work. However, this line of inquiry will be strongly considered in follow up studies.   

      Reviewer #3 (Public Review): 

      The manuscript from Amen et al reports the isolation and characterization of human antibodies that recognize proteins expressed at different sexual stages of Plasmodium falciparum. The isolation approach was antigen agnostic and based on the sorting, activation, and screening of memory B cells from a donor whose serum displays high transmission-reducing activity. From this effort, 14 antibodies were produced and further characterized. The antibodies displayed a range of transmission-reducing activities and recognized different Pf sexual stage proteins. However, none of these antibodies had substantially lower TRA than previously described antibodies. 

      The authors then performed further characterization of antibody B1E11K, which was unique in that it recognized multiple proteins expressed during sexual and asexual stages. Using protein microarrays, B1E11K was shown to recognize glutamate-rich repeats, following an EE-XX-EE pattern. An impressive set of biophysical experiments was performed to extensively characterize the interactions of B1E11K with various repeat motifs and lengths. Ultimately, the authors succeeded in determining a 2.6 A resolution crystal structure of B1E11K bound to a 16AA repeat-containing peptide. Excitingly, the structure revealed that two Fabs bound simultaneously to the peptide and made homotypic antibody-antibody contacts. This had only previously been observed with antibodies directed against CSP repeats. 

      Overall I found the manuscript to be very well written, although there are some sections that are heavy on field-specific jargon and abbreviations that make reading unnecessarily difficult. For instance, 'SIFA' is never defined. 

      We have carefully gone through the manuscript and have introduced abbreviations at first use, removed unnecessary abbreviations and removed unnecessary jargon to increase readability.

      Strengths of the manuscript include the target-agnostic screening approach and the thorough characterization of antibodies. The demonstration that B1E11K is cross-reactive to multiple proteins containing glutamate-rich repeats, and that the antibody recognizes the repeats via homotypic interactions, similar to what has been observed for CSP repeat-directed antibodies, should be of interest to many in the field. 

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors): 

      Figure 1 - why only gametes ELISA and Spz or others?  

      The volumes of the single B cell supernatants were too small to screen against multiple antigens/parasite stages. As we aimed to isolate antibodies against the sexual stages of the parasite, our assay focused on this stage and supernatants were not tested against other stages. Furthermore, we screened for reactivity against gametes as TRA mAbs likely target gametes rather than other forms of sexual stage parasites.

      Figure 2 A 

      (a) Wild type (WT) and Pfs48/45 knock-out (KO) gametes.

      (b) I am a bit confused about what GMT is vs Pfs48/45 

      We have changed the column titles in Figure 2A to “wild-type gametes” and “Pfs48/45 knockout gametes” to improve clarity.  

      (c) Binding is high % why is it red? 

      We chose to present the results in a heatmap format with a graded color scale, from strong binders in red to weak binders in green. It has now been clarified in the legend of the figure. 

      Please state acronyms clearly 

      TRA - transmission reducing activity 

      SMFA - standard membrane feeding assay 

      We have added the full terms to clarify the acronyms.

      1123- VRC01 (not O1)

      We have corrected this.

      Figure 2 C bottom panels, clarify which ones are TRAbs (Assuming the Mabs with over 80% TRA at 500 ug/ml) (right gel) and the ones that are not (left gel)? 

      In the Western blot in Figure 2c, we have marked the antibodies with >80% TRA with an asterisk.

      Furthermore, we have replaced ‘TRAbs’ by ‘mAbs with >80% TRA at 500 µg/mL’ in the figure legend.

      ITC show the same affinity of the Fab to the 2 peptides but not the ELISA, not the BLI/SPR would be more appropriate. Any potential explanation?  

      The way binding affinity is determined across various techniques can result in slight differences in determined values. For instance, ELISAs utilize long incubation times with extensive washing steps and involve a spectroscopic signal, isothermal titration calorimetry (ITC) uses calorimetric signal at different concentration equilibriums to extract a KD, and BLI determines kinetic parameters for KD determination. Discrepancies in binding affinities between orthologous techniques have indeed been observed previously in the context of peptide-antibody binding (e.g. PMID: 34788599).

      Despite this, regardless of technique, the relative relationships in all three sets of data is the same - higher binding affinity is observed to the longer P2 peptide. This is the main takeaway of the section. As the reviewer suggests, BLI is likely the most appropriate readout here and is the only value explicitly mentioned in the main text. We primarily use ITC to support our proposed binding stoichiometry which is important to substantiate the SEC-MALS and nsEM data in Figure 4H-I. We added the following sentences to help reinforce these points: “The determined binding affinity from our ITC experiments (Table 1) differed from our BLI experiments (Fig. 4D and 4E), which can occur when measuring antibody-peptide interactions. Regardless, our data across techniques all trend toward the same finding in which a stronger binding affinity is observed toward the longer RESA P2 (16AA) peptide.”

      Figure 5C - would be helpful to have the peptide sequence above referring to what is E1, E2 etc... 

      We added two panels (Figure 5C-D) showcasing the binding interface that shows the peptide numbering in the context of the overall complex. We hope that this will help better orient the reader. 

      Figure S4 - maybe highlight in different colors the EENVV, EEIEE, Etc, etc 

      Repeats found in the sequence of the various proteins in Figure S4 have now been highlighted with different colors.

      Line 163 - why 14 mabs if 11 wells? Isn't it 1 B cell per well? The authors should explain right away that some wells have more than 1 B cell and some have 1 HC, 1LC, and 1 KC. 

      We agree that this was somewhat confusing and have modified the text which now reads: “We obtained and cloned heavy and light chain sequences for 11 out of 84 wells. For three wells we obtained a kappa light chain sequence and for five wells a lambda light chain sequence. For three wells we obtained both a lambda and kappa light chain sequence suggesting that either both chains were present in a single B cell or that two B cells were present in the well. For all 14 wells we retrieved a single heavy chain sequence. Following amplification and cloning, 14 mAbs, from 11 wells, were expressed as full human IgG1s (Table S1) (Dataset S1).”

      Line 166-167 - were they multiple HC (different ones) as well when Lambda and kappa were present?

      This is not clear at first. 

      We clarified this point in the text, see also comment above.

      Line 177 - expressed Pfs48/45 and Pfs230, is it lacking both or just Pfs48/45 (as stated on line 172)? 

      Pfs48/45 binds to the gamete surface via a GPI anchor, while Pfs230 is retained to the surface through binding to Pfs48/45. Hence, the Pfs48/45 knockout parasite will therefore also lack surfacebound Pfs230. We have added a sentence to the Results clarifying this: “The mAbs were also tested for binding to Pfs48/45 knock-out female gametes, which lack surface-bound Pfs48/45 and Pfs230”.

      Show the ELISA data used to calculate EC50 in Figure 3. 

      ELISA binding curves are now shown as Figure S5.

      Line 313-315 - what if you reverse, capture the Fab (peptide too small even if biotinylated?) 

      As anticipated by the Reviewer, immobilizing the Fab and dipping into peptide did not yield appreciable signal for kinetic analysis and thus the experiment from this setup is not reported. 

      Line 341 - add crystal structure 

      This has now been added.

      There is a bit too much speculation in the discussion. For e.g. "The B1C5L and B1C5K mAbs were shown to recognize Domain 2 of Pfs48/45 and exhibited moderate potency, as previously described for Abs with such specificity (27). These 2 mAbs were isolated from the same well and shared the same heavy chain; their three similar characteristics thus suggest that their binding is primarily mediated by the heavy chain". Actual data will reinforce this statement. 

      As B1C5L and B1C5K recognized domain 2 of Pfs48/45 with similar affinity, this strongly suggests that binding is mediated though the heavy chain. Structural analysis could confirm this statement, but this is out of the scope of this study.  

      Reviewer #2 (Recommendations For The Authors): 

      Figure 1: This figure provides a description of the workflow. To make it more relevant for the paper, the authors could add relevant numbers as the workflow proceeds. 

      (a) For example, how many memory B cells were sorted, how many supernatants were positive, and then how many mAbs were produced? These numbers can be attached to the relevant images in the workflow. 

      We modified the figure to include the numbers. 

      (b) For the "Supernatant screening via gamete extract ELISA", please change to "Supernatant screening via gamete/gametocyte extract ELISA". 

      We modified the statement as suggested. 

      Line 155: The manuscript states that 84 wells reacted with gamete/gametocyte lysate. The following sentence states that "Out of the 21 supernatants that were positive...". Can the authors provide the summary of data for all 84 wells or why focus on only 21 supernatants? 

      We screened all supernatants against gamete lysate, and only a subset against gametocyte lysate. In total, we found 84 positive supernatants that were reactive to at least one of the two lysates. 21 of those 84 positive were screened against both lysates. We have modified the text to clarify the numbers:

      “After activation, single cell culture supernatants potentially containing secreted IgGs were screened in a high-throughput 384-well ELISA for their reactivity against a crude Pf gamete lysate (Fig. S1B). A subset of supernatants was also screened against gametocyte lysate (S1C). In total, supernatants from 84 wells reacted with gamete and/or gametocyte lysate proteins, representing 5.6% of the total memory B cells. Of the 21 supernatants that were screened against both gamete and gametocyte lysates, six recognized both, while nine appeared to recognize exclusively gamete proteins, and six exclusively gametocyte proteins.”

      Please note that all 84 positive wells were taken forward for B cell sequencing and cloning. 

      Line 171: SIFA is introduced for the first time and should be completely spelled out.

      We have corrected this. 

      Figure 2: 

      (a) In Figure 2A, can you change the column title from "% pos KO GMT" to "% pos Pfs48/45 KO GMT"?

      We have changed the column titles.  

      (b) In Figure 2B, the SMFA results have been converted to %TRA. Can the authors please provide the raw data for the oocyst counts and number of mosquitoes infected in Supplementary Materials? 

      We have added oocyst count data in Table S2, to which we refer in the figure legend. 

      (c) For Figure 2F, the authors do have other domains to Pfs230 as described in Inklaar et al, NPJ Vaccines 2023. An ELISA/Western to the other domains could identify the binding site for B2C10L, though we appreciate this is not the central result of this manuscript. 

      We thank the reviewer for this suggestion. We are indeed planning to identify the target domain of B2C10L using the previously described fragments, but agree with the reviewer that this not the focus of the current manuscript and decided to therefore not include it in the current report.

      Line 116: The word sporozoites appears in subscript and should be corrected to be normal text. 

      We have corrected this.

      Line 216: Typo "B1E11K" 

      We have corrected this.

      Materials and Methods: 

      (a) PBMC sampling: Please add the ethics approval codes in this section. 

      Donor A visited the hospital with a clinical malaria infection and provided informed consent for collection of PBMCs. We have modified the method section to clarify this. 

      “Donor A had lived in Central Africa for approximately 30 years and reported multiple malaria infections during that period. At the time of sampling PBMCs, Donor A had recently returned to the Netherlands and visited the hospital with a clinical malaria infection. After providing informed consent, PBMCs were collected, but gametocyte prevalence and density were not recorded.”

      (b) Gamete/Gametocyte extract ELISA: Can the authors please provide the concentration of antibodies used for the positive and negative controls (TB31F, 2544, and 399) 

      We have added the concentrations for these mAbs in the methods section.

      Recombinant Pfs48/45 and Pfs230 ELISA: Please state the concentration or molarity used for the coating of recombinant Pfs48/45 and Pfs230CMB. 

      We have added the concentrations, i.e. 0.5 µg/mL, to the methods section.

      Western Blotting: The protocol states that DTT was added to gametocyte extracts (Line 594), but Western Blots in Figures 2 and 3 were performed in non-reducing conditions. Please confirm whether DTT was added or not. 

      Thank you for noting this. We did not use DTT for the western blots and have removed this line from the methods section.

      Reviewer #3 (Recommendations For The Authors): 

      Below are a few minor comments to help improve the manuscript. 

      (1) In Figure 4E, are the BLI data fit to a 1:1 binding model? The fits seem a bit off, and from ITC and X-ray studies it is known that 2 Fabs bind 1 peptide. The second Fab should presumably have higher affinity than the first Fab since the second Fab will make interactions with both the peptide and the first Fab. It may be better to fit the BLI data to a 2:1 binding model. 

      The 2:1 (heterogeneous ligand) model assumes that there are two different independent binding sites. However, the second binding event described is dependent on the first binding event and thus this model also does not accurately reflect the system. Given that there is not an ideal model to fit, we instead are careful about the language used in the main text to describe these results. Additionally, we also include a sentence to the results section to ensure that the proper findings/interpretations are highlighted: “…our data all trend toward the same finding in which a stronger binding affinity is observed toward the longer RESA P2 (16AA) peptide.”

      (2) The sidechain interactions shown in Figures 5C and D could probably be improved. The individual residues are just 'floating' in space, causing them to lack context and orientation. 

      We added two panels (Fig. 5C-D) showcasing the binding interface that shows the peptide numbering in the context of the overall complex. We hope that this will help orient the reader.  

      (3) The percentage of Ramachandran outliers should be listed in Table 2. Presumably, the value is 0.2%, but this is omitted in the current table. 

      Table 2 has been modified to include the requested information explicitly.

    1. In fact, research shows that the way people learn is as unique as their fingerprints

      I think this illustrates why it could be important to, as Nick says, separate our students into a few boxes because it makes it easier to think about, and then we can think of obstacles and solutions that may come up in each group while lesson planning.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This interesting study explores the mechanism behind an increased susceptibility of daf-18/PTEN mutant nematodes to paralyzing drugs that exacerbate cholinergic transmission. The authors use state-of-theart genetics and neurogenetics coupled with locomotor behavior monitoring and neuroanatomical observations using gene expression reporters to show that the susceptibility occurs due to low levels of DAF-18/PTEN in developing inhibitory GABAergic neurons early during larval development (specifically, during the larval L1 stage). DAF-18/PTEN is convincingly shown to act cell-autonomously in these cells upstream of the PI3K-PDK-1-AKT-DAF-16/FOXO pathway, consistent with its well-known role as an antagonist of this conserved signaling pathway. The authors exclude a role for the TOR pathway in this process and present evidence implicating selectivity towards developing GABAergic neurons. Finally, the authors show that a diet supplemented with a ketogenic body, β-hydroxybutyrate, which also counteracts the PI3K-PDK-1-AKT pathway, promoting DAF-16/FOXO activity, partially rescues the proper development (morphology and function) of GABAergic neurons in daf-18/PTEN mutants, but only if the diet is provided early during larval development. This strongly suggests that the critical function of DAF18/PTEN in developing inhibitory GABAergic neurons is to prevent excessive PI3K-PDK-1-AKT activity during this critical and particularly sensitive period of their development in juvenile L1 stage worms. Whether or not the sensitivity of GABAergic neurons to DAF-18/PTEN function is a defining and widespread characteristic of this class of neurons in C. elegans and other animals, or rather a particularity of the unique early-stage GABAergic neurons investigated remains to be determined.

      Strengths:

      The study reports interesting and important findings, advancing the knowledge of how daf-18/PTEN and the PI3K-PDK-1-AKT pathway can influence neurodevelopment, and providing a valuable paradigm to study the selectivity of gene activities towards certain neurons. It also defines a solid paradigm to study the potential of dietary interventions (such as ketogenic diets) or other drug treatments to counteract (prevent or revert?) neurodevelopment defects and stimulate DAF-16/FOXO activity.

      Weaknesses:

      (1) Insufficiently detailed methods and some inconsistencies between Figure 4 and the text undermine the full understanding of the work and its implications.

      The incomplete methods presented, the imprecise display of Figure 4, and the inconsistency between this figure and the text, make it presently unclear what are the precise timings of observations and treatments around the L1 stage. What exactly do E-L1 and L1-L2 mean in the figure? The timing information is critical for the understanding of the implications of the findings because important changes take place with the whole inhibitory GABAergic neuronal system during the L1 stage into the L2 stage. The precise timing of the events such as neuronal births and remodelling events are welldescribed (e.g., Figure 2 in Hallam and Jin, Nature 1998; Fig 7 in Mulcahy et al., Curr Biol, 2022). Likewise, for proper interpretation of the implication of the findings, it is important to describe the nature of the defects observed in L1 larvae reported in Figure 1E - at present, a representative figure is shown of a branched commissure. What other types of defects, if any, are observed in early L1 larvae? The nature of the defects will be informative. Are they similar or not to the defects observed in older larvae?

      We thank the reviewer for highlighting these areas for improvement. We have updated and clarified the timing of observation in the text, figures, and methodology section accordingly.

      All experiments were conducted using age-synchronized animals. Gravid worms were placed on NGM plates and removed after two hours. The assays were then carried out on animals that hatched from the eggs laid during this specific timeframe.

      Regarding the detailed timings outlined in the original Figure 4 (now Figure 5 in the revised version), we provided the following information in the revised version: For experiments involving continuous exposure to βHB throughout development, the gravid worms were placed on NGM plates containing the ketone body and removed after two hours. Therefore, this exposure covered the ex-utero embryonic development period up to the L4-Young adult stage when the experiments were conducted.

      In experiments involving exposure at different developmental stages as those depicted in Figure 4 of the original version, (now Figure 5, revised version), animals were transferred between plates with and without βHB as required. We exposed daf-18/PTEN mutant animals to βHB-supplemented diets for 18-hour periods at different developmental stages (Figure 5A, revised version). The earliest exposure occurred during the 18 hours following egg laying, covering ex-utero embryonic development and the first 8-9 hours of the L1 stage. The second exposure period encompassed the latter part of the L1 stage, the entire L2 stage, and most of the L3 stage. The third exposure spanned the latter part of the L3 stage (~1-2 hours), the entire L4 stage, and the first 6-7 hours of the adult stage.

      All this information has been conveniently included in Figure 5, text (Page13, lines 259-276), and in methodology (Page 4, Lines 85-90, Revised Methods and Supplementary information) of the revised manuscript.

      In response to the reviewer's suggestion, we have also included photos of daf-18 worms at the L1 stage (30 min/1h post-hatching). Defects are already present at this early stage, such as handedness and abnormal branching commissures, which are also observed in adult worm neurons (see Supplementary Figure 4, revised version). 

      These defects manifest in DD neurons shortly after larval birth. The prevalence of animals with errors is higher in L4 worms (when both VDs and DDs are formed) compared to early L1s (Figures 3 C-E and Supplementary Figure 4, revised version). This suggests that defects in VD neurons also occur in daf-18 mutants. Indeed, when we analyzed the neuronal morphology of several wild-type and daf-18 mutant animals, we found defects in the commissures corresponding to both DD and VD neurons (Supplementary Figure 3, revised version). 

      These data are now included in the revised version (Results (Page 10, lines 177-196), Discussion (Pages 14-16), Main Figure 3, and Supplementary Figures 3, 4 and 7 revised version)

      (2) The claim of proof of concept for a reversal of neurodevelopment defects is not fully substantiated by data.

      The authors state that the work "constitutes a proof of concept of the ability to revert a neurodevelopmental defect with a dietary intervention" (Abstract, Line 56), however, the authors do not present sufficient evidence to distinguish between a "reversal" or prevention of the neurodevelopment defect by the dietary intervention. This clarification is critical for therapeutic purposes and claims of proof-of-concept. From the best of my understanding, reversal formally means the defect was present at the time of therapy, which is then reverted to a "normal" state with the therapy. On the other hand, prevention would imply an intervention that does not allow the defect to develop to begin with, i.e., the altered or defective state never arises. In the context of this study, the authors do not convincingly show reversal. This would require showing "embryonic" GABAergic neuron defects or showing convincing data in newly hatched L1 (0-1h), which is unclear if they do so or not, as I have failed to find this information in the manuscript. Again, the method description needs to be improved and the implications can be very different if the data presented in Figure 2D-E regard newly born L1 animals (0-1h) or L1 animals at say 5-7h after hatching. This is critical because the development of the embryonically-born GABAergic DD neurons, for instance, is not finalized embryonically. Their neurites still undergo outgrowth (albeit limited) upon L1 birth (see DataS2 in Mulcahy et al., Curr Biol 2022), hence they are susceptible to both committing developmental errors and to responding to nutritional interventions to prevent them. In contrast to embryonic GABAergic neurons, embryonic cholinergic neurons (DA/DB) do not undergo neurite outgrowth post-embryonically (Mulcahy et al., Curr Biol 2022), a fact which could provide some mechanistic insight considering the data presented. However, neurites from other post-embryonically-born neurons also undergo outgrowth postembryonically, but mostly during the second half of the L1 stage following their birth up to mid-L2, with significant growth occurring during the L1-L2 transition. These are the cholinergic (VA/VB and AS neurons) and GABAergic (VD) neurons. The fact that AS neurons undergo a similar amount of outgrowth as VD neurons is informative if VD neurons are or are not susceptible to daf-18/PTEN activity. Independently, DD neurons are still quite unique on other aspects (see below), which could also bring insight into their selective response.

      Finally, even adjusting the claim to "constitutes a proof-of-concept of the ability of preventing a neurodevelpmental defect with a dietary intervention" would not be completely precise, because it is unclear how much this work "constitutes a proof of concept". This is because, unless I misunderstood something, dietary interventions are already applied to prevent neurodevelopment defects, such as when folic acid supplementation is recommended to pregnant women to prevent neural tube defects in newborns.

      Thank you very much for pointing out this issue and highlighting the need to further investigate the ameliorative capacity of βHB on GABAergic defects in daf-18 mutants. In the revised version, we have included experiments to address this point.

      Our microscopy analyses strongly indicate that the development of DD neurons is affected, with errors observed as early as one-hour post-hatching (Main Figure 3, and Supplementary Figures  4 and 7, revised version). Additionally, based on the position of the commissures in L4s, our results strongly suggest that VD neurons are also affected (Supplementary Figure 3, revised version). Both, the frequency of animals with errors and the number of errors per animal are higher in L4s compared to L1 larvae (Main Figures 3,  and Supplementary Figure 4 and 7, revised version). It is very likely that the errors in VD neurons, which are born in the late L1 stage, are responsible for the higher frequency of defects observed in L4 animals. 

      As the reviewer noted, GABAergic DD neurons, which are born embryonically, do not complete their development during the embryonic stages. Some defects in DD neurons may arise during the postembryonic period. Following the reviewer's suggestion, we analyzed L1 larvae at different times before the appearance of VDs (1 hour post-hatching and 6 hours post-hatching). We did not observe an increase in error prevalence, suggesting that DD defects in daf-18 mutants are mostly embryonic (Supplementary Fig 4B, Revised Version). 

      Our findings suggest that βHB's enhancement is not due to preventive effects in DDs, as defects persist in newly hatched larvae regardless of βHB presence (Supplementary Figure 7, revised version), and postembryonic DD growth does not introduce new errors (Supplementary Figure 4, revised version). This lack of preventive effect could be due to βHB's limited penetration into the embryonic environment. Unlike early L1s, significant improvement occurs in L4s upon βHB early exposure (Supplementary Figure 7, revised version). This could be explained by a reversing effect on malformed DD neurons and/or a protective influence on VD neuron development. While we cannot rule out the first option, even if all errors in DDs in L1 were repaired (which is very unlikely), it wouldn't explain the level of improvement in L4 (Supplementary Figure 7, revised version). Therefore, we speculate that VDs may be targeted by βHB. The notion that exposure to βHB during early L1 can ameliorate defects in neurons primarily emerging in late L1s (VDs) is intriguing. We may hypothesize that residual βHB or a metabolite from prior exposure could forestall these defects in VD neurons. Notably, βHB has demonstrated a capacity for long-lasting effects through epigenetic modifications (Reviewed in He et al, 2023, https://doi.org/10.1016%2Fj.heliyon.2023.e21098). More work is needed to elucidate the underlying fundamental mechanisms regarding the ameliorating effects of βHB supplementation. We have now discussed these possibilities under discussion (Page 17, lines 369-383, revised version).

      We agree with the reviewer that the term "reversal" is not accurate, and we have avoided using this terminology throughout the text. Furthermore, in the title, we have decided to change the word "rescue" to "ameliorate," as our experiments support the latter term but not the former. Additionally, the reviewer is correct that folic acid administration to pregnant women is already a metabolic intervention to prevent neural tube defects. In light of this, we have avoided claiming this as proof of concept in the revised manuscript 

      (3) The data presented do not warrant the dismissal of DD remodeling as a contributing factor to the daf-18/PTEN defects.

      Inhibitory GABAergic DD neurons are quite unique cells. They are well-known for their very particular property of remodeling their synaptic polarity (DD neurons switch the nature of their pre- and postsynaptic targets without changing their wiring). This process is called DD remodeling. It starts in the second half of the L1 stage and finishes during the L2 stage. Unfortunately, the fact that the authors find a specific defect in early GABAergic neurons (which are very likely these unique DD neurons) is not explored in sufficient detail and depth. The facts that these neurons are not fully developed at L1, that they still undergo limited neurite growth, and that they are poised for striking synaptic plasticity in a few hours set them apart from the other explored neurons, such as early cholinergic neurons, which show a more stable dynamics and connectivity at L1 (see Mulcahy et al., Curr Biol 2022).

      The authors use their observation that daf-18/PTEN mutants present morphological defects in GABAergic neurons prior to DD remodeling to dismiss the possibility that the DAF-18/PTEN-dependent effects are "not a consequence of deficient rearrangement during the early larval stages". However, DD remodeling is just another cell-fate-determined process and as such, its timing, for instance, can be affected by mutations in genes that affect cell fates and developmental decisions, such as daf-18 and daf-16, which affect developmental fates such as those related with the dauer fate. Specifically, the authors do not exclude the possibility that the defects observed in the absence of either gene could be explained by precocious DD remodeling. Precocious DD remodeling can occur when certain pathways, such as the lin-14 heterochronic pathway, are affected. Interestingly, lin-14 has been linked with daf16/FOXO in at least two ways: during lifespan determination (Boehm and Slack, Science 2005) and in the

      L1/L2 stages via the direct negative regulation of an insulin-like peptide gene ins-33 (Hristova et al., Mol Cell Bio 2005). It is likely that the prevention of DD dysfunction requires keeping insulin signaling in check (downregulated) in DD neurons in early larval stages, which seems to coincide with the critical timing and function of daf-18/PTEN. Hence, it will be interesting to test the involvement of these genes in the daf-18/daf-16 effects observed by the authors.

      This is another interesting point raised by the reviewer. We have demonstrated that defects manifest in early L1 (30 min-1 hour post-hatching) which corresponds to a pre-remodeling time in wild-type worms.

      We acknowledge the possibility of early remodeling in specific mutants as pointed out by the reviewer.

      However, the following points suggest that the effects of these mutations may extend beyond the particularity of DD remodeling: i) Our experiments also show defects in VD neurons in daf-18 mutants (Supplementary Figure 3, revised version), as discussed in our previous response. These neurons do not undergo significant remodeling during their development. ii) DAF-18 and DAF-16 deficiencies produce neurodevelopmental alteration on other Non-Remodeling Neurons: Severe neurite defects in neurons that are nearly fully formed at larval hatching, such as AIY in daf-18 and daf-16 mutants, have been previously reported (Christensen et al., 2011). Additionally, the migration of another neuron, HSN, is severely affected in these mutants (Kennedy et al., 2013). iii) To the best of our knowledge, DD remodeling only alters synaptic polarity without forming new commissures or significant altering the trajectory of the formed ones. Thus, it is unlikely (though not impossible) for remodeling defects to cause the observed commissural branching and handedness abnormalities in DD neurons. Therefore, we think that the impact of daf-18 mutations on GABAergic neurons is not primarily linked to DD remodeling but extends to various neuron types. It is intriguing and requires further exploration in the future, the apparent resilience of cholinergic motor neurons to these mutations. This resilience is not limited to daf18/PTEN animals since mutants in certain genes expressed in both neuron types (such as neuronal integrin ina-1 or eel-1, the C. elegans ortholog of HUWE1) alter the function or morphology of GABAergic neurons but not cholinergic motor neurons (Kowalski, J. R. et al. Mol Cell Neurosci 2014; Oliver, D. et al. J Dev Biol (2019); Opperman, K. J. et al. Cell Rep 2017). These points are discussed in the manuscript (Discussion, page 15, lines 311-322, revised version) and reveal the existence of compensatory or redundant mechanisms in these excitatory neurons, rendering them much more resistant to both morphological and functional abnormalities.

      Discussion on the impact of the work on the field and beyond:

      The authors significantly advance the field by bringing insight into how DAF-18/PTEN affects neurodevelopment, but fall short of understanding the mechanism of selectivity towards GABAergic neurons, and most importantly, of properly contextualizing their findings within the state-of-the-art C. elegans biology.

      For instance, the authors do not pinpoint which type of GABAergic neuron is affected, despite the fact that there are two very well-described populations of ventral nerve cord inhibitory GABAergic neurons with clear temporal and cell fate differences: the embryonically-born DD neurons and the postembryonically-born VD neurons. The time point of the critical period apparently defined by the authors (pending clarifications of methods, presentation of all data, and confirmation of inconsistencies between the text and figures in the submitted manuscript) could suggest that DAF-18/PTEN is required in either or both populations, which would have important and different implications. An effect on DD neurons seems more likely because an image is presented (Figure 2D) of a defect in an L1 daf-18/PTEN mutant larva with 6 neurons (which means the larva was processed at a time when VD neurons were not yet born or expressing pUnc-47, so supposedly it is an image of a larva in the first half of the L1 stage (0-~7h?)). DD neurons are also likely the critical cells here because the neurodevelopment errors are partially suppressed when the ketogenic diet is provided at an "early" L1 stage, but not later (e.g., from L2-L3, according to the text, L2-L4 according to the figure? ).

      Thank you for this insightful input. As previously mentioned, we conducted experiments in this revision to clarify the specificity of GABAergic errors in daf-18/PTEN mutants, in particular, whether they affect DDs, VDs, or both. Our results suggest that commissural defects are not limited to DD neurons but also occur in VD neurons (Supplementary Figure 3). Regarding the effect of βHB, our findings suggest that VD neurons are targets of βHB action. As mentioned in the previous response and the discussion section (Page 17, lines 369-383, revised version), we might speculate that lingering βHB or a metabolite from prior exposure could mitigate these defects in VD neurons that are born in Late L1s-Early L2s. Additionally, βHB has been noted for its capacity to induce long-term epigenetic changes. Therefore, it could act on precursor cells of VD neurons, with the resulting changes manifesting during VD development independently of whether exposure has ceased. All these possibilities are now discussed in the manuscript.

      Acknowledging that our work raises several questions that we aim to address in the future, we believe our manuscript provides valuable information regarding how the PI3K pathway modulates neuronal development and how dietary interventions can influence this process.

      This study brings important contributions to the understanding of GABAergic neuron development in C. elegans, but unfortunately, it is justified and contextualized mostly in distantly-related fields - where the study has a dubious impact at this stage rather than in the central field of the work (post-embryonic development of C. elegans inhibitory circuits) where the study has stronger impact. This study is fundamentally about a cell fate determination event that occurs in a nutritionally-sensitive

      developmental stage (post-embryonic L1 larval stage) yet the introduction and discussion are focused on more distantly related problems such as excitatory/inhibitory (E/I) balance, pathophysiology of human diseases, and treatments for them. Whereas speculation is warranted in the discussion, the reduced indepth consideration of the known biology of these neurons and organisms weakens the impact of the study as redacted. For instance, the critical role of DAF-18/PTEN seems to occur at the early L1 larval stage, a stage that is particularly sensitive to nutritional conditions. The developmental progression of L1 larvae is well-known to be sensitive to nutrition - eg, L1 larvae arrest development in the absence of food, something that is explored in nematode labs to synchronize animals at the L1 stage by allowing embryos to hatch into starvation conditions (water). Development resumes when they are exposed to food. Hence, the extensive postembryonic developmental trajectory that GABAergic neurons need to complete is expected to be highly susceptible to nutrition. Is it? The sensitivity towards the ketogenic diet intervention seems to favor this. In this sense, the attribution of the findings to issues with the nutrition-sensitive insulin-like signaling pathway seems quite plausible, yet this possibility seems insufficiently considered and discussed.

      We greatly appreciate the reviewer's emphasis on the sensitivity of the L1 stage to nutritional status. As the reviewer points out, C. elegans adjusts its development based on food availability, potentially arresting development in L1 in the absence of food. It is therefore reasonable that both the completion of DD neuron trajectories and the initial development steps of VD neurons are particularly sensitive to dietary modulation of the insulin pathway, in which both DAF-18 and DAF-16 play roles. This important point has also been included in the discussion (Page 18, lines 384-407, revised version).

      Finally, the fact that imbalances in excitatory/inhibitory (E/I) inputs are linked to Autism Spectrum Disorders (ASD) is used to justify the relevance of the study and its findings. Maybe at this stage, the speculation would be more appropriate if restricted to the discussion. In order to be relevant to ASD, for instance, the selectivity of PTEN towards inhibitory neurons should occur in humans too. However, at present, the E/I balance alteration caused by the absence of daf-18/PTEN in C. elegans could simply be a coincidence due to the uniqueness of the post-embryonic developmental program of GABAergic neurons in C. elegans. To be relevant, human GABAergic neurons should also pass through a unique developmental stage that is critically susceptible to the PI3K-PDK1-AKT pathway in order for DAF18/PTEN to have any role in determining their function. Is this the case? Hence, even in the discussion, where the authors state that "this study provides universally relevant information on.... the mechanisms underlying the positive effects of ketogenic diets on neuronal disorders characterized by GABA dysfunction and altered E/I ratios", this claim seems unsubstantiated as written particularly without acknowledging/mentioning the criteria that would have to be fulfilled and demonstrated for this claim to be true.

      Our results suggest that defects in GABAergic neurons are not limited to DDs, which, as the reviewer rightly notes, are quite unique in their post-embryonic development primarily due to the synaptic remodeling process they undergo. These defects also extend to VD neurons, which do not exhibit significant developmental peculiarities once they are born. Therefore, we think that the defects are not specific to the developmental program of DD neurons but are more related to all GABAergic motoneurons. Additionally, the observation of defects in non-GABAergic neurons in C. elegans daf-18 mutants supports the hypothesis that the role of daf-18 is not limited to DD neurons (Christensen et al., 2011; Kennedy et al., 2013).

      In mammals, Pten conditional knockout (cKO) animals have been extensively studied for synaptic connectivity and plasticity, revealing an imbalance between synaptic excitation and inhibition (E/I balance) (Reviewed in Rademacher and Eickholt, 2019, Cold Spring Harbor Perspect Med, https://doi.org/10.1101%2Fcshperspect.a036780). This imbalance is now widely accepted as a key pathological mechanism linked to the development of ASD-related behavior (Lee et al, 2017; Biological Psychiatry, https://doi.org/10.1016/j.biopsych.2016.05.011) . The importance of PTEN in the development of GABAergic neurons in mammals is well-documented. For instance, embryonic PTEN deletion from inhibitory neurons impacts the establishment of appropriate numbers of parvalbumin and somatostatin-expressing interneurons, indicating a central role for PTEN in inhibitory cell development (Vogt et al, 2015, Cell Rep, https://doi.org/10.1016%2Fj.celrep.2015.04.019). Additionally, conditional PTEN knockout in GABAergic neurons is sufficient to generate mice with seizures and autism-related behavioral phenotypes (Shin et al, 2021, Molecular Brain, https://doi.org/10.1186%2Fs13041-02100731-8). Moreover, while mice in which PV GABAergic neurons lacked both copies of Pten experienced seizures and died, heterozygous animals (PV-Pten+/−) showed impaired formation of perisomatic inhibition (Baohan et al, 2016, Nature Comm, OI: 10.1038/ncomms12829). Therefore, there is substantial evidence in mammals linking PTEN mutations to neurodevelopmental disorders in general and affecting GABAergic neurons in particular. Hence, we believe that the role of daf-18/PTEN in GABAergic development could be a more widespread phenomenon across the animal kingdom rather than a specific process unique to C. elegans.

      Beyond the points discussed, we have addressed the reviewer's comment regarding the last sentence of the abstract. We have revised it to more cautiously frame the relationship between our findings, ASD, and mammalian neurodevelopmental disorders.

      Reviewer #2 (Public Review):

      Summary:

      Disruption of the excitatory/inhibitory (E/I) balance has been reported in Autism Spectrum Disorders

      (ASD), with which PTEN mutations have been associated. Giunti et al choose to explore the impact of PTEN mutations on the balance between E/I signaling using as a platform the C. elegans neuromuscular system where both cholinergic (E) and GABAergic (I) motor neurons regulate muscle contraction and relaxation. Mutations in daf-18/PTEN specifically affect morphologically and functionally the GABAergic (I) system, while leaving the cholinergic (E) system unaffected. The study further reveals that the observed defects in the GABAergic system in daf-18/PTEN mutants are attributed to reduced activity of DAF-16/FOXO during development.

      Moreover, ketogenic diets (KGDs), known for their effectiveness in disorders associated with E/I imbalances such as epilepsy and ASD, are found to induce DAF-16/FOXO during early development. Supplementation with β-hydroxybutyrate in the nematode at early developmental stages proves to be both necessary and sufficient to correct the effects on GABAergic signaling in daf-18/PTEN mutants.

      Strengths:

      The authors combined pharmacological, behavioral, and optogenetic experiments to show the

      GABAergic signaling impairment at the C. elegans neuromuscular junction in DAF-18/PTEN and DAF-

      16/FOXO mutants. Moreover, by studying the neuron morphology, they point towards

      neurodevelopmental defects in the GABAergic motoneurons involved in locomotion. Using the same set of experiments, they demonstrate that a ketogenic diet can rescue the inhibitory defect in the daf18/PTEN mutant at an early stage.

      Weaknesses:

      The morphological experiments hint towards a pre-synaptic defect to explain the GABAergic signaling impairment, but it would have also been interesting to check the post-synaptic part of the inhibitory neuromuscular junctions such as the GABA receptor clusters to assess if the impairment is only presynaptic or both post and presynaptic.

      Moreover, all observations done at the L4 stage and /or adult stage don't discriminate between the different GABAergic neurons of the ventral nerve cord, ie the DDs which are born embryonically and undergo remodeling at the late L1 stage, and VDs which are born post-embryonically at the end of the L1 stage. Those additional elements would provide information on the mechanism of action of the FOXO pathway and the ketone bodies.

      Thank you for your insightful suggestions. 

      This is an initial study that serves as a cornerstone, demonstrating the sensitivity of GABAergic neuron development to alterations in the PI3K pathway and how these alterations can be mitigated by a dietary intervention with a ketone body. While we have determined that the transcription factor DAF-16/FOXO is essential in the neurodevelopmental process and is the target of ketone bodies to alleviate defects, there are still underlying mechanisms to be elucidated. This is only the first step that opens many avenues for further investigation, including the study of post-synaptic partners.

      While our current study primarily focuses on neuronal alterations without delving into potential postsynaptic effects, we do plan to investigate this aspect in future research. This includes examining GABAergic receptors as well as cholinergic receptors, as exacerbation of cholinergic signaling cannot be ruled out. To conduct a comprehensive study of post-synaptic structure and functionality, we would need strains with fluorescent markers for both pre- and post-synaptic components (such as rab-3, unc-49, unc29, acr-16 fusion to GFP or mCherry). Unfortunately, most of these strains are not currently available in our laboratory. Unlike the US or Europe, acquiring these strains from the C. elegans CGC repository in Argentina is challenging due to common customs delays, which require significant time and resources to navigate. Discussions at the Latin American C. elegans conference with CGC administrators, such as Ann Rougvie, have been initiated to address this issue, but a solution has not been reached yet.  Additionally, to analyze post-synaptic functionality in-depth, studying the response to perfusion with various agonists using electrophysiology would be beneficial. We are in the process of acquiring the capability to conduct electrophysiology experiments in our laboratory, but progress is slow due to limited funding.

      While we believe these experiments are very informative, they will require a considerable amount of time due to our current circumstances. We consider them non-essential to the primary message of the paper, which focuses on neuronal developmental defects leading to functional alterations in daf-18/PTEN mutants and the novel finding that these can be mitigated by supplementing food with hydroxybutyrate. We will study the structure and functionality of the post-synapse in our future projects and also plan to extend this investigation to mutants with deficiencies in genes closely related to neurodevelopmental defects, such as neuroligin, neurexin, or shank-3, which have been implicated in synaptic architecture.

      We also agree that discriminating between DD and VD neurons provides significant insights into the neurodevelopmental phenomena dependent on the FOXO pathway and the action of βHB. In this revised version, we present evidence that not only DD neurons are affected but also VD neurons (see

      Supplementary Figure 3, revised version). This allows us to suggest that daf-18 affects the development of GABAergic neurons regardless of whether they are born embryonically (DDs) or post-embryonically (VDs) (see also our response to the previous reviewer). We hope to distinguish the defects observed in each type of neuron in future studies. For this, we would need to use strains specifically marked in one neuronal type or another, which, for the same reasons mentioned earlier, would take a considerable amount of time under current conditions. 

      Conclusion:

      Giunti et al provide fundamental insights into the connection between PTEN mutations and neurodevelopmental defects through DAF-16/FOXO and shed light on the mechanisms through which ketogenic diets positively impact neuronal disorders characterized by E/I imbalances.  

      Reviewer #3 (Public Review):

      Summary:

      This is a conceptually appealing study by Giunti et al in which the authors identify a role for PTEN/daf-18 and daf-16/FOXO in the development of inhibitory GABA neurons, and then demonstrate that a diet rich in ketone body β-hydroxybutyrate partially suppresses the PTEN mutant phenotypes. The authors use three assays to assess their phenotypes: (1) pharmacological assays (with levamisole and aldicarb); (2) locomotory assays and (3) cell morphological assays. These assays are carefully performed and the article is clearly written. While neurodevelopmental phenotypes had been previously demonstrated for PTEN/daf-18 and daf-16/FOXO (in other neurons), and while KB β-hydroxybutyrate had been previously shown to increase daf-16/FOXO activity (in the context of aging), this study is significant because it demonstrates the importance of KB β-hydroxybutyrate and DAF-16 in the context of neurodevelopment. Conceptually, and to my knowledge, this is the first evidence I have seen of a rescue of a developmental defect with dietary metabolic intervention, linking, in an elegant way, the underpinning genetic mechanisms with novel metabolic pathways that could be used to circumvent the defects.

      Strengths:

      What their data clearly demonstrate, is conceptually appealing, and in my opinion, the biggest contribution of the study is the ability of reverting a neurodevelopmental defect with a dietary intervention that acts upstream or in parallel to DAF-16/FOXO.

      Weaknesses:

      The model shows AKT-1 as an inhibitor of DAF-16, yet their studies show no differences from wildtype in akt-1 and akt-2 mutants. AKT is not a major protein studied in this paper, and it can be removed from the model to avoid confusion, or the result can be discussed in the context of the model to clarify interpretation.

      Thank you very much for the suggestion. We agree with the reviewer's appreciation that the study of AKT's action itself is too limited in this study to draw conclusions that would allow its inclusion in the proposed model. Therefore, following the reviewer's suggestion, we have removed this protein from our model

      When testing additional genes in the DAF-18/FOXO pathway, there were no significant differences from wild-type in most cases. This should be discussed. Could there be an alternate pathway via DAF-18/DAF16, excluding the PI3K pathway or are there variations in activity of PI3K genes during a ketogenic diet that are hard to detect with current assays?

      Thank you for bringing up this point. Our pharmacological experiments indeed demonstrate that all mutants associated with an exacerbation of the PI3K pathway, which typically inhibits nuclear translocation and activity of the transcription factor DAF-16, lead to imbalances in E/I

      (excitation/inhibition) that manifest as hypersensitivity to cholinergic drugs. This includes the gain of function of pdk-1 and the loss of function of daf-18 and daf-16 itself. In our subsequent experiments, we demonstrate that this exacerbation of the PI3K pathway leads to errors in the neurodevelopment of GABAergic neurons, which explains the hypersensitivity to aldicarb and levamisole.

      As the reviewer remarks, it is intriguing why mutants inhibiting this pathway do not show differences in their sensitivity to cholinergic drugs compared to wild-type animals. We can speculate, for instance, that during neurodevelopment, there is a critical period where the PI3K pathway must remain with very low activity (or even deactivated) for proper development of GABAergic neurons. This could explain why there are no differences in sensitivity to cholinergic drugs between mutants that inhibit the PI3K pathway and the wild type. The PI3K pathway depends on insulin-like signals, which are in turn positively modulated by molecules associated with the presence of food. Interestingly, larval stage 1 is particularly sensitive to nutritional status, being able to completely arrest development in the absence of food. Therefore, dietary intervention with BHB may generate a signal of dietary restriction (as seen in mammals) and, as a consequence of this dietary restriction, the PI3K pathway is inhibited, resulting in increased DAF-16 activity. This could restore the proper neurodevelopment of GABAergic neurons. However, this is mere speculation, and further deeper experiments (than the pharmacology ones we performed here) with mutants in different genes within the PI3K pathway may shed light on this point.

      Following the reviewer's suggestion, this point has been discussed in the revised version of the manuscript. (Discussion Page 18, Lines 384-407).

      The consequence of SOD-3 expression in the broader context of GABA neurons was not discussed. SOD3 was also measured in the pharynx but measuring it in neurons would bolster the claims.

      SOD-3 is a known target of DAF-16. Previous studies have shown that βHB induces SOD-3 expression through the induction of DAF-16 (Edwards et al, 2014, Aging,

      https://doi.org/10.18632%2Faging.100683). The highest levels of SOD-3 expression are typically observed in the pharynx or intestine (DeRosa et al, 2019 https://doi.org/10.1038/s41586-019-1524-5;  Zheng et al., 2021, PNAS, https://doi.org/10.1073/pnas.2021063118), and it is often used as a measure of general upregulation of DAF-16. Therefore, we used this parameter as a measure of βHB upregulating systemic DAF-16 activity.  While we agree with the reviewer that observing variations in SOD-3 expression in neurons would further support our conclusions, unfortunately, we did not detect measurable signals of SOD-3 in motor neurons in either the control condition or the daf-18 background even upon stress or BHB-exposure. This may be because SOD-3 is a minor target of DAF-16 in these neurons, or its modulation may not correspond to the timing of fluorescence measurements (L4-adults).

      Despite this, our genetic experiments and neuron-specific rescue experiments lead us to conclude that DAF-16 must act autonomously in GABAergic neurons to ensure proper neurodevelopment.

      If they want to include AKT-1, seeing its effect on SOD-3 expression could be meaningful to the model.

      Thank you for this suggestion. We believe that even measuring SOD-3 levels in akt mutant backgrounds would still provide limited information to give it a predominant value in our work. Additionally, to have a complete understanding of the total role of AKT, it would be necessary to measure it in a double mutant background of akt-1; akt-2, and these double mutants generate 100 % dauers even at 15C (Oh et al., PNAS 2005, https://doi.org/10.1073/pnas.0500749102; Quevedo et al., Current Biology 2007, http://dx.doi.org/10.1016/j.cub.2006.12.038; Gatzi et al., PLOS ONE 2014,

      https://doi.org/10.1371/journal.pone.0107671), greatly complicating the execution of these experiments. Therefore, following the first advice of this reviewer, we have decided to modify our model by excluding AKT.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      ⁃ Please include earlier in the main text the rationale for using unc-25 as a control/reference already when mentioning Figure 1A.

      Thank you for pointing out the need to reference this control earlier. We have included the following paragraph in the description of Figure 1 (Page 5, line 71, revised version):

      “Hypersensitivity to cholinergic drugs is typical of animals with an increased E/I ratio in the neuromuscular system, such as mutants in unc-25 (the C. elegans orthologue for glutamic acid decarboxylase, an essential enzyme for synthesizing GABA). While daf-18/PTEN mutants become paralyzed earlier than wild-type animals, their hypersensitivity to cholinergic drugs is not as severe as that observed in animals completely deficient in GABA synthesis, such unc-25 null mutants (Figures 1B and 1C) indicating a less pronounced imbalance between excitatory and inhibitory signals.”

      ⁃ Please discuss the greater sensitivity of pdk-1(gf) animals to levamisole than to aldicarb.

      Thank you for bringing up this subtle point.  We understand that the reviewer is referring to the paralysis curve in response to aldicarb in pdk-1(gf), which is closer to unc-25 than the curve for levamisole (in both cases, they are more sensitive than the wild type). Therefore, pdk-1(gf) animals seem to be more sensitive to aldicarb than to levamisole. These results are now shown in Figure 1D (revised version).

      The PI3K pathway does not only act in neurons but also in muscles. Gain of function in pdk-1 has been shown to modulate muscle protein degradation (Szewczyk et al, EMBO Journal, 2008. https://doi.org/10.1038/sj.emboj.7601540). In contrast,  no effect on protein degradation has been reported for null mutants in this gene. Several studies have demonstrated that protein degradation levels can differentially affect receptor subunits, particularly acetylcholine receptors (Reviewed in Crespi et al, Br J Pharmacol, 2018). C. elegans is characterized by a wide repertoire of AChR subunits, and there are at least two subtypes of ACh receptors in muscles (one multimeric sensitive to levamisole and one homomeric (ACR-16) insensitive to levamisole) (Richmond et al, 1999 Nature Neuroscience http://dx.doi.org/10.1038/12160; Touroutine D, JBC 2005 https://doi.org/10.1074/jbc.M502818200).

      Interestingly, acr-16 null mutants are hypersensitive to aldicarb (Zeng et al, JCB, 2023, https://doi.org/10.1083/jcb.202301117) while the electrophysiological response to levamisole in this mutant remains similar to that of wild-type (Tourorutine et al, 2005). Therefore, it may be that the gain of function in pdk-1 induces a change in the expression of AChR subtypes in muscle that differentially affect sensitivity to levamisole and ACh. This is purely speculative, and there may be many other explanations. While it would be interesting to explore this difference further, it goes far beyond the scope of this study. The cholinergic drug sensitivity assay is purely exploratory and allowed us to delve into the GABAergic and cholinergic signals in daf-18 mutants. In this sense, the hypersensitivity of pdk-1(gf) to both drugs supports the idea that an increase in PI3K signaling leads to an increased E/I ratio.

      ⁃ Please explain the rationale to perform akt-1 and akt-2 assays separated. Why not test doublemutants? Has their lack of redundancy been determined?.  

      Our pharmacological assays are conducted at the L4 larval stage, making it impossible to analyze the potential redundancy of akt-1 and akt-2 in sensitivity to levamisole and aldicarb. This impossibility arises because the akt-1;akt-2 double mutant exhibits nearly 100% arrest as dauer even at 15°C, as reported in several prior studies (Oh et al., PNAS 2005, https://doi.org/10.1073/pnas.0500749102; Quevedo et al., Current Biology 2007, http://dx.doi.org/10.1016/j.cub.2006.12.038; Gatzi et al., PLOS ONE 2014, https://doi.org/10.1371/journal.pone.0107671). While the increased dauer arrest in the double mutant compared to the single mutants might suggest redundant functions in dauer entry, there are also reports indicating the absence of redundancy in other processes, such as vulval development (Nakdimon et al., PLOS Genetics 2012, https://doi.org/10.1371%2Fjournal.pgen.1002881).

      The complete Dauer arrest likely underlies why other studies focusing on the role of the PI3K pathway in neurodevelopment utilize both mutants separately (Christensen et al, Development 2011,

      https://doi.org/10.1242/dev.069062). While determining the potential redundancy of these genes is not feasible for this assay, we utilized various mutants of the pathway (age-1, pdk-1, daf-18, daf-16 and daf16;daf-18 in addition to the akt-s) that support the conclusion, which is that exacerbating the PI3K pathway activity makes animals hypersensitive to cholinergic drugs.

      In response to the reviewer's concern, we have added a sentence in the text explaining the impossibility of performing the assay in the akt-1;akt-2 double mutant (Page 6, lines90-92) 

      Figure 1C and D (This applies to all similarly presented bar figures). Please show data points and dispersion (preferably data, median+- 25-75% or average+-SD). 

      Thank you. Done

      ⁃ Line 112 -maybe "and resumes"? 

      Thank you. Done (Line 126, revised version)

      ⁃ Figure 1E and F. Please present mean +-SD (not SEM) of fluctuations. Please change slightly the tones so that the dispersion is easier to distinguish on the "blue light on" box.

      Thank you for the suggestion. We have adjusted the tones as recommended to enhance the visualization of the "blue light on" box. For visualization purposes, we present the shading of the standard error of the mean (SEM), as is usual in these types of optogenetic experiments where traces of animal length variations are measured (Liewald et al, Nature Methods, 2008, doi: 10.1038/nmeth.1252; Schulstheis et al, J. Neurophysiology, 2011, doi: 10.1152/jn.00578.2010; Koopman et al, BMC Biology 2021, https://doi.org/10.1186/s12915-021-01085-2; Seidhenthal et al, Micro Publication Biology, 2022, https://doi.org/10.17912%2Fmicropub.biology.000607 ).

      For the revised version, we have also included bar graphs for each optogenetic experiment, representing the mean of the length average of each worm measured from the first second after the blue light was turned on until the second before the light was turned off (in the graph, this corresponds to the period between seconds 6 and 9 of the traces). These graphs include the standard deviation and the corresponding significance levels. All of this has been included in the new legend (Figure 2D, 2E, 4E-J).

      ⁃ Figure 1A&1B & Supplementary Figure 1D x Supplementary Figure 1E&1F. What is the difference between these experiments? Whereas the unc-25 mutants paralyze in the same amount of time, the WT animals paralyze ~1 h later in Supplementary Figure 1E-1F in response to either drug. Please revise experimental conditions to see if anything can be learned eg, maybe this is a nutritional response from experiments done at different timepoints? Maybe different food recipes affected sensitivity to paralysis?

      Thank you for pointing this out. While the experiments with daf-18 (in both alleles) and daf-16 were conducted at the beginning of this project (2019-2020), the assays with the other mutants in the PI3K and mTOR pathways were performed years later. Changes in the reagents used (agar, peptone, cholesterol, etc.) to grow the worms have occurred, potentially altering the animals' response directly or through the nutritional quality of the bacteria they grow on. In addition, the difference may be attributed to the fact that experiments at the project's outset were conducted by one author, while more recent experiments were carried out by another. The goal is to quantify paralysis in non-responsive worms after touch stimulation. The force of this probing or the thickness of the hair used for touching can be slightly operator-dependent and can lead to variable responses. In addition, always the presence of wild-type and unc-25 strain is included as internal control in every experiment. Nevertheless, despite this userdependent variation, the experiments were always conducted blindly (except for unc-25, whose uncoordinated phenotype is easily identifiable), thus we trust in the outcomes.

      ⁃ Supplementary Figure 1G - Length and Width appear to be switched in both left and right panels - please revise and include a description of N and of statistics depicted. 

      Unfortunately, we don't see the switching error that the reviewer mentioned. In the left panel, we demonstrate that optogenetic activation of GABAergic neurons leads to an increase in length without modifying the width of the animal. Therefore, we conclude that the increase in area, as observed in our Fiji macro for optogenetic response analysis, is due to an increase in the animal's length. In the cholinergic activation shown in the right panel, the animal shortens (decreasing length) without modifying the width, resulting in the reduction of the total body area. 

      We have included information about N (sample size) and the statistical test used in the legends as suggested. These graphs are now shown as Figures 2F and G, revised version.

      ⁃ Supplementary Figure 1G legend lines 779-780. Please describe the post-hoc test applied following ANOVA to obtain the denoted p values. This applies to all datasets where ANOVA or Krusal-Wallis tests were applied.

      Following reviewer´s suggestion, all the post-hoc tests applied after ANOVA or Kruskal-Wallis analysis were included in the legend of each figure and Materials and Methods (statistical analysis section).

      ⁃ Line 174 maybe "arises *from* the hyperactivation" instead of *for*?.

      Corrected. Thank you. Line 190, revised version.

      ⁃ Supplementary Figure 4. On line 816 it says n=40-90, but please check the n of the daf-18, daf-16 samples, which seem to have less than 40 animals.

      We understand that the reviewer is referring to Supplementary Figure 3 from the original version (now Supplementary Figure 5 in the revised version). We have now included the number of observations below each data point cloud to clearly indicate the sample size for each condition

      ⁃ Supplementary Figure 4 - please state what are the bars on the graphs. Please state which post-hoc test was performed after Kruskal-Wallis and present at least the p values obtained between treated controls and each genotype. Alternatively, present the whole truth table in supplementary daita.

      We understand that the reviewer is referring to Supplementary Figure 3 from the original version (now Supplementary Figure 5 in the revised version). There was an error in the original legend (thank you for bringing this to our attention) since the statistics were not performed using Kruskall-Wallis in this case, but rather each treated condition was compared to its own untreated control using Mann-Whitney test. We have now added the p-values to the graph. All raw data for this figure, as well as for all other figures, are available in Open Science Framework (https://osf.io/mdpgc/?view_only=3edb6edf2298421e94982268d9802050).

      ⁃ Please cite the figure panels in order: eg, Figure 3E is mentioned in the text after panels Figure 3F-K.

      Done. We have rearranged the figures to adapt them to the text order (Figure 4, revised version)

      ⁃ Figure 4 - line 610 please revise "(n=20-30 (n: 20-25 animals per genotype/trial)."

      Thank you. Corrected.

      ⁃ Figure 4 - there appears to be an inconsistency in the figure with the text (lines 223-225). In figures it says E-L1, but in the text, it says "solely in L1". Does E-L1 include the whole L1 stage? If not- E-L1 can be interpreted only as during the embryonic stage, hence, no exposure to betaHB due to the impermeable chitin eggshell. Then there is L1-L2, which should cover the L1 stage and the L2 or something else. Please revise. The text mentions L2-L3 or L3-L4 and these categories are not in the figures. This clarification is key for the interpretation of the results. The precise developmental time of the exposures is not defined either in the methods or in the figures. Please provide precise times relative to hours and/or molts and revise the text/figure for consistency.

      The reviewer is entirely correct in pointing out the lack of relevant data regarding the exposure time to βHB. We have now clarified the information For the revised version, we have adjusted the nomenclature of each exposure period to precisely reflect the developmental stages involved.

      For the experiments involving continuous exposure to βHB throughout development, the NGM plate contained the ketone body. Therefore, the exposure encompassed, in principle, the ex-utero embryonic development period up to L4-Young adults (E-L4/YA, in Figure 5A) when the experiments were conducted. Since it could be a restriction to drug penetration through the chitin shell of the eggs (see Supplementary Figure 7), we can ensure βHB exposure from hatching.

      In experiments involving exposure at different developmental stages as those depicted in Figure 4 of the original version, (now Figure 5), animals were transferred between plates with and without βHB as required. We exposed daf-18/PTEN mutant animals to βHB-supplemented diets for 18-hour periods at different developmental stages (Figure 5A). The earliest exposure occurred during the 18 hours following egg laying, covering ex-utero embryonic development and the first 8-9 hours of the L1 stage (This period is called E-L1, in figure 5 revised version). The second exposure period encompassed the latter part of the L1 stage, the entire L2 stage, and most of the L3 stage (L1-L3). The third exposure spanned the latter part of the L3 stage (~1-2 hours), the entire L4 stage, and the first 6-7 hours of the adult stage (L3-YA).

      All this information has been conveniently included in Figure 5 (and its legend), text (Page 13, lines 259276), and Material and Methods of the revised manuscript.

      ⁃ Some methods are not sufficiently well described. Specifically, how the animals were exposed to treatments and how stages were obtained for each experiment. Was synchronization involved? If so, in which experiments and how exactly was it performed?

      As mentioned in previous responses all the experiments were performed in age-synchronized animals. We include the following sentence in Materials and Methods (C. elegans culture and maintenance section): “All experiments were conducted on age-synchronized animals. This was achieved by placing gravid worms on NGM plates and removing them after two hours. The assays were performed on the animals hatched from the eggs laid in these two hours”.

      Reviewer #2 (Recommendations For The Authors):

      Major points

      (1) To complete the study on the GABAergic signaling at the NMJs, it would be interesting to assess the status of the post-synaptic part of the synapse such as the GABAR clustering. It would also tell if the impairment is only presynaptic or both post and presynaptic.

      Thank you for your insightful suggestion. We agree that exploring post-synaptic elements can shed light on whether the impairment is solely presynaptic or involves both pre and post-synaptic components.

      While our current study primarily focuses on neuronal alterations without delving into potential postsynaptic effects, we do plan to investigate this aspect in the future. This includes not only examining GABAergic receptors but also exploring cholinergic receptors, as exacerbation of cholinergic signaling cannot be ruled out. To conduct a comprehensive study of post-synaptic structure and functionality, we would need strains with fluorescent markers for both pre and post-synaptic components (rab-3, unc-49, unc-29, acr-16 driving GFP or mCherry). However, most of these strains are not currently available in our laboratory. Unlike the US or Europe, acquiring these strains from the C. elegans CGC repository in Argentina is challenging due to common customs delays, requiring significant time and resources to navigate. Discussions at the Latin American C. elegans conference with CGC administrators, such as Ann Rougvie, have been initiated to address this issue, but a solution has not been reached yet. 

      Additionally, to analyze post-synaptic functionality in-depth, studying the response to perfusion with various agonists using electrophysiology would be beneficial. We are in the process of acquiring the capability to conduct electrophysiology experiments in our laboratory, but progress is slow due to limited funding.

      While we believe these experiments are very informative, they will require a considerable amount of time due to our current circumstances. We consider them non-essential to the primary message of the paper, which focuses on neuronal morphological defects leading to functional alterations in daf-18/PTEN mutants.

      We will include these experiments in our future projects, also planning to extend this investigation to mutants with deficiencies in genes closely related to neurodevelopmental defects, such as neuroligin, neurexin, or shank-3, which have been implicated in synaptic architecture.

      (2) The author always referred to unc-47 promoter or unc-17 promoter, never specifying where those promoters are driving the expression (and in the Materials & Methods, no information on the corresponding sequence). Depending on the promoters they may not only be expressed in the motoneurons involved in locomotion (VA, VB, DA, DB, VD, and DD), but they could also be expressed in other neurons which could be of importance for the conclusions of the optogenetic assays but also the daf-18 expression in GABAergic neurons.

      We appreciate the reviewer's insight regarding the broader expression patterns of the unc-17 and unc-47 promoters in all cholinergic and GABAergic neurons, respectively. The strains expressing constructs with these promoters were obtained from the CGC or other labs and have been widely used in previous papers (Liewald et al, Nature Methods, https://www.nature.com/articles/nmeth.1252 (2008); Byrne, A. B. et al. Neuron 81, 561-573, doi:10.1016/j.neuron.2013.11.019 (2014).

      Regarding the optogenetic assays, the readout utilized (body length elongation or contraction) is primarily associated with the activity of cholinergic and GABAergic motor neurons and has been used in numerous studies to measure motor neuron functionality (Liewald et al, Nature Methods, https://www.nature.com/articles/nmeth.1252 (2008);Hwang, H. et al. Sci Rep 6, 19900, doi:10.1038/srep19900 (2016); Schultheis et al,  . J Neurophysiol 106, 817-827, doi:10.1152/jn.00578.2010 (2011); Koopman, M., Janssen, L. & Nollen, E. A. BMC Biol 19, 170, doi:10.1186/s12915-021-01085-2 (2021);). It has previously been established that the shortening observed after optogenetic activation of the unc-17 promoter, while active in various interneurons, depends on the activity of cholinergic motor neurons (Liewald et al., Nature Methods, https://www.nature.com/articles/nmeth.1252 (2008)). This was demonstrated by examining transgenic worms expressing ChR2-YFP from another cholinergic, motoneuronspecific but weaker promoter, Punc-4. They observed contraction and coiling upon illumination, albeit to a milder degree.

      In terms of GABAergic neurons, only 3 do not directly synapse to body wall muscles (AVL, PDV, and RIS) and are primarily involved in defecation. Of the 23 GABAergic motor neurons, 19 are Dtype motoneurons, while the remaining 4 innervate head muscles (Pereira et al, eLife 2015, https://doi.org/10.7554/eLife.12432). It is therefore expected that while there may be some contribution from these latter neurons to the elongation after optogenetic activation in animals containing punc-47::ChR2, the main contribution should be from the D-type neurons. Additionally, while there may be some influence on D-type neuron development due to daf-18 rescue in neurons like RME, DVB or AVL, the most direct explanation for the rescue is that daf-18 acts autonomously in D-type cells.  Additionally, we have pharmacological and behavioral assays that support the findings of optogenetics and enable us to reach final conclusions.

      (3) DD neurons are born during embryogenesis and newborn L1s have neurites even though less than at a later stage. If possible, it would be interesting to take a look at them to see if βHB has an effect or not. It will corroborate the hypothesis that βHB action is prevented by the impermeable eggshell on a system that can respond at a later stage. Moreover, using a specific DD, DA, and DB promoter, it would be possible to check if there is a difference in the morphological defects between embryonic and post-embryonic neurons.

      This is a very interesting point raised by the reviewer. We conducted experiments to analyze the morphology of GABAergic neurons in animals exposed to βHB only during the ex-utero embryonic development (in their laid egg state). We observed that this incubation was not sufficient to rescue the defects in GABAergic neurons (Supplementary Figure 7, revised version). As reported by other authors and discussed in our paper, the chitinous eggshell might act as an impermeable barrier to most drugs. However, we cannot rule out that incubation during this period is necessary but not sufficient to mitigate the defects. We have included these experiments in Supplementary Figure 7 and in the text (Page 13, lines 272-276)

      Additionally, we analyzed confocal images where, based on their position, we could identify and assess errors in DD (embryonic) and VD (Post-embryonic) neurons (Supplementary Figure 3, revised version). These experiments show that the effects are observed in both types of neurons, and we did not observe any differential alterations in neuronal morphology between the two types of neurons.

      Minor points

      (1)   Expression of daf-18/PTEN in muscle or hypodermis, could it ensure a proper development? It could give insights into the action mechanism of βHB.

      The reviewer's observation is indeed very intriguing. Previous studies from the Grishok lab (Kennedy et al, 2013) have demonstrated that the expression of daf-18 or daf-16 in extraneuronal tissues, specifically in the hypodermis, can rescue migratory defects in the serotoninergic neuron HSN in daf-18 or daf-16 null mutants of C. elegans. Clearly, this could also be an option for rescuing the morphological and functional defects of GABAergic motoneurons.

      However, the fact that the expression of daf-18 in GABAergic neurons rescues these defects strongly suggests an autonomous effect. In this regard, autonomous effects of DAF-18 or DAF-16 on neurodevelopmental defects have also been reported in interneurons in C. elegans (Christensen et al, 2011). This is included in the discussion (Page 15, lines 330-335)

      (2) Re-organise the introduction. The paragraph on ketogenic diets (lines 35-38) is not logically linked.

      Following reviewer´s suggestion we have reorganized the introduction and changed the order of explanation regarding the significance of ketogenic diets, linking it with their proven effectiveness in alleviating symptoms of diseases with E/I imbalance (Lines 23-60, revised version)

      (3) Incorporate titles in the result section to guide the reader.

      Done. Thank you

      (4) Systematically add PTEN or FOXO when daf-18 or daf-16 are mentioned (for example lines 69, 84, 85).

      Done. Thank you  

      (5) Strain lists: lines 646 to 653: some information is missing on the different transgenes used in this study (integrated (Is) or extrachromosomal (Ex) with their numbers).

      Thank you for bringing this to our attention. We have now included all the information regarding the different transgenes used in this study, including whether they are integrated (Is) or extrachromosomal (Ex) and their respective numbers. This information can be found in the revised version of the manuscript (Materials and Methods, C. elegans culture and maintenance section highlighted in yellow).

      Reviewer #3 (Recommendations For The Authors):

      In Figure 1, some experiments were done with the unc-25 control while others, such as the optogenetic experiments, were done without those controls.

      Thank you for pointing this out. In the optogenetic experiments, we waited for the worm to move forward for 5 seconds at a sustained speed before exposing it to blue light to standardize the experiment, as the response can vary if the animal is in reverse, going forward, or stationary. Due to the severity of the uncoordinated movement in unc-25 mutants, achieving this forward movement before exposure is very difficult. Additionally, this lack of coordination prevents these animals from performing the escape response tests, as they barely move. Therefore, we limited the use of this severe GABAergic-deficient control to pharmacological or post-prodding shortening experiments.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      […]

      (1) The authors claim that the negative frequency dependence that maintains polymorphism in their model results from a non-linear relationship between the display trait and sexual success [...] Maybe I missed something, but the authors do not provide support for their claim about the negative frequency-dependence of sexual selection in their simulations. To do so they could (1) extract the relationship between the relative mating success of the two male types from the simulations and (2) demonstrate that polymorphism is not maintained if the relationship between male display trait and mating success is linear.

      We believe that there is a confusion of terminology here. We agree that for the two alleles at a locus impacting male display in our model, the allele conferring inferior display quality will have a fitness that increases as its frequency increases, so this allele displays positive frequency dependent fitness. And, the alternate, display-favoring allele at the locus does display negative frequency dependence. Our use of the terminology ‘negative frequency dependence’ was meant to refer to the negative dependence of the fitness of the display-favoring allele with respect to its own frequency. However, a significant body of literature instead discusses models in which both an allele and its alternate(s) are beneficial when at low frequency and deleterious when at high frequency under the same selective challenge, entailing negative frequency dependence of fitness for all alleles involved. This benefit-when-rare model of a single trait is often described simply as negative frequency dependence, and generates balancing selection at the locus, but is not the model we are presenting here, and does not encompass all models involving negative frequency dependent fitness. This lexical expectation may make the interpretation of our work more difficult, and we have amended the manuscript to make our model clearer (lines 227-231). In this model, we have a negative frequency dependence for the fitness of the display-favoring allele in mate competition, but the net selective disadvantage of this allele at high frequency is due to a cost in another, pleiotropic, fitness challenge: the constant survival effect. So, the alleles are under balancing selection where alternate alleles are favored by selection when rare, but not due solely to selection during mate competition. Instead, our model relies on pleiotropy for an emergent form of frequency-dependent balancing selection (in the sense that each allele is predicted to be beneficial on balance when rare).

      In the reviewer’s model of the success of two alleles at one locus, the ratio of success is vaguely linear with allele frequency for n=3, though it starts quite convex and has an inflection point between convex and concave segments (for the disfavored allele) at p≈0.532. This is visualized easily by plotting the function and its derivatives in Wolfram-Alpha. For n>=4, the fitness function with respect to the display-favoring/disfavoring allele becomes increasingly concave/convex respectively, and this specific nonlinearity is needed to act along with the antagonistic pleiotropy to maintain balancing selection, rather than being maintained by a model that favors any rare allele on the basis of its rarity in some manner. In an attempt to make the importance of the encounter number parameter clearer, we’ve generated new panels for Figure S1 which simulate encounter numbers 2, 3, and 4, and we have updated corresponding text and figure references in lines 335-338.

      For (1-2), it is not clear how to modify the simulation such that the relationship between the trait value and mating success can be perfectly linear - either linear with respect to allele frequency in a one locus model or linear with respect to trait value at a specific population composition, without removing the simulation of mate competition altogether. While it may be of interest to explore a more comprehensive range of biological trade-offs in future studies, we are not able to meaningfully do so within the context of the present manuscript.

      (2) The authors only explore versions of the model where the survival costs are paid by females or by both sexes. We do not know if polymorphism would be maintained or not if the survival cost only affected males, and thus if sexual antagonism is crucial.

      We now present simulations with male costs only as added panels to Figure S1 and mention these results in the main text (lines 334-335). Maintenance of the polymorphism is significantly reduced or completely absent in such simulations.

      (3) The authors assume no cost to aneuploidy, with no justification. Biologically, investment in aneuploid eggs would not be recoverable by Drosophila females and thus would potentially act against inversions when they are rare.

      We did offer some discussion and justification of our decision to model no inherent fitness of the inversion mutation itself, specifically aneuploidy, in lines 36-39 and 78-80 of the original reviewed preprint. Previous research suggests that D. melanogaster females may not actually invest in aneuploid eggs generated from crossover within paracentric inversions. While surprising, and potentially limited to a subset of clades, many ‘r-selected’ taxa or those in which maternal investment is spread out over time may have some degree of reproductive compensation for non-viable offspring, which can reduce the costs of generating aneuploids significantly (for example, t-haplotypes in mice). We have added this example and citation to lines 34ff in the current draft.

      (4) The authors appear to define balanced polymorphism as a situation in which the average allele frequency from multiple simulation runs is intermediate between zero and one (e.g., Figure 3). However, a situation where 50% of simulation runs end up with the fixation of allele A and the rest with the fixation of allele B (average frequency of 0.5) is not a balanced polymorphism. The conditions for balanced polymorphism require that selection favors either variant when it is rare.

      We originally chose mean final frequency for presenting the single locus simulations based on the ease of generating a visual plot that included information on fixation vs loss and equilibrium frequency. Figure 3 and related supplemental images have been changed to now also represent the proportion of simulations retaining polymorphism at the locus in the final generation.

      (5) Possibly the most striking result of the experiment is the fact that for 14 out of 16 combinations of inversion x maternal background, the changes in allele frequencies between embryo and adult appear greater in magnitude in females than in males irrespective of the direction of change, being the same in the remaining two combinations. The authors interpret this as consistent with sexually antagonistic pleiotropy in the case of In(3L)Ok and In(3R)K. The frequencies of adult inversion frequencies were, however, measured at the age of 2 months, at which point 80% of flies had died. For all we know, this may have been 90% of females and 70% of males that died at this point. If so, it might well be that the effects of inversion on longevity do not systematically differ between the ages and the difference in Figure 9B results from the fact that the sample includes 30% longest-lived males and 10% longest-lived females.

      This critique deserves some consideration. The aging adults were separated by sex during aging, but while we recorded the number of survivors, we did not record the numbers of eclosed adults and their sexes initially collected out of an interest in maintaining high throughput collection. We therefore cannot directly calculate the associated survival proportions, but we can estimate them. We collected 1960 females and 3156 males, and we can very roughly estimate survival if we assume that equal numbers of each sex eclosed, and that the survivors represent 20% of the original population. That gives 12790 individuals per sex, or 84.7% female mortality and 75.3% male mortality.

      So, we have added a qualification discussing the possibility of stronger selection on females and its influence on observed sex-specific frequency changes, on lines 602-605.

      (6) Irrespective of the above problem, survival until the age of 2 months is arguably irrelevant from the viewpoint of fitness consequences and thus maintenance of inversion polymorphism in nature. It would seem that trade-offs in egg-to-adult survival (as assumed in the model), female fecundity, and possibly traits such as females resistance to male harm would be much more relevant to the maintenance of inversion polymorphisms.

      Adult Drosophila will continue to reproduce in good conditions until mortality, and the estimated age of a mean reproductive event for a Drosophila melanogaster individual is 24 days (Pool 2015), and likewise for D. simulans (Turelli and Hoffman 1995). Given that reproduction is centered around 24 days, we expect sampling at 2 months of age to still be relevant to fitness. In seasonally varying climates, either temperate or with long dry season, survival through challenging conditions is expected to require several months. In many such cases, females are in reproductive diapause, and so longevity is the main selective pressure. See lines 931-936 in the revised manuscript.

      As we agreed above, it would of interest to investigate a wider range of trade-offs in future studies. We focused here on the balanced between survival and male reproductive success because the latter trait generates negative frequency dependence for display-favoring alleles and a disproportionate skew towards higher quality competitors, whereas many other fitness-relevant traits lack that property.

      (7) The experiment is rather minimalistic in size, with four cages in total; given that each cage contains a different female strain, it essentially means N=1. The lack of replication makes statements like " In(2L)t and In(2R)NS each showed elevated survival with all maternal strains except ZI418N" (l. 493) unsubstantiated because the claimed special effect of ZI418N is based on a single cage subject to genetic drift and sampling error. The same applies to statements on inversion x female background interac7on (e.g., l. 550), as this is inseparable from residual variation. It is fortunate that the most interesting effects appear largely consistent across the cages/female backgrounds. Still, I am wondering why more replicates had not been included.

      Our experimental approach might be described as “diversity replication”. Essentially, the four maternal genetic backgrounds are serving dual purposes – both to assess experimental consistency and to ensure that our conclusions are not solely driven by a single non-representative genotype (which in so many published studies, can not be ruled out). It would indeed be interesting if we could have quadrupled the size of our experiment by having four replicates per maternal background. However, we suspect the reviewer may not recognize the substantial effort involved in our four existing experiments. Each of these involved collecting 500+ virgin females, hand-picking thousands of embryos during the duration of egg-laying, and repeatedly transferring offspring to maintain conditions during aging, such that cages had to be staggered by more than a month. These four cages took a year of benchwork just to collect frozen samples, before any preparation and quality control of the associated amplicon libraries for sequencing. Adding a further multiplier would take it well beyond the scope of a single PhD thesis.  Fortunately, we were able to obtain the key results of interest without that additional effort, even if clearer insights into the role of maternal background would also be of strong interest.

      We do agree that no firm conclusions about maternal background can be reached without further replication, and so we have qualified or removed relevant statements accordingly (lines 568ff, 620-622).

      Reviewer #1 (Recommendations For The Authors):

      The description of the model is confusing and incomplete, e.g., the values of several parameters used to obtain the numerical results are not given. It is first stated (l. 223) that the model is haploid, but text elsewhere talks about homozygotes and heterozygotes. If the model is diploid (this in itself is not clear), what is assumed about dominance?

      We are not presenting results for a mathematical model estimated numerically. We have now clarified our transition from a conceptual depiction of our model, in which we use haploid representations for simplified presentation, to our forward population genetic simulations, which are entirely diploid. More broadly, we have improved our communication of the assumptions and parameters used in our simulations. The scenarios we investigate involve purely additive trait effects within and between loci (except that survival probabilities are multiplicative to avoid negative values). We think that considering other dominance scenarios would be a worthy subject for a follow-up study, whereas the present manuscript is already covering a great deal of ground.   

      Similarly, it is hard to understand the design (l.442ff). I was confused as to whether a population was set up for each inversion or for all of them and what the unit or replication was. I found the description in Methods (l. 763-771) much clearer and only slightly longer; I suggest the authors transfer it to the Results. Also, Figure 8 should contain the entire crossing scheme; the current version is misleading in that it implies males with only two genotypes.

      All four tested inversions were segregating within the same karyotypically diverse population of males, and were assayed from the same experiments. We have attempted to improve the relevant description. For Figure 8, we had trouble conceiving a graphic update that contained a more complete cross scheme without seeming much more confused and cluttered. We have tried to clarify in the relevant text and the figure caption instead.

      There are a number of small issues that should be addressed:

      - No epistasis for viability assumed - what would be the consequence?

      We explored a model in which we intentionally included no terms for epistatic effects on phenotype. All epistasis with regard to fitness is emergent from competition between individuals with phenotypes composed of non-epistatic, non-dominant genetic effects. So, the simplest model of antagonism would have no epistasis for viability whatsoever. One could explore a model that has emergent viability epistasis in a similar way, by implementing stabilizing selection on a quantitative trait with a gaussian or similar non-linear phenotype-to-fitness map, but that might be better served as a topic for a future study. We have, however, tried to make this intent clearer in the text.

      l. 750 implies that aneuploidy generated by the inversion has no cost (aneuploid games are resampled)

      Yes, as addressed in public review item (3). Alternately see lines 34ff, 293, 369, 392 for in-text edits.

      l. 24-25: unclear; is this to mean that there is haplotype x sex interaction for survival?

      l. 25: success in what? (I assume this will be explained in the paper, but the abstract should stand on its own).

      l. 193-4: "producing among most competitive males": something missing or a word too much?? Figure 1B,C: a tiny detail, but the plots would be more intuitive if the blue (average) bars were ager (i.e., to the right) of the male and female ones, given that the average is derived from the two sex-specific values.

      Each of the above have been edited or implemented as suggested

      l. 205. It is convex function, but I do not understand what the authors mean by "convex distribution".

      Hopefully the updated text is clearer: “yielding a distribution of male reproductive output that follows a relatively convex trend”.

      l. 223ff: some references to Fig 1 panels in this paragraph seem off by one letter (i.e., A should be B, etc.).

      l. 231 "fitness...are equally fit": rephrase 

      l. 260: maybe "thrown out" is not the most fortunate term, maybe "eliminated" would be better?

      Each of the above have been edited or implemented as suggested

      Figure 3: I do not understand the meaning of "additive" and "multiplicative" in the case of a single locus haploid model

      All presented simulations are diploid, and these refer to the interactions between the two alleles at the locus. Hopefully the language is overall clearer in this draft.

      l. 274: "Mutation of new nucleotide" meaning what? Or is it mutation _to_ a new nucleotide?

      Hopefully the revised text is clearer.

      Figure 5. The right panel of figure 5A implies that, with the inversion, the population evolves to an extreme display trait that is so costly that it fills 95% of all individuals (or of all females?

      What is assumed about this here?). Apart from the biological realism of this result, what does it say about the accumulation of polymorphism and maintenance of the inversion? The graphs in fig 5B do plot a divergence between haplotypes, but it is not clear how they relate to those in panel A - the parameter values used to generate these plots are again not listed. Furthermore, from the viewpoint of the polymorphism, it would be good to report the frequencies at the steady-state.

      We have now clarified the figure description, including the parameter values used. The distribution of frequencies at the end of the simulation is represented in figure 6. Given that we set up the simulation with assumptions that are otherwise common to population models, what biological process would prevent this extreme? Why isn’t this extreme observed in natural populations? One possible explanation is that they become sex chromosomes, with increasing likelihood as the cost increases. Or other compensatory changes may occur that we don’t simulate, like regulatory evolution giving a complementary phenotype. Maybe genetic constraints in natural populations prevent the mutation of the kind of pleiotropic mutations that drive this dynamic. The populations still survive, though they are parameterized by relative fitness. What would an absolute fitness population function be? Would it go extinct or not? It would be of interest to explore a wider range of models, but it is the purpose of this paper to establish that this is a viable model for the maintenance of sexually antagonistic polymorphism and association with inversions. We have added a paragraph motivated by this comment to the Discussion starting on line 765.

      l. 401-2: Z-like, W-like : please specify you are talking about patterns resembling sex chromosomes. 

      l. 738: "population calculates"?

      l. 743-4 and 746-7: is this the same thing said twice, or are there two components of noise?  l. 357: there is no figure 5C.

      Each of the above have been addressed with text edits.

      L. 473-5: Yes, the offspring did not contain inversion homozygotes, but the sire pool did, didn't it? So homozygous inversions may have affected male reproductive success. Anyway, most of this paragraph (from line 473) seems to belong in Discussion rather than Results.

      We have revised this sentence to focus on offspring survival. 

      We can understand the reviewer’s suggestion about Results vs. Discussion text. While this can often be a challenging balance, we find that papers are often clearer if some initial interpretation is offered within the Results text. However, we moved the portion of this paragraph relating our findings to the published literature to the Discussion.

      l. 516: " In(3L)Ok favored male survival": this is misleading/confusing given the data, " In(3L)Ok reduced female survival more strongly than male survival..."

      Hopefully the phrasing is clearer now.

      l. 663ff: I did not have an impression that this section added anything new and could safely be cut.

      We have done some editing to make this more concise and emphasize what we think is essential, but we believe that the model of an autosomal, sexually antagonistic inversion differentiating before contributing to the origin of a sex chromosome is novel and interesting. And, that this additional emphasis is worthwhile to encourage thought and consideration of this idea in future research and among interested researchers.

      l. 751: "flat probability per locus": do the authors mean a constant probability?

      Edited.

      Reviewer #2 (Public Review):

      The manuscript lacks clarity of writing. It is impossible to fully grasp what the authors did in this study and how they reached their conclusions. Therefore, I will highlight some cases that I found problematic.

      Hopefully the revised manuscript improves writing clarity. 

      Although this is an interesting idea, it clearly cannot explain the apparent influence of seasonal and clinal variation on inversion frequencies.

      We do not believe that our model predicts a non-existence of temporal and spatial dependence of the fitness of inverted haplotypes, nor do we seek to identify the manner in which seasonal and clinal differences affect fitness of inverted haplotypes. Rather, we argued that the influence of seasonal and clinal selection on inversions does not on its own predict the observed maintenance of inversions at low to intermediate frequencies across such a diverse geographic range, along with the higher frequencies of many derived inversions in more ancestral environments. 

      We might imagine that trade-offs between life history traits such as mate competition and survival should be universal across the range of an organism. But in practice, the fitness benefits and costs of a pleiotropic variant (or haplotype) may be heavily dependent on the environment. A harsh environment such as a temperate winter may both reduce the number of females that a male encounters (decreasing the benefit of display-enhancing variants) and also increase the likelihood that survival-costly variants lead to mortality (thus increasing their survival penalty). In light of such dynamics, our model would predict that equilibrium inversion frequencies should be spatially and temporally variable, in agreement with a number of empirical observations regarding D. melanogaster inversions.

      We have edited the introduction to emphasize that inversion frequencies vary temporally as well as seasonally, on lines 144ff. We also note relevant discussion of the potential interplay between the environment and trade-offs such as those we investigate, on lines 153-155.

      The simulations are highly specific and make very strong assumptions, which are not well-justified.

      We respond to all specific concerns expressed in the Recommendations For The Authors section below. We also note that we have made further clarifications throughout the text regarding the assumptions made in our analysis and their justification.  

      Reviewer #2 (Recommendations For The Authors):

      I think that the manuscript would greatly benefit from a major rewrite and probably also a reanalysis of the empirical data.

      In particular, a genome-wide analysis of differences in SNP frequencies between sexes and developmental stages would help the reader to appreciate that inversions are special.

      [moved up within this section for clarity] We are lacking a genomic null model-how often do the authors see similar allele frequency differences when looking at the entire genome? This could be easily done with whole genome Pool-Seq and would tell us whether inversions are really different from the genomic background. I think that this information would be essential given the many uncertainties about the statistical tests performed. 

      We expect that autosome-wide SNP frequencies will be heavily influenced by the frequencies of inversions, which occur on all four major autosomal chromosome arms. These inversions often show moderate disequilibrium with distant variants (e.g. Corbett-Detig & Hartl 2012).

      Furthermore, the limited number of haplotypes present, given that the paternal population was founded from 10 inbred lines, would further enhance associations between inversions and distant variants. Therefore, we do not expect that whole-genome Pool-Seq data would provide an appropriate empirical null distribution for frequency changes. Instead, we have generated appropriate null predictions by accounting for both sampling effects and experimental variance, and we have aimed to make this methodology clearer in the current draft. 

      Some basic questions:

      why start at a frequency of 50% (line 287)?

      Isn't it obvious that in this scenario strong alleles with sexually antagonistic effects can survive?

      The initial goal of the associated Figure 4 was not to show that a strongly antagonistic variant could persist. Instead, we wanted to test the linkage conditions in which a second, relatively weaker antagonistic variant survived – which did not occur in the absence of strong linkage. 

      We have now added simulations with relatively lower initial frequencies, in which the weaker variant and the inversion both start at 0.05 frequency, while the stronger variant is still initialized at 0.5 to reflect the initial presence of one balanced locus with a strongly antagonistic variant. Here, the weaker antagonistic variant is still usually maintained when it is close to the stronger variant, and while the inversion-mediated maintenance of the weaker variant at greater distance from the stronger variant because less frequent than the original investigated case, it still happens often enough to hypothetically allow for such outcomes over evolutionary time-scales.

      Still, we should also emphasize that the goals of this proof-of-concept analysis are to establish and convey some basic elements of our model. Subsequently, analyses such as those presented in Figures 5 and 6 provide clearer evidence that the hypothesized dynamics of inversions facilitating the accumulation of sexual antagonism actually occur in our simulations.

      The experiments seem to be conducted in replicate (which is of course essential), but I could not find a clear statement of how many replicates were done for each maternal line cross.

      How did the authors arrive at 16 binomial trials (line 473)? 4 inversions, 4 maternal genotypes?

      How were replicates dealt with?

      In Figure 9, it would be important to visualize the variation among replicates.

      Unfortunately, we did not have the bandwidth to perform replicates of each maternal line. Instead, we use four maternal backgrounds to simultaneously establish consistency across independent experiments and genetic backgrounds (see our response to Reviewer 1, point 7). We’ve edited the draft to make this clearer and more clearly delineate what is supported and not supported by our data. Replicate variation for the control replicates of the extraction and sequencing process, and the exact read counts of the experiment, are available in Supplemental Tables S5, S6, and S7.

      The statistical analysis of trade-off is not clear: which null model was tested? No frequency change? In my opinion, two significances are needed: a significant difference between parental and embryo and then embryo and adult offspring. The issue with this is, however, that the embryo data are used twice and an error in estimating the frequency of the embryos could be easily mistaken as antagonistic selection.

      Hopefully the description of our null model is clearer in the text, now starting around line 967 in the Methods. We are aware of the positive dependence when performing tests comparing the paternal to embryo and then embryo to offspring frequencies, and this is accounted for by our analysis strategy - see lines 1009-1012.

      It was not clear how the authors adjusted their chi-squared test expectations. Were they reinventing the wheel? There is an improved version of the chi-squared test, which accounts for sampling variation.

      We did not actually perform chi-square tests. Instead, we used the chi statistic from the chi-squared test as a quantitative summary of the differences in read counts between samples. We compared an observed value of chi to values for this statistic obtained from simulated replicates of the experiment. Sampling from this simulation generated our ‘expected’ distribution of read counts, sampled to match sources of variance introduced in the experimental procedure, but without any effect of natural selection, per lines 825ff in the original submission. Hence, we are approximating the likelihood of observing an empirical chi statistic by generating random draws from a model of the experiment and comparing values calculated from each draw to the experimental value: a Monte Carlo method of approximating a p-value for our data. We have attempted to make the structure of these simulations and their use as a null-model clearer in this draft.

      It is not sufficiently motivated why the authors model differences in the extraction procedure with a binomial distribution.

      Adding a source of variance here seemed necessary as running control sequencing replicates revealed that there was residual variance not fully recapitulated by sample-size-dependent resampling. Given that we were still sampling a number of draws from a binomial outcome (the read being from the inverted or standard arrangement), a binomial distribution seemed a reasonable model, and we fit the level of this additional noise source to an experiment-wide constant, read-count or genome-count independent parameter that best fit the variance observed in the controls (lines 830ff in the original draft). Clarification is made in this manuscript draft, lines 979-989.

      How many reads were obtained from each amplicon? It looks like the authors tried to mimic differences between technical replicates by a binomial distribution, which matches the noise for a given sample size, but this depends on the sequence coverage of the technical replicates.

      We provide read counts in Supplemental Tables S6 and S7. The relevant paragraph in the methods has been edited for clarity, lines 972ff. Accounting for sampling differences between replicates used a hypergeometric distribution for paternal samples to account for paternal mortality before collection, and the rest were resampled with a binomial distribution. There were two additional binomial samplings, to account for resampling the read counts and to capture further residual variance in the library prep that did not seem to depend on either allele or read counts.

      It would be good to see an estimate for the strength of selection: 10% difference in a single generation appears rather high to me.

      Estimates of selection strength based on solving for a Wright-Fisher selection coefficient for each tested comparison can now be found in Table S8, mentioned in text on lines 589-590. The mean magnitude of selection coefficients for all paternal to embryo comparisons was 0.322, and for embryo to all adult offspring it was 0.648. For In(3L)Ok the mean selection coefficients were 0.479 and -0.53, and for In(3R)K they were -0.189 and 1.28, respectively. Some are of quite large magnitude, but we emphasize that the coefficients for embryo to adult are based on survival to old age, rather than developmental viability. That factor, in addition to the laboratory environment, makes these estimates distinct from selection coefficients that might be experienced in natural populations.

      Reviewer #3 (Public Review):

      Strengths:

      (1) …the authors developed and used a new simulator (although it was not 100% clear as to why SLiM could not have been used as SLiM has been used to study inversions).

      Before SLiM 3.7 or so (and including when we did the bulk of our simulation work), we do not think it would have been feasible to use SLiM to model the mutation of inversions with random breakpoints and recombination between without altering the SLiM internals. Separately, needing to script custom selection, mutation, and recombination functions in Eidos would have slowed SLiM down significantly. Given our greater familiarity with python and numpy, and the ability to implement a similar efficiency simulator more quickly than through learning C++ and Eidos, we chose to write our own.

      It should be a fair bit easier to implement comparable simulations in SLiM now, but it will still require scripting custom mutation, selection, and recombination functions and would still result in a similarly slow runtime. The current script recipe recommended by SLiM for simulating inversions uses constants to specify the breakpoints of a single inversion, without the ability to draw multiple inversions from a mutational distribution, or model recombination between more complicated karyotypes. Hence, our simulator still seems to be a more versatile and functional option for the purposes of this study.

      Weaknesses:

      [Comments 1 through 4 on Weaknesses included numerous citation suggestions, and some discussion recommendations as well. In our revised manuscript, we have substantially implemented these suggestions. In particular, we have deepened our introduction of mechanisms of balancing selection and prior work on inversion polymorphism, integrating many

      suggested references. While especially helpful, these suggestions are too extensive to completely quote and respond to in this already-copious document. Therefore, we focus our response on two select topics from these comments, and then proceed to comment 5 thereafter.]

      (2) The general reduction principle and inversion polymorphism. In Section 1.2., the authors state that "there has not been a proposed mechanism whereby alleles at multiple linked loci would directly benefit from linkage and thereby maintain an associated inversion polymorphism under indirect selection." Perhaps I am misunderstanding something, but in my reading, this statement is factually incorrect. In fact, the simplest version of Dobzhansky's epistatic coadaptation model

      (see Charlesworth 1974; also see Charlesworth and Charlesworth 1973 and discussion in Charlesworth & Flatt 2021; Berdan et al. 2023) seems to be an example of exactly what the authors seem to have in mind here: two loci experiencing overdominance, with the double heterozygote possessing the highest fitness (i.,e., 2 loci under epistatic selection, inducing some degree of LD between these loci), with subsequent capture by an inversion; in such a situation, a new inversion might capture a haplotype that is present in excess of random expectation (and which is thus filer than average)…

      We agree that the quoted statement could be misleading and have rewritten it. We intended to point out that we are presenting a model in which all loci contribute additively (with respect to display) or multiplicatively (with respect to survival probability), without any dominance relationships or genetic interaction terms. And yet, the model generates epistatic balancing selection in a panmictic population under a constant environment. This represents a novel mechanism by which (the life-history characteristics of) a population would generate epistatic balancing selection as an emergent property, instead of assuming a priori that there is some balancing mechanism and representing frequency dependence, dominance effects, or epistatic interactions directly using model parameters. We have therefore refined the scope of the statement in question (lines 155-158). 

      (4) Hearn et al. 2022 on Littorina saxatilis snails. 

      A good reference. There is considerable work on ecotype-associated inversions in L. saxatalis, but we previously cut some discussion of this and of other populations with high gene flow but identifiable spatial structure for inversion-associated phenotypes (e.g. butterfly mimicry polymorphisms, Mimulus, etc.). Due to the spatially discrete environmental preferences and sampled ranges of the inversions in these populations, we considered these examples to be somewhat distinct from explaining inversion polymorphism in a potentially homogenous and panmictic environment. 

      (4) cont. A very interesting paper that may be worth discussing is Connallon & Chenoweth (2019) about dominance reversals of antagonistically selected alleles (even though C&C do not discuss inversions): AP alleles (with dominance reversals) affecting two or more life-history traits provide one example of such antagonistically selected alleles (also see Rose 1982, 1985; Curtsinger et al. 1994) and sexually antagonistically selected alleles provide another. The two are of course not necessarily mutually exclusive, thus making a conceptual connection to what the authors model here.

      We had removed a previously drafted discussion of dominance reversal for brevity’s sake, but this topic is once again represented in the updated draft of the manuscript with a short reference in the introduction, lines 76-80. We also mention ‘segregation lift’ (Wittmann et al. 2017) involving a similar reversal of dominance for fitness between temporally fluctuating conditions, as opposed to between sexes or life history stages. 

      (5) The model. In general, the description of the model and of the simulation results was somewhat hard to follow and vague. There are several aspects that could be improved:  [5](1) it would help the reader if the terminology and distinction of inverted vs. standard arrangements and of the three karyotypes would be used throughout, wherever appropriate.

      We have attempted to do so, using the suggested heterokaryotypic/homokaryotypic terminology.

      [5](2) The mention of haploid populations/situations and haploid loci (e.g., legend to Figure 1) is somewhat confusing: the mechanism modelled here, of course, requires suppressed recombination in the inversion/standard heterokaryotype; and thus, while it may make sense to speak of haplotypes, we're dealing with an inherently diploid situation. 

      While eukaryotes with haploid-dominant life history may still experience similar dynamics, we do expect that most male display competition is in diploid animals, and we are only simulating diploid fitnesses and experimenting with diploid Drosophila. We have tried to minimize the discussion of haploids in this draft.

      [5](3) The authors have a situation in mind where the 2 karyotypes (INV vs. STD) in the heterokaryotype carry distinct sets of loci in LD with each other, with one karyotype/haplotype carrying antagonistic variants favoring high male display success and with the other karyotype/haplotype carrying non-antagonistic alternative alleles at these loci and which favor survival. Thus, at each of the linked loci, we have antagonistic alleles and non-antagonistic alleles - however, the authors don't mention or discuss the degree of dominance of these alleles. The degree of dominance of the alleles could be an important consideration, and I found it curious that this was not mentioned (or, for that matter, examined). 

      In this study, our goal was to show that the investigated model could produce balanced and increasing antagonism without the need to invoke dominance. We think there would be a strong case for a follow-up study that more investigates how dominance and other variables impact the parameter space of balanced antagonism, but this goal is beyond our capacity to pursue in this initial study. We’ve added several lines clarifying the absence of dominance from our investigated models, and pointing out that dominance could modulate the predictions of these models (lines 211-213, 278-282).  

      [5](4) In many cases, the authors do not provide sufficient detail (in the main text and the main figures) about which parameter values they used for simulations; the same is true for the Materials & Methods section that describes the simulations. Conversely, when the text does mention specific values (e.g., 20N generations, 0.22-0.25M, etc.), little or no clear context or justification is being provided. 

      We have sought to clarify in this draft that 20N was chosen as an ample time frame to establish equilibrium levels and frequencies of genetic variation under neutrality. We present a time sequence in Figure 5, and these results indicate that that antagonism has stabilized in models without inversions or with higher recombination rates, whereas its rate of increase has slowed in a model with inversions and lower levels of crossing over. 

      The inversion breakpoints and the position of the locus with stronger antagonistic effects in Figure 4 were chosen arbitrarily for this simple proof of concept demonstration, with the intent that this locus was close to one breakpoint. Hopefully these and other parameters are clearer in the revised manuscript.

      [5](5) The authors sometimes refer to "inversion mutation(s)" - the meaning of this terminology is rather ambiguous.

      Edited, hopefully the wording is clearer now. The quoted phrase had uniformly referred to the origin of new inversions by a mutagenic process. 

      (6) Throughout the manuscript, especially in the description and the discussion of the model and simulations, a clearer conceptual distinction between initial "capture" and subsequent accumulation / "gain" of variants by an inversion should be made. This distinction is important in terms of understanding the initial establishment of an inversion polymorphism and its subsequent short- as well as long-term fate. For example, it is clear from the model/simulations that an inversion accumulates (sexually) antagonistic variants over time - but barely anything is said about the initial capture of such loci by a new inversion.

      We do not have a good method of assessing a transition between these two phases for the simulations in which both antagonistic alleles and inversions arise stochastically by a mutagenic process. However, we have tried to be clearer on the distinction in this draft: we have included simulations in Figure 4 with variants starting at lower frequencies, and we have tried to better contextualize the temporal trajectories in Figure 5 as (in part) modeling the accumulation of variants after such an origin.

      Reviewer #3 (Recommendations For The Authors):

      - In general: the whole paper is quite long, and I felt that many parts could be written more clearly and succinctly - the whole manuscript would benefit from shortening, polishing, and making the wording maximally precise. Especially the Introduction (> 8 pages) and Discussion (7.5 pages) sections are quite long, and the description of the model and model results was quite hard to follow.

      We have attempted to condense some portions of the manuscript, but inevitably added to others based on important reviewer suggestions. Regarding the length Introduction and Discussion, we are covering a lot of intellectual territory in this study, and we aim to make it accessible to readers with less prior familiarity. At this point, we have well over 100 citations – far more than a typical primary research paper – in part thanks to the relevant sources provided by this reviewer. We are therefore optimistic that our text will provide a valuable reference point for future studies. We have also made significant efforts to clarify the Results and Methods text in this draft without notably expanding these sections.

      - In general: the conceptual parts of the paper (introduction, discussion) could be better connected to previous work - this concerns e.g. the theoretical mechanisms of balancing selection that might be involved in maintaining inversions; the general, theoretical role of antagonistic pleiotropy (AP) and trade-offs in maintaining polymorphisms; previously made empirical connections between inversions and AP/trade-offs; previously made empirical connections between inversions and sexual antagonism.

      In the revised manuscript, we have improved the connection of these topics to prior work.

      - L3: "accumulate". A clearer distinction could be made, throughout, between initial capture of alleles/haplotypes by an inversion vs. subsequent gain.

      Please see point 6 in the response to the Public Review, above.

      - L29: I basically agree about the enigma, however, there are quite many empirical examples in D. melanogaster / D. pseudoobscura and other species where we do know something about the nature of selection involved, e.g., cases of NFDS, spatially and temporally varying selection, fitness trade-offs, etc.

      At least for our focal species, we have emphasized that geographic (and now temporal) associations have been found for some inversions. For the sake of length and focus, we probably should not go down the road of documenting each phenotypic association that has been reported for these inversions, or say too much about specific inversions found in other species. As indicated in our response to reviewer 2, some previously documented inversion-associated trade-offs may be compatible with the model presented here. However, we did locate and add to our Discussion one report of frequency-dependent selection on a D. melanogaster inversion (Nassar et al. 1973).

      - L43: it is actually rather unlikely, though not impossible, that new inversions are ever completely neutral (see the review by Berdan et al. 2023).

      This line was intended to convey that, in line with Said et al. 2018’s results, the structural alterations involved in common segregating inversions are not expected to contribute significantly to the phenotype and fitness (as indicated by lack of strong regulatory effects), and that their phenotypic consequences are instead due to linked variation. We have rewritten this passage to better communicate this point, now lines 44-52. Interpreting Section 2 and Figure 1 of Berdan et al. 2023, the linked variation may be what is in mind when saying that inversions are almost never neutral. We have also added a line referencing the expected linked variation of a new inversion (lines 49-52).

      - L51-73: I felt this overview should be more comprehensive. The model by Kirkpatrick & Barton (2016 ) is in many ways less generic than the one of Charlesworth (1974) which essentially represents one way of modeling Dobzhansky's epistatic coadaptation. Also, the AOD mechanism is perhaps given too much weight here as this mechanism is very unlikely to be able to explain the establishment of a balanced inversion polymorphism (see Charlesworth 2023 preprint on bioRxiv). NFDS, spatially varying selection and temporally varying selection (for all of which there is quite good empirical evidence) should all be mentioned here, including the classical study of Wright and Dobzhansky (1946) which found evidence for NFDS (also see Chevin et al. 2021 in Evol. Lett.)

      On reflection, we agree that we put too much emphasis on AOD and have edited the section to be more representative.

      - L57. Two earlier Dobzhansky references, about epistatic coadaptation, would be: Dobzhansky, T. (1949). Observations and experiments on natural selection in Drosophila. Hereditas, 35(S1), 210-224. hlps://doi.org/10.1111/j.1601-5223.1949.tb033 34.xM; Dobzhansky, T. (1950). Genetics of natural populations. XIX. Origin of heterosis through natural selection in populations of Drosophila pseudoobscura. Genetics, 35, 288-302.hlps://doi.org/10.1093/gene7cs/35.3.288 - In general, in the introduction, the classical chapter by Lemeunier and Aulard (1992) should be cited as the primary reference and most comprehensive review of D. melanogaster inversion polymorphisms.

      - L101: this is of course true, though there are some exceptions, such as In(3R)Mo.

      - L110: the papers by Knibb, the chapter by Lemeunier and Aulard (1992), and the meta-analysis of INV frequencies by Kapun & Flatt (2019) could be cited here as well.

      Citation suggestions integrated.

      - L123 and elsewhere: the common D. melanogaster inversions are old but perhaps not THAT old - if we take the Corbett-Detig & Hartl (2012) es7mates, then most of them do not really exceed an age of Ne generations, or at least not by much. I mean: yes, they are somewhat old but not super-old (cf. discussion in Andolfatto et al. 2001).

      Edited to curb any hyperbole. We agree that there are much more ancient polymorphisms in populations.

      - L133-135. This needs to be rewritten: this claim is incorrect, to my mind (Charlesworth 1974; also see Charlesworth and Charlesworth 1973; discussion in Charlesworth & Flatt 2021).

      Edited. See public review response (2).

      - L154: the example of inversion polymorphism is actually explicitly discussed in Altenberg's and Feldman's (1987) paper on the reduction principle.

      Edited to mention this. Inversions are also mentioned in Feldman et al. 1980, Feldman and Balkau 1973, Feldman 1972, and have been in discussion since the origins of the idea.

      - L162ff: see Connallon & Chenoweth (2019).

      Citation suggestion integrated, along with Cox & Calsbeek 2009 which seems more directly applicable, now line 185ff.

      - L169: why? There is much evidence for other important trade-offs in this system.

      Reworded.

      - L178-179: other studies have found that trade-offs/AP contribute to the maintenance of inversion polymorphisms, e.g. Mérot et al. 2020 and Betrán et al. 1998, etc.

      Added Betrán et al. 1998 - a good reference. Moved up mention of Mérot et al. 2020 from later in the text and directed readers to the Discussion, lines 202-205.

      - L198. "alternate inversion karyotypes" - you mean INV vs. STD? It would be good to adopt a maximally clear, uniform terminology throughout.

      Edited to communicate this better.

      - L215-217: this is a theoretically well-known result due to Hazel (1943); Dickerson (1955); Robertson (1955); e.g., see the discussion in the quantative genetics book by Roff (1997) or in the review of Flatt (2020).

      Citations integrated, now lines 232ff.

      - L223 and L245: "haploid" - somewhat confusing (see public review). 

      - L259-260: This may need some explanation. 

      - L261-262: simply state that there is no recombination in D. melanogaster males.

      Edited for increased clarity.

      - L274 (and elsewhere): the meaning of "mutation...of new..inversion polymorphisms" is ambiguous - do you mean a polymorphic inversion and hence a new inversion polymorphism or do you mean polymorphisms/variants accumulating in an inversion?

      - L275: maybe better heterokaryotypic instead of heterozygous? (note that INV homokaryotypes or STD homokaryotypes can be homo- or heterozygous, so when referring to chromosomal heterozygotes instead of heterozygous chromosomes it may be best to refer to heterokaryotypes).

      Per [5](1) and [5](5) in the public review, we have edited our terminology.

      - L276: referral to M&M - I found the description of the model/simulation details there to be somewhat vague, e.g. in terms of parameter settings, etc.

      Further described.

      - L281-282: would SLiM not have worked?

      See public review response.

      - L286-287: why these parameters?

      Further described.

      - L296ff: it is not immediately clear that the loci under consideration are polymorphic for antagonistic alleles vs. non-antagonistic alternative alleles - maybe this could be made clear very explicitly.

      Edited to be explicit as suggested.

      - L341, 343: "inversion mutation" - meaning ambiguous.

      - L348, 352: "specified rate" - vague.

      - L354-357: initial capture and/or accumulation/gain? 

      - L401, 402, 404: Z-, W- and Y- are brought up here without sufficient context/explanation.

      The above have been addressed by edits in the text.

      - L523, 557, 639, 646, and elsewhere: not the first evidence - see the paper by Mérot et al. (2020) (and e.g. also by Yifan Pei et al. (2023)). 

      Citations integrated in the introduction and discussion. Mérot et al. (2020) was cited (L486 in original) but discussion was curtailed in the previous draft. 

      - L558-559. I agree but it is clear that there are many mechanisms of balancing selection that can achieve this, at least in principle; for some of them (NFDS, etc.) we have pretty good evidence. 

      - L576-577. This is correct but for In(3R)C that study did find a differential hot vs. cold selection response.

      Addressed with text edit. 

      - L584-L586: cf. Betrán et al. (1998), Mérot et al. (2020), Pei et al. (2023), etc.

      - L591. "other forms of balancing selection": yes! This should be stressed throughout. Multiple forms of balancing selection exist and they are not mutually exclusive. 

      - L593: consider adding Dobzhansky (1943), Machado et al. (2021) 

      - L596-597: this is rather unlikely, at least in terms of inversion establishment (see Charlesworth 2023; hlps://www.biorxiv.org/content/10.1101/2023.10.16.562579v1).

      - L608: consider adding Kapun & Flal (2019). 

      - L611-612: see studies by Mukai & Yamaguchi, 1974; and Watanabe et al., 1976. 

      - L639, 646: AP - see general literature on AP as a factor in maintaining polymorphism (Rose

      1982, 1985; Curtsinger et al. 1994; Charlesworth & Hughes 2000 chapter in Lewontin Festschrift; Conallon & Chenoweth 2019 - this latter paper is par7cularly relevant in terms of AP effects in the context of sexual antagonism) 

      Citation suggestions integrated.

      - L657: inversion polymorphism is explicitly discussed in Altenberg's and Feldman's (1987) paper on the reduction principle.

      Hopefully this is better communicated.

      - L724-755: I felt that this section generally lacks sufficient details, especially in terms of parameter choices and settings for the simula7ons. 

      - L732L: why not state these rates?

      Parameter values are now given a fuller description in figure legends and in the methods.  

      - L746: but we know that mutational effect sizes are not uniformly distributed (?).

      We made this choice for simplicity and to avoid invoking seemingly arbitrary distribution, but one could instead simulate trait effects with some gamma distribution. Display values would still have variable fitness effects that fluctuate with population composition, but we agree that distribution shifted toward small effects would be more realistic.

      - L765: In(3R)P is not mentioned elsewhere - is this really correct?

      That was incorrect, fixed.

    1. Author response:

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:

      This paper investigates the effects of the explicit recognition of statistical structure and sleep consolidation on the transfer of learned structure to novel stimuli. The results show a striking dissociation in transfer ability between explicit and implicit learning of structure, finding that only explicit learners transfer structure immediately. Implicit learners, on the other hand, show an intriguing immediate structural interference effect (better learning of novel structure) followed by successful transfer only after a period of sleep.

      Strengths:

      This paper is very well written and motivated, and the data are presented clearly with a logical flow. There are several replications and control experiments and analyses that make the pattern of results very compelling. The results are novel and intriguing, providing important constraints on theories of consolidation. The discussion of relevant literature is thorough. In summary, this work makes an exciting and important contribution to the literature.

      Weaknesses:

      There have been several recent papers that have identified issues with alternative forced choice (AFC) tests as a method of assessing statistical learning (e.g. Isbilen et al. 2020, Cognitive Science). A key argument is that while statistical learning is typically implicit, AFC involves explicit deliberation and therefore does not match the learning process well. The use of AFC in this study thus leaves open the question of whether the AFC measure benefits the explicit learners in particular, given the congruence between knowledge and testing format, and whether, more generally, the results would have been different had the method of assessing generalization been implicit. Prior work has shown that explicit and implicit measures of statistical learning do not always produce the same results (eg. Kiai & Melloni, 2021, bioRxiv; Liu et al. 2023, Cognition).

      We agree that numerous papers in the Statistical Learning literature discuss how different test measures can lead to different results and, in principle, using a different measure could have led to varying results in our study. In addition, we believe there are numerous additional factors relevant to this issue including the dichotomous vs. continuous nature of implicit vs. explicit learning and the complexity of the interactions between the (degree of) explicitness of the participants' knowledge and the applied test method that transcend a simple labeling of tests as implicit or explicit and that strongly constrains the type of variations the results of  different test would produce. Therefore, running the same experiments with different learning measures in future studies could provide additional interesting data with potentially different results.

      However, the most important aspect of our reply concerning the reviewer's comment is that although quantitative differences between the learning rate of explicit and implicit learners are reported in our study, they are not of central importance to our interpretations. What is central are the different qualitative patterns of performance shown by the explicit and the implicit learners, i.e., the opposite directions of learning differences for “novel” and “same” structure pairs, which are seen in comparisons within the explicit group vs. within the implicit group and in the reported interaction. Following the reviewer's concern, any advantage an explicit participant might have in responding to 2AFC trials using “novel” structure pairs should also be present in the replies of 2AFC trials using the “same” structure pairs and this effect, at best, could modulate the overall magnitude of the across groups (Expl/Impl.) effect but not the relative magnitudes within one group. Therefore, we see no parsimonious reason to believe that any additional interaction between the explicitness level of participants and the chosen test type would impede our results and their interpretation. We will make a note of this argument in the revised manuscript.

      Given that the explicit/implicit classification was based on an exit survey, it is unclear when participants who are labeled "explicit" gained that explicit knowledge. This might have occurred during or after either of the sessions, which could impact the interpretation of the effects.

      We agree that this is a shortcoming of the current design, and obtaining the information about participants’ learning immediately after Phase 1 would have been preferred. However, we made this choice deliberately as the disadvantage of assessing the level of learning at the end of the experiment is far less damaging than the alternative of exposing the participants to the exit survey question earlier and thereby letting them achieve explicitness or influence their mindset otherwise through contemplating the survey questions before Phase 2. Our Experiment 5 shows how realistic this danger of unwanted influence is: with a single sentence alluding to pairs in the instructions of Exp 5, we  could completely change participants' quantitative performance and qualitative response pattern. Unfortunately, there is no implicit assessment of explicitness we could use in our experimental setup. We also note that given the cumulative nature of statistical learning, we expect that the effect of using an exit survey for this assessment only shifts absolute magnitudes (i.e. the fraction of people who would fall into the explicit vs. implicit groups) but not aspects of the results that would influence our conclusions.

      Reviewer #2 (Public Review):

      Summary:

      Sleep has not only been shown to support the strengthening of memory traces but also their transformation. A special form of such transformation is the abstraction of general rules from the presentation of individual exemplars. The current work used large online experiments with hundreds of participants to shed further light on this question. In the training phase, participants saw composite items (scenes) that were made up of pairs of spatially coupled (i.e., they were next to each other) abstract shapes. In the initial training, they saw scenes made up of six horizontally structured pairs, and in the second training phase, which took place after a retention phase (2 min awake, 12 h incl. sleep, 12 h only wake, 24 h incl.

      sleep), they saw pairs that were horizontally or vertically coupled. After the second training phase, a two-alternatives-forced-choice (2-AFC) paradigm, where participants had to identify true pairs versus randomly assembled foils, was used to measure the performance of all pairs. Finally, participants were asked five questions to identify, if they had insight into the pair structure, and post-hoc groups were assigned based on this. Mainly the authors find that participants in the 2-minute retention experiment without explicit knowledge of the task structure were at chance level performance for the same structure in the second training phase, but had above chance performance for the vertical structure. The opposite was true for both sleep conditions. In the 12 h wake condition these participants showed no ability to discriminate the pairs from the second training phase at all.

      Strengths:

      All in all, the study was performed to a high standard and the sample size in the implicit condition was large enough to draw robust conclusions. The authors make several important statistical comparisons and also report an interesting resampling approach. There is also a lot of supplemental data regarding robustness.

      Weaknesses:

      My main concern regards the small sample size in the explicit group and the lack of experimental control.  

      The sample sizes of the explicit participants in our experiments are, indeed, much smaller than those of the implicit participants due to the process of how we obtain the members of the two groups. However, these sample sizes of the explicit groups are not small at all compared to typical experiments reported in Visual Statistical Learning studies, rather they tend to be average to large sizes. It is the sizes of the implicit subgroups that are unusually high due to the aforementioned data collecting process. Moreover, the explicit subgroups have significantly larger effect sizes than the implicit subgroup, bolstering the achieved power that is also confirmed by the reported Bayes Factors that support the “effect” or the “no effect” conclusions in the various tests ranging in value from substantial to very strong.  Based on these statistical measures,  we think the sample sizes of the explicit participants in our studies are adequate.

      However, we do agree that the unbalanced nature of the sample and effect sizes can be problematic for the between-group comparisons. We aim to replace the student’s t-tests that directly compares explicit and implicit participants with Welch’s t-tests that are better suited for unequal sample sizes and variances.

      As for the lack of experimental control, indeed, we could not fully randomize consolidation condition assignment. Instead, the assignment was a product of when the study was made available on the online platform Prolific. This method could, in theory, lead to an unobserved covariate, such as morningness, being unbalanced between conditions. We do not have any reasons to believe that such a condition would critically alter the effects reported in our study, but as it follows from the nature of unobserved variables, we obviously cannot state this with certainty. Therefore, we will explicitly discuss these potential pitfalls in the revised version of the manuscript.  

      Reviewer #3 (Public Review):

      In this project, Garber and Fiser examined how the structure of incidentally learned regularities influences subsequent learning of regularities, that either have the same structure or a different one. Over a series of six online experiments, it was found that the structure (spatial arrangement) of the first set of regularities affected the learning of the second set, indicating that it has indeed been abstracted away from the specific items that have been learned. The effect was found to depend on the explicitness of the original learning: Participants who noticed regularities in the stimuli were better at learning subsequent regularities of the same structure than of a different one. On the other hand, participants whose learning was only implicit had an opposite pattern: they were better in learning regularities of a novel structure than of the same one. This opposite effect was reversed and came to match the pattern of the explicit group when an overnight sleep separated the first and second learning phases, suggesting that the abstraction and transfer in the implicit case were aided by memory consolidation.

      These results are interesting and can bridge several open gaps between different areas of study in learning and memory. However, I feel that a few issues in the manuscript need addressing for the results to be completely convincing:

      (1) The reported studies have a wonderful and complex design. The complexity is warranted, as it aims to address several questions at once, and the data is robust enough to support such an endeavor. However, this work would benefit from more statistical rigor. First, the authors base their results on multiple t-tests conducted on different variables in the data. Analysis of a complex design should begin with a large model incorporating all variables of interest. Only then, significant findings would warrant further follow-up investigation into simple effects (e.g., first find an interaction effect between group and novelty, and only then dive into what drives that interaction). Furthermore, regardless of the statistical strategy used, a correction for multiple comparisons is needed here. Otherwise, it is hard to be convinced that none of these effects are spurious. Last, there is considerable variation in sample size between experiments. As the authors have conducted a power analysis, it would be good to report that information per each experiment, so readers know what power to expect in each.

      Answering the questions we were interested in required us to investigate two related but separate types of effects within our data: general above-chance performance in learning, and within- and across-group differences.

      Above-chance performance: As typical in SL studies, we needed to assess whether learning happened at all and which types of items were learned. For this, a comparison to the chance level is crucial and, therefore, one-sample t-test is the statistical test of choice. Note that all our t-tests were subject to experiment-wise correction for multiple comparisons using the Holm-Bonferroni procedure, as reported in the Supplementary Materials.

      Within- and across-group differences: To obtain our results regarding group and partype differences and their interactions, we used mixed ANOVAs and appropriate post-hoc tests as the reviewer suggested. These results are reported in the method section.

      Concerning power analysis, we will add the requested information on achieved power by experiment to the revised version of the manuscript.  

      (2) Some methodological details in this manuscript I found murky, which makes it hard to interpret results. For example, the secondary results section of Exp1 (under Methods) states that phase 2 foils for one structure were made of items of the other structure. This is an important detail, as it may make testing in phase 2 easier, and tie learning of one structure to the other. As a result, the authors infer a "consistency effect", and only 8 test trials are said to be used in all subsequent analyses of all experiments. I found the details, interpretation, and decision in this paragraph to lack sufficient detail, justification, and visibility. I could not find either of these important design and analysis decisions reflected in the main text of the manuscript or in the design figure. I would also expect to see a report of results when using all the data as originally planned.  

      We thank the reviewer for pointing out these critical open questions our manuscript that need further clarification. The inferred “consistency effect” is based on patterns found in the data, which show an increase in negative correlation between test types during the test phase. As this is apparently an effect of the design of the test phase and not an effect of the training phase, which we were interested in, we decided to minimize this effect as far as possible by focusing on the early test trials. For the revised version of the manuscript, we will revamp and expand how this issue was handled and also add a short comment in the main text, mentioning the use of only a subset of test trials and pointing the interested reader to the details.

      Similarly, the matched sample analysis is a great addition, but details are missing. Most importantly, it was not clear to me why the same matching method should be used for all experiments instead of choosing the best matching subgroup (regardless of how it was arrived at), and why the nearest-neighbor method with replacement was chosen, as it is not evident from the numbers in Supplementary Table 1 that it was indeed the best-performing method overall. Such omissions hinder interpreting the work.

      Since our approach provided four different balanced metrics (see Supp. Tables 1-4) for each matching method, it is not completely straightforward to make a principled decision across the methods. In addition, selecting the best method for each experiment separately carries the suspicion of cherry-picking the most suitable results for our purposes. For the revised version, we will expand on our description of the matching and decision process and add additional descriptive plots showing what our data looks like under each matching method for each experiment. These plots highlight that the matching techniques produce qualitatively roughly identical results and picking one of them over the other does not alter the conclusions of the test.  The plots will give the interested reader all the necessary information to assess the extent our design decisions influence our results.

      (3) To me, the most surprising result in this work relates to the performance of implicit participants when phase 2 followed phase 1 almost immediately (Experiment 1 and Supplementary Experiment 1). These participants had a deficit in learning the same structure but a benefit in learning the novel one. The first part is easier to reconcile, as primacy effects have been reported in statistical learning literature, and so new learning in this second phase could be expected to be worse. However, a simultaneous benefit in learning pairs of a new structure ("structural novelty effect") is harder to explain, and I could not find a satisfactory explanation in the manuscript.  

      Although we might not have worded it clearly, we do not claim that our "structural novelty effect" comes from a “benefit” in learning pairs of the novel structure. Rather, we used the term “interference” and lack of this interference. In other words, we believe that one possible explanation is that there is no actual benefit for learning pairs of the novel structure but simply unhindered learning for pairs of the novel structure and simultaneous inference for learning pairs of the same structure. Stronger interference for the same compared to the novel structure items seems as a reasonable interpretation as similarity-based interference is well established in the general (not SL-specific) literature under the label of proactive interference. We will clarify these ideas in the revised manuscript.

      After possible design and statistical confounds (my previous comments) are ruled out, a deeper treatment of this finding would be warranted, both empirically (e.g., do explicit participants collapse across Experiments 1 and Supplementary Experiment 1 show the same effect?) and theoretically (e.g., why would this phenomenon be unique only to implicit learning, and why would it dissipate after a long awake break?).

      Across all experiments, the explicit participants showed the same pattern of results but no significant difference between pair types, probably due to insufficiency of the available  sample sizes. We already included in the main text the collapsed explicit results across Experiments 1-4 and Supplementary Experiment 1 (p. 16).  This analysis confirmed that, indeed, there was a significant generalization for explicit participants across the two learning phases. We could re-run the same analysis for only Experiment 1 and

      Supplementary Experiment 1, but due to the small sample of  N=12 in Suppl. Exp. 1, this test will be likely completely underpowered. Obtaining the sufficient sample size for this one test would require an excessive number (several hundreds) of new participants.  

      In terms of theoretical treatment, we already presented our interpretation of our results in the discussion section, which we can expand on in the revised manuscript.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reply to Reviewers

      We would like to thank all the reviewers for their thorough reading and helpful comments. Below, please find our point-by-point response. The reviewer comments received through ReviewCommons have not been altered except for formatting.

      Reviewer #1 (Evidence, reproducibility and clarity (Required)):

      The authors extended the existing recombination-induced tag exchange (RITE) technology to show that they can image a subset of NPCs, improving signal-to-noise ratios for live cell imaging in yeast, and to track the stability or dynamics of specific nuclear pore proteins across multiple cell divisions. Further, the authors use this technology to show that the nuclear basket proteins Mlp1, Mlp2 and Pml39 are stably associated with "old NPCs" through multiple cell cycles. The authors show that the presence of Mlp1 in these "old NPCs" correlates with exclusion of Mlp1-positive NPCs from the nucleolar territory. A surprising result is that basket-less NPCs can be excluded from the non-nucleolar region, an observation that correlates with the presence of Nup2 on the NPC regardless of maturation state of the NPC. In support of the proposal that retention of NPCs via Mlp1 and Nup2 in non-nucleolar regions, simulation data is presented to suggest that basket-less NPCs diffuse faster in the plane of the nuclear envelope.

      However, there are some points that do need addressing:

      Major Points 1. Taking into account that the Nup2 result in Figure 4B forms the basis for one half of the proposed model in Figure 6 regarding the exclusion of NPCs from the nucleolar region of the NE, there is a relatively small amount of data in support of this finding and this proposed model. For example, the only data for Nup2 in the manuscript is a column chart in Figure 4B with no supporting fluorescence microscopy examples for any Nup2 deletion. Further, the Nup60 deletion mutant will have zero basket-containing NPCs, whereas the Nup2 deletion will be a mixture of basket-containing and basket-less NPCs. The only support for the localization of basket-containing NPCs in the Nup2 deletion mutant is through a reference "Since Mlp1-positive NPCs remain excluded from the nucleolar territory in nup2Δ cells (Galy et al., 2004), the homogenous distribution observed in this mutant must be caused predominantly by the redistribution of Mlp-negative NPCs into the nucleolar territory."

      As suggested by the reviewer, we have added fluorescence microscopy examples for the Nup2 deletion to new Figure 4D. In addition, we have added data on Nup1 as suggested by reviewer 3. Since we observed a significant effect on nucleolar NPC density also upon depletion of Nup1 (new Figure 4A), we have overall revised the text and model to now reflect the shared role of Nup1 and Nup2.

      We have also localized Mlp1-GFP in a nup2Δ background as well as in the Nup60ΔC background where Nup2 can no longer bind to the NPC. In both strains, Mlp1-containing NPCs remain excluded from the nucleolus as now shown in the new Figure 4E. Although we also observed partial Mlp1 mislocalization to a nuclear focus in the nup2Δ strain, such mislocalization was only minimal in the strain with the Nup2-binding domain in Nup60 deleted (nup60ΔC), supporting our conclusion that Nup2 contributes to nucleolar exclusion of NPCs independent of Mlp1. Similarly, Mlp1-positive NPCs remained excluded from the nucleolar territory in cells depleted of Nup1 (new Figure 4B).

      1. The authors could consider utilizing this opportunity to discuss their technological innovations in the context of the prior work of Onischenko et al., 2020. This work is referenced for the statement "RITE can be used to distinguish between old and new NPCs" Page 2, Line 43. However, it is not referenced for the statement "We constructed a RITE-cassette that allows the switch from a GFP-labelled protein to a new protein that is not fluorescently labelled (RITE(GFP-to-dark))" despite Onischenko et al., 2020 having already constructed a RITE-cassette for the GFP-to-dark transition. The authors could consider taking this opportunity to instead focus on their innovative approach to apply this technology to decrease the number of fluorescently-tagged NPCs by dilution across multiple cell divisions and to interpret this finding as a measure of the stability of nuclear pore proteins within the broader NPC.

      We apologize for this imprecise citation. We have modified the text to indicate that our RITE cassette was previously used in two publications. It now reads: "We used a RITE-cassette that allows the switch from a GFP-labelled protein to a new protein that is not fluorescently labelled (RITE(GFP-to-dark)) (Onischenko et al., 2020, Kralt et al., 2022)." Together with additional changes to the text throughout, we hope that our new manuscript version more clearly highlights the innovation of our approach relative to previous use cases.

      1. The authors could also consider taking this opportunity to discuss their results in the context of the Saccharomyces cerevisiae nuclear pore complex structures published e.g. in Kim et al., 2018, Akey et al., 2022, Akey et al., 2023 in which the arrangement of proteins in the nuclear basket is presented, and also work from the Kohler lab (Mészáros et al., 2015) on how the basket proteins are anchored to the NPC. There is additional literature that also might help provide some perspective to the findings in the current manuscript, such as the observation that a lesser amount of Mlp2 to Mlp1 observed is consistent with prior work (e.g. Kim et al., 2018) and that intranuclear Mlp1 foci are also formed after Mlp1 overexpression (Strambio-de-Castillia et al., 1999).

      Following the reviewer's suggestion, we extended our discussion of basket Nup stoichiometry and organization in the discussion section including most of the citations mentioned as well as the recent articles on the nuclear basket structure and organization (Stankunas & Köhler 2024 1038/s41556-024-01484-x, Singh et al. 2024 10.1016/j.cell.2024.07.020)

      Minor Points 1. What is the "lag time" of the doRITE switching? Do the authors believe that it is comparable to the approximate 1-hour timeframe following beta-estradiol induction as shown previously in Chen et al. Nucleic Acids Research, Volume 28, Issue 24, 15 December 2000, Page e108, https://doi.org/10.1093/nar/28.24.e108

      We thank the reviewer for suggesting we analyze the kinetics of RITE switching. We carried out quantitative real-time PCR on genomic DNA and found that the half-time of switching is below 20 min. The majority of the population is switched after 1 hour, similar to the results in Chen et al. This data is now included in Supplemental Figure 1A.

      1. The authors could consider a brief explanation of radial position (um) for the benefit of the reader, in Figures 1E (right panel) and 2B (right panel), perhaps using a diagram to make it easier to understand the X-axis (um).

      To address this, we have now included a diagram and refer to it in the figure legend and the text.

      1. In Figure 1G, would the authors consider changing the vertical axis title and the figure legend wording from "mean number of NPCs per cell" to "mean labeled NPC # per cell" to reflect that what is being characterized are the remaining GFP-bearing NPCs over time?

      Thank you for spotting this inaccuracy. We have changed the label to "mean # of labeled NPCs per cell".

      1. In Figure 2C, the magenta-labeled protein in the micrographs is not described in the figure or the legend.

      A description has been added in figure and legend.

      1. In Figure S2A, there is an arrow indicating a Nup159 focus, but this is not described in the figure legend, as is done in Figure 2C.

      A description has been added to the legend.

      1. In Figure S3C, the figure legend does not match the figure. Was this supposed to be designed like Figure 3C and is missing part of the figure? Or is the legend a typographical error?

      We apologize for this error and thank the reviewer for spotting it. The legend has been corrected (now Figure S4B).

      1. In Figure S4B, the spontaneously recombined RITE (GFP-to-dark) Nup133-V5 appears in the western blot as equally abundant to pre-recombined Nup133-V5-GFP. In the figure legend, this is explained as cells grown in synthetic media without selection to eliminate cells that have lost their resistance marker from the population. In Cheng et al. Nucleic Acids Res. 2000 Dec 15; 28(24): e108, Cre-EBD was not active in the absence of B-estradiol, despite galactose-induced Cre-EBD overexpression. Would the authors be able to comment further on the Cre-Lox RITE system in the manuscript?

      We note that also in the cited publication, cells are grown in the presence of selection to select (as stated in this publication) "against pre-excision events that occur because of low but measurable basal expression of the recombinase". Although the authors report that spontaneous recombination is reduced with the b-estradiol inducible system (compared to pGAL expression control of the recombinase only), they show negligible spontaneous recombination only within a two-hour time window. Indeed, we also observe low levels of uninduced recombination on a short timeframe, but occasional events can become significant in longer incubation times (e.g. overnight growth) in the absence of selection. It should be noted that in our system, Cre expression is continuously high (TDH3-promoter) and not controlled by an inducible GAL promoter. We have added the information about the promoter controlling Cre-expression in the methods section.

      1. In Figure 6, the authors may want to consider inverting the flow of the cartoon model to start from the wild type condition and apply the deletion mutations at each step to "arrive" at the mutant conditions, rather than starting with mutant conditions and "adding back" proteins.

      Following the suggestions of this reviewer as well as reviewer 3, we have modified our model to smore clearly represent the contributions of the different basket components.

      Reviewer #1 (Significance (Required)):

      Recent work has drawn attention to the fact that not all NPCs are structurally or functionally the same, even within a single cell. In this light, the work here from Zsok et al. is an important demonstration of the kind of methodologies that can shed light on the stability and functions of different subpopulations of NPCs. Altogether, these data are used to support an interesting and topical model for Nup2 and nuclear-basket driven retention of NPCs in non-nucleolar regions of the nuclear envelope.

      We thank the reviewer for this positive assessment of our work.

      Reviewer #2 (Evidence, reproducibility and clarity (Required)):

      In this study, Zsok et al. develop innovative methods to examine the dynamics of individual nuclear pore complexes (NPCs) at the nuclear envelope of budding yeast. The underlying premise is that with the emergence of biochemically distinct NPCs that co-exist in the same cell, there is a need to develop tools to functionally isolate and study them. For example, there is a pool of NPCs that lack the nuclear basket over the nucleolus. Although the nature of this exclusion has been investigated in the past, the authors take advantage of a modification of recombination induced tag exchange (RITE), the slow turnover of scaffold nups, the closed mitosis of budding yeast, and extensive high quality time lapse microscopy to ultimately monitor the dynamics of individual NPCs over the nucleolus. By leveraging genetic knockout approaches and auxin-induced degradation with sophisticated quantitative and rigorous analyses, the authors conclude that there may be two mechanisms dependent on nuclear basket proteins that impact nucleolar exclusion. They also incorporate some computational simulations to help support their conclusions. Overall, the data are of the highest quality and are rigorously quantified, the manuscript is well written, accessible, and scholarly - the conclusions are thus on solid footing.

      We thank the reviewer for this assessment.

      Reviewer #2 (Significance (Required)):

      I have no concerns about the data or the conclusions in this manuscript. However, the significance is not overly clear as there is no major conceptual advance put forward, nor is there any new function suggested for the NPCs over nucleoli. As NPCs are immobile in metazoans, the significance may also be limited to a specialized audience.

      We respectfully disagree with this assessment. First, our work demonstrates the use of a novel approach in the application of RITE that can be useful for other researchers in the field of NPC biology and beyond. For example, doRITE could be applied to study the properties of aged NPCs, an area of considerable interest due to links between the NPC and age-related neurodegenerative diseases.

      Second, we characterize the interaction between conserved nuclear components, the NPC, the nucleolus and chromatin. While the specific architecture of the nucleus varies between species, many of these interactions are conserved. For example, Nup2's homologue Nup50 also interacts with chromatin in other systems, including mammalian cells, and thus may contribute to regulating the interplay between the nuclear basket and adjoining chromatin. This adds to our understanding of the multiple pathways and interactions that contribute to nuclear organization. Therefore, although the depletion of NPCs from the nucleolar territory in budding yeast may not be of direct importance, understanding the relationships between NPCs and their environment provide insight about nuclear organization throughout different eukaryotic lineages.

      In the revised manuscript, we attempt to better highlight and discuss these aspects.

      Reviewer #3 (Evidence, reproducibility and clarity (Required)):

      The manuscript of Zsok et al. describes the role of nuclear basket proteins in the distribution and mobility of nuclear pore complexes in budding yeast. In particular, the authors showed that the doRITE approach can be used for the analysis of stable and dynamically associated NUPs. Moreover, it can distinguish individual NUPs and follow the inheritance of individual NPCs from mother to daughter cells. The author's findings highlight that Mlp1, Mlp2, and Pml39 are stably associated with the nuclear pore; deletion of Mlp1-Mlp2 and Nup60 leads to the higher NPC density in the nucleolar territory; and NPCs exhibit increased mobility in the absence of the nuclear basket components.

      The manuscript contains most figures supporting the data, and data supports the conclusions. However, authors need to include better explanations for figures in the text and figure legends. Lack of detailed explanation can pose challenges for non-experts. In addition, the authors jump over figures and shuffle them through the manuscript, which disrupts the flow and coherence of the manuscript.

      We thank the reviewer for pointing this out. In response to the detailed comments given below, we have moved some figures and added more explicit explanations to the text to improve the flow and make it easier to follow. In addition, we have modified the figure legends throughout the manuscript to make them more accessible to the reader.

      Major comments: - The nuclear basket contains Nup1, Nup2, Nup60, Mlp1, and Mlp2 in yeast. Nup60 works as a seed for Mlp1/Mlp2 and Nup2 recruitment and plays a key role in the assembly of nuclear pore basket scaffold (PMID: 35148185). Logically, the authors focused primarily on Nup60 in the current manuscript. However, NUP153 has another ortholog of yeast - Nup1, which has not been studied in this work. I recommend adjusting the title of the manuscript to: Nup60 and Mlp1/Mlp2 regulate the distribution and mobility of nuclear pore complexes in budding yeast. I also suggest discussing why work on Nup1 was not included/performed in the manuscript.

      We thank the reviewer for suggesting we should test the role of Nup1. Although we had originally not considered it, since we were focusing on the interactors of Mlp1/2, we found that indeed Nup1 also contributes to nucleolar exclusion. We have therefore changed the title to "Nuclear basket proteins regulate the distribution and mobility of nuclear pore complexes in budding yeast".

      • Figure 2B: I suggest choosing a more representative image for Pml39. It looks not like a stable component but rather dynamic as NUP60 or Gle1 based on figure showed in Figure 2B.

      We thank the reviewer for pointing out this poor choice of panel. We selected a panel for the 14h timepoint that more clearly shows that individual foci can still be seen for Pml39 after this time. Due to its lower copy number, the foci are dimmer for Pml39 than the other stable Nups. Nevertheless, at both the 11 and 14 h timepoint, clear dots can be detected for Pml39, while e.g. Nup116 in the same figure exhibits a more distributed signal and the signal for Nup60 and Gle1 is no longer visible.

      • Depletion of AID-tagged proteins needs to be supported by Western blot analysis with protein-specific antibodies, and PCR results should be included in supplementary data to demonstrate the homozygosity of the strains.

      The correct genomic tagging of the depleted proteins by AID was confirmed by PCR. We include this PCR analysis for the reviewer below. Since we are working with haploid yeast cells, all strains only carry a single copy of the genes. Unfortunately, we do not have protein-specific antibodies against the depleted proteins. However, other phenotypes support the successful depletion of the protein: Mlp1-mislocalization upon Nup60 depletion, reduced transcript production in Pol II depletion (characterized previously: PMID: 31753862, PMID: 36220102), growth defect upon Nup1 depletion.

      • Figure 5B: Snapshots of images from the movie are required. There are no images, only quantifications.

      We have replaced the supplemental movie with a movie showing the detection by Trackmate as well as overlaid tracks. As requested, a snapshot of this movie was inserted in figure 5B. We have also moved the example tracks from the supplement to the main figure. Furthermore, we will deposit the tracking dataset in the ETH Research Collection to make it available to the community.

      Description of figure legends is more technical than supporting/explaining the figure. For example, below my suggestions for Figure 1D. Please, consider more detailed explanation for other figures. (D) Left: Schematic of the RITE cassette. NUP of interest is tagged with V5 tag and eGFP fluorescent protein where LoxP sites flank eGFP. Before the beta-estradiol-induced recombination, the old NPCs are marked with eGFP signal, whereas new NPCs lack an eGFP signal after the recombination. ORF: open reading frame; V5: V5-tag; loxP: loxP recombination site; eGFP: enhanced green fluorescent protein. Right: doRITE assay schematic of stable or dynamic Nup behavior over cell divisions in yeast after the recombination.

      We have modified the figure legends throughout the manuscript to make them more explanatory and helpful for the reader.

      In addition, I recommend highlighting the result in the title of the figures. Please, re-consider titles for Figure S3.

      We have split this figure to better group related results. The new figures S4 and S5 are entitled: " A RITE(dark-to-GFP) cassette to visualize newly assembled NPC. " and "Mlp1 truncations localize predominantly to non-nucleolar NPCs."

      Minor: P.1 Line 31. Extra period symbol before the "(Figure 1A)".

      Fixed

      P.2 Line 10. Inconsistent writing of PML39 and MLP1. Both genes are capitalized. The same for P.4 Line 16. In some cases all letters are capitalized in other only the first one.

      We are following the official yeast gene nomenclature by spelling gene names in italicized capitals and protein names with only the first letter capitalized. We are sorry that this can be confusing for readers more familiar with other model systems.

      P.2 Line 18-22. The sentence is too long and hard to read. I recommend splitting it into two sentences.

      We agree and have fixed this.

      P.2-3 Line 46-47. The sentence is unclear. Suggestion: We expected that successive cell divisions would dilute the signal of labelled and stably associated with the NPC nucleoporins. By contrast, ...

      We have modified the sentence to read: "When tagging a Nup that stably associates with the NPC, we expected that successive cell divisions would dilute labelled NPCs by inheritance to both mother and daughter cells leading to a low density of labelled NPCs. By contrast,..."

      P.4 Line 17-21. Please, consider adding extra information and clarifying lines 19-21. For example, in Line 19 Figure 2B you can add that the reader needs to compare row 1 and row 4.

      Thank you, we have fixed this as suggested.

      P. 5 Line 15. When a number begins a sentence, that number should always be spelled out. You can pe-phrase the sentence to avoid it. Also, I recommend adding an explanation/hypothesis of why new NPCs are less frequently detected in nucleolar territory.

      We have formatted the text. Interestingly, new NPCs are more frequently detected in the nucleolar territory than old NPCs. We have reformulated this section to make it clearer, also in response to the next comment.

      P.5 Line 17-22. I recommend re-phrasing these two sentences. Logically, it is clear that Mlp1/Mlp2 loss mimics "old NPCs" to look more like "new NPCs", and for that reason, they are more frequently included in the nucleolar territory, but it is not clear when you read these two sentences from the first time.

      We have reformulated this section to make it clearer.

      P6. Line 16. No figure supporting data on graph (Figure 3B).

      We have added fluorescent images of the nup2Δ strain to the figure (new Figure 4D).

      P.7 Line 10-13. The sentence is unclear.

      We have shortened the sentence and moved part of the content to the discussion in the next paragraph.

      P.13,14 etc. If 0h timepoint has been used for normalization, why is it present on the graph?

      The 0h timepoint is shown for comparison and to illustrate the standard deviation in the data.

      P.15. Line 32-33. There is no image here. Potentially wrong description of the figure.

      Thank you for spotting this. This was fixed (new Figure S4B).

      Figures: - Inconsistent labeling of figures. For example, Fig.1, Fig.1S, Figure 2 etc.

      Thank you, this has been corrected.

      • Inconsistent labeling of figures. For example, Fig.1 G "mean number of NPCs per cell" - no capitalization of the first letter. Fig.1S "Fraction in population" is capitalized. In general, titles of axis should be capitalized.

      Thank you for spotting this. This was fixed.

      Suggestions for Figure 1D and Figure 6 are attached as a separate file.

      We thank the reviewer for their suggestions to improve these figures. We have taken their recommendation and revised the figures accordingly (see also response to reviewer 1, minor point 8).

      Reviewer #3 (Significance (Required)):

      Zsok et al. used the recombination-induced tag exchange (RITE) approach, which is an interesting and powerful method to follow individual NUPs over time with respect to their localization and abundance. This approach has been used before in PMID: 36515990 to distinguish pre-existing and newly synthesized Nup2 populations and has been extended to other basket NUPs in this work. Using this method, the authors support the earlier data on basket nucleoporins and highlight new insights on how basket nucleoporins regulate NPCs distribution and mobility. Overall, the manuscript provides new details on the stability of nucleoporins in yeast and how these data align with the mass spectrometry and FRAP data performed earlier in other studies. The limitation of this study is the absence of data on Nup1. It was unclear why these data were not present. Additional data can be included on the dynamics of Pml39, for example, using the FRAP method. The dynamic of Pml39 at the pore was shown only using the doRITE method.

      As suggested, we have tested the role of Nup1 (see above).

      Unfortunately, we are not able to provide orthologous data for the dynamics of Pml39. As we discuss in the manuscript, FRAP is not suitable for the analysis of the dynamics of most nucleoporins in yeast due to the high lateral mobility of NPCs in the nuclear envelope and has previously generated misleading results for Mlp1. Furthermore, the low expression levels of Pml39 will make it difficult to obtain reliable FRAP curves for this protein. We therefore do not think that adding FRAP experiments with Pml39 will provide valuable insight.

      However, in addition to the Pml39 doRITE result itself, our observation that the Pml39-dependent pool of Mlp1 exhibits stable association with the NPC supports the interpretation of Pml39 as a stable protein as well.

      In general, this study represents a unique research study of basic research on nuclear pore proteins that will be of general interest to the nuclear transport field.

      Field of expertise: nuclear-cytoplasmic transport, nuclear pore, inducible protein degradation. I do not have sufficient expertise in ExTrack.

    1. Poorly supported claims may be true, but without good reasons to accept those claims, a person’s support of them is irrational. In philosophy, we want to understand and evaluate the reasons for a claim. Just as a house that is built without a solid foundation will rapidly deteriorate and eventually fall, the philosopher who accepts claims without good reasons is likely to hold a system of beliefs that will crumble.

      I think this is vital information as it can be applied to everyday lfie too. Without evidence, claims and reasonings are very poor and therefore lacks external validity. Evidence aids reliability and trusting of the intial source.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1

      Evidence, reproducibility, and clarity

      Singh et al. analyze the expression and putative contribution of TEs in CD4+ T cells in HIV elite controllers. Through re-analysis of existing datasets, the authors describe broad differences in expression of TEs in ECs through analysis of RNA-seq and ATAC-seq data and come up with convincing examples where differentially-expressed innate immune genes correlate with increased accessibility of proximal TEs. Overall, the authors' conclusions are appropriately measured, though the manuscript text should be re-organized for clarity and a few further analyses are needed to support the main message of the paper.

      Major comments

      The manuscript would benefit from a re-organization of the figures to focus on TEs - in particular, Fig 1B, Fig 2, and Fig 3 reproduce known transcriptional differences between ECs and HCs and serve as quality controls for the authors' computational analysis. Conversely, Supplementary Fig 6 contains very interesting data on KZNF expression and should be included in the main figures.

      Authors: Thank you for the suggestion. We agree that Figure S6 should be featured more prominently in the manuscript. Accordingly, we have now incorporated it into the main text as Figure 6. The TE-KZNF correlation plots, previously Figure 5C, have been relocated to this new figure to provide a cohesive presentation of all KZNF-related data within the same figure.

      We’ve chosen to keep Figures 1B, 2, and 3 in their original places. We contend that they provide a foundational view of transcriptional variances in gene expression between patient groups, encompassing both previously identified and novel DEGs, which we believe warrants their placement in the main text. Furthermore, they serve as robust quality control measures for subsequent TE-centric transcriptional analyses. Given that there is no limitation in the number of figures in Genome Biology articles, we think it’s adequate to retain them as main figures.

      It remains unclear whether differences in TE expression described are specific to ECs or to EC-like CD4+ T cell states. As there are plenty of datasets available that compare the transcriptome of naïve, activated, exhausted, and regulatory CD4+ T cells, the authors should compare the TE expression patterns observed in ECs to activated CD4+ T cells, particularly those with a Th1 and cytotoxic phenotype analogous to those observed in ECs, from healthy donors.

      Authors: We thank the reviewer for this constructive suggestion to further study the foundations of HIV-1 elite control. In our initial study, we demonstrate that PBMCs from elite controllers (ECs) exhibit a heightened proportion of activated CD4+ T cells compared to PBMCs of healthy controls (HCs) and a heightened proportion of macrophages, naïve CD4+ T cells, and NK cells compared to PBMCs of treatment-naïve viremic progressors (VPs) (Figure 2D). Additionally, through clustering analysis of deconvoluted CD4+ T cell samples from elite controllers, we ascertain that the clustering pattern is not predicated on the CD4+ T cell subtype (Figure 3B). To further explore the reviewer’s inquiry, we compared the TE expression profile of ECs with that of unstimulated and stimulated CD4+ T cell subsets from HCs (data source: PMID 31570894), integrated into the revised manuscript as Figure S3B.

      “Unsupervised clustering of these samples shows that the TE expression pattern of ECs is most similar to that of Th2 progenitor cells, which are associated with HIV-1-specific adaptive immune responses (61). Still, we observed that, for the majority of families, TE expression was higher on average in all EC CD4+ T cell subsets than in CD4+ T cell subsets from HCs, regardless of stimulation (Figure S3B). While a subset of TE families exhibited an expression pattern in ECs similar to that of activated CD4+ T cells of HCs (e.g., high expression of L1s and THE1B), multiple TE families appear to be upregulated in an EC-specific way (e.g., LTR12C and LTR7). Together, these findings underscore the unique immune cell composition, transcriptome, and retrotranscriptome of ECs.” [pg.13-14, L226-235]

      While these observations are interesting, pursuing this question further falls beyond the scope of our study, as we note in the Discussion of the revised manuscript. We believe the reviewer’s inquiry pertains to a distinct research question, namely whether the potential for elite control of HIV-1 infection manifests as a detectable phenotype pre-infection within healthy CD4 T cell subsets (i.e., EC-like CD4+ T cell states) or is a unique phenotype that emerges solely after HIV-1 infection.

      “Another outstanding question is whether the gene and TE signatures revealed by our analysis of ECs exist in the general population independent of HIV-1 infection or if they are driven by the initial infection. While this inquiry is beyond the scope of this study, we have presented here evidence of common TE signatures between EC CD4+ T cells and Th2 progenitors from HCs (Figure S3B) and established that ECs possess a unique CD4+ T cell retrotranscriptome with potential implications for natural HIV-1 control. Future studies designed to assess elite control prediction should explore whether these TE profiles can serve as predictive variables for whether an individual displays enhanced viral control.” [pg. 38, L663-671]

      Therefore, while we appreciate the reviewer's suggestion and offer the addition of these preliminary findings, we believe that further investigation would be better suited for future studies specifically designed to address that question. Our manuscript aims to provide insight into the retrotranscriptome dynamics in ECs and their potential implications for natural HIV-1 control.

      In Fig 1, the authors demonstrate differential expression of both innate immune genes and TEs, but the link between the two is unclear. Is there any enrichment in differential expression for TEs located proximal to innate immune genes? This type of analysis should be possible using the authors' own software to map TE expression to specific genomic loci.

      __Authors: __Thank you for this excellent question. To answer this inquiry, we used the paired ATAC-seq and RNA-seq datasets for from ECs and HCs (used in Figures 1 and 4) to produce a new list of TE-gene pairs on which we could perform gene set enrichment analysis, the results of which have been integrated into the revised manuscript as Figure 4A.

      “We used paired ATAC-seq – which measures chromatin accessibility – and RNA-seq datasets for ECs (n=4) and HCs (n=4) to create a list of TE-gene pairs where the TE locus and gene show increased accessibility and expression, respectively, in ECs compared to HCs (Table S7, see Methods for details). These loci and genes were paired based on proximity, with a maximum distance of 10kb between the TE locus and the gene’s transcription start site, to increase the likelihood of a direct cis-regulatory influence of the TE over the nearby gene. Subsequent gene set enrichment analysis revealed that these genes were predominantly involved in cellular activation, cytokine production, and immune response regulation (Figure 4A). The enrichment for differential accessibility of TE loci near genes involved in these pathways suggests that the distinct TE landscape observed in ECs may contribute significantly to a unique immune regulome in these individuals.” [pg. 21, L357-368]

      Thus, we conclude that yes, there is an enrichment for immune-related genes with higher expression in ECs, proximal to differentially accessible TEs. We highlight six of these TE-gene pairs in Figure 4B-C. While we have high confidence in our analyses, future experimental validation is needed to confirm these regulatory relationships.

      Optional: In Fig 3, the authors cluster CD4+ T cells based on transcriptomic profiles. It would be interesting to re-cluster these samples based on TE expression alone, given the differences in TE expression described in Fig 5.

      __Authors: __Thank you for the suggestion. We agree that it would be valuable to assess how the EC clustering is altered when considering TE expression alone, as opposed to combining gene and TE family expression. To address this, we used the same graph-based k-nearest neighbors method to re-cluster the EC CD4+ T cell RNA-seq samples based only on locus-level TE expression, integrated into the revised manuscript as Figure S7.

      “To further explore locus-level expression patterns, we re-clustered the same EC samples (n=128) using only locus-level TE expression. This again resolved four EC clusters (Figure S7A), which interestingly appeared even more distinct than those identified by gene and TE family expression (Figure 3A). The TE locus-based clusters (TL-Cs) aligned well with the gene and TE family clusters (GT-Cs), with an average 70% overlap in samples between each GT-C and its corresponding TL-C (Figure S7B), indicating high consistency (Table S8). The remaining 30% of samples that shifted between clusters did so consistently within individuals, not cohorts, maintaining heterogeneous TL-C compositions similar to the GT-Cs (Figures S7C & S5A). An exception to this heterogeneity was TL-C4, comprising 22 samples from GT-C1 that were almost entirely from the CD4+ T cell subsets of only four participants in the Jiang cohort (Figure S7C, Table S8). No other samples from the Jiang cohort shifted to this cluster from other GT-Cs, suggesting that these patterns reflect individual variation rather than cohort bias. Like the GT-Cs, each TL-C included samples from all five CD4+ T cell subsets and was largely heterogeneous (Figure S7C). Notably, TL-C2 mirrored corresponding GT-C3 in its overrepresentation of EM and TM cells, while TL-C1 uniquely showed an overrepresentation of naïve CD4+ T cells. Beyond sample composition, each TL-C was characterized by a unique pattern of expressed TE loci (Figure S7D). These signatures were heterogeneous across families, with subsets of variable loci from one TE family marking separate clusters (Figure S7E), some of which did not reach the threshold of significance in earlier analyses when analyzed at the family-level, like SVA-D. Many families maintained their cluster-specific signatures, like THE1B (a marker of GT-C2), for which the majority of variable loci were found in corresponding TL-C1. However, some TE families, like the L1s that marked GT-C1, showed more heterogeneous signatures with variable loci marking multiple TL-Cs. These findings underscore the need for future locus-level investigations with high-depth sequencing to fully capture the complexity of TE expression.” [pg. 27-28, L462-488]

      We believe these findings not only validate the distinct clustering patterns observed but also highlight the potential of locus-level TE analysis to reveal additional layers of retrotranscriptomic diversity in EC CD4+ T cells.

      Significance

      The manuscript by Singh et al. describes for the first time the role of TEs in HIV elite controllers, suggesting that TEs may be co-opted for cis-regulatory function. This study builds off prior work demonstrating that HIV-infected CD4+ T cells activate LTR elements that may regulate the expression of interferon-inducible genes, demonstrating that ECs show further upregulation of innate immune genes. While these findings will need to be experimentally validated, this study constitutes a useful resource and adds to the growing body of evidence implicating TEs in cis-regulatory control of immune genes. This study will be of interest to basic scientists interested in genetic mechanisms of HIV control, and if further developed may comprise a useful source of biomarkers to predict viral kinetics in HIV-infected individuals. My expertise is in immunology, TE biology, and viral infection.

      Authors: We greatly appreciate this positive evaluation of our manuscript and recognition of its significance in uncovering novel evidence of TE co-option for immune regulatory function in HIV-1 elite control, as well as the suggestion of promising avenues for future research in this field.

      Reviewer #2

      Evidence, reproducibility and clarity

      The authors have re-analyzed published RNA-Seq data from CD4 T cells isolated from HIV elite controllers and reference cohorts, including HIV negative persons, viremic progressors and ART-treated persons. Their main finding is that in some of their comparisons, EC have higher levels of interferon-stimulated genes (ISG), paired with distinct expression patterns of transposable elements. The authors suggest that expression of transposable elements may induce altered expression of ISG, presumably due to immune recognition of TE. They also suggest that reduced expression of KZNF genes, which encode for transcription factors that can suppress TE, may be responsible for enhanced expression of TE. I have the following comments:

      1. All data included in this manuscript derive from previously published data. A new dataset, specifically designed to focus on a high-resolution analysis of TE expression, would be better suited to address the proposed questions.

      Authors: We agree that a new dataset tailored specifically for high-resolution analysis of TE expression would be optimal for addressing the proposed inquiries, and we emphasize this point in the Discussion of the revised manuscript.

      “We found that distinct sets of innate immunity genes and restriction factors are upregulated in different EC clusters even in the absence of active viremia, suggesting that elevated basal expression of these factors plays a previously underappreciated role in the EC phenotype. Further studies will be necessary to cement this idea and would especially benefit from the integration of single-cell omics to dissect TE regulation and clustering in deconvoluted CD4+ T cells of ECs. We also acknowledge that our study is limited by the small number of EC individuals with available omics data, which likely limited our ability to identify significant relationships between transcriptome clustering and available participant metadata (Figure S5). While the rarity of ECs in the seropositive population makes it challenging to study this phenotype, the transcriptomic heterogeneity revealed by our analyses underscores the need for surveying larger and more diverse EC cohorts.” [pg. 37-38, L651-662]

      Regrettably, we do not have access to elite controller samples (which are exceedingly rare), and as such the addition of a novel dataset was not feasible within the scope of this revision. Nevertheless, we assert that the publicly available sequencing data analyzed here is robust and suitable for locus- and family-level TE analysis. All sequencing runs were paired-end and of high depth, ensuring proper alignment to and high coverage of TEs at a locus-specific resolution. Additionally, we use in-house pipelines curated for TE analysis, to optimize the accuracy and quantity of TE-assigned reads (see Methods and our GitHub Repository for more details).

      Authors: We agree that a new dataset tailored specifically for high-resolution analysis of TE expression would be optimal for addressing the proposed inquiries, and we emphasize this point in the Discussion of the revised manuscript.

      “We found that distinct sets of innate immunity genes and restriction factors are upregulated in different EC clusters even in the absence of active viremia, suggesting that elevated basal expression of these factors plays a previously underappreciated role in the EC phenotype. Further studies will be necessary to cement this idea and would especially benefit from the integration of single-cell omics to dissect TE regulation and clustering in deconvoluted CD4+ T cells of ECs. We also acknowledge that our study is limited by the small number of EC individuals with available omics data, which likely limited our ability to identify significant relationships between transcriptome clustering and available participant metadata (Figure S5). While the rarity of ECs in the seropositive population makes it challenging to study this phenotype, the transcriptomic heterogeneity revealed by our analyses underscores the need for surveying larger and more diverse EC cohorts.” [pg. 37-38, L651-662]

      Regrettably, we do not have access to elite controller samples (which are exceedingly rare), and as such the addition of a novel dataset was not feasible within the scope of this revision. Nevertheless, we assert that the publicly available sequencing data analyzed here is robust and suitable for locus- and family-level TE analysis. All sequencing runs were paired-end and of high depth, ensuring proper alignment to and high coverage of TEs at a locus-specific resolution. Additionally, we use in-house pipelines curated for TE analysis, to optimize the accuracy and quantity of TE-assigned reads (see Methods and our GitHub Repository for more details).

      1. As the authors acknowledge, the described investigations are exploratory, and do not allow to draw firm conclusions. Mechanistic experiments are recommended to address the authors' hypotheses.

      Authors: We agree and have duly acknowledged throughout the Discussion the exploratory nature of our investigations and the need for future mechanistic experiments to validate our model. Below are passages from the revised manuscript which we’ve added to emphasize these points.

      “These findings underscore the need for future locus-level investigations with high-depth sequencing to fully capture the complexity of TE expression.” [pg. 28, L486-488]

      “Each step in the model will require experimental work to be validated. First and foremost, it will be important to confirm that the TEs exhibiting increased transcript levels and accessibility in ECs are indeed boosting the innate immune response and control of HIV-1 in these individuals.” [pg. 34, L583-586]

      “CRISPR-Cas9 editing was used in cell lines to demonstrate that a subset of MER41 elements function as enhancers driving the interferon-inducibility of several innate immune genes. However, the specific MER41 loci we identified here as differentially active in ECs have not been tested experimentally for enhancer activity. Thus, further work is warranted to confirm the regulatory function of these loci under the control of STAT1 or other immune TFs, as well as other TE families identified as targets of immune-related TFs (Figure S8).” [pg. 35, L594-600]

      “Overall, our results reinforce the concept that TEs are important players in the human antiviral response (25,93) and uncover specific candidate elements for boosting cellular defenses against HIV-1 in ECs. We acknowledge that these associations are drawn from correlative patterns and manipulative experiments are needed to infer causality between chromatin changes at these TEs and increased expression of nearby immunity genes.” [pg. 36, L618-623]

      “Further work is needed to validate TE-KZNF regulatory interactions in T cells, probe their connection to epigenetic variation at individual TE loci, and explore their repercussions on gene expression variation in CD4+ T cells, with and without HIV-1 infection.” [pg. 40, L715-718]

      Thus, while we appreciate and agree with the suggestion of experimental validation, we contend that these experiments fall beyond the scope of the present study, which is a computational investigation providing insight into the EC retrotranscriptome and its potential implications for natural HIV-1 control.

      1. An important limitation is that virological data of EC are not considered. For example, I believe it is a lot more likely that the upregulation of ISG in EC relates to ongoing low-level viral replication. The authors could analyze cell-associated HIV RNA and DNA levels and determine how they associate with ISG expression.

      Authors: Thank you for bringing up this important consideration. It's worth noting that the public datasets used in our study reported undetectable viremia in the EC volunteers (PMIDs 30964004, 29269040, 32848246, 27453467). Nonetheless, we sought to address this limitation and explore the potential association between ISG expression and viremia as recommended by the reviewer. These analyses were integrated into the revised manuscript as Figure S6.

      “To exclude the possibility that these gene expression signatures in ECs are associated with viremia, we quantified HIV-1 transcript levels in deconvoluted CD4+ T cell RNA-seq samples from ECs and ART-treated PLWH for comparison. In the original studies, all samples were reported to have undetected viremia by blood tests (9,37-39). Consistent with this, we found that the vast majority of the EC and ART samples taken from PBMCs exhibited very low HIV-1 transcript levels, with TPM values generally below 1. However, in samples originating from the lymph nodes of EC individuals (n = 22) (37), we detected HIV-1 expression in some subsets (Figure S6A&B). In agreement with the corresponding study (37), we found elevated HIV-1 transcript levels in germinal center and non-germinal center T follicular helper cells (GC Tfh & nGC Tfh, not included in our clustering analyses) -- and to a lesser extent in T effector memory (EM) cells (Figure S6A, average TPM This added analysis confirms that the increased expression of ISGs in ECs is not correlated with virological transcription and is therefore likely not to be driven by viremia.

      1. KZNF genes seem downregulated in EC. Can the authors propose a reason/mechanism for that?

      Authors: There is the possibility that KZNF regulatory loops are the cause of their transcriptional downregulation, which has been documented in embryogenesis (PMID 31006620) and cancer (PMID 33087347). We’ve incorporated this hypothesis into the Discussion as an additional consideration for the reader.

      “These observations suggest that interindividual variation in KZNF expression in CD4+ T cells could explain why certain TEs are variably expressed and accessible across ECs. But what are the mechanisms underlying variation in ZNF expression? It is possible that TE-KZNF regulatory loops are involved, in which a copy of the TE family targeted by a KZNF is inserted near and regulates the KZNF gene, thereby introducing a negative feedback loop. This phenomenon has been documented in prior studies of KZNF activity in embryogenesis (51) and cancer (115).” [pg. 39-40, L705-711]

      While we believe this is a viable hypothesis, it requires further experimentation to confirm the existence of this phenomenon and its impacts in the context of immune cells.

      Significance

      Overall, I think this is an interesting manuscript that proposes distinct and potentially important mechanisms that may contribute to immune control of HIV. My suggestions to improve the manuscript are complex and cannot be easily addressed through experimental work. I believe a possible option would be to publish the present manuscript without my proposed modifications but highlight the weaknesses of the current paper more clearly; mechanistic studies could then be deferred to a future study.

      Authors: We appreciate the reviewer's positive assessment of our manuscript and their recognition of its significance in elucidating novel TE-derived mechanisms that may contribute to natural HIV-1 control. We agree that mechanistic studies are required to test our predictions. As the reviewer suggests, these would be complex experiments that we feel fall beyond the scope of this study. With the additions detailed above in response to the reviewer’s point #2, we believe that we have clearly highlighted the limitations of our work and emphasized the need for future experimentation to validate our findings.

      Reviewer #3

      Evidence, reproducibility, and clarity

      Summary: This manuscript presents an analysis of published gene expression (RNA-seq and ATAC-seq) data from a couple of cohorts of HIV-infected elite controllers (EC), as compared to uninfected controls, (HC), virological progressors (VP). The authors report that HIV elite controllers may exhibit 4 distinct patterns of TE (and gene) expression and suggest that TE expression may drive some form of antiviral gene expression. Further, they show that heterogeneous TE expression may be determined by differential KZNF gene activity among the different clusters of elite controllers. These results are very interesting, even though the conclusions are very preliminary. It presents intriguing correlations between expression of certain TE groups of LINES and HERVs, and the clustering into 4 gene expression groups in EC and is a novel finding. That said, correlation is not causation, and the authors need to be more cautious in presenting their highly preliminary model in Figure 6.

      Authors: We are grateful for the reviewer's insightful assessment of our manuscript, acknowledging the novelty and interest of our findings regarding TE expression patterns in HIV-1 elite controllers. We also appreciate their constructive feedback regarding the cautious interpretation of preliminary conclusions. In the revised manuscript, we have underscored the exploratory nature of our investigations and the need for future mechanistic experiments to validate our model.

      “These findings underscore the need for future locus-level investigations with high-depth sequencing to fully capture the complexity of TE expression.” [pg. 28, L486-488]

      “Each step in the model will require experimental work to be validated. First and foremost, it will be important to confirm that the TEs exhibiting increased transcript levels and accessibility in ECs are indeed boosting the innate immune response and control of HIV-1 in these individuals.” [pg. 34, L583-586]

      “CRISPR-Cas9 editing was used in cell lines to demonstrate that a subset of MER41 elements function as enhancers driving the interferon-inducibility of several innate immune genes. However, the specific MER41 loci we identified here as differentially active in ECs have not been tested experimentally for enhancer activity. Thus, further work is warranted to confirm the regulatory function of these loci under the control of STAT1 or other immune TFs, as well as other TE families identified as targets of immune-related TFs (Figure S8).” [pg. 35, L594-600]

      “Overall, our results reinforce the concept that TEs are important players in the human antiviral response (25,93) and uncover specific candidate elements for boosting cellular defenses against HIV-1 in ECs. We acknowledge that these associations are drawn from correlative patterns and manipulative experiments are needed to infer causality between chromatin changes at these TEs and increased expression of nearby immunity genes.” [pg. 36, L618-623]

      “Further work is needed to validate TE-KZNF regulatory interactions in T cells, probe their connection to epigenetic variation at individual TE loci, and explore their repercussions on gene expression variation in CD4+ T cells, with and without HIV-1 infection.” [pg. 40, L715-718]

      We hope these passages provide sufficient caution and clarity in the presentation of our scientific inquiry.

      Major comments:

      Overall, although preliminary, as the authors note, the results are interesting and worthy of follow-up. At this point, however, a number of issues arise that need further clarification and analysis before I would consider this study complete.

      First, the analyses shown in Figures 3-5 based on data from studies on EC of CD4 cells are apparently motivated by the differential TE expression in total PBMCs shown in Fig 1 and 2. Yet, the TE groups (please don't use taxonomic terms like "subfamily") identified in Fig 2 and Fig 4 are completely different, with no overlap. This discrepancy underscores the possibility that the differential expression observed is, at least in part, due to the differences among the groups or clusters in cell type composition, as seen in Fig 2D and 3B which, themselves, could be a consequence of HIV infection and elite control (which has been shown to involve ongoing, albeit low-level, virus replication). This issue must be addressed.

      Authors: Thank you for the suggestion. First, we’d like to clarify that the data used in Figures 1 and 2 were not both derived from PBMCs. Figures 1 and S1 examine the differential expression of TEs in EC CD4+ T cells compared to HCs and ART-treated PLWH, respectively. Figure 2 examines differential expression of TEs in EC PBMCs compared to treatment-naïve VPs. Second, regarding Figure 4B-C, the TE loci that we chose to highlight were not based on our results from the PBMC analysis in Figure 2, which is why there is no overlap in the TE families presented. Instead, we selected those TE-gene pairs based on 1) known function of the genes in immunity and/or HIV-1 restriction, 2) known contribution of the TE families to immunity, and 3) differential accessibility and expression of the TEs and genes respectively in ECs compared to HCs. Thus, Figure 4B-C represents select examples that we deemed particularly relevant to the EC phenotype. We have revised the manuscript to better explain the process of TE-gene pair identification and the rationale behind our selection for Figure 4B-C.

      “We used paired ATAC-seq – which measures chromatin accessibility – and RNA-seq datasets from the CD4+ T cells of ECs (n=4) and HCs (n=4) (39) to create a list of TE-gene pairs where the TE locus and gene show increased accessibility and expression, respectively, in ECs compared to HCs (Table S7, see Methods for details). These loci and genes were paired based on proximity, with a maximum distance of 10kb between the TE locus and the gene’s transcription start site, to increase the likelihood of a direct cis-regulatory influence of the TE over the nearby gene.” [pg. 21, L357-363)

      “In Figure 4B & 4C, we have highlighted six of the TE-gene pairs from Table S7 based on the gene’s function in HIV-1 restriction and the TE family’s known contribution to immune gene regulation.” [pg. 21, L369-371]

      Regarding cell type composition, we acknowledge that the differences observed in the proportion of immune cell subtypes may contribute to the differential expression between ECs, VPs, and HCs (Figures 2D and S3A). However, we provide evidence that cell type composition cannot be the sole driver for the clustering of deconvoluted CD4+ T cell RNA-seq samples (Figure 3B and S5D). Cell subtype alone could not explain the observed clustering of EC samples by gene and TE family expression. Clusters 1 and 2, for example, had nearly identical subtype compositions, but were clearly separated on the UMAP (Figures 3A, 3B, and S5D). We remark on this in the Results of the revised manuscript.

      “[W]e visualized the samples by cellular subtype, as identified in the original studies, to assess whether the clustering could be explained by CD4+ T cell subtype composition (Figure S5D). Clusters 1 and 2 were essentially indistinguishable in cell type composition, whereas Clusters 3 and 4 showed an overrepresentation of TM/EM and naïve/CM cell types, respectively (Figure 3B). Thus, cell subtype composition could only partially explain the clustering.” [pg. 16, L271-276]

      The EC CD4+ T cell clusters also had unique gene ontology, gene & TE expression, and TE accessibility profiles (Figures 3C, 3D, 5). Moreover, while we do not have parallel RNA- and ATAC-seq data from similarly deconvoluted CD4+ T cells of ECs like those used in the clustering analysis (PMIDs 32848246 & 27453467), the original article from which we sourced the parallel RNA- and ATAC-seq data used in Figures 1 and 4 reported that these samples are predominantly effector memory CD4+ T cells (PMID 30964004). If new deconvoluted, multi-omic datasets from ECs become available, we would be interested in further exploring the contribution of cell type composition. However, the current data indicate that it is not a major contributor to the differential TE expression identified in our analyses.

      Regarding the impact of ongoing HIV-1 replication upon the unique expression patterns in the EC participants, it's worth noting that the public datasets used in our study reported undetectable viremia in the EC volunteers (PMIDs 30964004, 29269040, 32848246, 27453467). Nonetheless, we sought to address this by quantifying HIV-1 transcription and exploring its potential association with interferon-stimulated gene (ISG) expression, a group of genes that we know would be reactive to active viremia. These analyses were integrated into the revised manuscript as Figure S6.

      “To exclude the possibility that these gene expression signatures in ECs are associated with viremia, we quantified HIV-1 transcript levels in deconvoluted CD4+ T cell RNA-seq samples from ECs and ART-treated PLWH for comparison. In the original studies, all samples were reported to have undetected viremia by blood tests (9,37-39). Consistent with this, we found that the vast majority of the EC and ART samples taken from PBMCs exhibited very low HIV-1 transcript levels, with TPM values generally below 1. However, in samples originating from the lymph nodes of EC individuals (n = 22) (37), we detected HIV-1 expression in some subsets (Figure S6A&B). In agreement with the corresponding study (37), we found elevated HIV-1 transcript levels in germinal center and non-germinal center T follicular helper cells (GC Tfh & nGC Tfh, not included in our clustering analyses) -- and to a lesser extent in T effector memory (EM) cells (Figure S6A, average TPM Based on these results, we have concluded that the differential expression of genes and TEs in the EC clusters are not a consequence of low-level viral transcription in ECs.

      Finally, a remark on TE nomenclature: The reviewer suggests that we use the term “TE groups” as opposed to taxonomic terms such as TE subfamily or TE family. We respectfully disagree. This nomenclature of TEs has been well defined (PMIDs 26612867, 26612867, 17984973) and is widely used in TE literature. Throughout the manuscript, we have conformed to the nomenclature used to annotate the human genome. One can debate the way TE families and subfamilies have been classified in Dfam (the database through which repetitive elements in the human genome have been annotated), but it is outside the scope of this study to revisit that nomenclature.

      Similarly, of the 12 DE TE groups in EC in Fig 5A, only 3 overlap with the 16 in EC Fig S1.

      Authors: This is correct, but we don’t believe it’s concerning. In Figure 5A, we are comparing the expression of TE families between separate EC clusters. In Figure S1, we are comparing the expression of TE families in ECs compared to ART-treated PLWH. These are fundamentally different comparisons and thus the differences in the identified DE-TEs between the two figures reflect the distinct biological contexts being investigated in each analysis.

      Second, the introduction points out the strongly supported association between elite control and immunogenetic determinants, most notably specific HLA-B types, but also innate immunity factors. This cries out for inclusion of these factors in the analyses of this manuscript, in the format of Figure S4, for example, but none is to be found. The relevant genotypes are likely available in the metadata in the references cited, but, if not, could be inferred from the RNA-seq data.

      Authors: Thank you for the recommendation. While our project’s primary focus is on the transcriptomic and epigenomic signatures, we agree that studying the HLA-B genotypes of all EC participants could provide valuable context for understanding the clustering of elite controllers. To explore this, we inferred the HLA-B alleles in each EC participant whose RNA-seq data was included in the clustering analysis, utilizing the arcasHLA tool (PMID: 31173059) on the total CD4+ T cell samples. We then validated these inferred HLA-B alleles against the available metadata from one of the source studies (PMID 27453467) and found that they matched for all participants. This strengthened our confidence in the accuracy of the HLA-B genotype inferences for the other samples where comprehensive HLA-B data was not provided.

      In order to assess how these protective HLA-B alleles segregated between the four EC clusters derived from gene and TE family expression, we chose to visualize three of the most common alleles associated with HIV-1 elite control: HLA-B*27:03, *57:01, and *57:03 (PMIDs 30964004, 25119688, 21051598) (Figure R1, available in the Response to Reviewers PDF).

      Our analysis revealed that these major protective alleles were not significantly overrepresented in any particular cluster. Consequently, we believe that HLA-B genotype does not have a major impact on the clustering observed in Figure 3.

      It would also be very useful to present the KZNF data in Figure 5 the same way, since, looking at Fig 5C, the correlation of high and low KZNF expression, while clearly correlated with a that of few groups of elements, with clustering into specific groups does not appear to be well supported.

      Authors: Thank you for the insightful suggestion. While the KZNF genes are included in the gene set used for the clustering analysis in Figure 3, we agree that clustering based solely on KZNF expression and displaying it as we have in Figures 3A and S5 could provide valuable insights. However, when we attempted to cluster the EC RNA-seq samples using only KZNF expression data, we were limited by the relatively low number of KZNF genes that showed sufficient variability across samples (n = 120). For robust statistical power, we require at least 200 features to reliably cluster the 128 EC CD4+ T cell samples. We believe this limitation does not diminish the relevance of KZNFs in the observed clustering patterns but rather highlights the nuanced role each KZNF plays in the regulation of the transcriptome. Each individual KZNF is responsible for the regulation of hundreds to thousands of TE loci (PMID 37730438). Thus, while a clustering approach based solely on KZNF expression may not be feasible, the integral role of KZNFs in modulating the transcriptome through TE regulation remains evident and supports their inclusion in Figure 6 of the revised manuscript.

      In general, other than the cell type composition differences, there is no presentation of evidence for any biologically important feature associated with the clusters found.

      Authors: We agree that the root cause of the transcriptomic differences between the EC clusters is hard to pin down but we do identify several distinctive features of the clusters that we believe are biologically significant. First, having extracted the lists of genes whose differential expression defined the four EC clusters, gene set enrichment analysis revealed that the clusters were functionally distinct, each characterized by a unique list of top GO terms (Figure 3C). Second, we provide evidence that KZNFs expressed in CD4+ T cells significantly bind to the candidate TE families whose expression defines each of these clusters (Figure 6D) and have significantly decreased expression in ECs compared to VPs (Figure 6C). This is corroborated by pairwise correlation analysis that revealed cluster-specific anticorrelation patterns between these KZNFs and their target TEs (Figure 6A). We present this data in support of our hypothesized KZNF-based mechanism for TE co-option in viral immunity. We do not yet have data indicative of the mechanism by which KZNF expression is in turn regulated. However, we speculate that negative feedback loops may be contributing to changes in KZNF expression.

      “These observations suggest that interindividual variation in KZNF expression in CD4+ T cells could explain why certain TEs are variably expressed and accessible across ECs. But what are the mechanisms underlying variation in ZNF expression? It is possible that TE-KZNF regulatory loops are involved, in which a copy of the TE family targeted by a KZNF is inserted near and regulates the KZNF gene, thereby introducing a negative feedback loop. This phenomenon has been documented in prior studies of KZNF activity in embryogenesis (51) and cancer (115).” [pg. 39-40, L705-711]

      Overall, our study presents preliminary evidence that the four EC clusters derived from gene & TE family expression may be distinguished by complex interplay of activators (Figure S8) and repressors (Figure 6) altering the activity of infection-responsive TE families to co-opt specific elements for immune regulatory function. While not yet validated in an experimental setting, we believe these results are of biological significance.

      Third, the figures present values that have been very heavily analyzed, and it is difficult to impossible to infer what the underlying data look like. For example, with the exception of a few selected examples in Figs 4 and 5, individual provirus data are lacking. Nor can we tell how consistent the distribution of expression values within a TE group is, whether the TEs included solo LTRs (which constitute the majority of all ERVs), the possible contribution of other TFs to expression (with the exception of a brief mention of STAT1).

      Authors: We respectfully disagree that the values presented in our figures are heavily analyzed. As this manuscript represents the first investigation of TEs’ role in HIV-1 elite control, we believe the most reasonable initial approach was to compile and visualize the data at the family level, rather than at the level of individual loci, which is harder to interpret due to mapping issues, commonly low transcription, and often idiosyncratic behavior of individual loci. Nonetheless, we did not limit our analysis to full-length HERVs (proviruses) and thus retain all solo LTR data in our analyses. This was added to the Methods of the revised manuscript.

      “To facilitate comprehensive expression quantification, we curated a reference transcriptome by combining gene, TE, and HIV-1 genomic sequences. This was achieved by integrating the locus-level TE classification from RepeatMasker, the hg19 GenCode gene annotation,

      and the HXB2 reference HIV-1 annotation. For the TEs, we removed simple repeats, SINE elements, and DNA transposons, retaining LINE and HERV loci, including all solo LTRs. We also removed any loci within gene exons/UTRs. The remaining sequences were appended in fasta format, and all sequences were annotated with their respective gene, TE locus, or HIV subunit and modeled in GTF format.” [pg. 55, L869-878]

      For the sake of transparency, all relevant details on sequencing data analysis and the corresponding scripts are available in the Methods and our GitHub Repository.

      Additionally, while most of our figures make comparisons at the family level, we do visualize multiple TE loci (Figure 4C) and provide a list of putative locus-level TE-gene pairs from which those shown in Figure 4C were selected (Table S7). In our revisions, we also re-clustered the 128 EC CD4+ T cell RNA-seq samples based only on locus-level TE expression, using the same graph-based k-nearest neighbors method as in Figure 3. The results of this new analysis have been integrated into the revised manuscript as Figure S7.

      “To further explore locus-level expression patterns, we re-clustered the same EC samples (n=128) using only locus-level TE expression. This again resolved four EC clusters (Figure S7A), which interestingly appeared even more distinct than those identified by gene and TE family expression (Figure 3A). The TE locus-based clusters (TL-Cs) aligned well with the gene and TE family clusters (GT-Cs), with an average 70% overlap in samples between each GT-C and its corresponding TL-C (Figure S7B), indicating high consistency (Table S8). The remaining 30% of samples that shifted between clusters did so consistently within individuals, not cohorts, maintaining heterogeneous TL-C compositions similar to the GT-Cs (Figures S7C & S5A). An exception to this heterogeneity was TL-C4, comprising 22 samples from GT-C1 that were almost entirely from the CD4+ T cell subsets of only four participants in the Jiang cohort (Figure S7C, Table S8). No other samples from the Jiang cohort shifted to this cluster from other GT-Cs, suggesting that these patterns reflect individual variation rather than cohort bias. Like the GT-Cs, each TL-C included samples from all five CD4+ T cell subsets and was largely heterogeneous (Figure S7C). Notably, TL-C2 mirrored corresponding GT-C3 in its overrepresentation of EM and TM cells, while TL-C1 uniquely showed an overrepresentation of naïve CD4+ T cells. Beyond sample composition, each TL-C was characterized by a unique pattern of expressed TE loci (Figure S7D). These signatures were heterogeneous across families, with subsets of variable loci from one TE family marking separate clusters (Figure S7E), some of which did not reach the threshold of significance in earlier analyses when analyzed at the family-level, like SVA-D. Many families maintained their cluster-specific signatures, like THE1B (a marker of GT-C2), for which the majority of variable loci were found in corresponding TL-C1. However, some TE families, like the L1s that marked GT-C1, showed more heterogeneous signatures with variable loci marking multiple TL-Cs. These findings underscore the need for future locus-level investigations with high-depth sequencing to fully capture the complexity of TE expression.” [pg. 27-28, L462-488]

      With this addition, we include significantly more data analyzed at the locus level, which we believe not only validate the distinct clustering observed in Figure 3, but also underscore the potential for locus resolution analysis to reveal additional layers of retrotranscriptomic diversity in EC CD4+ T cells.

      Finally, we agree with the reviewer that TFs other than STAT1 may contribute to the observed changes in TE expression. To investigate this, we analyzed several TFs expressed in CD4+ T cells and, for TFs enriched over TEs of interest, subsequently examined the correlation between TF and target TE expression in the deconvoluted EC CD4+ T cell samples used for the clustering. The results of this analysis have been integrated into the revised manuscript at Figure S8.

      “In addition to KZNF repressors, transcriptional activators may also drive the differential expression of specific TE families across ECs (83). To investigate this, we focused on transcription factors (TFs) expressed in CD4+ T cells and mined ChIP-seq data from the ENCODE Consortium (84) to identify TFs with binding enrichment to TE families of interest, selected for their elevated, cluster-specific expression in ECs (highlighted in Figures 4, 5, and S4). We then examined the correlation between TF and target TE expression in the deconvoluted CD4+ T cell samples from ECs used for our clustering analysis (Figure 3) (9,37). We observed several significant positive correlations between TF and TE expression across ECs (Figure S8). Thus, differential expression of immune-related TFs may also contribute to the variation in TE expression and cis-regulatory activity across ECs, in tandem with the repressive activities of KZNFs.” [pg. 30, L517-527]

      This evidence supports the reviewer’s suggestion that other TFs may be contributing to the unique EC retrotranscriptome we profile in this study. These added analyses, mimicking those conducted for KZNFs in Figure 6B & 6D, demonstrate that transcriptional activators may indeed play a crucial role in shaping the TE landscape in ECs.

      Other issues

      Figure 1:

      A) Log2 fold change of what? TPM values? Needs to be specified.

      Authors: Thank you for pointing out this ambiguity. The log2-transformed fold change values plotted in Figure 1A refer to DESeq2-normalized expression. They were extracted from the results of the DESeq2 pipeline, which we applied to the raw count expression matrix (see our Methods for more details). Following your suggestion, we have clarified this point in the figure legend in the revised manuscript.

      “Total detected genes and TE loci are plotted by log2-transformed fold change of DESeq2-normalized counts (EC vs. HC).” [pg. 10, L163-164]

      We have similarly made these changes to any figure legend which was ambiguous in its description of the expression data.

      Why Bonferroni correction? Usually BH q values or other less stringent adjustments are used nowadays.

      Authors: In our analysis, we opted for the Bonferroni correction due to its well-established reliability and stringent control of the family-wise error rate when conducting multiple tests. Given the exploratory nature of our investigation and the desire to minimize the risk of false positive findings, we chose to employ this traditional correction method within our analytical pipelines.

      B,C): Z-score of what? Scaled, normalized counts? Scaled TPM values?

      Authors: Thank you again for highlighting this point of uncertainty. We now clarify this in the figure legend in the revised manuscript.

      “Heatmap displaying the expression of the top differentially expressed genes in CD4+ T cells of ECs (n=4; red bar) vs. HCs (n=5; blue bar). Relative expression levels are representative of row-wise scaled, log2-transformed expression in transcripts per million (TPM). Heatmap coloration is based on the z-score distribution from low (gold) to high (purple) expression.” [pg. 11, L167-171]

      Figure 2:

      B) The blue font color is very difficult to see.

      Authors: We have changed the blue font color to make it more easily distinguishable from the black.

      C) This heatmap should demarcate or separate genes versus TE clades. If that's not possible, then the two should be shown separately.

      Authors: We appreciate your suggestion regarding the heatmap presentation. While we understand the rationale for demarcating genes versus TE clades, we have chosen to retain the original figure layout. In this analysis, TEs were analyzed simultaneously with genes. The order in which they are shown was obtained by default clustering of the expression matrix using the hclust function. We chose to present them together and in this order to provide a comprehensive visualization of the differential expression patterns between the two groups and highlight the homogenous nature of gene and TE expression across VPs.

      L191: How many groups (NOT families) and how many total elements were examined?

      Authors: We begin with the RepeatMasker annotation of the hg19 assembly and filter out the SINE elements, DNA transposons, simple repeats, and all loci within gene exons/UTRs. These details are provided in the Methods of the revised manuscript, as was quoted above. In total, our analyses examine 1,104,828 loci from 603 TE groups (which we refer to as families). We apologize if this figure is not accurate to a separate classification of TEs into groups, rather than families. Any such method of grouping TEs is unfamiliar to us and outside of the Dfam annotation.

      L198: 2B, not C

      Authors: Thank you for catching this. The figures labelled were swapped in error and have been changed to reflect in Figure 2 to match the in-text references.

      L205: Did the expressed proviruses have STAT1 sites?

      Authors: Thank you for your question. The identification of LTR13’s increased expression in ECs compared to VPs was the result of a family level analysis which considered expression additively across the LTR13 loci in our annotation. To answer your question, we analyzed STAT1 ChIP-seq data from the ENCODE Consortium to characterize which LTR13 loci were bound by STAT1 (corroborated by motif prediction calls). We then integrated the EC RNA-seq data and found that the expressed LTR13 proviruses significantly overrepresented those with bound STAT1 sites (Figure R2, available in the Response to Reviewers PDF).

      These data suggest that STAT1 binding may play a critical role in the transcriptional regulation of LTR13 in ECs, contributing to their differential expression profile. Further exploration into the contribution of activating, immune-related TFs is explored in Figure S8 in the revised manuscript.

      L333: 10 kb is very close. Why was it chosen?

      Authors: We chose 10 kb as our cutoff for selection because it allowed for very high confidence in the TE loci’s cis-regulatory capacity over the nearby genes. For transparency, we have made this clearer in the Results text of the revised manuscript.

      “These loci and genes were paired based on proximity, with a maximum distance of 10kb between the TE locus and the gene’s transcription start site, to increase the likelihood of a direct cis-regulatory influence of the TE over the nearby gene.” [pg. 21, L360-363]

      However, if desired, a less stringent cutoff could also be used with relative confidence (e.g., 50 kb).

      L351-352: Again, correlation is not causation. How do the authors know it's not the other way around?

      Authors: The candidates that we chose to display in Figure 4 (the figure to which these lines refers) are from MER41, ERV3-16, and LTR12C. Our lab and others have shown that these specific loci or other loci in these TE families are capable of regulating neighboring genes’ expression, with specific evidence in the context of immunity (PMID Smitha, Ed, APOBEC, etc.). Based on this knowledge, we believe that it’s most likely that TE-derived regulatory sequences are the cause of the increased restriction factor expression, rather than TE accessibility being a consequence of the transcriptional activation of the neighboring genes. However, we recognize that these results are correlative, as the reviewer notes, and we emphasize this in the revised manuscript. Most notably:

      “We acknowledge that these associations are drawn from correlative patterns and manipulative experiments are needed to infer causality between chromatin changes at these TEs and increased expression of nearby immunity genes.” [pg. 36, L620-623]

      Figure 4

      B) Need to show a scale of the genome region, the orientation of both the gene and the TE, whether it is a solo LTR

      Authors: Thank you for the suggestion. Genomic scale and orientation have been added to Figure 4C. All loci visualized were solo LTRs, save for HCP5, which is a lncRNA derived from a full-length ERV3 element.

      Figure 5

      A) Would benefit from also showing HCs

      Authors: Thank you for the recommendation. The RNA-seq datasets used in this analysis do not include HC samples. Additionally, this analysis is meant to highlight differences in TE expression between the four EC clusters. Thus, we have chosen to keep Figure 5A as it appears in the original manuscript.

      C) Would be helped by showing adjusted p-values, and also should show examples of non-correlating relationships between these KZNF genes and other TEs.

      Authors: Thank you for the suggestion. All correlation analyses had adjusted p-values below 0.01, derived from corr.test in R. We’ve added this to the figure legends of Figure 6B [pg. 32, L539] and S8B [pg. 53, L835]. However, we have chosen not to integrate non-correlating examples into the revised manuscript for the sake of space.

      Figure 6

      Title: should start with "proposed model for.." or some such.

      Authors: Thank you for the suggestion. The title has been changed to “Proposed model for the interplay of KZNFs and TEs regulating proximal antiviral gene expression in elite controllers of HIV-1” in the revised manuscript [pg. 34, L580-581].

      L 537: Again, how do the alleles segregate in the clusters?

      Authors: This question has been addressed in response to an earlier comment from Reviewer #3.

      Generally, in the correlation analyses, I'd like to see adjusted p-values and examples of non-correlated results.

      Authors: Thank you for the suggestion. As mentioned above, all correlation analyses have been annotated with the adjusted p-value threshold. Additionally, below we’ve included examples of non-correlated results from two analyses. First, we show a TE-gene pair whose increased TE accessibility in HCs compared to ECs does not correlate with increased expression of the proximal gene (Figure R3, available in the Response to Reviewers PDF). Notably, this gene does not play a role in HIV-1 infection response. Here, we show that genes with proximal (Second, we show the pairwise correlation and linear regression results of L1PA6 and ZNF2 (Figure R4, available in the Response to Reviewers PDF). ZNF2 is one of the KZNFs highlighted in Figure 6 for its low expression in ECs, anticorrelated to its repressive target LTR12C. On the other hand, L1PA6 is active in ECs, with variably high expression across samples. ZNF2 ChIP-exo revealed that ZNF2 has no capacity to bind to L1PA6 loci (adj. p-value = 1; PMID 37730438). Thus, even though both genes are variable across samples, we observe no significant (anti)correlation between the two variables (rho = 0.051 & p-value = 0.866).

      While we have not integrated these results into the revised manuscript for the sake of space, we hope that the provided examples satisfactorily demonstrate the presence of non-correlated results in our analyses, further reinforcing the specificity and robustness of our significant findings.

      Significance:

      This study presents an in-depth analysis of the reverse transcriptome in Elite controllers. It will be of interest to both HIV researchers and those interested in the regulation of the human retrotranscriptome and its consequences.

      Provides an avenue for future explanation into elite controllers and TE involvement in the phenotype.

      Does a good job of placing the work in the context of existing lit, synthesizing other papers regarding TEs and immune control.

      Potential immune regulatory involvement of specific HERV clades.

      Authors: We’d like to thank the reviewer for their encouraging feedback. We’re pleased that they found our analysis of the EC retrotranscriptome to be of broad interest and appreciate their recognition of our efforts to synthesize existing literature, contextualizing our findings within the broader field. We agree that our study opens new avenues for exploring the role of TEs, particularly specific HERV clades, in not only the EC phenotype but immune regulation as a whole.

    1. Automatic thinking causes us to simplify problems and see them through narrow frames. We fi ll in miss- ing information based on our assumptions about the world and evaluate situations based on associations that automatically come to mind and belief systems that we take for granted. In so doing, we may form a mistaken picture of a situation, just as looking through a small window overlooking an urban park could mis- lead someone into thinking he or she was in a more bucolic place. page 12

      I think that the results from the research conducted on culting stigmatized identity affecting students' performances made me realize how much of a mental toll stereotypes can play on people. It's disheartening and ironic simultaneously to see how the high and low sides of the caste-system groups collectively performed worse when they were told their respective roles. It influences the way that I think as a student going to an international school because it is interesting to see how these ideas can parallel to students around me. Regardless, this passage relates to today's inquiry question because it can be used and argued to reflect how poor people shape individual economic actions due to neglect and stereotypes affecting their life subconsciously.

    2. Automatic thikning causes us to simplify problems and see them through narrow frames. We fill in missing information based on our assumptions about the world and evaluate situations based on associations that automatically come to mind and belief systems that we take for granted. In so doing, we may form a mistaken picture of a situation, just as looking through a small window overlooking an urban park could mislead someone into thinking he or she was in a more bucolic place. page 6

      (Question)

      Throughout this passage I noticed that automatic thinking or "thinking fast" is commonly portrayed as being bad due to its associations with irrational decisions, prejudice, and intuitive. Given this context, what are the positive benefits of automatic thinking and why did the author fail to shed light on this in the book?

      I would say that this section of the book relates to today's inquiry because it teaches us more about the first system of thinking and how it can be malicious to mainly use this for our everyday thinking. For example, statistically poor people fail to make it out of the bottom income threshold for most of their life. I hypothesize that this is due to their lack of choices, power, and education in order to think deliberately.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Thank you very much for your editorial handling of our manuscript entitled 'A conserved fungal Knr4/Smi1 protein is vital for maintaining cell wall integrity and host plant pathogenesis'. We have taken on board the reviewers' comments and thank them for their diligence and time in improving our manuscript.

      Please find our responses to each of the comments below.

      Reviewer(s)' comments

      Reviewer #1


      Major comments:


      __1.1. As a more critical comment, I find the presentation of the figures somewhat confusing, especially with the mixing of main figures, supplements to the main figures, and actual supplemental data. On top of that, the figures are not called up in the right order (e.g. Figure 4 follows 2D, while 3 comes after 4; Figure 6 comes before 5...), and some are never called up (I think) (e.g. Figure 1B, Figure 2B). __


      __Response: __The figure order has been revised according to the reviewer's suggestion, while still following eLife's formatting guidelines for naming supplementals. Thank you.

      1.2. I agree that there should be more CWI-related genes in the wheat module linked to the FgKnr4 fungal module, or, vice-versa, CW-manipulating genes in the fungal module. It would at least be good if the authors could comment further on if they find such genes, and if not, how this fits their model.


      Response: Thank you for your insightful suggestion regarding the inclusion of more CWI-related genes in the wheat module linked to the FgKnr4 fungal module F16, or vice versa. We did observe a co-regulated response between the wheat module W05 which is correlated to the FgKnr4 module F16. Namely, we observed an enrichment of oxidative stress genes including respiratory burst oxidases and two catalases (lines 304 - 313) in the correlated wheat module (W05). Early expression of these oxidative stress inducing genes likely induces the CWI pathway in the fungus, which is regulated by FgKnr4. Knr4 functions as both a regulatory protein in the CWI pathway and as a scaffolding protein across multiple pathways in S. cerevisiae (Martin-Yken et al., 2016, https://onlinelibrary.wiley.com/doi/10.1111/cmi.12618 ). Scaffolding protein-encoding genes are typically expressed earlier than the genes they regulate to enable pre-assembly with their interacting partners, ensuring that signaling pathways are ready to activate when needed. In this context, the CWI integrity MAPKs Bck1 and Mkk1 are part of module F05, which includes two chitin synthases and a glucan synthase. This module is highly expressed during the late symptomless phase. The MAPK Mgv1, found in module F13, is expressed consistently throughout the infection process, which aligns with the expectation that MAPKs are mainly post-transcriptionally regulated. Thank you for bringing our attention to this, this is now included in the discussion (lines 427 - 443) along with eigengene expression plots of all modules added to the supplementary (Figure 3 - figure supplement 1).

      To explore potential shared functions of FgKnr4 with other genes in its module, we re-analyzed the high module membership genes within module F16, which includes FgKnr4, using Knetminer (Hassani-Pak et al., 2021; https://onlinelibrary.wiley.com/doi/10.1111/pbi.13583 ). This analysis revealed that 8 out of 15 of these genes are associated with cell division and ATP binding. Four of the candidate genes are also part of a predicted protein-protein interaction subnetwork of genes within module F16, which relate to cell cycle and ATP binding. In S. cerevisiae, the absence of Knr4 results in cell division dysfunction (Martin-Yken et al., 2016, https://onlinelibrary.wiley.com/doi/10.1111/cmi.12618 ). Accordingly, we tested sensitivity of ΔFgknr4 to microtubule inhibitor benomyl (a compound commonly used to identify mutants with cell division defects; Hoyt et al., 1991 https://www.cell.com/cell/pdf/0092-8674(81)90014-3.pdf). We found that the ΔFgknr4 mutant was more susceptible to benomyl, both when grown on solid agar and in liquid culture. This data has now been added Figure 7, and referred to in lines 338-348.

      __Specific issues: __


      1.3. In the case of figure 5, I generally find it hard to follow. In the text (line 262/263), the authors state that 5C shows "eye-shaped lesions" caused by ΔFgknr4 and ΔFgtri5, but I can't see neither (5C appears to be a ΔFgknr4 complementation experiment). The figure legend also states nothing in this regard.

      __Response: __Thank you for your suggestion. We have amended the manuscript to include an additional panel that shows the dissected spikelet without its outer glumes, making the eye shaped diseased regions more visible in Figure 5.

      __1.4. Figure 5D supposedly shows 'visibly reduced fungal burden' in ΔFgknr4-infected plants, but I can't really see the fungal burden in this picture, but the infected section looks a lot thinner and more damaged than the control stem, so in a way more diseased. __


      Response: __Thank you for your insight. We have revised our conclusions based on this image to state that while ΔFgknr4 can colonise host tissue, it does so less effectively compared to the wild-type strain as we are unable to quantitatively evaluate fungal burden using image-colour thresholding due to the overlapping colours of the fungal cells and wheat tissues. Decreased host colonisation is evidenced by (i) reduced fungal hyphae proliferation, particularly in the thicker adaxial cell layer, (ii) collapsed air spaces in wheat cells, and (iii) increased polymer deposition at the wheat cell walls, indicating an enhanced defence response. __Figure 5 has been amended to include these observations in the corresponding figure legend and the resin images now include insets with detailed annotation.

      __1.5. The authors then go on to state (lines 272-273) that they analyzed the amounts of DON mycotoxin in infected tissues, but don't seem to show any data for this experiment. __

      Response: __We have amended this to now include the data in __Figure 5 - figure supplement 2B, thank you.

      Reviewer #2


      __Major issues: __


      2.1 If Knf4 is involved in the CWI pathway, what other genes involved in the CWI pathway are in this fungal module? one of the reasons for developing modules or sub-networks is to assign common function and identify new genes contributing to the function. since FgKnr4 is noted to play a role in the CWI pathways, then genes in that module should have similar functions. If WGCN does not do that, what is the purpose of this exercise?


      Response: __Thank you for raising this point regarding the role of FgKnr4 in the CWI pathway and the expectations for genes of shared function within the FgKnr4 module F16. We did observe that the module containing FgKnr4 (F16) was also correlated to a wheat module (W05) which was significantly enriched for oxidative stress genes. This pathogen-host correlated pattern led us to study module F16, which otherwise lacks significant gene ontology term enrichment, unique gene set enrichments, and contains few characterised genes. This is now highlighted in __lines 233-246. This underscores the strength of the WGCNA. By using high-resolution RNA-seq data to map modules to specific infection stages, we identified an important gene that would have otherwise been overlooked. This approach contrasts with other network analyses that often rely on the guilt-by-association principle to identify novel virulence-related genes within modules containing known virulence factors, potentially overlooking significant pathways outside the scope of prior studies. Therefore, our analysis has already benefited from several advantages of WGCNA, including the identification of key genes with high module membership that may be critical for biological processes, as well as generating a high-resolution, stage-specific co-expression map of the F. graminearum infection process in wheat. This point is now emphasised in lines 233-252. As discussed in response to reviewer 1, Knr4 functions as both a regulatory protein in the CWI pathway and as a scaffolding protein across multiple pathways in S. cerevisiae (Martin-Yken et al., 2016, https://onlinelibrary.wiley.com/doi/10.1111/cmi.12618 ) which would explain its clustering separate from the CWI pathway genes. The high module membership genes within module F16 containing FgKnr4 were re-analysed using Knetminer (Hassani-Pak et al., 2021; https://onlinelibrary.wiley.com/doi/10.1111/pbi.13583 ), which found that 8/15 of these genes were related to cell division and ATP binding. Four of the candidate genes are also part of a predicted protein-protein interaction subnetwork of genes within module F16, which relate to cell cycle and ATP binding. In S. cerevisiae, the absence Knr4 leads to dysfunction in cell division. Accordingly, we tested sensitivity of ΔFgknr4 to the microtubule inhibitor benomyl (a compound commonly used to identify mutants with cell division defects; Hoyt et al., 1991 https://www.cell.com/cell/pdf/0092-8674(81)90014-3.pdf). We found that the ΔFgknr4 mutant was more susceptible to benomyl, both when grown on solid agar and in liquid culture. This data has now been added as Figure 7 and referred to in lines 338-348.


      2.2. Due to development defects in the Fgknr1 mutant, I would not equate to as virulence factor or an effector gene.


      __Response: __We are in complete agreement with the reviewer and are not suggesting that FgKnr4 is an effector or virulence factor, we have been careful with our wording to indicate that FgKnr4 is simply necessary for full virulence and its disruption results in reduced virulence and have outlined how we believe FgKnr4 participates in a fungal signaling pathway required for infection of wheat.


      2.3. What new information is provided with WGCN modules compared with other GCN network in Fusarium (examples of GCN in Fusarium is below) ____https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5069591/ https://doi.org/10.1186/s12864-020-6596-y____ DOI: 10.1371/journal.pone.0013021. The GCN networks from Fusarium have already identified modules necessary/involved in pathogenesis.

      Response: __The 2016 New Phytologist gene regulatory network (GRN) by Guo et al. is large and comprehensive. However, only three of the eleven datasets are in planta, with just one dataset focusing on F. graminearum infection on wheat spikes. The other two in planta datasets involve barley infection and Fusarium crown rot. By combining numerous in planta and in vitro datasets, the previous GRNs lack the fine resolution needed to identify genetic relationships under specific conditions, such as the various stages of symptomatic and symptomless F. graminearum infection of mature flowering wheat plants. This limitation is highlighted in the 2016 paper itself. This network is expanded in the Guo et al., 2020 BMC genomics paper where it includes one additional in planta and nine in vitro datasets. However, the in planta dataset involves juvenile wheat coleoptile infection, which serves as an artificial model for wheat infection but is not on mature flowering wheat plants reminiscent of Fusarium Head Blight of cereals in the field. This model differs significantly in the mode of action of F. graminearum, notably DON mycotoxin is not essential for virulence in this context (Armer et al. 2024, https://pubmed.ncbi.nlm.nih.gov/38877764/ ). The Guo et al., 2020 paper still faces the same issues in terms of resolution and the inability to draw conclusions specific to the different stages of F. graminearum infection. Additionally, these GRNs use Affymetrix data, which miss over 400 genes (~ 3 % of the genome) from newer gene models. In contrast, our study addresses these limitations by analysing a meticulously sampled, stage- and tissue-specific in planta RNA-seq dataset using the latest reference annotation. Our approach provides higher resolution and insights into host transcriptomic responses during the infection process. The importance of our study in the context of these GRNs is now addressed in the introduction (__lines 85-92).


      2.4. Ideally, the WGCN should have been used identify plant targets of Fusarium pathogenicity genes. This would have provided credibility and usefulness of the WGCN. Many bioinformatic tools are available to identify virulence factors and the utility of WGCN in this regard is not viable. However, if the authors had overlapped the known virulence factors in a fungal module to a particular wheat module, the impact of the WGCN would be great. The module W12 has genes from numerous traits represented and WGCN could have been used to show novel links between Fg and wheat. For example, does tri5 mutant affect genes in other traits?

      __Response: __Thank you for your suggestions. In this study we have shown the association between the main fungal virulence factor of F. graminearum, DON mycotoxin, with wheat detoxification responses. Through this we have identified a set of tri5 responsive genes and validated this correlation in two genes belonging to the phenylalanine pathway and one transmembrane detoxification gene. Although we could validate more genes in this tri5 responsive wheat module, our paper aimed to investigate previously unstudied aspects of the F. graminearum infection process and how the fungus responded to changing conditions within the host environment. We accomplished this by characterising a gene within a fungal module that had limited annotation enrichment and few characterised genes. Tri5 on the other hand is the most extensively studied gene in F. graminearum and while the network we generated may offer new insights into tri5 responsive genes, this is beyond the scope of our current study. In addition to the tri5 co-regulated response, we have also demonstrated the coordinated response between the fungal module F16, which contains FgKnr4 that is necessary for tolerance to oxidative stress, and the wheat module W05, which is enriched for oxidative stress genes.


      While our co-expression network approach can be used to explore and validate other early downstream signaling and defense components in wheat cells, several challenges must be considered: (a) the poor quality of wheat gene calls, (b) genetic redundancy due to both homoeologous genes and large gene families, and (c) the presence of DON, which can inhibit translation and prevent many transcriptional changes from being realised within the host responses. Additionally, most plant host receptors are not transcriptionally upregulated in response to pathogen infection (most R gene studies for the NBS-LRR and exLRR-kinase classes), making their discovery through a transcriptomics approach unlikely. These points will be included in our discussion (lines 408-413), thank you.

      Specific issues

      • *

      2.5. Since tri5 mutant was used a proof of concept to link wheat/Fg modules, it would have been useful to show that TRI14, which is not involved DON biosynthesis, but involved in virulence ( https://doi.org/10.3390/applmicrobiol4020058____) impact the wheat module genes.


      Response: __Our goal was to show that wheat genes respond to the whole TRI cluster, not just individual TRI genes. Therefore, the tri5 mutant serves as a solid proof-of-concept, because TRI5 is essential for DON biosynthesis, the primary function of the TRI gene cluster, thereby representing the function of the cluster as a whole. This is now clarified in __lines 217-219. Additionally, the uncertainties surrounding other TRI mutants would complicate the question we were addressing-namely, whether a wheat module enriched in detoxification genes is responding to DON mycotoxin, as implied by shared co-expression patterns with the TRI cluster. For instance, the referenced TRI14 paper indicates that DON is produced in the same amount in vitro in a single media. Although the difference is not significant, the average DON produced is lower for the two Δtri14 transformants tested. Therefore, we cannot definitively rule out that TRI14 is involved in DON biosynthesis and extrapolate this to DON production in planta. Despite this, the suggestion is interesting, and would make a nice experiment but we believe it does not contribute to the overall aim of this study.

      2.6. Moreover, prior RNAseq studies with tri5 mutant strain on wheat would have revealed the expression of PAL and other phenylpropanoid pathway genes?

      __Response: __We agree that this would be an interesting comparison to make but unfortunately no dataset comparing in planta expression of the tri5 mutant within wheat spikes exists.

      2.7. Table S1 lists 15 candidate genes of the F16 module; however, supplementary File 1 indicates 74 genes in the same module. The basis of exclusion should be explained. The author has indicated genes with high MM was used as representative of the module. The 59 remaining genes of this module did not meet this criteria? Give examples.


      Response: __The 15 genes with the highest module membership were selected as initial candidates for further shortlisting from the 74 genes within module F16. In WGCNA, genes with high module membership (MM) (i.e. intramodular connectivity) are predicted to be central to the biological functions of the module (Langfelder and Horvath, 2008; https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-9-559 ) and continues to be a metric to identify biologically significant genes within WGCN analyses (https://bmcplantbiol.biomedcentral.com/articles/10.1186/s12870-024-05366-0 Tominello-Ramirez et al., 2024; https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9151341/ ;Zheng et al., 2022; https://www.nature.com/articles/s41598-020-80945-3 Panahi and Hejazi et al 2021). Following methods by Mateus et al. (2019) (https://academic.oup.com/ismej/article/13/5/1226/7475138 ) key genes were defined as those exhibiting elevated MM within the module, which were also strongly correlated (R > |0.70|) with modules of the partner organism (wheat). We have clarified this point in the manuscript. Thank you for the suggestion. (__Lines 253-263).

      2.____8. A list from every module that pass this criteria will be useful resource for functional characterization studies.


      __Response: __A supplementary spreadsheet has been generated which includes full lists of the top 15 genes with the highest module membership within the five fungal modules correlated to wheat modules and a summary of shared attributes among them. Thank you for this suggestion.

      2.9. Figure 3 indicates TRI genes in the module F12; your PHI base in Supp File S2 lists only TRI14. Why other TRI genes such as TRI5 not present in this File?


      Response: For clarity, the TRI genes in module F12 are TRI3, TRI4, TRI11, TRI12, and TRI14 which was stated in Table 1. TRI5 clusters with its neighboring regulatory gene TRI6 in module F11, which exhibits a similar but reduced expression pattern compared to module F12. To improve clarity on this the TRI genes in module F12 are also listed in-text in line 168 and added to Figure 4. The enrichment and correlated relationship of W12 to a cluster's expression still imply a correlated response of the wheat gene to the TRI cluster's biosynthetic product (DON), which is absent in the Δtri5 mutant.

      TRI14 and TRI12 are listed in PHI-base. TRI12 was mistakenly excluded due to an unmapped Uniprot ID, which were added separately in the spreadsheet. We will recheck all unmapped ID lists to ensure all PHI-base entries are included in the final output. Thank you for pointing out this error.


      2.10. What is purpose of listing the same gene multiple times? Example, osp24 (a single gene in Fg) is listed 13 times in F01 module.


      __Response: __This is a consequence of each entry having a separate PHI ID, which represents different interactions including inoculations on different cultivar. Cultivar and various experimental details were omitted from the spreadsheet to reduce information density, however the multiple PHI base ID's will be kept separate to make the data more user friendly when working with the PHI-base database. An explanation for this is now provided in the file's explanatory worksheet, thank you.

      Reviewer #3:


      3.1. Why only use of high confidence transcripts maize to map the reads and not the full genome like Fusarium graminearum? I have never analyzed plant transcriptome.


      __Response: __ In the wheat genome, only high-confidence gene calls are used by the global community (Choulet et al., 2023; https://link.springer.com/chapter/10.1007/978-3-031-38294-9_4 ) until a suitable and stable wheat pan-genome becomes available.

      3.2. The regular output of DESeq are TPMs, how did the authors obtain the FPKM used in the analysis?


      Response: FPKM was calculated using the GenomicFeatures package and included on GitHub to enhance accessibility for other users. However, the input for WGCNA and this study as a whole was normalised counts rather than FPKM. The FPKM analysis was done to improve interoperability of the data for future users and made available on Github. To complement this, the information regarding FPKM calculation is now included in the methods section of the revised manuscript (line 491).

      3.3. Do the authors have a Southern blot to prove the location of the insertion and number of insertions in Zymoseptoria tritici mutant and complemented strains?


      __Response: __No, but the phenotype is attributed to the presence or absence of ZtKnr4, as the mutant was successfully complemented in multiple phenotypic aspects. This satisfies Koch's postulates which is the gold standard for reverse genetics experimentation (Falkow 1988; https://www.jstor.org/stable/4454582 ).

      __3.4. Boxplots and bar graphs should have the same format. In Figures 5 B and F and supplementary figure 6.3 the authors showed the distribution of samples but it is lacking in figure 3 B and all bar graphs. __


      __Response: __Graphs have been modified to display the distribution of all samples, thank you.

      3.5. Line 247 FGRAMPH1_0T23707 should be FGRAMPH1_01T23707


      __Response: __Thank you this has now been amended.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment

      The authors develop a self-returning self-avoiding polymer model of chromosome organization and show that their framework can recapitulate at the same time local density and large-scale contact structural properties observed experimentally by various technologies. The presented theoretical framework and the results are valuable for the community of modelers working on 3D genomics. The work provides solid evidence that such a framework can be used, is reliable in describing chromatin organization at multiple scales, and could represent an interesting alternative to standard molecular dynamics simulations of chromatin polymer models.

      We appreciate the editor for an accurate description of the scope of the paper.

      Public Reviews:

      Reviewer #1 (Public Review):

      Carignano et al propose an extension of the self-returning random walk (SRRW) model for chromatin to include excluded volume aspects and use it to investigate generic local and global properties of the chromosome 3D organization inside eukaryotic nuclei. In particular, they focus on chromatin volumic density, contact probability, and domain size and suggest that their framework can recapitulate several experimental observations and predict the effect of some perturbations.

      We thanks the reviewer for the attention paid to the manuscript and all the relevant comments.

      Strengths:

      - The developed methodology is convincing and may offer an alternative - less computationally demanding - framework to investigate the single-cell and population structural properties of 3D genome organization at multiple scales.

      - Compared to the previous SRRW model, it allows for investigation of the role of excluded volume locally.

      Excluded volume is accounted for everywhere, not locally. We emphasized this on page 3, line 182:

      “The method that we employ to remove overlaps is a low-temperature-controlled molecular dynamics simulation using a soft repulsive interaction potential between initially overlapping beads, that is terminated as soon as all overlaps have been resolved, as described in the Appendix 3.”


      - They perform some experiments to compare with model predictions and show consistency between the two.

      Weaknesses:

      - The model is a homopolymer model and currently cannot fully account for specific mechanisms that may shape the heterogeneous, complex organization of chromosomes (TAD at specific positions, A/B compartmentalization, promoter-enhancer loops, etc.).

      The SR-EV model is definitely not a homo-polymer, as it is not a regular concatenation of a single monomeric unit.

      The model includes loops, which may happen in two ways: 1) As in the SRRW, branching structures emerging from the configuration backbone can be interpreted as nested loops and 2) A relatively long forward step followed by a return is a single loop. The model induces the formation of packing domains, which are not TADs, and are quantitatively in agreement with ChromSTEM experiments.

      We consider convenient to add a new figure that will further clarify the structures obtained with the SR-EV model. The following paragraph and figure has been added in page 5:

      “The density heterogeneity displayed by the SR-EV configurations can be analyzed in terms of the accessibility. One way to reveal this accessibility is by calculating the coordinations number (CN) for each nucleosome, using a coordination radius of 11.5 nm, along the SR-EV configuration. CN values range from 0 for an isolated nucleosome to 12 for a nucleosome immersed in a packing domain. In Figure 3 we show the SR-EV configuration showed in Figure 2, but colored according to CN. CN can be also considered as a measure to discriminate heterochromatin (red) and euchromatin (blue). Figure 3-A shows how the density inhomogeneity is coupled to different CN, with high CN represented in red and low CN represented in blue. Figure 3-B show a 50 nm thick slab obtained from the same configuration that clearly show the nucleosomes at the center of each packing domains are almost completely inaccesible, while those outside are open and accessible. It is also clear that the surface of the packing domains are characterized by nearly white nucleosomes, i.e. coordinated towards the center of the domain and open in the opposite direction.”

      - By construction of their framework, the effect of excluded volume is only local and larger-scale properties for which excluded volume could be a main actor (formation of chromosome territories [Rosa & Everaers, PLoS CB 2009], bottle-brush effects due to loop extrusion [Polovnikov et al, PRX 2023], etc.) cannot be captured.

      Excluded volume is considered for all nucleosomes, including overlapping beads distant along the polymer chain. Chromosome territories can be treated, but it is not in this case because we look at a single model chromosome.

      - Apart from being a computationally interesting approach to generating realistic 3D chromosome organization, the method offers fewer possibilities than standard polymer models (eg, MD simulations) of chromatin (no dynamics, no specific mechanisms, etc.) with likely the same predictive power under the same hypotheses. In particular, authors often claim the superiority of their approach to describing the local chromatin compaction compared to previous polymer models without showing it or citing any relevant references that would show it.

      We apologize if the text transmit an idea of superiority over other methods that was not intended. SR-EV is an alternative tool that may give a different, even complementary point of view, to standard polymer models.

      - Comparisons with experiments are solid but are not quantified.

      The comparisons that we have presented are quantitative. We do not have so far a way to characterize alpha or phi, a priori, for a particular system.

      Impact:

      Building on the presented framework in the future to incorporate TAD and compartments may offer an interesting model to study the single-cell heterogeneity of chromatin organization. But currently, in this reviewer's opinion, standard polymer modeling frameworks may offer more possibilities.

      We thank the reviewer for the positive opinion on the potential of the presented method. The incorporation of TADs and compartments is left for a future evolution of the model as its complexity will make this work extremely long.

      Reviewer #2 (Public Review):

      Summary:

      The authors introduce a simple Self Returning Excluded Volume (SR-EV) model to investigate the 3D organization of chromatin. This is a random walk with a probability to self-return accounting for the excluded volume effects. The authors use this method to study the statistical properties of chromatin organization in 3D. They compute contact probabilities, 3D distances, and packing properties of chromatin and compare them with a set of experimental data.

      We thank the reviewer for the attention paid to our manuscript.

      Strengths:

      (1) Typically, to generate a polymer with excluded volume interactions, one needs to run long simulations with computationally expensive repulsive potentials like the WeeksChanlder-Anderson potential. However, here, instead of performing long simulations, the authors have devised a method where they can grow polymer, enabling quick generation of configurations.

      (2) Authors show that the chromatin configurations generated from their models do satisfy many of the experimentally known statistical properties of chromatin. Contact probability scalings and packing properties are comparable with Chromatin Scanning Transmission Electron Microscopy (ChromSTEM)  experimental data from some of the cell types.

      Weaknesses:

      This can only generate broad statistical distributions. This method cannot generate sequence-dependent effects, specific TAD structures, or compartments without a prior model for the folding parameter alpha. It cannot generate a 3D distance between specific sets of genes. This is an interesting soft-matter physics study. However, the output is only as good as the alpha value one provides as input.

      We proposed a model to create realistic chromatin configuration that we have contrasted with specific single cell experiments, and also reproducing ensemble average properties. 3D distances between genes can be calculated after mapping the genome to the SR-EV configuration. The future incorporation of the genome sequence will also allow us to describe TADs and A/B compartments. See added paragraph in the Discussion section:

      “The incorporation of genomic character to the SR-EV model will allow us to study all individual single chromosomes properties, and also topological associated domains and A/B compartmentalization from ensemble of configurations as in HiC experiments. “

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      Major:

      - In the introduction and along the text, the authors are often making strong criticisms of previous works (mostly polymer simulation-based) to emphasize the need for an alternative approach or to emphasize the outcomes of their model. Most of these statements (see below) are incomplete if not wrong. I would suggest tuning down or completely removing them unless they are explicitly demonstrated (eg, by explicit quantitative comparisons). There is no need to claim any - fake - superiority over other approaches to demonstrate the usefulness of an approach. Complementarity or redundance in the approaches could also be beneficial.

      We regret if we unintentionally transmitted a claim of superiority. We have made several small edits to change that.

      - Line 42-43: at least there exist many works towards that direction (including polymer modeling, but also statistical modeling). For eg, see the recent review of Franck Alber.

      Line removed. Citation to Franck Alber included below in the text.

      - Line 54-57: Point 1 is correct but is it a fair limitation? These models can predict TADs & compartments while SR-EV no. Point 2 is wrong, it depends on the resolution of the model and computer capacity but it is not an intrinsic limitation. Point 3 is wrong, such models can predict very well single-cell properties, and again it is not an intrinsic limitation of the model. Point 4 is incorrect. The space-filling/fractal organization was an (unfortunate) picture to emphasize the typical organization of chromosomes in the early times (2009), but crumpled polymers which are a more realistic description are not space-filling (see Halverson et al, 2013).

      Text involving points 1 to 4 removed. It was unnecessary and does not change the line of the paper.

      - L400-402 + 409-411: in such a model, the biphasic structure may emerge from loop extrusion but also naturally from the crumpled polymer organization. Simple crumpled polymer without loop extrusion and phase separation would also produce biphasic structures.

      Yes, we agree. Also SR-EV leads to biphasic structures.

      - L 448-449: any data to show that existing polymer modeling would predict a strong dependency of C_p(n) on the volumic fraction (in the range studied here)?

      No, I don’t know a work predicting that.

      - Fig. 4:

      - Large-scale structural properties (R^2(n) and C_p(n)) are not dependent on phi. Is it surprising that by construction, SR-EV only relaxes the system locally after SRRW application?

      Excluded volume is considered at all length scales. However, as the decreasing C_p curves observed in theories and experiments imply, the fraction of overlap (or contacts) is more important at small separations (local) than at large separations. Yet, it was a surprise for us to observed negligible effect on phi.

      - Why not make a quantitative comparison between predicted and measured C_p(n)? Or at least plotting them on the same panel.

      Panels B and C are in the same scale and show a good agreement between SR-EV and experiments. However, it is not perfectly quantitative agreement. SR-EV represents the generic structure of chromatin and perfect agreement should not be expected.

      - Comparison with an average C_p(n) over all the chromosomes would be better.

      Possibly, but we don’t think it adds anything to the paper.

      - In Figure 5,6,7 (and related text): authors often describe some parameter values that are 'closest to experiment findings'. Can the authors quantify/justify this? The various 'closest' parameters are different. Can the authors comment?

      The folding parameter and average volume fraction are chose so that the agreement is best with the displayed experimental system, different cell for each case.

      - Figure 5: why not show the experimental distribution from Ou et al?

      - Figure 6 & 7: experimental results. Can the authors show images from their own experiments? Can they show that cohesion/RAD21 is really depleted after auxin treatment?

      It is currently under review in a different journal.

      - In the Discussion, a fair discussion on the limitations of the methods (dynamics, etc) is missing.

      Minor

      - Line 34-36: the logical relationship between this sentence and the ones before and after is very unclear.

      - Along the text, authors use the term 'connectivity' to describe 3D (Hi-C) contacts between different regions of the same chromosome/polymer. This is misleading as connectivity in polymer physics describes the connection along the polymer and not in the 3D space.

      No. I don’t think we used connectivity in that sense. We agree with your statement on the use of connectivity in polymer physics, and is what we always had in mind for this model.

      - Line 92: typo.

      - On the SR-EV method: does the relaxation process create local knots in the structure?

      We have not checked for knots.

      - Table 1: the good correspondence with linker length is remarkable but likely 'fortunate', other chosen resolutions would have led to other results. Moreover, the model cannot account for the fine structure of chromatin fiber. Can the authors comment on that?

      Fortunate to the extent that we sample the model parameter to overall catch the structure of chromatin.

      - Line 211: 'without the need of imposing any parameter': alpha is a parameter, no?

      Correct. Phrase deleted.

      - L267-269 & 450-451: actually in Liu & Dekker, they do observe an effect on Hi-C map (C_p(n)), weak but significant and not negligible.

      Our statements read ‘minimal’ and ‘relatively insensitive’. It is observed, but very small.

      - L283-286: This is a perspective statement that should be in the discussion.

      Moved to the Discussion, as suggested.

      - L239-241: The authors seem to emphasize some contradictions with recent results on phase separation. This is unclear and should be relocated to discussion.

      We just pointed out recent experiments, as stated. No intention to generate a discussion with any of them.

      - L311-313: Unclear statement.

      - L316-325: This is not results but discussion/speculation.

      Moved to Discussion

      - Along the text: 'promotor'-> 'promoter'. 

      - Corrected.

      - L364: explain more in detail PWS microscopy.

      Reviewer #2 (Recommendations For The Authors):

      Even though there are claims about nucleosome-resolution chromatin polymer, it is not clear that this work can generate structures with known nucleosome-resolution features. Nucleosome-level structure is much beyond a random walk with excluded volume and is driven by specific interactions. The authors should clarify this.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews

      Reviewer #1 (Public Review):

      Summary:

      Federer et al. tested AAVs designed to target GABAergic cells and parvalbumin-expressing cells in marmoset V1. Several new results were obtained. First, AAV-h56D targeted GABAergic cells with >90% specificity, and this varied with serotype and layer. Second, AAV-PHP.eB.S5E2 targeted parvalbumin-expressing neurons with up to 98% specificity. Third, the immunohistochemical detection of GABA and PV was attenuated near viral injection sites.

      Strengths:

      Vormstein-Schneider et al. (2020) tested their AAV-S5E2 vector in marmosets by intravenous injection. The data presented in this manuscript are valuable in part because they show the transduction pattern produced by intraparenchymal injections, which are more conventional and efficient.

      Our manuscript additionally provides detailed information on the laminar specificity and coverage of these viral vectors, which was not investigated in the original studies.

      Weaknesses:

      The conclusions regarding the effects of serotype are based on data from single injection tracks in a single animal. I understand that ethical and financial constraints preclude high throughput testing, but these limitations do not change what can be inferred from the measurements. The text asserts that "...serotype 9 is a better choice when high specificity and coverage across all layers are required". The data presented are consistent with this idea but do not make a strong case for it.

      We are aware of the limitations of our results on the AAV-h56D. We agree with the Reviewer that a single injection per serotype does not allow us to make strong statements about differences between the 3 serotypes. Therefore, in the revised version of the manuscript we have tempered our claims about such differences and use more caution in the interpretation of these data (Results p. 6 and Discussion p.10). Despite this weakness, we feel that these data still demonstrate high efficiency and specificity across cortical layers of transgene expression in GABA cells using the h56D promoter, at least with two of the 3 AAV serotypes we tested. We feel that in itself this is sufficiently useful information for the primate community, worthy of being reported. Due to cost, time and ethical considerations related to the use of primates, we chose not to perform additional experiments to determine precise differences among serotypes. Thus, for example, while it is possible that had we replicated these experiments, serotype 7 could have proven equally efficient and specific as the other two serotypes, we felt answering this question did not warrant additional experiments in this precious species.

      A related criticism extends to the analysis of Injection volume on viral specificity. Some replication was performed here, but reliability across injections was not reported. My understanding is that individual ROIs were treated as independent observations. These are not biological replicates (arguably, neither are multiple injection tracks in a single animal, but they are certainly closer). Idiosyncrasies between animals or injections (e.g., if one injection happened to hit one layer more than another) could have substantial impacts on the measurements. It remains unclear which results regarding injection volume or serotype would hold up had a large number of injections been made into a large number of marmosets.

      For the AAV-S5E2, we made a total of 7 injections (at least 2 at each volume), all of which, irrespective of volume, resulted in high specificity and efficiency for PV interneurons. Our conclusion is that larger volumes are slightly less specific, but the differences are minimal and do not warrant additional injections. Additionally, we kept all the other parameters across animals constant (see new Supplementary Table 1), all of our injections involved all cortical layers, and the ROIs we selected for counts encompassed reporter protein expression across all layers. To provide a better sense of the reliability of the results across injections, in the revised version of the manuscript we now provide results for each of the AAV-S5E2 injection case separately in a new Supplementary Table 2. The results in this table indicate the results are indeed rather consistent across cases with slightly greater specificity for injection volumes in the range of 105-180 nl.

      Reviewer #2 (Public Review):

      This is a straightforward manuscript assessing the specificity and efficiency of transgene expression in marmoset primary visual cortex (V1), for 4 different AAV vectors known to target transgene expression to either inhibitory cortical neurons (3 serotypes of AAV-h56D-tdTomato) or parvalbumin (PV)+ inhibitory cortical neurons in mice. Vectors are injected into the marmoset cortex and then postmortem tissue is analyzed following antibody labeling against GABA and PV. It is reported that: "in marmoset V1 AAV-h56D induces transgene expression in GABAergic cells with up to 91-94% specificity and 80% efficiency, depending on viral serotype and cortical layer. AAV-PHP.eB-S5E2 induces transgene expression in PV cells across all cortical layers with up to 98% specificity and 86-90% efficiency."

      These claims are largely supported but slightly exaggerated relative to the actual values in the results presented. In particular, the overall efficiency for the best h56D vectors described in the results is: "Overall, across all layers, AAV9 and AAV1 showed significantly higher coverage (66.1{plus minus}3.9 and 64.9%{plus minus}3.7)". The highest coverage observed is just in middle layers and is also less than 80%: "(AAV9: 78.5%{plus minus}9.1; AAV1: 76.9%{plus minus}7.4)".

      In the abstract, we indeed summarize the overall data and round up the decimals, and state that these percentages are upper bound but that they vary by serotype and layer while in the Results we report the detailed counts with decimals. To clarify this, in the revised version of the Abstract we have changed 80% to 79% and emphasize even more clearly the dependence on serotype and layer. We have amended this sentence of the Abstract as follows: “We show that in marmoset V1 AAV-h56D induces transgene expression in GABAergic cells with up to 91-94% specificity and 79% efficiency, but this depends on viral serotype and cortical layer.”

      For the AAV-PHP.eB-S5E2 the efficiency reported in the abstract (“86-90%) is also slightly exaggerated relative to the results: “Overall, across all layers coverage ranged from 78%{plus minus}1.9 for injection volumes >300nl to 81.6%{plus minus}1.8 for injection volumes of 100nl.”

      Indeed, the numbers in the Abstract are upper bounds, for example efficiency in L4A/B with S5E2 reaches 90%. To further clarify this important point, in the revised abstract we now state ”AAV-PHP.eB-S5E2 induces transgene expression in PV cells across all cortical layers with up to 98% specificity and 86-90% efficiency, depending on layer”.

      These data will be useful to others who might be interested in targeting transgene expression in these cell types in monkeys. Suggestions for improvement are to include more details about the vectors injected and to delete some comments about results that are not documented based on vectors that are not described (see below).

      Major comments:

      Details provided about the AAV vectors used with the h56D enhancer are not sufficient to allow assessment of their potential utility relative to the results presented. All that is provided is: "The fourth animal received 3 injections, each of a different AAV serotype (1, 7, and 9) of the AAV-h56D-tdTomato (Mehta et al., 2019), obtained from the Zemelman laboratory (UT Austin)." At a minimum, it is necessary to provide the titers of each of the vectors. It would also be helpful to provide more information about viral preparation for both these vectors and the AAVPHP.eB-S5E2.tdTomato. Notably, what purification methods were used, and what specific methods were used to measure the titers?

      We thank the Reviewer for this comment. In the revised version of the manuscript, we now provide a new Supplementary Table 1 with titers and other information for each viral vector injection. We also provide information regarding viral preparation in a new sections in the Methods entitled “ Viral Preparation”  (p12).

      The first paragraph of the results includes brief anecdotal claims without any data to support them and without any details about the relevant vectors that would allow any data that might have been collected to be critically assessed. These statements should be deleted. Specifically, delete: “as well as 3 different kinds of PV-specific AAVs, specifically a mixture of AAV1-PaqR4-Flp and AAV1-h56D-mCherry-FRT (Mehta et al., 2019), an AAV1-PV1-ChR2-eYFP (donated by G. Horwitz, University of Washington),” and delete “Here we report results only from those vectors that were deemed to be most promising for use in primate cortex, based on infectivity and specificity. These were the 3 serotypes of the GABA-specific pAAV-h56D-tdTomato, and the PV-specific AAVPHP.eB-S5E2.tdTomato.” These tools might in fact be just as useful or even better than what is actually tested and reported here, but maybe the viral titer was too low to expect any expression.

      These data are indeed anecdotal, but we felt this could be useful information, potentially preventing other primate labs from wasting resources, animals and time, particularly, as some of these vectors have been reported to be selective and efficient in primate cortex, which we have not been able to confirm. We made several injections in several animals of those vectors that failed either to infect a sufficient number of cells or turned out to be poorly specific. Therefore, the negative results have been consistent in our hands. But we agree with the Reviewer that our negative results could have depended on factors such as titer. In the revised version of the manuscript, following the reviewer’s suggestion, we have deleted this information.

      Based on the description in the Methods it seems that no antibody labeling against TdTomato was used to amplify the detection of the transgenes expressed from the AAV vectors. It should be verified that this is the case - a statement could be added to the Methods.

      That is indeed the case. We used no immunohistochemistry to enhance the reporter proteins as this was unnecessary. The native/ non-amplified tdT signal was strong. This is now stated in the methods (p.12).

      Reviewer #3 (Public Review):

      Summary:

      Federer et al. describe the laminar profiles of GABA+ and of PV+ neurons in marmoset V1. They also report on the selectivity and efficiency of expression of a PV-selective enhancer (S5E2). Three further viruses were tested, with a view to characterizing the expression profiles of a GABA-selective enhancer (h56d), but these results are preliminary.

      Strengths:

      The derivation of cell-type specific enhancers is key for translating the types of circuit analyses that can be performed in mice - which rely on germline modifications for access to cell-type specific manipulation - in higher-order mammals. Federer et al. further validate the utility of S5E2 as a PV-selective enhancer in NHPs.

      Additionally, the authors characterize the laminar distribution pattern of GABA+ and PV+ cells in V1. This survey may prove valuable to researchers seeking to understand and manipulate the microcircuitry mediating the excitation-inhibition balance in this region of the marmoset brain.

      Weaknesses:

      Enhancer/promoter specificity and efficiency cannot be directly compared, because they were packaged in different serotypes of AAV.

      The three different serotypes of AAV expressing reporter under the h56D promoter were only tested once each, and all in the same animal. There are many variables that can contribute to the success (or failure) of a viral injection, so observations with an n=1 cannot be considered reliable.

      This is an important point that was also brough up by Reviewer 1, which we have addressed in our reply-to-Reviewer 1. For clarity and convenience, below we copy our response to Reviewer 1.

      “We are aware of the limitations of our results on the AAV-h56D. We agree with the Reviewer that a single injection per serotype does not allow us to make strong statements about differences between the 3 serotypes. Therefore, in the revised version of the manuscript we will temper our claims about such differences and use more caution in the interpretation of these data. Despite this weakness, we feel that these data still demonstrate high efficiency and specificity across cortical layers of transgene expression in GABA cells using the h56D promoter, at least with two of the 3 AAV serotypes we tested. We feel that in itself this is sufficiently useful information for the primate community, worthy of being reported. Due to cost, time and ethical considerations related to the use of primates, we chose not to perform additional experiments to determine precise differences among serotypes. Thus, for example, while it is possible that had we replicated these experiments, serotype 7 would have proven equally efficient and specific as the other two serotypes, we felt answering this question did not warrant additional experiments in this precious species.”

      The language used throughout conflates the cell-type specificity conferred by the regulatory elements with that conferred by the serotype of the virus.

      Authors’ reply. In the revised version of the manuscript, we have corrected ambiguous language throughout.

      Recommendations for the authors

      Reviewer #1 (Recommendations For The Authors):

      My Public Review comments can be addressed by dialing down the interpretation of the data or providing appropriate caveats in the presentation of the relevant results and their discussion.

      We have done so. See text additions on p. 6 of the Results and p.10 of the Discussion.

      Minor comments:

      92% of PV+ neurons in the marmoset cortex were GABAergic. Can the authors speculate on the identity of the 8% PV+/GABA- neurons (e.g., on the basis of morphology)? Are they likely excitatory? Are they more likely to represent failures of GABA staining?

      We do not know what the other 8% of PV+/GABA- neurons are because we did not perform any other kind of IHC staining. Our best guess is that at least to some extent these represent failures of GABA staining, which is always challenging to perform in primate cortex. However, in mouse PV expression has been demonstrated in a minority of excitatory neurons.

      "Coverage of the PV-AAV was high, did not depend on injection volume.." The fact that the coverage did not depend on injection volume presumably depends, at least in part, on how ROIs were selected. Surely different volumes of injection transduce different numbers of neurons at different distances from the injection track. This should be clarified.

      The ROIs were selected at the center of the injected site/expression core from sections in which the expression region encompassed all cortical layers. Of course, larger volumes of injection resulted in larger transduced regions and therefore overall larger number of transduced neurons, but we counted cells only withing 100 µm wide ROIs at the center of the injection and the percent of transduced PV cells in this core region did not vary significantly across volumes. We have clarified the methods of ROI selection (see Methods pp. 13).

      Figure 2. What is meant by “absolute” in the legend for Figure 2? (How does “mean absolute density” differ from “mean density?”)

      We meant not relative, but this is obvious from the units, so we have removed the word “absolute” in the legend.

      Some non-significant p-values are indicated by "p>0.05" whereas others are given precisely (e.g., p = 1). Please provide precise p-values throughout. Also, the p-value from a surprisingly large number of comparisons in the first section of the results is "1". Is this due to rounding? Is it possible to get significance in a Bonferroni-corrected Kruskal-Wallis test with only 6 observations per condition?

      We now report exact p values throughout the manuscript (with a couple of exceptions where, in order to avoid reporting a large number of p values which interrupts the flow of the manuscript) we provide the upper bound value and state all those comparisons were below that value). The minimum sample size for Kruskall Wallis is 5, for each group being compared, and we our sample is 6 per group.

      Figure 3: The density of tdTomato-expressing cells appears to be greater at the AAV9 injection site than at the AAV1 injection site in the example sections shown. Might some of the differences between serotypes be due to this difference? I would imagine that resolving individual cells with certainty becomes more difficult as the amount of tdTomato expression increases.

      There was an error in the scale bar of Fig. 3C, so that the AAV1 injection site was shown at higher magnification than indicated by the wrong scale bar. Hence the density of tdTomato appeared lower than it is. Moreover, the tdT expression region shown in Fig. 3A is a merge of two sections, while it is only from a single section in panels B and C, leading to the impression of higher density of infected cells in panel A. The pipette used for the injection in panel A was not inserted perfectly vertical to the cortical surface, resulting in an injection site that did not span all layers in a single section; thus, to demonstrate that the injection indeed encompassed all layers (and that the virus infected cells in all layers), we collapsed label from two sections. We have now corrected the magnification of panel C so that it matches the scale bar in panel A, and specify in the figure legend that panel A label is from two sections.

      Text regarding Figure 3: The term “injection sizes” is confusing. I think it is intended to mean “the area over which tdTomato-expressing cells were found” but this should be clarified.

      Throughout the manuscript, we have changed the term injection site to “viral-expression region”.

      Figure 3: What were the titers of the three AAV-h56D vectors?

      Titers are now reported in the new Supplementary Table 1.

      Figure 3: The yellow box in Figure 3C is slightly larger than the yellow boxes in 3A and 3B. Is this an error or should the inset of Figure 3 have a scale bar that differs from the 50 µm scale bar in 3A?

      There were indeed errors in scale bars in this figure, which we have now corrected. Now all boxes have the same scale bar.

      Was MM423 one of the animals that received the AAV-h56D injections or one of the three that received AAV-S5E2 injection?

      This is an animal that received a 315nl injection of AAV-PHP.eB-S5E2.tdTomato. This is now specified in the Methods (see p. 12) and in the new Supplementary Table 1.

      Please provide raw cell counts and post-injection survival times for each animal.

      We now provide this information in Supplementary Tables 1 and 2.

      How were the different injection volumes of the AAV-S5E2 virus arranged by animal? Which volume of the AAV-S5E2 virus was injected into the two animals who received single injections?

      We now provide this information in Supplementary Table 1.

      Figure 6A: the point is made in the text that "[the distribution of tdT+ and PV+ neurons] did not differ significantly... peaking in L2/3 and 4C " Is the fact that the number of tdT+ and PV+ peak in layers 2/3 and 4C a consequence of these layers being thicker than the others? If so, this statement seems trivial.

      No, and this is the reason why we measured density in addition to percent of cells across layers in Figure 2. Figure 2B shows that even when measuring density, therefore normalizing by area, GABA+ and PV+ cell density still peaks in L2/3 and 4. Thus, these peaks do not simply reflect the greater thickness of these layers.

      Do the authors have permission to use data from Xu et al. 2010?

      Yes, we do.

      Reviewer #2 (Recommendations For The Authors):

      Minor comments:

      "Viral strategies to restrict gene expression to PV neurons have also been recently developed (Mehta et al., 2019; Vormstein-Schneider et al., 2020)." Mich et al. should also be cited here. Cell Rep. 2021;34(13):108754.

      We thank the reviewer for pointing out this missing references. This is now cited.

      “GABA density in L4C did not differ from any other layers, but the percent of GABA+ cells in L4C was significantly higher than in L1 (p=0.009) and 4A/B (p=<0.0001).” This and other similar observations depend on calculating the percentage of cells relative to the total number of DAPI-labeled cells in each layer. Since it is apparent that there must be considerable variability between layers, it would be helpful to add a histogram showing the densities of all DAPI-labeled cells for each layer.

      This is not how we calculated density. Density, as now clarified in the Results on p. 4, was defined as the number of cells per unit area. Counts in each layer were divided by each layers’ counting area. This corrects for differences in number of total labeled cells per layer. Therefore, reporting DAPI density is not necessary (we did not count DAPI cell density per layer).

      "Identical injection volumes of each serotype, delivered at 3 different cortical depths (see Methods), resulted in different injection sizes, suggesting the different serotypes have different capacity of infecting cortical neurons. AAV7 produced the smallest injection site, which additionally was biased to the superficial and deep layers, with only few cells expressing tdT in the middle layers (Fig. 3B). AAV9 (Fig. 3A) and AAV1 (Fig. 3C) resulted in larger injection sites and infected all cortical layers." Differences noted here might reflect either differences related to the AAV serotype or to differences in titers. Please add details about titers for each vector and add comments as appropriate. Another interpretation would be that there are differences in viral spread within the tissue.

      We have now added Supplementary Table 1 which reports titers in addition to other information about injections. The titers and volumes used for AAV9 and AAV7 were identical, while the titer for AAV1 was higher. Therefore, the differences in infectivity, particularly the much smaller expression region obtained with AAV7 cannot be attributed to titer. Likely this is due to differences in tropism and/or viral spread among serotypes. This is now discussed (see Results p. 5bottom and 6 top).

      “Recently, several viral vectors have been identified that selectively and efficiently restrict gene expression to GABAergic neurons and their subtypes across several species, but a thorough validation and characterization of these vectors in primate cortex has lacked.” Is this really a fair statement, or is the characterization presented here also lacking? Methods used by others for quantifying specificity and efficiency are essentially the same as used here. See for example Mich et al. (which is not cited).

      The original validation in primates of the vectors examined in our study was based on small tissue samples and did not examine the laminar expression profile of transgene expression induced by these enhancer-AAVs. For example, the validation of the h56D-AAV in marmoset cortex in the original paper by Mehta et al (2019) was performed on a tissue biopsy with no knowledge of which cortical layers were included in the tissue sample. The only study that shows laminar expression in primate cortex (Mich et al., which is now cited), only shows qualitative images of viral expression across layers, reporting total specificity and coverage pooled across samples; moreover, the study by Mich et al.  deals with different PV-specific enhancers than the ones characterized in our study. Unlike any of the previous studies, here we have quantified specificity and coverage across layers.

      "Specifically, we have shown that the GABA-specific AAV9-h56D (Mehta et al., 2019) induces transgene expression in GABAergic cells with up to 91-94% specificity and 80% coverage, and the PV-specific AAV-PHP.eB-S5E2 (Vormstein-Schneider et al., 2020) induces transgene expression in PV cells with up to 98% specificity and 86-90% coverage." These statements in the discussion repeat the somewhat exaggerated coverage numbers noted above for the Abstract.

      The averages across all layers are reported in the Results. The Discussion, abstract and discussion report upper limits, and this is made clear by stating “up to”, and now we have also added “depending on layer”.

      Reviewer #3 (Recommendations For The Authors):

      Abstract:

      • Ln 2: Can you be more specific about what you mean by the 'various functions of inhibition'? e.g. do you mean 'the various inhibitory influences on the local microcircuit' or similar?

      These are listed in the introduction to the paper but there is no space in the abstract to do so. Now the sentence reads: “various computational functions of…”.

      • Ln 5: 'has' to 'is'/'has been'.

      The grammar here is correct “has derived”.

      • Ln 6: humans are primates! Maybe change this to 'nonhuman primates'?

      We have added “non-human”

      • Ln n-1: 'viral vectors represent' -> 'viral vectors are'.

      We have changed it to “are”

      Intro:

      • Many readers may expect 'VIP' to be listed as the third major sub-class of interneurons. Could you note that the 5HT3a receptor-expressing group includes VIP cells?

      Done (p.3).

      • "Understanding cortical inhibitory neuron function in the primate is critical for understanding cortical function and dysfunction in the model system closest to humans" - this seems close to being circular logic (not quite, but close). Could you modify this sentence to reflect why understanding cortical function and dysfunction in NHP may be of interest?

      This sentence now reads (p.3):” Understanding cortical inhibitory neuron function in the primate is critical for understanding cortical function and dysfunction in the model system closest to humans, where cortical inhibitory neuron dysfunction has been implicated in many neurological and psychiatric disorders, such as epilepsy, schizophrenia and Alzheimer’s disease (Cheah et al., 2012; Verret et al., 2012; Mukherjee et al., 2019)”. We also note that this was already stated in the previous version of the paper but in the Discussion section which read (and still reads on p. 9 2nd paragraph): “It is important to study inhibitory neuron function in the primate, because it is unclear whether findings in mice apply to higher species, and inhibitory neuron dysfunction in humans has been implicated in several neurological and psychiatric disorders (Marin, 2012; Goldberg and Coulter, 2013; Lewis, 2014).”.

      • "In particular, two recent studies have developed recombinant adeno-associated viral vectors (AAV) that restrict gene expression to GABAergic neurons". This sentence places the emphasis on the wrong component of the technology. The fact that AAV was used is irrelevant; these constructs could equally have been packaged in a lenti, CAV, HSV, rabies, etc. The emphasis should be on the recently developed regulatory elements (the enhancers/promoters).

      Same problem with the following excerpts; this text implies that the serotype/vector confers cell-type selectivity, but the results presented do not support this assertion (the promoter/enhancer is what confers the selectivity).

      • "specifically, three serotypes of an AAV that restricts gene expression to GABAergic neurons".

      • "one serotype of an AAV that restricts gene expression to PV cells".

      • "GABA- and PV-specific AAVs".

      • "GABA-specific AAV" (in results).

      • "PV-specific AAVs".

      • "In this study, we have characterized several AAV vectors designed to restrict expression to GABAergic cells" (in discussion).

      • "GABA-virus". GABA is a NT, not a virus.

      We have modified the language in all these sections and throughout the manuscript.

      Results:

      • Enhancer specificity and efficiency cannot be directly compared, because they were packaged in different serotypes of AAV.

      We agree, and in fact we are not making comparisons between different enhancers (i.e., S5E2 and h56D).

      The three different serotypes of AAV expressing reporter under the h56D promoter were only tested once each, and all in the same animal. There are many variables that can contribute to the success (or failure) of a viral injection, so observations with an n=1 cannot be considered reliable.

      The authors need to either: (1) replicate the h56D virus injections in (at least) a second animal, or (2) rewrite the paper to focus on the AAV.PhP mDlx virus alone - for which they have adequate data - and mention the h56D data as an anecdotal result, with clear warnings about the preliminary nature of the observations due to lack of replication.

      We agree about the lack of sufficient data to make strong statements about the differences between serotypes for the h56D-AAV. In the revised version of the manuscript, following the Reviewers’ suggestion, we have chosen to temper our claims about differences between serotypes for the h56D enhancer and use more caution in the interpretation of these data. We feel that these data still demonstrate sufficiently high efficiency and specificity across cortical layers of transgene expression in GABA cells using the h56D promoter, at least with two of the 3 AAV serotypes we tested, to warrant their use in primates. Due to cost, time and ethical considerations related to the use of primates, we chose not to perform additional experiments to determine precise differences among serotypes. Thus, for example, while it is possible that had we replicated these experiments, serotype 7 could have proven equally efficient and specific as the other two serotypes, we felt answering this question did not warrant additional experiments in this precious species. Our edits in regard to this point can be found in the Results on p. 6 and Discussion on p. 10.

      • Did the authors compare h56D vs mDlx? This would be a useful and interesting comparison.

      We did not.

      • 3 tissue sections were used for analysis. How were these selected? Did the authors use a stereological approach?

      For the analysis in Fig. 2, the 3 sections were randomly selected and for the positioning of the ROIs we selected a region in dorsal V1 anterior to the posterior pole  (to avoid laminar distortions due to the curvature of the brain). This is now specified (see p. 4).

      • "both GABA+ and PV+ cells peak in layers" revise for clarity (e.g., the counts peak).

      In now reads “GABA+ and PV+ cell percent and density” (see p.4).

      • "we refer to this virus as GABA-AAV" these are 3 different viruses!

      The idea here was to use an abbreviation instead of using the full viral name every single time. Clearly the reviewer does not like this, so we have removed this convention throughout the paper and now specify the entire viral name each time.

      • "Identical injection volumes of each serotype, delivered at 3 different cortical depths (see Methods), resulted in different injection sizes". Do you mean 'resulted in different volumes of expression'?

      Yes. We have now rephrased this as follows: “…resulted in viral expression regions that differed in both size as well as laminar distribution” (p.5).

      • “suggesting the different serotypes have different capacity of infecting cortical neurons”. You can’t draw any firm conclusions from a single injection. The rest of this section of the results, along with the whole of Figure 4, and Figure 7a-d, is in danger of being misleading. Please remove. The best you can do here is to say ‘we injected 3 different viruses that express reporter under the h56D promoter. The results are shown in Figure 3, but these are anecdotal, as only a single injection of each virus was performed’. You could then note in the discussion to what extent these results are consistent with the existing literature (e.g., AAV9 often produces good coverage in NHP – anterograde and retrograde, AAV1 also works well in the CNS, although generally doesn’t infect as aggressively as AAV9. I’m not familiar with any attempts to use AAV7).

      With respect to Fig. 4, our approach in the revised version is detailed above. For convenience we copy it below here. With respect to Fig 7A-D, we feel the results are more robust as the data from the 3 serotypes here were pooled together, as the 3 serotype similarly downregulated GABA and PV expression at the injection site, and we do not make any statement about differences among serotypes for the data shown in Fig. 7A-D.

      “In the revised version of the manuscript, following the Reviewer ’s suggestion, we have chosen to temper our claims about differences between serotypes for the h56D enhancer and use more caution in the interpretation of these data (see revised text in the Results on p. 6 and in the Discussion on p. 10). We feel that these data still demonstrate sufficiently high efficiency and specificity across cortical layers of transgene expression in GABA cells using the h56D promoter, at least with two of the 3 AAV serotypes we tested, to warrant their use in primates. Due to cost, time and ethical considerations related to the use of primates, we chose not to perform additional experiments to determine precise differences among serotypes. Thus, for example, while it is possible that had we replicated these experiments, serotype 7 could have proven equally efficient and specific as the other two serotypes, we felt answering this question did not warrant additional experiments in this precious species.”

      • Figure 3: why the large variation in tissue quality? Are the 3 upper images taken at the same magnification? If not, they need different scale bars. The cells in A (upper row) look much smaller than those in B and C, and the size of the 'inset' box varies.

      We thank the reviewer for noticing this. We discovered an error in the scale bar of Fig. 3C, so that the AAV1 injection site was shown at higher magnification than indicated by the wrong scale bar. We have now corrected the error in scale bars. We have also fixed the different box sizes.

      • "Overall, across all layers coverage ranged from 78%{plus minus}1.9 for injection volumes >300nl to 81.6%{plus minus}1.8 for injection volumes of 100nl." Coverage didn't differ between layers, so revise this to: "Overall, across all layers coverage ranged from 78% to 81.6%." or give an overall mean (~80%).

      We have corrected the sentence as suggested by the Reviewer (see p. 8 first paragraph).

      • "extending farther from the borders" -> "extending beyond the borders".

      We have corrected the sentence as suggested by the Reviewer (see p. 8).

      • "The reduced GABA and PV immunoreactivity caused by the viruses implies that the specificity of the viruses we have validated in this study is likely higher than estimated". Yes, but for balance you should also note that they may harm the physiology of the cell.

      We have added a sentence acknowledging this to the Discussion. Specifically, on p. 10, we now state: “However, this reduced immunoreactivity raises concerns about the virus or high levels of reporter protein possibly harming the cell physiology.”

      Discussion:

      • "but a thorough validation and characterization of these vectors in primate cortex has lacked" better to say "has been limited", because Dimidschstein 2016 (marmoset V1) and Vormstein-schneider 2020 (macaque S1 and PFC) both reported expression in NHP.

      We have added the following sentence to this paragraph of the Discussion. “In particular, previous studies have not characterized the specificity and coverage of these vectors across cortical layers.”(see p. 8).

      • "whether finding in mice" -> 'whether findings in mice'.

      Corrected, thanks.

      • The discussion re: species differences is missing reference to Kreinen 2020 (10.1038/s41586-020-2781-z).

      This reference has been added. Thanks.

      • “Injections of about 200nl volume resulted in higher specificity (95% across layers) and coverage” – this is misleading. The coverage was not statistically different among injection volumes.

      We have added the following sentence: ”although coverage did not differ significantly across volumes.” (see p. 10).

      • "it is possible that subtle alteration of the cortical circuit upon parenchymal injection of viruses (including AAVs) leads to alteration of activity-dependent expression of PV and GABA." Or (and I would argue, more likely) the expression of large quantities of your big reporter protein compromised the function of the cell, leading to reduced expression of native proteins. You don't mention any IHC to amplify the RFP signal, so I'm assuming that your images are of direct expression. If so, you are expressing A LOT of reporter protein.

      We have added a sentence acknowledging this to the Discussion. Specifically, on p. 10, we now state: “However, this reduced immunoreactivity raises concerns about the virus or high levels of reporter protein possibly harming the cell physiology.”

      Methods:

      • It's difficult to piece together which viruses were injected in which monkeys, at what volumes, and at what titer. Please compile this info into a table for ease of reference (including any other relevant parameters).

      We now provide a Supplementary Table 1.

    1. The Gilgamesh Epic is the most notable literary product of Babylonia as yet discovered in the mounds of Mesopotamia. It recounts the exploits and adventures of a favorite hero, and in its final form covers twelve tablets, each tablet consisting of six columns (three on the obverse and three on the reverse) of about 50 lines for each column, or a total of about 3600 lines. Of this total, however, barely more than one-half has been found among the remains of the great collection of cuneiform tablets gathered by King Ashurbanapal (668–626 B.C.) in his palace at Nineveh, and discovered by Layard in 18541 in the course of his excavations of the mound Kouyunjik (opposite Mosul). The fragments of the epic painfully gathered—chiefly by George Smith—from the circa 30,000 tablets and bits of tablets brought to the British Museum were published in model form by Professor Paul Haupt;2 and that edition still remains the primary source for our study of the Epic. [10] For the sake of convenience we may call the form of the Epic in the fragments from the library of Ashurbanapal the Assyrian version, though like most of the literary productions in the library it not only reverts to a Babylonian original, but represents a late copy of a much older original. The absence of any reference to Assyria in the fragments recovered justifies us in assuming that the Assyrian version received its present form in Babylonia, perhaps in Erech; though it is of course possible that some of the late features, particularly the elaboration of the teachings of the theologians or schoolmen in the eleventh and twelfth tablets, may have been produced at least in part under Assyrian influence. A definite indication that the Gilgamesh Epic reverts to a period earlier than Hammurabi (or Hammurawi)3 i.e., beyond 2000 B. C., was furnished by the publication of a text clearly belonging to the first Babylonian dynasty (of which Hammurabi was the sixth member) in CT. VI, 5; which text Zimmern4 recognized as a part of the tale of Atra-ḫasis, one of the names given to the survivor of the deluge, recounted on the eleventh tablet of the Gilgamesh Epic.5 This was confirmed by the discovery6 of a [11]fragment of the deluge story dated in the eleventh year of Ammisaduka, i.e., c. 1967 B.C. In this text, likewise, the name of the deluge hero appears as Atra-ḫasis (col. VIII, 4).7 But while these two tablets do not belong to the Gilgamesh Epic and merely introduce an episode which has also been incorporated into the Epic, Dr. Bruno Meissner in 1902 published a tablet, dating, as the writing and the internal evidence showed, from the Hammurabi period, which undoubtedly is a portion of what by way of distinction we may call an old Babylonian version.8 It was picked up by Dr. Meissner at a dealer’s shop in Bagdad and acquired for the Berlin Museum. The tablet consists of four columns (two on the obverse and two on the reverse) and deals with the hero’s wanderings in search of a cure from disease with which he has been smitten after the death of his companion Enkidu. The hero fears that the disease will be fatal and longs to escape death. It corresponds to a portion of Tablet X of the Assyrian version. Unfortunately, only the lower portion of the obverse and the upper of the reverse have been preserved (57 lines in all); and in default of a colophon we do not know the numeration of the tablet in this old Babylonian edition. Its chief value, apart from its furnishing a proof for the existence of the Epic as early as 2000 B. C., lies (a) in the writing Gish instead of Gish-gi(n)-mash in the Assyrian version, for the name of the hero, (b) in the writing En-ki-dũ—abbreviated from dũg—() “Enki is good” for En-ki-dú () in the Assyrian version,9 and (c) in the remarkable address of the maiden Sabitum, dwelling at the seaside, to whom Gilgamesh comes in the course of his wanderings. From the Assyrian version we know that the hero tells the maiden of his grief for his lost companion, and of his longing to escape the dire fate of Enkidu. In the old Babylonian fragment the answer of Sabitum is given in full, and the sad note that it strikes, showing how hopeless it is for man to try to escape death which is in store for all mankind, is as remarkable as is the philosophy of “eat, drink and be merry” which Sabitum imparts. The address indicates how early the tendency arose to attach to ancient tales the current religious teachings. [12] “Why, O Gish, does thou run about? The life that thou seekest, thou wilt not find. When the gods created mankind, Death they imposed on mankind; Life they kept in their power. Thou, O Gish, fill thy belly, Day and night do thou rejoice, Daily make a rejoicing! Day and night a renewal of jollification! Let thy clothes be clean, Wash thy head and pour water over thee! Care for the little one who takes hold of thy hand! Let the wife rejoice in thy bosom!” Such teachings, reminding us of the leading thought in the Biblical Book of Ecclesiastes,10 indicate the didactic character given to ancient tales that were of popular origin, but which were modified and elaborated under the influence of the schools which arose in connection with the Babylonian temples. The story itself belongs, therefore, to a still earlier period than the form it received in this old Babylonian version. The existence of this tendency at so early a date comes to us as a genuine surprise, and justifies the assumption that the attachment of a lesson to the deluge story in the Assyrian version, to wit, the limitation in attainment of immortality to those singled out by the gods as exceptions, dates likewise from the old Babylonian period. The same would apply to the twelfth tablet, which is almost entirely didactic, intended to illustrate the impossibility of learning anything of the fate of those who have passed out of this world. It also emphasizes the necessity of contenting oneself with the comfort that the care of the dead, by providing burial and food and drink offerings for them affords, as the only means of ensuring for them rest and freedom from the pangs of hunger and distress. However, it is of course possible that the twelfth tablet, which impresses one as a supplement to the adventures of Gilgamesh, ending with his return to Uruk (i.e., Erech) at the close of the eleventh tablet, may represent a later elaboration of the tendency to connect religious teachings with the exploits of a favorite hero. [13] We now have further evidence both of the extreme antiquity of the literary form of the Gilgamesh Epic and also of the disposition to make the Epic the medium of illustrating aspects of life and the destiny of mankind. The discovery by Dr. Arno Poebel of a Sumerian form of the tale of the descent of Ishtar to the lower world and her release11—apparently a nature myth to illustrate the change of season from summer to winter and back again to spring—enables us to pass beyond the Akkadian (or Semitic) form of tales current in the Euphrates Valley to the Sumerian form. Furthermore, we are indebted to Dr. Langdon for the identification of two Sumerian fragments in the Nippur Collection which deal with the adventures of Gilgamesh, one in Constantinople,12 the other in the collection of the University of Pennsylvania Museum.13 The former, of which only 25 lines are preserved (19 on the obverse and 6 on the reverse), appears to be a description of the weapons of Gilgamesh with which he arms himself for an encounter—presumably the encounter with Ḫumbaba or Ḫuwawa, the ruler of the cedar forest in the mountain.14 The latter deals with the building operations of Gilgamesh in the city of Erech. A text in Zimmern’s Sumerische Kultlieder aus altbabylonischer Zeit (Leipzig, 1913), No. 196, appears likewise to be a fragment of the Sumerian version of the Gilgamesh Epic, bearing on the episode of Gilgamesh’s and Enkidu’s relations to the goddess Ishtar, covered in the sixth and seventh tablets of the Assyrian version.15 Until, however, further fragments shall have turned up, it would be hazardous to institute a comparison between the Sumerian and the Akkadian versions. All that can be said for the present is that there is every reason to believe in the existence of a literary form of the Epic in Sumerian which presumably antedated the Akkadian recension, [14]just as we have a Sumerian form of Ishtar’s descent into the nether world, and Sumerian versions of creation myths, as also of the Deluge tale.16 It does not follow, however, that the Akkadian versions of the Gilgamesh Epic are translations of the Sumerian, any more than that the Akkadian creation myths are translations of a Sumerian original. Indeed, in the case of the creation myths, the striking difference between the Sumerian and Akkadian views of creation17 points to the independent production of creation stories on the part of the Semitic settlers of the Euphrates Valley, though no doubt these were worked out in part under Sumerian literary influences. The same is probably true of Deluge tales, which would be given a distinctly Akkadian coloring in being reproduced and steadily elaborated by the Babylonian literati attached to the temples. The presumption is, therefore, in favor of an independent literary origin for the Semitic versions of the Gilgamesh Epic, though naturally with a duplication of the episodes, or at least of some of them, in the Sumerian narrative. Nor does the existence of a Sumerian form of the Epic necessarily prove that it originated with the Sumerians in their earliest home before they came to the Euphrates Valley. They may have adopted it after their conquest of southern Babylonia from the Semites who, there are now substantial grounds for believing, were the earlier settlers in the Euphrates Valley.18 We must distinguish, therefore, between the earliest literary form, which was undoubtedly Sumerian, and the origin of the episodes embodied in the Epic, including the chief actors, Gilgamesh and his companion Enkidu. It will be shown that one of the chief episodes, the encounter of the two heroes with a powerful guardian or ruler of a cedar forest, points to a western region, more specifically to Amurru, as the scene. The names of the two chief actors, moreover, appear to have been “Sumerianized” by an artificial process,19 and if this view turns out to be [15]correct, we would have a further ground for assuming the tale to have originated among the Akkadian settlers and to have been taken over from them by the Sumerians. New light on the earliest Babylonian version of the Epic, as well as on the Assyrian version, has been shed by the recovery of two substantial fragments of the form which the Epic had assumed in Babylonia in the Hammurabi period. The study of this important new material also enables us to advance the interpretation of the Epic and to perfect the analysis into its component parts. In the spring of 1914, the Museum of the University of Pennsylvania acquired by purchase a large tablet, the writing of which as well as the style and the manner of spelling verbal forms and substantives pointed distinctly to the time of the first Babylonian dynasty. The tablet was identified by Dr. Arno Poebel as part of the Gilgamesh Epic; and, as the colophon showed, it formed the second tablet of the series. He copied it with a view to publication, but the outbreak of the war which found him in Germany—his native country—prevented him from carrying out this intention.20 He, however, utilized some of its contents in his discussion of the historical or semi-historical traditions about Gilgamesh, as revealed by the important list of partly mythical and partly historical dynasties, found among the tablets of the Nippur collection, in which Gilgamesh occurs21 as a King of an Erech dynasty, whose father was Â, a priest of Kulab.22 The publication of the tablet was then undertaken by Dr. Stephen Langdon in monograph form under the title, “The Epic of Gilgamish.”23 In a preliminary article on the tablet in the Museum Journal, Vol. VIII, pages 29–38, Dr. Langdon took the tablet to be of the late [16]Persian period (i.e., between the sixth and third century B. C.), but his attention having been called to this error of some 1500 years, he corrected it in his introduction to his edition of the text, though he neglected to change some of his notes in which he still refers to the text as “late.”24 In addition to a copy of the text, accompanied by a good photograph, Dr. Langdon furnished a transliteration and translation with some notes and a brief introduction. The text is unfortunately badly copied, being full of errors; and the translation is likewise very defective. A careful collation with the original tablet was made with the assistance of Dr. Edward Chiera, and as a consequence we are in a position to offer to scholars a correct text. We beg to acknowledge our obligations to Dr. Gordon, the Director of the Museum of the University of Pennsylvania, for kindly placing the tablet at our disposal. Instead of republishing the text, I content myself with giving a full list of corrections in the appendix to this volume which will enable scholars to control our readings, and which will, I believe, justify the translation in the numerous passages in which it deviates from Dr. Langdon’s rendering. While credit should be given to Dr. Langdon for having made this important tablet accessible, the interests of science demand that attention be called to his failure to grasp the many important data furnished by the tablet, which escaped him because of his erroneous readings and faulty translations. The tablet, consisting of six columns (three on the obverse and three on the reverse), comprised, according to the colophon, 240 lines25 and formed the second tablet of the series. Of the total, 204 lines are preserved in full or in part, and of the missing thirty-six quite a number can be restored, so that we have a fairly complete tablet. The most serious break occurs at the top of the reverse, where about eight lines are missing. In consequence of this the connection between the end of the obverse (where about five lines are missing) and the beginning of the reverse is obscured, though not to the extent of our entirely losing the thread of the narrative. [17] About the same time that the University of Pennsylvania Museum purchased this second tablet of the Gilgamesh Series, Yale University obtained a tablet from the same dealer, which turned out to be a continuation of the University of Pennsylvania tablet. That the two belong to the same edition of the Epic is shown by their agreement in the dark brown color of the clay, in the writing as well as in the size of the tablet, though the characters on the Yale tablet are somewhat cramped and in consequence more difficult to read. Both tablets consist of six columns, three on the obverse and three on the reverse. The measurements of both are about the same, the Pennsylvania tablet being estimated at about 7 inches high, as against 72/16 inches for the Yale tablet, while the width of both is 6½ inches. The Yale tablet is, however, more closely written and therefore has a larger number of lines than the Pennsylvania tablet. The colophon to the Yale tablet is unfortunately missing, but from internal evidence it is quite certain that the Yale tablet follows immediately upon the Pennsylvania tablet and, therefore, may be set down as the third of the series. The obverse is very badly preserved, so that only a general view of its contents can be secured. The reverse contains serious gaps in the first and second columns. The scribe evidently had a copy before him which he tried to follow exactly, but finding that he could not get all of the copy before him in the six columns, he continued the last column on the edge. In this way we obtain for the sixth column 64 lines as against 45 for column IV, and 47 for column V, and a total of 292 lines for the six columns. Subtracting the 16 lines written on the edge leaves us 276 lines for our tablet as against 240 for its companion. The width of each column being the same on both tablets, the difference of 36 lines is made up by the closer writing. Both tablets have peculiar knobs at the sides, the purpose of which is evidently not to facilitate holding the tablet in one’s hand while writing or reading it, as Langdon assumed26 (it would be quite impracticable for this purpose), but simply to protect the tablet in its position on a shelf, where it would naturally be placed on the edge, just as we arrange books on a shelf. Finally be it noted that these two tablets of the old Babylonian version do not belong to the same edition as the Meissner tablet above described, for the latter consists [18]of two columns each on obverse and reverse, as against three columns each in the case of our two tablets. We thus have the interesting proof that as early as 2000 B.C. there were already several editions of the Epic. As to the provenance of our two tablets, there are no definite data, but it is likely that they were found by natives in the mounds at Warka, from which about the year 1913, many tablets came into the hands of dealers. It is likely that where two tablets of a series were found, others of the series were also dug up, and we may expect to find some further portions of this old Babylonian version turning up in the hands of other dealers or in museums. Coming to the contents of the two tablets, the Pennsylvania tablet deals with the meeting of the two heroes, Gilgamesh and Enkidu, their conflict, followed by their reconciliation, while the Yale tablet in continuation takes up the preparations for the encounter of the two heroes with the guardian of the cedar forest, Ḫumbaba—but probably pronounced Ḫubaba27—or, as the name appears in the old Babylonian version, Ḫuwawa. The two tablets correspond, therefore, to portions of Tablets I to V of the Assyrian version;28 but, as will be shown in detail further on, the number of completely parallel passages is not large, and the Assyrian version shows an independence of the old Babylonian version that is larger than we had reason to expect. In general, it may be said that the Assyrian version is more elaborate, which points to its having received its present form at a considerably later period than the old Babylonian version.29 On the other hand, we already find in the Babylonian version the tendency towards repetition, which is characteristic of Babylonian-Assyrian tales in general. Through the two Babylonian tablets we are enabled to fill out certain details [19]of the two episodes with which they deal: (1) the meeting of Gilgamesh and Enkidu, and (2) the encounter with Ḫuwawa; while their greatest value consists in the light that they throw on the gradual growth of the Epic until it reached its definite form in the text represented by the fragments in Ashurbanapal’s Library. Let us now take up the detailed analysis, first of the Pennsylvania tablet and then of the Yale tablet. The Pennsylvania tablet begins with two dreams recounted by Gilgamesh to his mother, which the latter interprets as presaging the coming of Enkidu to Erech. In the one, something like a heavy meteor falls from heaven upon Gilgamesh and almost crushes him. With the help of the heroes of Erech, Gilgamesh carries the heavy burden to his mother Ninsun. The burden, his mother explains, symbolizes some one who, like Gilgamesh, is born in the mountains, to whom all will pay homage and of whom Gilgamesh will become enamoured with a love as strong as that for a woman. In a second dream, Gilgamesh sees some one who is like him, who brandishes an axe, and with whom he falls in love. This personage, the mother explains, is again Enkidu. Langdon is of the opinion that these dreams are recounted to Enkidu by a woman with whom Enkidu cohabits for six days and seven nights and who weans Enkidu from association with animals. This, however, cannot be correct. The scene between Enkidu and the woman must have been recounted in detail in the first tablet, as in the Assyrian version,30 whereas here in the second tablet we have the continuation of the tale with Gilgamesh recounting his dreams directly to his mother. The story then continues with the description of the coming of Enkidu, conducted by the woman to the outskirts of Erech, where food is given him. The main feature of the incident is the conversion of Enkidu to civilized life. Enkidu, who hitherto had gone about naked, is clothed by the woman. Instead of sucking milk and drinking from a trough like an animal, food and strong drink are placed before him, and he is taught how to eat and drink in human fashion. In human fashion he also becomes drunk, and his “spree” is naïvely described: “His heart became glad and his face shone.”31 [20]Like an animal, Enkidu’s body had hitherto been covered with hair, which is now shaved off. He is anointed with oil, and clothed “like a man.” Enkidu becomes a shepherd, protecting the fold against wild beasts, and his exploit in dispatching lions is briefly told. At this point—the end of column 3 (on the obverse), i.e., line 117, and the beginning of column 4 (on the reverse), i.e., line 131—a gap of 13 lines—the tablet is obscure, but apparently the story of Enkidu’s gradual transformation from savagery to civilized life is continued, with stress upon his introduction to domestic ways with the wife chosen or decreed for him, and with work as part of his fate. All this has no connection with Gilgamesh, and it is evident that the tale of Enkidu was originally an independent tale to illustrate the evolution of man’s career and destiny, how through intercourse with a woman he awakens to the sense of human dignity, how he becomes accustomed to the ways of civilization, how he passes through the pastoral stage to higher walks of life, how the family is instituted, and how men come to be engaged in the labors associated with human activities. In order to connect this tale with the Gilgamesh story, the two heroes are brought together; the woman taking on herself, in addition to the rôle of civilizer, that of the medium through which Enkidu is brought to Gilgamesh. The woman leads Enkidu from the outskirts of Erech into the city itself, where the people on seeing him remark upon his likeness to Gilgamesh. He is the very counterpart of the latter, though somewhat smaller in stature. There follows the encounter between the two heroes in the streets of Erech, where they engage in a fierce combat. Gilgamesh is overcome by Enkidu and is enraged at being thrown to the ground. The tablet closes with the endeavor of Enkidu to pacify Gilgamesh. Enkidu declares that the mother of Gilgamesh has exalted her son above the ordinary mortal, and that Enlil himself has singled him out for royal prerogatives. After this, we may assume, the two heroes become friends and together proceed to carry out certain exploits, the first of which is an attack upon the mighty guardian of the cedar forest. This is the main episode in the Yale tablet, which, therefore, forms the third tablet of the old Babylonian version. In the first column of the obverse of the Yale tablet, which is badly preserved, it would appear that the elders of Erech (or perhaps the people) are endeavoring to dissuade Gilgamesh from making the [21]attempt to penetrate to the abode of Ḫuwawa. If this is correct, then the close of the first column may represent a conversation between these elders and the woman who accompanies Enkidu. It would be the elders who are represented as “reporting the speech to the woman,” which is presumably the determination of Gilgamesh to fight Ḫuwawa. The elders apparently desire Enkidu to accompany Gilgamesh in this perilous adventure, and with this in view appeal to the woman. In the second column after an obscure reference to the mother of Gilgamesh—perhaps appealing to the sun-god—we find Gilgamesh and Enkidu again face to face. From the reference to Enkidu’s eyes “filled with tears,” we may conclude that he is moved to pity at the thought of what will happen to Gilgamesh if he insists upon carrying out his purpose. Enkidu, also, tries to dissuade Gilgamesh. This appears to be the main purport of the dialogue between the two, which begins about the middle of the second column and extends to the end of the third column. Enkidu pleads that even his strength is insufficient, “My arms are lame, My strength has become weak.” (lines 88–89) Gilgamesh apparently asks for a description of the terrible tyrant who thus arouses the fear of Enkidu, and in reply Enkidu tells him how at one time, when he was roaming about with the cattle, he penetrated into the forest and heard the roar of Ḫuwawa which was like that of a deluge. The mouth of the tyrant emitted fire, and his breath was death. It is clear, as Professor Haupt has suggested,32 that Enkidu furnishes the description of a volcano in eruption, with its mighty roar, spitting forth fire and belching out a suffocating smoke. Gilgamesh is, however, undaunted and urges Enkidu to accompany him in the adventure. “I will go down to the forest,” says Gilgamesh, if the conjectural restoration of the line in question (l. 126) is correct. Enkidu replies by again drawing a lurid picture of what will happen “When we go (together) to the forest…….” This speech of Enkidu is continued on the reverse. In reply Gilgamesh emphasizes his reliance upon the good will of Shamash and reproaches Enkidu with cowardice. He declares himself superior to Enkidu’s warning, and in bold terms [22]says that he prefers to perish in the attempt to overcome Ḫuwawa rather than abandon it. “Wherever terror is to be faced, Thou, forsooth, art in fear of death. Thy prowess lacks strength. I will go before thee, Though thy mouth shouts to me: ‘thou art afraid to approach,’ If I fall, I will establish my name.” (lines 143–148) There follows an interesting description of the forging of the weapons for the two heroes in preparation for the encounter.33 The elders of Erech when they see these preparations are stricken with fear. They learn of Ḫuwawa’s threat to annihilate Gilgamesh if he dares to enter the cedar forest, and once more try to dissuade Gilgamesh from the undertaking. “Thou art young, O Gish, and thy heart carries thee away, Thou dost not know what thou proposest to do.” (lines 190–191) They try to frighten Gilgamesh by repeating the description of the terrible Ḫuwawa. Gilgamesh is still undaunted and prays to his patron deity Shamash, who apparently accords him a favorable “oracle” (têrtu). The two heroes arm themselves for the fray, and the elders of Erech, now reconciled to the perilous undertaking, counsel Gilgamesh to take provision along for the undertaking. They urge Gilgamesh to allow Enkidu to take the lead, for “He is acquainted with the way, he has trodden the road [to] the entrance of the forest.” (lines 252–253) The elders dismiss Gilgamesh with fervent wishes that Enkidu may track out the “closed path” for Gilgamesh, and commit him to the care of Lugalbanda—here perhaps an epithet of Shamash. They advise Gilgamesh to perform certain rites, to wash his feet in the stream of Ḫuwawa and to pour out a libation of water to Shamash. Enkidu follows in a speech likewise intended to encourage the hero; and with the actual beginning of the expedition against Ḫuwawa the tablet ends. The encounter itself, with the triumph of the two heroes, must have been described in the fourth tablet. [23] Now before taking up the significance of the additions to our knowledge of the Epic gained through these two tablets, it will be well to discuss the forms in which the names of the two heroes and of the ruler of the cedar forest occur in our tablets. As in the Meissner fragment, the chief hero is invariably designated as dGish in both the Pennsylvania and Yale tablets; and we may therefore conclude that this was the common form in the Hammurabi period, as against the writing dGish-gì(n)-mash34 in the Assyrian version. Similarly, as in the Meissner fragment, the second hero’s name is always written En-ki-dũ35 (abbreviated from dúg) as against En-ki-dú in the Assyrian version. Finally, we encounter in the Yale tablet for the first time the writing Ḫu-wa-wa as the name of the guardian of the cedar forest, as against Ḫum-ba-ba in the Assyrian version, though in the latter case, as we may now conclude from the Yale tablet, the name should rather be read Ḫu-ba-ba.36 The variation in the writing of the latter name is interesting as pointing to the aspirate pronunciation of the labial in both instances. The name would thus present a complete parallel to the Hebrew name Ḫowawa (or Ḫobab) who appears as the brother-in-law of Moses in the P document, Numbers 10, 29.37 Since the name also occurs, written precisely as in the Yale tablet, among the “Amoritic” names in the important lists published by Dr. Chiera,38 there can be no doubt that [24]Ḫuwawa or Ḫubaba is a West Semitic name. This important fact adds to the probability that the “cedar forest” in which Ḫuwawa dwells is none other than the Lebanon district, famed since early antiquity for its cedars. This explanation of the name Ḫuwawa disposes of suppositions hitherto brought forward for an Elamitic origin. Gressmann39 still favors such an origin, though realizing that the description of the cedar forest points to the Amanus or Lebanon range. In further confirmation of the West Semitic origin of the name, we have in Lucian, De Dea Syria, § 19, the name Kombabos40 (the guardian of Stratonika), which forms a perfect parallel to Ḫu(m)baba. Of the important bearings of this western character of the name Ḫuwawa on the interpretation and origin of the Gilgamesh Epic, suggesting that the episode of the encounter between the tyrant and the two heroes rests upon a tradition of an expedition against the West or Amurru land, we shall have more to say further on. The variation in the writing of the name Enkidu is likewise interesting. It is evident that the form in the old Babylonian version with the sign dũ (i.e., dúg) is the original, for it furnishes us with a suitable etymology “Enki is good.” The writing with dúg, pronounced dū, also shows that the sign dú as the third element in the form which the name has in the Assyrian version is to be read dú, and that former readings like Ea-bani must be definitely abandoned.41 The form with dú is clearly a phonetic writing of the Sumerian name, the sign dú being chosen to indicate the pronunciation (not the ideograph) of the third element dúg. This is confirmed by the writing En-gi-dú in the syllabary CT XVIII, 30, 10. The phonetic writing is, therefore, a warning against any endeavor to read the name by an Akkadian transliteration of the signs. This would not of itself prove that Enkidu is of Sumerian origin, for it might well be that the writing En-ki-dú is an endeavor to give a Sumerian aspect to a name that may have been foreign. The element dúg corresponds to the Semitic ṭâbu, “good,” and En-ki being originally a designation of a deity as the “lord of the land,” which would be the Sumerian [25]manner of indicating a Semitic Baal, it is not at all impossible that En-ki-dúg may be the “Sumerianized” form of a Semitic בַּעל טזֹב “Baal is good.” It will be recalled that in the third column of the Yale tablet, Enkidu speaks of himself in his earlier period while still living with cattle, as wandering into the cedar forest of Ḫuwawa, while in another passage (ll. 252–253) he is described as “acquainted with the way … to the entrance of the forest.” This would clearly point to the West as the original home of Enkidu. We are thus led once more to Amurru—taken as a general designation of the West—as playing an important role in the Gilgamesh Epic.42 If Gilgamesh’s expedition against Ḫuwawa of the Lebanon district recalls a Babylonian campaign against Amurru, Enkidu’s coming from his home, where, as we read repeatedly in the Assyrian version, “He ate herbs with the gazelles, Drank out of a trough with cattle,”43 may rest on a tradition of an Amorite invasion of Babylonia. The fight between Gilgamesh and Enkidu would fit in with this tradition, while the subsequent reconciliation would be the form in which the tradition would represent the enforced union between the invaders and the older settlers. Leaving this aside for the present, let us proceed to a consideration of the relationship of the form dGish, for the chief personage in the Epic in the old Babylonian version, to dGish-gi(n)-mash in the Assyrian version. Of the meaning of Gish there is fortunately no doubt. It is clearly the equivalent to the Akkadian zikaru, “man” (Brünnow No. 5707), or possibly rabû, “great” (Brünnow No. 5704). Among various equivalents, the preference is to be given to itlu, “hero.” The determinative for deity stamps the person so designated as deified, or as in part divine, and this is in accord with the express statement in the Assyrian version of the Gilgamesh Epic which describes the hero as “Two-thirds god and one-third human.”44 [26]Gish is, therefore, the hero-god par excellence; and this shows that we are not dealing with a genuine proper name, but rather with a descriptive attribute. Proper names are not formed in this way, either in Sumerian or Akkadian. Now what relation does this form Gish bear to as the name of the hero is invariably written in the Assyrian version, the form which was at first read dIz-tu-bar or dGish-du-bar by scholars, until Pinches found in a neo-Babylonian syllabary45 the equation of it with Gi-il-ga-mesh? Pinches’ discovery pointed conclusively to the popular pronunciation of the hero’s name as Gilgamesh; and since Aelian (De natura Animalium XII, 2) mentions a Babylonian personage Gilgamos (though what he tells us of Gilgamos does not appear in our Epic, but seems to apply to Etana, another figure of Babylonian mythology), there seemed to be no further reason to question that the problem had been solved. Besides, in a later Syriac list of Babylonian kings found in the Scholia of Theodor bar Koni, the name גלמגום with a variant גמיגמוס occurs,46 and it is evident that we have here again the Gi-il-ga-mesh, discovered by Pinches. The existence of an old Babylonian hero Gilgamesh who was likewise a king is thus established, as well as his identification with It is evident that we cannot read this name as Iz-tu-bar or Gish-du-bar, but that we must read the first sign as Gish and the third as Mash, while for the second we must assume a reading Gìn or Gi. This would give us Gish-gì(n)-mash which is clearly again (like En-ki-dú) not an etymological writing but a phonetic one, intended to convey an approach to the popular pronunciation. Gi-il-ga-mesh might well be merely a variant for Gish-ga-mesh, or vice versa, and this would come close to Gish-gi-mash. Now, when we have a name the pronunciation of which is not definite but approximate, and which is written in various ways, the probabilities are that the name is foreign. A foreign name might naturally be spelled in various ways. The [27]Epic in the Assyrian version clearly depicts dGish-gì(n)-mash as a conqueror of Erech, who forces the people into subjection, and whose autocratic rule leads the people of Erech to implore the goddess Aruru to create a rival to him who may withstand him. In response to this appeal dEnkidu is formed out of dust by Aruru and eventually brought to Erech.47 Gish-gì(n)-mash or Gilgamesh is therefore in all probability a foreigner; and the simplest solution suggested by the existence of the two forms (1) Gish in the old Babylonian version and (2) Gish-gì(n)-mash in the Assyrian version, is to regard the former as an abbreviation, which seemed appropriate, because the short name conveyed the idea of the “hero” par excellence. If Gish-gì(n)-mash is a foreign name, one would think in the first instance of Sumerian; but here we encounter a difficulty in the circumstance that outside of the Epic this conqueror and ruler of Erech appears in quite a different form, namely, as dGish-bil-ga-mesh, with dGish-gibil(or bìl)-ga-mesh and dGish-bil-ge-mesh as variants.48 In the remarkable list of partly mythological and partly historical dynasties, published by Poebel,49 the fifth member of the first dynasty of Erech appears as dGish-bil-ga-mesh; and similarly in an inscription of the days of Sin-gamil, dGish-bil-ga-mesh is mentioned as the builder of the wall of Erech.50 Moreover, in the several fragments of the Sumerian version of the Epic we have invariably the form dGish-bil-ga-mesh. It is evident, therefore, that this is the genuine form of the name in Sumerian and presumably, therefore, the oldest form. By way of further confirmation we have in the syllabary above referred to, CT, XVIII, 30, 6–8, three designations of our hero, viz: dGish-gibil(or bíl)-ga-mesh muḳ-tab-lu (“warrior”) a-lik pa-na (“leader”) All three designations are set down as the equivalent of the Sumerian Esigga imin i.e., “the seven-fold hero.” [28] Of the same general character is the equation in another syllabary:51 Esigga-tuk and its equivalent Gish-tuk = “the one who is a hero.” Furthermore, the name occurs frequently in “Temple” documents of the Ur dynasty in the form dGish-bil-ga-mesh52 with dGish-bil-gi(n)-mesh as a variant.53 In a list of deities (CT XXV, 28, K 7659) we likewise encounter dGish-gibil(or bíl)-ga-mesh, and lastly in a syllabary we have the equation54 dGish-gi-mas-[si?] = dGish-bil-[ga-mesh]. The variant Gish-gibil for Gish-bil may be disposed of readily, in view of the frequent confusion or interchange of the two signs Bil (Brünnow No. 4566) and Gibil or Bíl (Brünnow No. 4642) which has also the value Gi (Brünnow 4641), so that we might also read Gish-gi-ga-mesh. Both signs convey the idea of “fire,” “renew,” etc.; both revert to the picture of flames of fire, in the one case with a bowl (or some such obiect) above it, in the other the flames issuing apparently from a torch.55 The meaning of the name is not affected whether we read dGish-bil-ga-mesh or dGish-gibil(or bíl)-ga-mesh, for the middle element in the latter case being identical with the fire-god, written dBil-gi and to be pronounced in the inverted form as Gibil with -ga (or ge) as the phonetic complement; it is equivalent, therefore, to the writing bil-ga in the former case. Now Gish-gibil or Gish-bíl conveys the idea of abu, “father” (Brünnow No. 5713), just as Bil (Brünnow No. 4579) has this meaning, while Pa-gibil-(ga) or Pa-bíl-ga is abu abi, “grandfather.”56 This meaning may be derived from Gibil, as also from Bíl = išatu, “fire,” then eššu, “new,” then abu, “father,” as the renewer or creator. Gish with Bíl or Gibil would, therefore, be “the father-man” or “the father-hero,” [29]i.e., again the hero par excellence, the original hero, just as in Hebrew and Arabic ab is used in this way.57 The syllable ga being a phonetic complement, the element mesh is to be taken by itself and to be explained, as Poebel suggested, as “hero” (itlu. Brünnow No. 5967). We would thus obtain an entirely artificial combination, “man (or hero), father, hero,” which would simply convey in an emphatic manner the idea of the Ur-held, the original hero, the father of heroes as it were—practically the same idea, therefore, as the one conveyed by Gish alone, as the hero par excellence. Our investigation thus leads us to a substantial identity between Gish and the longer form Gish-bil(or bíl)-ga-mesh, and the former might, therefore, well be used as an abbreviation of the latter. Both the shorter and the longer forms are descriptive epithets based on naive folk etymology, rather than personal names, just as in the designation of our hero as muḳtablu, the “fighter,” or as âlik pâna, “the leader,” or as Esigga imin, “the seven-fold hero,” or Esigga tuk, “the one who is a hero,” are descriptive epithets, and as Atra-ḫasis, “the very wise one,” is such an epithet for the hero of the deluge story. The case is different with Gi-il-ga-mesh, or Gish-gì(n)-mash, which represent the popular and actual pronunciation of the name, or at least the approach to such pronunciation. Such forms, stripped as they are of all artificiality, impress one as genuine names. The conclusion to which we are thus led is that Gish-bil(or bíl)-ga-mesh is a play upon the genuine name, to convey to those to whom the real name, as that of a foreigner, would suggest no meaning an interpretation fitting in with his character. In other words, Gish-bil-ga-mesh is a “Sumerianized” form of the name, introduced into the Sumerian version of the tale which became a folk-possession in the Euphrates Valley. Such plays upon names to suggest the character of an individual or some incident are familiar to us from the narratives in Genesis.58 They do not constitute genuine etymologies and are rarely of use in leading to a correct etymology. Reuben, e.g., certainly does not mean “Yahweh has seen my affliction,” which the mother is supposed to have exclaimed at [30]the birth (Genesis 29, 32), with a play upon ben and be’onyi, any more than Judah means “I praise Yahweh” (v. 35), though it does contain the divine name (Yehô) as an element. The play on the name may be close or remote, as long as it fulfills its function of suggesting an etymology that is complimentary or appropriate. In this way, an artificial division and at the same time a distortion of a foreign name like Gilgamesh into several elements, Gish-bil-ga-mesh, is no more violent than, for example, the explanation of Issachar or rather Issaschar as “God has given my hire” (Genesis 30, 18) with a play upon the element sechar, and as though the name were to be divided into Yah (“God”) and sechar (“hire”); or the popular name of Alexander among the Arabs as Zu’l Karnaini, “the possessor of the two horns.” with a suggestion of his conquest of two hemispheres, or what not.59 The element Gil in Gilgamesh would be regarded as a contraction of Gish-bil or gi-bil, in order to furnish the meaning “father-hero,” or Gil might be looked upon as a variant for Gish, which would give us the “phonetic” form in the Assyrian version dGish-gi-mash,60 as well as such a variant writing dGish-gi-mas-(si). Now a name like Gilgamesh, upon which we may definitely settle as coming closest to the genuine form, certainly impresses one as foreign, i.e., it is neither Sumerian nor Akkadian; and we have already suggested that the circumstance that the hero of the Epic is portrayed as a conqueror of Erech, and a rather ruthless one at that, points to a tradition of an invasion of the Euphrates Valley as the background for the episode in the first tablet of the series. Now it is significant that many of the names in the “mythical” dynasties, as they appear in Poebel’s list,61 are likewise foreign, such as Mes-ki-in-ga-še-ir, son of the god Shamash (and the founder of the “mythical” dynasty of Erech of which dGish-bil-ga-mesh is the fifth member),62 and En-me-ir-kár his son. In a still earlier “mythical” dynasty, we encounter names like Ga-lu-mu-um, Zu-ga-gi-ib, Ar-pi, [31]E-ta-na,63 which are distinctly foreign, while such names as En-me(n)-nun-na and Bar-sal-nun-na strike one again as “Sumerianized” names rather than as genuine Sumerian formations.64 Some of these names, as Galumum, Arpi and Etana, are so Amoritic in appearance, that one may hazard the conjecture of their western origin. May Gilgamesh likewise belong to the Amurru65 region, or does he represent a foreigner from the East in contrast to Enkidu, whose name, we have seen, may have been Baal-Ṭôb in the West, with which region he is according to the Epic so familiar? It must be confessed that the second element ga-mesh would fit in well with a Semitic origin for the name, for the element impresses one as the participial form of a Semitic stem g-m-š, just as in the second element of Meskin-gašer we have such a form. Gil might then be the name of a West-Semitic deity. Such conjectures, however, can for the present not be substantiated, and we must content ourselves with the conclusion that Gilgamesh as the real name of the hero, or at least the form which comes closest to the real name, points to a foreign origin for the hero, and that such forms as dGish-bil-ga-mesh and dGish-bíl-gi-mesh and other variants are “Sumerianized” forms for which an artificial etymology was brought forward to convey the [32]idea of the “original hero” or the hero par excellence. By means of this “play” on the name, which reverts to the compilers of the Sumerian version of the Epic, Gilgamesh was converted into a Sumerian figure, just as the name Enkidu may have been introduced as a Sumerian translation of his Amoritic name. dGish at all events is an abbreviated form of the “Sumerianized” name, introduced by the compilers of the earliest Akkadian version, which was produced naturally under the influence of the Sumerian version. Later, as the Epic continued to grow, a phonetic writing was introduced, dGish-gi-mash, which is in a measure a compromise between the genuine name and the “Sumerianized” form, but at the same time an approach to the real pronunciation. Next to the new light thrown upon the names and original character of the two main figures of the Epic, one of the chief points of interest in the Pennsylvania fragment is the proof that it furnishes for a striking resemblance of the two heroes, Gish and Enkidu, to one another. In interpreting the dream of Gish, his mother. Ninsun, lays stress upon the fact that the dream portends the coming of someone who is like Gish, “born in the field and reared in the mountain” (lines 18–19). Both, therefore, are shown by this description to have come to Babylonia from a mountainous region, i.e., they are foreigners; and in the case of Enkidu we have seen that the mountain in all probability refers to a region in the West, while the same may also be the case with Gish. The resemblance of the two heroes to one another extends to their personal appearance. When Enkidu appears on the streets of Erech, the people are struck by this resemblance. They remark that he is “like Gish,” though “shorter in stature” (lines 179–180). Enkidu is described as a rival or counterpart.66 This relationship between the two is suggested also by the Assyrian version. In the creation of Enkidu by Aruru, the people urge the goddess to create the “counterpart” (zikru) of Gilgamesh, someone who will be like him (ma-ši-il) (Tablet I, 2, 31). Enkidu not only comes from the mountain,67 but the mountain is specifically designated [33]as his birth-place (I, 4, 2), precisely as in the Pennsylvania tablet, while in another passage he is also described, as in our tablet, as “born in the field.”68 Still more significant is the designation of Gilgamesh as the talimu, “younger brother,” of Enkidu.69 In accord with this, we find Gilgamesh in his lament over Enkidu describing him as a “younger brother” (ku-ta-ni);70 and again in the last tablet of the Epic, Gilgamesh is referred to as the “brother” of Enkidu.71 This close relationship reverts to the Sumerian version, for the Constantinople fragment (Langdon, above, p. 13) begins with the designation of Gish-bil-ga-mesh as “his brother.” By “his” no doubt Enkidu is meant. Likewise in the Sumerian text published by Zimmern (above, p. 13) Gilgamesh appears as the brother of Enkidu (rev. 1, 17). Turning to the numerous representations of Gilgamesh and Enkidu on Seal Cylinders,72 we find this resemblance of the two heroes to each other strikingly confirmed. Both are represented as bearded, with the strands arranged in the same fashion. The face in both cases is broad, with curls protruding at the side of the head, though at times these curls are lacking in the case of Enkidu. What is particularly striking is to find Gilgamesh generally a little taller than Enkidu, thus bearing out the statement in the Pennsylvania tablet that Enkidu is “shorter in stature.” There are, to be sure, also some distinguishing marks between the two. Thus Enkidu is generally represented with animal hoofs, but not always.73 Enkidu is commonly portrayed with the horns of a bison, but again this sign is wanting in quite a number of instances.74 The hoofs and the horns mark the period when Enkidu lived with animals and much like an [34]animal. Most remarkable, however, of all are cylinders on which we find the two heroes almost exactly alike as, for example, Ward No. 199 where two figures, the one a duplicate of the other (except that one is just a shade taller), are in conflict with each other. Dr. Ward was puzzled by this representation and sets it down as a “fantastic” scene in which “each Gilgamesh is stabbing the other.” In the light of the Pennsylvania tablet, this scene is clearly the conflict between the two heroes described in column 6, preliminary to their forming a friendship. Even in the realm of myth the human experience holds good that there is nothing like a good fight as a basis for a subsequent alliance. The fragment describes this conflict as a furious one in which Gilgamesh is worsted, and his wounded pride assuaged by the generous victor, who comforts his vanquished enemy by the assurance that he was destined for something higher than to be a mere “Hercules.” He was singled out for the exercise of royal authority. True to the description of the two heroes in the Pennsylvania tablet as alike, one the counterpart of the other, the seal cylinder portrays them almost exactly alike, as alike as two brothers could possibly be; with just enough distinction to make it clear on close inspection that two figures are intended and not one repeated for the sake of symmetry. There are slight variations in the manner in which the hair is worn, and slightly varying expressions of the face, just enough to make it evident that the one is intended for Gilgamesh and the other for Enkidu. When, therefore, in another specimen, No. 173, we find a Gilgamesh holding his counterpart by the legs, it is merely another aspect of the fight between the two heroes, one of whom is intended to represent Enkidu, and not, as Dr. Ward supposed, a grotesque repetition of Gilgamesh.75 The description of Enkidu in the Pennsylvania tablet as a parallel figure to Gilgamesh leads us to a consideration of the relationship of the two figures to one another. Many years ago it was pointed out that the Gilgamesh Epic was a composite tale in which various stories of an independent origin had been combined and brought into more or less artificial connection with the heros eponymos of southern Babylonia.76 We may now go a step further and point out that not [35]only is Enkidu originally an entirely independent figure, having no connection with Gish or Gilgamesh, but that the latter is really depicted in the Epic as the counterpart of Enkidu, a reflection who has been given the traits of extraordinary physical power that belong to Enkidu. This is shown in the first place by the fact that in the encounter it is Enkidu who triumphs over Gilgamesh. The entire analysis of the episode of the meeting between the two heroes as given by Gressmann77 must be revised. It is not Enkidu who is terrified and who is warned against the encounter. It is Gilgamesh who, during the night on his way from the house in which the goddess Ishḫara lies, encounters Enkidu on the highway. Enkidu “blocks the path”78 of Gilgamesh. He prevents Gilgamesh from re-entering the house,79 and the two attack each other “like oxen.”80 They grapple with each other, and Enkidu forces Gilgamesh to the ground. Enkidu is, therefore, the real hero whose traits of physical prowess are afterwards transferred to Gilgamesh. Similarly in the next episode, the struggle against Ḫuwawa, the Yale tablet makes it clear that in the original form of the tale Enkidu is the real hero. All warn Gish against the undertaking—the elders of Erech, Enkidu, and also the workmen. “Why dost thou desire to do this?”81 they say to him. “Thou art young, and thy heart carries thee away. Thou knowest not what thou proposest to do.”82 This part of the incident is now better known to us through the latest fragment of the Assyrian version discovered and published by King.83 The elders say to Gilgamesh: “Do not trust, O Gilgamesh, in thy strength! Be warned(?) against trusting to thy attack! The one who goes before will save his companion,84 He who has foresight will save his friend.85 [36] Let Enkidu go before thee. He knows the roads to the cedar forest; He is skilled in battle and has seen fight.” Gilgamesh is sufficiently impressed by this warning to invite Enkidu to accompany him on a visit to his mother, Ninsun, for the purpose of receiving her counsel.86 It is only after Enkidu, who himself hesitates and tries to dissuade Gish, decides to accompany the latter that the elders of Erech are reconciled and encourage Gish for the fray. The two in concert proceed against Ḫuwawa. Gilgamesh alone cannot carry out the plan. Now when a tale thus associates two figures in one deed, one of the two has been added to the original tale. In the present case there can be little doubt that Enkidu, without whom Gish cannot proceed, who is specifically described as “acquainted with the way … to the entrance of the forest”87 in which Ḫuwawa dwells is the original vanquisher. Naturally, the Epic aims to conceal this fact as much as possible ad majorem gloriam of Gilgamesh. It tries to put the one who became the favorite hero into the foreground. Therefore, in both the Babylonian and the Assyrian version Enkidu is represented as hesitating, and Gilgamesh as determined to go ahead. Gilgamesh, in fact, accuses Enkidu of cowardice and boldly declares that he will proceed even though failure stare him in the face.88 Traces of the older view, however, in which Gilgamesh is the one for whom one fears the outcome, crop out; as, for example, in the complaint of Gilgamesh’s mother to Shamash that the latter has stirred the heart of her son to take the distant way to Ḫu(m)baba, “To a fight unknown to him, he advances, An expedition unknown to him he undertakes.”89 Ninsun evidently fears the consequences when her son informs her of his intention and asks her counsel. The answer of Shamash is not preserved, but no doubt it was of a reassuring character, as was the answer of the Sun-god to Gish’s appeal and prayer as set forth in the Yale tablet.90 [37] Again, as a further indication that Enkidu is the real conqueror of Ḫuwawa, we find the coming contest revealed to Enkidu no less than three times in dreams, which Gilgamesh interprets.91 Since the person who dreams is always the one to whom the dream applies, we may see in these dreams a further trace of the primary rôle originally assigned to Enkidu. Another exploit which, according to the Assyrian version, the two heroes perform in concert is the killing of a bull, sent by Anu at the instance of Ishtar to avenge an insult offered to the goddess by Gilgamesh, who rejects her offer of marriage. In the fragmentary description of the contest with the bull, we find Enkidu “seizing” the monster by “its tail.”92 That Enkidu originally played the part of the slayer is also shown by the statement that it is he who insults Ishtar by throwing a piece of the carcass into the goddess’ face,93 adding also an insulting speech; and this despite the fact that Ishtar in her rage accuses Gilgamesh of killing the bull.94 It is thus evident that the Epic alters the original character of the episodes in order to find a place for Gilgamesh, with the further desire to assign to the latter the chief rôle. Be it noted also that Enkidu, not Gilgamesh, is punished for the insult to Ishtar. Enkidu must therefore in the original form of the episode have been the guilty party, who is stricken with mortal disease as a punishment to which after twelve days he succumbs.95 In view of this, we may supply the name of Enkidu in the little song introduced at the close of the encounter with the bull, and not Gilgamesh as has hitherto been done. “Who is distinguished among the heroes? Who is glorious among men? [Enkidu] is distinguished among heroes, [Enkidu] is glorious among men.”96 [38]Finally, the killing of lions is directly ascribed to Enkidu in the Pennsylvania tablet: “Lions he attacked *     *     *     *     * Lions he overcame”97 whereas Gilgamesh appears to be afraid of lions. On his long search for Utnapishtim he says: “On reaching the entrance of the mountain at night I saw lions and was afraid.”98 He prays to Sin and Ishtar to protect and save him. When, therefore, in another passage some one celebrates Gilgamesh as the one who overcame the “guardian,” who dispatched Ḫu(m)baba in the cedar forest, who killed lions and overthrew the bull,99 we have the completion of the process which transferred to Gilgamesh exploits and powers which originally belonged to Enkidu, though ordinarily the process stops short at making Gilgamesh a sharer in the exploits; with the natural tendency, to be sure, to enlarge the share of the favorite. We can now understand why the two heroes are described in the Pennsylvania tablet as alike, as born in the same place, aye, as brothers. Gilgamesh in the Epic is merely a reflex of Enkidu. The latter is the real hero and presumably, therefore, the older figure.100 Gilgamesh resembles Enkidu, because he is originally Enkidu. The “resemblance” motif is merely the manner in which in the course of the partly popular, partly literary transfer, the recollection is preserved that Enkidu is the original, and Gilgamesh the copy. The artificiality of the process which brings the two heroes together is apparent in the dreams of Gilgamesh which are interpreted by his mother as portending the coming of Enkidu. Not the conflict is foreseen, but the subsequent close association, naïvely described as due to the personal charm which Enkidu exercises, which will lead Gilgamesh to fall in love with the one whom he is to meet. The two will become one, like man and wife. [39] On the basis of our investigations, we are now in a position to reconstruct in part the cycle of episodes that once formed part of an Enkidu Epic. The fight between Enkidu and Gilgamesh, in which the former is the victor, is typical of the kind of tales told of Enkidu. He is the real prototype of the Greek Hercules. He slays lions, he overcomes a powerful opponent dwelling in the forests of Lebanon, he kills the bull, and he finally succumbs to disease sent as a punishment by an angry goddess. The death of Enkidu naturally formed the close of the Enkidu Epic, which in its original form may, of course, have included other exploits besides those taken over into the Gilgamesh Epic. There is another aspect of the figure of Enkidu which is brought forward in the Pennsylvania tablet more clearly than had hitherto been the case. Many years ago attention was called to certain striking resemblances between Enkidu and the figure of the first man as described in the early chapters of Genesis.101 At that time we had merely the Assyrian version of the Gilgamesh Epic at our disposal, and the main point of contact was the description of Enkidu living with the animals, drinking and feeding like an animal, until a woman is brought to him with whom he engages in sexual intercourse. This suggested that Enkidu was a picture of primeval man, while the woman reminded one of Eve, who when she is brought to Adam becomes his helpmate and inseparable companion. The Biblical tale stands, of course, on a much higher level, and is introduced, as are other traditions and tales of primitive times, in the style of a parable to convey certain religious teachings. For all that, suggestions of earlier conceptions crop out in the picture of Adam surrounded by animals to which he assigns names. Such a phrase as “there was no helpmate corresponding to him” becomes intelligible on the supposition of an existing tradition or belief, that man once lived and, indeed, cohabited with animals. The tales in the early chapters of Genesis must rest on very early popular traditions, which have been cleared of mythological and other objectionable features in order to adapt them to the purpose of the Hebrew compilers, to serve as a medium for illustrating [40]certain religious teachings regarding man’s place in nature and his higher destiny. From the resemblance between Enkidu and Adam it does not, of course, follow that the latter is modelled upon the former, but only that both rest on similar traditions of the condition under which men lived in primeval days prior to the beginnings of human culture. We may now pass beyond these general indications and recognize in the story of Enkidu as revealed by the Pennsylvania tablet an attempt to trace the evolution of primitive man from low beginnings to the regular and orderly family life associated with advanced culture. The new tablet furnishes a further illustration for the surprisingly early tendency among the Babylonian literati to connect with popular tales teachings of a religious or ethical character. Just as the episode between Gilgamesh and the maiden Sabitum is made the occasion for introducing reflections on the inevitable fate of man to encounter death, so the meeting of Enkidu with the woman becomes the medium of impressing the lesson of human progress through the substitution of bread and wine for milk and water, through the institution of the family, and through work and the laying up of resources. This is the significance of the address to Enkidu in column 4 of the Pennsylvania tablet, even though certain expressions in it are somewhat obscure. The connection of the entire episode of Enkidu and the woman with Gilgamesh is very artificial; and it becomes much more intelligible if we disassociate it from its present entanglement in the Epic. In Gilgamesh’s dream, portending the meeting with Enkidu, nothing is said of the woman who is the companion of the latter. The passage in which Enkidu is created by Aruru to oppose Gilgamesh102 betrays evidence of having been worked over in order to bring Enkidu into association with the longing of the people of Erech to get rid of a tyrannical character. The people in their distress appeal to Aruru to create a rival to Gilgamesh. In response, “Aruru upon hearing this created a man of Anu in her heart.” Now this “man of Anu” cannot possibly be Enkidu, for the sufficient reason that a few lines further on Enkidu is described as an [41]offspring of Ninib. Moreover, the being created is not a “counterpart” of Gilgamesh, but an animal-man, as the description that follows shows. We must separate lines 30–33 in which the creation of the “Anu man” is described from lines 34–41 in which the creation of Enkidu is narrated. Indeed, these lines strike one as the proper beginning of the original Enkidu story, which would naturally start out with his birth and end with his death. The description is clearly an account of the creation of the first man, in which capacity Enkidu is brought forward. “Aruru washed her hands, broke off clay, threw it on the field103 … created Enkidu, the hero, a lofty offspring of the host of Ninib.”104 The description of Enkidu follows, with his body covered with hair like an animal, and eating and drinking with the animals. There follows an episode105 which has no connection whatsoever with the Gilgamesh Epic, but which is clearly intended to illustrate how Enkidu came to abandon the life with the animals. A hunter sees Enkidu and is amazed at the strange sight—an animal and yet a man. Enkidu, as though resenting his condition, becomes enraged at the sight of the hunter, and the latter goes to his father and tells him of the strange creature whom he is unable to catch. In reply, the father advises his son to take a woman with him when next he goes out on his pursuit, and to have the woman remove her dress in the presence of Enkidu, who will then approach her, and after intercourse with her will abandon the animals among whom he lives. By this device he will catch the strange creature. Lines 14–18 of column 3 in the first tablet in which the father of the hunter refers to Gilgamesh must be regarded as a later insertion, a part of the reconstruction of the tale to connect the episode with Gilgamesh. The advice of the father to his son, the hunter, begins, line 19, “Go my hunter, take with thee a woman.” [42]In the reconstructed tale, the father tells his son to go to Gilgamesh to relate to him the strange appearance of the animal-man; but there is clearly no purpose in this, as is shown by the fact that when the hunter does so, Gilgamesh makes precisely the same speech as does the father of the hunter. Lines 40–44 of column 3, in which Gilgamesh is represented as speaking to the hunter form a complete doublet to lines 19–24, beginning “Go, my hunter, take with thee a woman, etc.” and similarly the description of Enkidu appears twice, lines 2–12 in an address of the hunter to his father, and lines 29–39 in the address of the hunter to Gilgamesh. The artificiality of the process of introducing Gilgamesh into the episode is revealed by this awkward and entirely meaningless repetition. We may therefore reconstruct the first two scenes in the Enkidu Epic as follows:106 Tablet I, col. 2, 34–35: Creation of Enkidu by Aruru. 36–41: Description of Enkidu’s hairy body and of his life with the animals. 42–50: The hunter sees Enkidu, who shows his anger, as also his woe, at his condition. 3, 1–12: The hunter tells his father of the strange being who pulls up the traps which the hunter digs, and who tears the nets so that the hunter is unable to catch him or the animals. 19–24: The father of the hunter advises his son on his next expedition to take a woman with him in order to lure the strange being from his life with the animals. Line 25, beginning “On the advice of his father,” must have set forth, in the original form of the episode, how the hunter procured the woman and took her with him to meet Enkidu. Column 4 gives in detail the meeting between the two, and naïvely describes how the woman exposes her charms to Enkidu, who is captivated by her and stays with her six days and seven nights. The animals see the change in Enkidu and run away from him. [43]He has been transformed through the woman. So far the episode. In the Assyrian version there follows an address of the woman to Enkidu beginning (col. 4, 34): “Beautiful art thou, Enkidu, like a god art thou.” We find her urging him to go with her to Erech, there to meet Gilgamesh and to enjoy the pleasures of city life with plenty of beautiful maidens. Gilgamesh, she adds, will expect Enkidu, for the coming of the latter to Erech has been foretold in a dream. It is evident that here we have again the later transformation of the Enkidu Epic in order to bring the two heroes together. Will it be considered too bold if we assume that in the original form the address of the woman and the construction of the episode were such as we find preserved in part in columns 2 to 4 of the Pennsylvania tablet, which forms part of the new material that can now be added to the Epic? The address of the woman begins in line 51 of the Pennsylvania tablet: “I gaze upon thee, Enkidu, like a god art thou.” This corresponds to the line in the Assyrian version (I, 4, 34) as given above, just as lines 52–53: “Why with the cattle Dost thou roam across the field?” correspond to I, 4, 35, of the Assyrian version. There follows in both the old Babylonian and the Assyrian version the appeal of the woman to Enkidu, to allow her to lead him to Erech where Gilgamesh dwells (Pennsylvania tablet lines 54–61 = Assyrian version I, 4, 36–39); but in the Pennsylvania tablet we now have a second speech (lines 62–63) beginning like the first one with al-ka, “come:” “Come, arise from the accursed ground.” Enkidu consents, and now the woman takes off her garments and clothes the naked Enkidu, while putting another garment on herself. She takes hold of his hand and leads him to the sheepfolds (not to Erech!!), where bread and wine are placed before him. Accustomed hitherto to sucking milk with cattle, Enkidu does not know what to do with the strange food until encouraged and instructed by the woman. The entire third column is taken up with this introduction [44]of Enkidu to civilized life in a pastoral community, and the scene ends with Enkidu becoming a guardian of flocks. Now all this has nothing to do with Gilgamesh, and clearly sets forth an entirely different idea from the one embodied in the meeting of the two heroes. In the original Enkidu tale, the animal-man is looked upon as the type of a primitive savage, and the point of the tale is to illustrate in the naïve manner characteristic of folklore the evolution to the higher form of pastoral life. This aspect of the incident is, therefore, to be separated from the other phase which has as its chief motif the bringing of the two heroes together. We now obtain, thanks to the new section revealed by the Pennsylvania tablet, a further analogy107 with the story of Adam and Eve, but with this striking difference, that whereas in the Babylonian tale the woman is the medium leading man to the higher life, in the Biblical story the woman is the tempter who brings misfortune to man. This contrast is, however, not inherent in the Biblical story, but due to the point of view of the Biblical writer, who is somewhat pessimistically inclined and looks upon primitive life, when man went naked and lived in a garden, eating of fruits that grew of themselves, as the blessed life in contrast to advanced culture which leads to agriculture and necessitates hard work as the means of securing one’s substance. Hence the woman through whom Adam eats of the tree of knowledge and becomes conscious of being naked is looked upon as an evil tempter, entailing the loss of the primeval life of bliss in a gorgeous Paradise. The Babylonian point of view is optimistic. The change to civilized life—involving the wearing of clothes and the eating of food that is cultivated (bread and wine) is looked upon as an advance. Hence the woman is viewed as the medium of raising man to a higher level. The feature common to the Biblical and Babylonian tales is the attachment of a lesson to early folk-tales. The story of Adam and Eve,108 as the story of Enkidu and the woman, is told with a purpose. Starting with early traditions of men’s primitive life on earth, that may have arisen independently, Hebrew and [45]Babylonian writers diverged, each group going its own way, each reflecting the particular point of view from which the evolution of human society was viewed. Leaving the analogy between the Biblical and Babylonian tales aside, the main point of value for us in the Babylonian story of Enkidu and the woman is the proof furnished by the analysis, made possible through the Pennsylvania tablet, that the tale can be separated from its subsequent connection with Gilgamesh. We can continue this process of separation in the fourth column, where the woman instructs Enkidu in the further duty of living his life with the woman decreed for him, to raise a family, to engage in work, to build cities and to gather resources. All this is looked upon in the same optimistic spirit as marking progress, whereas the Biblical writer, consistent with his point of view, looks upon work as a curse, and makes Cain, the murderer, also the founder of cities. The step to the higher forms of life is not an advance according to the J document. It is interesting to note that even the phrase the “cursed ground” occurs in both the Babylonian and Biblical tales; but whereas in the latter (Gen. 3, 17) it is because of the hard work entailed in raising the products of the earth that the ground is cursed, in the former (lines 62–63) it is the place in which Enkidu lives before he advances to the dignity of human life that is “cursed,” and which he is asked to leave. Adam is expelled from Paradise as a punishment, whereas Enkidu is implored to leave it as a necessary step towards progress to a higher form of existence. The contrast between the Babylonian and the Biblical writer extends to the view taken of viniculture. The Biblical writer (again the J document) looks upon Noah’s drunkenness as a disgrace. Noah loses his sense of shame and uncovers himself (Genesis 9, 21), whereas in the Babylonian description Enkidu’s jolly spirit after he has drunk seven jars of wine meets with approval. The Biblical point of view is that he who drinks wine becomes drunk;109 the Babylonian says, if you drink wine you become happy.110 If the thesis here set forth of the original character and import of the episode of Enkidu with the woman is correct, we may again regard lines 149–153 of the Pennsylvania tablet, in which Gilgamesh is introduced, as a later addition to bring the two heroes into association. [46]The episode in its original form ended with the introduction of Enkidu first to pastoral life, and then to the still higher city life with regulated forms of social existence. Now, to be sure, this Enkidu has little in common with the Enkidu who is described as a powerful warrior, a Hercules, who kills lions, overcomes the giant Ḫuwawa, and dispatches a great bull, but it is the nature of folklore everywhere to attach to traditions about a favorite hero all kinds of tales with which originally he had nothing to do. Enkidu, as such a favorite, is viewed also as the type of primitive man,111 and so there arose gradually an Epic which began with his birth, pictured him as half-animal half-man, told how he emerged from this state, how he became civilized, was clothed, learned to eat food and drink wine, how he shaved off the hair with which his body was covered,112 anointed himself—in short, “He became manlike.”113 Thereupon he is taught his duties as a husband, is introduced to the work of building, and to laying aside supplies, and the like. The fully-developed and full-fledged hero then engages in various exploits, of which some are now embodied in the Gilgamesh Epic. Who this Enkidu was, we are not in a position to determine, but the suggestion has been thrown out above that he is a personage foreign to Babylonia, that his home appears to be in the undefined Amurru district, and that he conquers that district. The original tale of Enkidu, if this view be correct, must therefore have been carried to the Euphrates Valley, at a very remote period, with one of the migratory waves that brought a western people as invaders into Babylonia. Here the tale was combined with stories current of another hero, Gilgamesh—perhaps also of Western origin—whose conquest of Erech likewise represents an invasion of Babylonia. The center of the Gilgamesh tale was Erech, and in the process of combining the stories of Enkidu and Gilgamesh, Enkidu is brought to Erech and the two perform exploits [47]in common. In such a combination, the aim would be to utilize all the incidents of both tales. The woman who accompanies Enkidu, therefore, becomes the medium of bringing the two heroes together. The story of the evolution of primitive man to civilized life is transformed into the tale of Enkidu’s removal to Erech, and elaborated with all kinds of details, among which we have, as perhaps embodying a genuine historical tradition, the encounter of the two heroes. Before passing on, we have merely to note the very large part taken in both the old Babylonian and the Assyrian version by the struggle against Ḫuwawa. The entire Yale tablet—forming, as we have seen, the third of the series—is taken up with the preparation for the struggle, and with the repeated warnings given to Gilgamesh against the dangerous undertaking. The fourth tablet must have recounted the struggle itself, and it is not improbable that this episode extended into the fifth tablet, since in the Assyrian version this is the case. The elaboration of the story is in itself an argument in favor of assuming some historical background for it—the recollection of the conquest of Amurru by some powerful warrior; and we have seen that this conquest must be ascribed to Enkidu and not to Gilgamesh. If, now, Enkidu is not only the older figure but the one who is the real hero of the most notable episode in the Gilgamesh Epic; if, furthermore, Enkidu is the Hercules who kills lions and dispatches the bull sent by an enraged goddess, what becomes of Gilgamesh? What is left for him? In the first place, he is definitely the conqueror of Erech. He builds the wall of Erech,114 and we may assume that the designation of the city as Uruk supûri, “the walled Erech,”115 rests upon this tradition. He is also associated with the great temple Eanna, “the heavenly house,” in Erech. To Gilgamesh belongs also the unenviable tradition of having exercised his rule in Erech so harshly that the people are impelled to implore Aruru to create a rival who may rid [48]the district of the cruel tyrant, who is described as snatching sons and daughters from their families, and in other ways terrifying the population—an early example of “Schrecklichkeit.” Tablets II to V inclusive of the Assyrian version being taken up with the Ḫuwawa episode, modified with a view of bringing the two heroes together, we come at once to the sixth tablet, which tells the story of how the goddess Ishtar wooed Gilgamesh, and of the latter’s rejection of her advances. This tale is distinctly a nature myth. The attempt of Gressmann116 to find some historical background to the episode is a failure. The goddess Ishtar symbolizes the earth which woos the sun in the spring, but whose love is fatal, for after a few months the sun’s power begins to wane. Gilgamesh, who in incantation hymns is invoked in terms which show that he was conceived as a sun-god,117 recalls to the goddess how she changed her lovers into animals, like Circe of Greek mythology, and brought them to grief. Enraged at Gilgamesh’s insult to her vanity, she flies to her father Anu and cries for revenge. At this point the episode of the creation of the bull is introduced, but if the analysis above given is correct it is Enkidu who is the hero in dispatching the bull, and we must assume that the sickness with which Gilgamesh is smitten is the punishment sent by Anu to avenge the insult to his daughter. This sickness symbolizes the waning strength of the sun after midsummer is past. The sun recedes from the earth, and this was pictured in the myth as the sun-god’s rejection of Ishtar; Gilgamesh’s fear of death marks the approach of the winter season, when the sun appears to have lost its vigor completely and is near to death. The entire episode is, therefore, a nature myth, symbolical of the passing of spring to midsummer and then to the bare season. The myth has been attached to Gilgamesh as a favorite figure, and then woven into a pattern with the episode of Enkidu and the bull. The bull episode can be detached from the nature myth without any loss to the symbolism of the tale of Ishtar and Gilgamesh. As already suggested, with Enkidu’s death after this conquest of the bull the original Enkidu Epic came to an end. In order to connect Gilgamesh with Enkidu, the former is represented as sharing [49]in the struggle against the bull. Enkidu is punished with death, while Gilgamesh is smitten with disease. Since both shared equally in the guilt, the punishment should have been the same for both. The differentiation may be taken as an indication that Gilgamesh’s disease has nothing to do with the bull episode, but is merely part of the nature myth. Gilgamesh now begins a series of wanderings in search of the restoration of his vigor, and this motif is evidently a continuation of the nature myth to symbolize the sun’s wanderings during the dark winter in the hope of renewed vigor with the coming of the spring. Professor Haupt’s view is that the disease from which Gilgamesh is supposed to be suffering is of a venereal character, affecting the organs of reproduction. This would confirm the position here taken that the myth symbolizes the loss of the sun’s vigor. The sun’s rays are no longer strong enough to fertilize the earth. In accord with this, Gilgamesh’s search for healing leads him to the dark regions118 in which the scorpion-men dwell. The terrors of the region symbolize the gloom of the winter season. At last Gilgamesh reaches a region of light again, described as a landscape situated at the sea. The maiden in control of this region bolts the gate against Gilgamesh’s approach, but the latter forces his entrance. It is the picture of the sun-god bursting through the darkness, to emerge as the youthful reinvigorated sun-god of the spring. Now with the tendency to attach to popular tales and nature myths lessons illustrative of current beliefs and aspirations, Gilgamesh’s search for renewal of life is viewed as man’s longing for eternal life. The sun-god’s waning power after midsummer is past suggests man’s growing weakness after the meridian of life has been left behind. Winter is death, and man longs to escape it. Gilgamesh’s wanderings are used as illustration of this longing, and accordingly the search for life becomes also the quest for immortality. Can the precious boon of eternal life be achieved? Popular fancy created the figure of a favorite of the gods who had escaped a destructive deluge in which all mankind had perished.119 Gilgamesh hears [50]of this favorite and determines to seek him out and learn from him the secret of eternal life. The deluge story, again a pure nature myth, symbolical of the rainy season which destroys all life in nature, is thus attached to the Epic. Gilgamesh after many adventures finds himself in the presence of the survivor of the Deluge who, although human, enjoys immortal life among the gods. He asks the survivor how he came to escape the common fate of mankind, and in reply Utnapishtim tells the story of the catastrophe that brought about universal destruction. The moral of the tale is obvious. Only those singled out by the special favor of the gods can hope to be removed to the distant “source of the streams” and live forever. The rest of mankind must face death as the end of life. That the story of the Deluge is told in the eleventh tablet of the series, corresponding to the eleventh month, known as the month of “rain curse”120 and marking the height of the rainy season, may be intentional, just as it may not be accidental that Gilgamesh’s rejection of Ishtar is recounted in the sixth tablet, corresponding to the sixth month,121 which marks the end of the summer season. The two tales may have formed part of a cycle of myths, distributed among the months of the year. The Gilgamesh Epic, however, does not form such a cycle. Both myths have been artificially attached to the adventures of the hero. For the deluge story we now have the definite proof for its independent existence, through Dr. Poebel’s publication of a Sumerian text which embodies the tale,122 and without any reference [51]to Gilgamesh. Similarly, Scheil and Hilprecht have published fragments of deluge stories written in Akkadian and likewise without any connection with the Gilgamesh Epic.123 In the Epic the story leads to another episode attached to Gilgamesh, namely, the search for a magic plant growing in deep water, which has the power of restoring old age to youth. Utnapishtim, the survivor of the deluge, is moved through pity for Gilgamesh, worn out by his long wanderings. At the request of his wife, Utnapishtim decides to tell Gilgamesh of this plant, and he succeeds in finding it. He plucks it and decides to take it back to Erech so that all may enjoy the benefit, but on his way stops to bathe in a cool cistern. A serpent comes along and snatches the plant from him, and he is forced to return to Erech with his purpose unachieved. Man cannot hope, when old age comes on, to escape death as the end of everything. Lastly, the twelfth tablet of the Assyrian version of the Gilgamesh Epic is of a purely didactic character, bearing evidence of having been added as a further illustration of the current belief that there is no escape from the nether world to which all must go after life has come to an end. Proper burial and suitable care of the dead represent all that can be done in order to secure a fairly comfortable rest for those who have passed out of this world. Enkidu is once more introduced into this episode. His shade is invoked by Gilgamesh and rises up out of the lower world to give a discouraging reply to Gilgamesh’s request, “Tell me, my friend, tell me, my friend, The law of the earth which thou hast experienced, tell me,” The mournful message comes back: “I cannot tell thee, my friend, I cannot tell.” Death is a mystery and must always remain such. The historical Gilgamesh has clearly no connection with the figure introduced into [52]this twelfth tablet. Indeed, as already suggested, the Gilgamesh Epic must have ended with the return to Erech, as related at the close of the eleventh tablet. The twelfth tablet was added by some school-men of Babylonia (or perhaps of Assyria), purely for the purpose of conveying a summary of the teachings in regard to the fate of the dead. Whether these six episodes covering the sixth to the twelfth tablets, (1) the nature myth, (2) the killing of the divine bull, (3) the punishment of Gilgamesh and the death of Enkidu, (4) Gilgamesh’s wanderings, (5) the Deluge, (6) the search for immortality, were all included at the time that the old Babylonian version was compiled cannot, of course, be determined until we have that version in a more complete form. Since the two tablets thus far recovered show that as early as 2000 B.C. the Enkidu tale had already been amalgamated with the current stories about Gilgamesh, and the endeavor made to transfer the traits of the former to the latter, it is eminently likely that the story of Ishtar’s unhappy love adventure with Gilgamesh was included, as well as Gilgamesh’s punishment and the death of Enkidu. With the evidence furnished by Meissner’s fragment of a version of the old Babylonian revision and by our two tablets, of the early disposition to make popular tales the medium of illustrating current beliefs and the teachings of the temple schools, it may furthermore be concluded that the death of Enkidu and the punishment of Gilgamesh were utilized for didactic purposes in the old Babylonian version. On the other hand, the proof for the existence of the deluge story in the Hammurabi period and some centuries later, independent of any connection with the Gilgamesh Epic, raises the question whether in the old Babylonian version, of which our two tablets form a part, the deluge tale was already woven into the pattern of the Epic. At all events, till proof to the contrary is forthcoming, we may assume that the twelfth tablet of the Assyrian version, though also reverting to a Babylonian original, dates as the latest addition to the Epic from a period subsequent to 2000 B.C.; and that the same is probably the case with the eleventh tablet. To sum up, there are four main currents that flow together in the Gilgamesh Epic even in its old Babylonian form: (1) the adventures of a mighty warrior Enkidu, resting perhaps on a faint tradition [53]of the conquest of Amurru by the hero; (2) the more definite recollection of the exploits of a foreign invader of Babylonia by the name of Gilgamesh, whose home appears likewise to have been in the West;124 (3) nature myths and didactic tales transferred to Enkidu and Gilgamesh as popular figures; and (4) the process of weaving the traditions, exploits, myths and didactic tales together, in the course of which process Gilgamesh becomes the main hero, and Enkidu his companion. Furthermore, our investigation has shown that to Enkidu belongs the episode with the woman, used to illustrate the evolution of primitive man to the ways and conditions of civilized life, the conquest of Ḫuwawa in the land of Amurru, the killing of lions and also of the bull, while Gilgamesh is the hero who conquers Erech. Identified with the sun-god, the nature myth of the union of the sun with the earth and the subsequent separation of the two is also transferred to him. The wanderings of the hero, smitten with disease, are a continuation of the nature myth, symbolizing the waning vigor of the sun with the approach of the wintry season. The details of the process which led to making Gilgamesh the favorite figure, to whom the traits and exploits of Enkidu and of the sun-god are transferred, escape us, but of the fact that Enkidu is the older figure, of whom certain adventures were set forth in a tale that once had an independent existence, there can now be little doubt in the face of the evidence furnished by the two tablets of the old Babylonian version; just as the study of these tablets shows that in the combination of the tales of Enkidu and Gilgamesh, the former is the prototype of which Gilgamesh is the copy. If the two are regarded as brothers, as born in the same place, even resembling one another in appearance and carrying out their adventures in common, it is because in the process of combination Gilgamesh becomes the reflex of Enkidu. That Enkidu is not the figure created by Aruru to relieve Erech of its tyrannical ruler is also shown by the fact that Gilgamesh remains in control of Erech. It is to Erech that he returns when he fails of his purpose to learn the secret of escape from old age and death. Erech is, therefore, not relieved of the presence of the ruthless ruler through Enkidu. The “Man of Anu” formed by Aruru as a deliverer is confused in the course of the growth of the [54]Epic with Enkidu, the offspring of Ninib, and in this way we obtain the strange contradiction of Enkidu and Gilgamesh appearing first as bitter rivals and then as close and inseparable friends. It is of the nature of Epic compositions everywhere to eliminate unnecessary figures by concentrating on one favorite the traits belonging to another or to several others. The close association of Enkidu and Gilgamesh which becomes one of the striking features in the combination of the tales of these two heroes naturally recalls the “Heavenly Twins” motif, which has been so fully and so suggestively treated by Professor J. Rendell Harris in his Cult of the Heavenly Twins, (London, 1906). Professor Harris has conclusively shown how widespread the tendency is to associate two divine or semi-divine beings in myths and legends as inseparable companions125 or twins, like Castor and Pollux, Romulus and Remus,126 the Acvins in the Rig-Veda,127 Cain and Abel, Jacob and Esau in the Old Testament, the Kabiri of the Phoenicians,128 Herakles and Iphikles in Greek mythology, Ambrica and Fidelio in Teutonic mythology, Patollo and Potrimpo in old Prussian mythology, Cautes and Cautopates in Mithraism, Jesus and Thomas (according to the Syriac Acts of Thomas), and the various illustrations of “Dioscuri in Christian Legends,” set forth by Dr. Harris in his work under this title, which carries the motif far down into the period of legends about Christian Saints who appear in pairs, including the reference to such a pair in Shakespeare’s Henry V: “And Crispin Crispian shall ne’er go by From that day to the ending of the world.”—(Act, IV, 3, 57–58.) There are indeed certain parallels which suggest that Enkidu-Gilgamesh may represent a Babylonian counterpart to the “Heavenly [55]Twins.” In the Indo-Iranian, Greek and Roman mythology, the twins almost invariably act together. In unison they proceed on expeditions to punish enemies.129 But after all, the parallels are of too general a character to be of much moment; and moreover the parallels stop short at the critical point, for Gilgamesh though worsted is not killed by Enkidu, whereas one of the “Heavenly Twins” is always killed by the brother, as Abel is by Cain, and Iphikles by his twin brother Herakles. Even the trait which is frequent in the earliest forms of the “Heavenly Twins,” according to which one is immortal and the other is mortal, though applying in a measure to Enkidu who is killed by Ishtar, while Gilgamesh the offspring of a divine pair is only smitten with disease, is too unsubstantial to warrant more than a general comparison between the Enkidu-Gilgamesh pair and the various forms of the “twin” motif found throughout the ancient world. For all that, the point is of some interest that in the Gilgamesh Epic we should encounter two figures who are portrayed as possessing the same traits and accomplishing feats in common, which suggest a partial parallel to the various forms in which the twin-motif appears in the mythologies, folk-lore and legends of many nations; and it may be that in some of these instances the duplication is due, as in the case of Enkidu and Gilgamesh, to an actual transfer of the traits of one figure to another who usurped his place. In concluding this study of the two recently discovered tablets of the old Babylonian version of the Gilgamesh Epic which has brought us several steps further in the interpretation and in our understanding of the method of composition of the most notable literary production of ancient Babylonia, it will be proper to consider the literary relationship of the old Babylonian to the Assyrian version. We have already referred to the different form in which the names of the chief figures appear in the old Babylonian version, dGish as against dGish-gì(n)-mash, dEn-ki-dũ as against dEn-ki-dú, Ḫu-wa-wa as against Ḫu(m)-ba-ba. Erech appears as Uruk ribîtim, “Erech of [56]the Plazas,” as against Uruk supûri, “walled Erech” (or “Erech within the walls”), in the Assyrian version.130 These variations point to an independent recension for the Assyrian revision; and this conclusion is confirmed by a comparison of parallel passages in our two tablets with the Assyrian version, for such parallels rarely extend to verbal agreements in details, and, moreover, show that the Assyrian version has been elaborated. Beginning with the Pennsylvania tablet, column I is covered in the Assyrian version by tablet I, 5, 25, to 6, 33, though, as pointed out above, in the Assyrian version we have the anticipation of the dreams of Gilgamesh and their interpretation through their recital to Enkidu by his female companion, whereas in the old Babylonian version we have the dreams directly given in a conversation between Gilgamesh and his mother. In the anticipation, there would naturally be some omissions. So lines 4–5 and 12–13 of the Pennsylvania tablet do not appear in the Assyrian version, but in their place is a line (I, 5, 35), to be restored to ”[I saw him and like] a woman I fell in love with him.” which occurs in the old Babylonian version only in connection with the second dream. The point is of importance as showing that in the Babylonian version the first dream lays stress upon the omen of the falling meteor, as symbolizing the coming of Enkidu, whereas the second dream more specifically reveals Enkidu as a man,131 of whom Gilgamesh is instantly enamored. Strikingly variant lines, though conveying the same idea, are frequent. Thus line 14 of the Babylonian version reads “I bore it and carried it to thee” and appears in the Assyrian version (I, 5, 35b supplied from 6, 26) “I threw it (or him) at thy feet”132 [57]with an additional line in elaboration “Thou didst bring him into contact with me”133 which anticipates the speech of the mother (Line 41 = Assyrian version I, 6, 33). Line 10 of the Pennsylvania tablet has pa-ḫi-ir as against iz-za-az I, 5, 31. Line 8 has ik-ta-bi-it as against da-an in the Assyrian version I, 5, 29. More significant is the variant to line 9 “I became weak and its weight I could not bear” as against I, 5, 30. “Its strength was overpowering,134 and I could not endure its weight.” The important lines 31–36 are not found in the Assyrian version, with the exception of I, 6, 27, which corresponds to lines 33–34, but this lack of correspondence is probably due to the fact that the Assyrian version represents the anticipation of the dreams which, as already suggested, might well omit some details. As against this we have in the Assyrian version I, 6, 23–25, an elaboration of line 30 in the Pennsylvania tablet and taken over from the recital of the first dream. Through the Assyrian version I, 6, 31–32, we can restore the closing lines of column I of the Pennsylvania tablet, while with line 33 = line 45 of the Pennsylvania tablet, the parallel between the two versions comes to an end. Lines 34–43 of the Assyrian version (bringing tablet I to a close)135 represent an elaboration of the speech of Ninsun, followed by a further address of Gilgamesh to his mother, and by the determination of Gilgamesh to seek out Enkidu.136 Nothing of this sort appears to have been included in the old Babylonian version.[58]Our text proceeds with the scene between Enkidu and the woman, in which the latter by her charms and her appeal endeavors to lead Enkidu away from his life with the animals. From the abrupt manner in which the scene is introduced in line 43 of the Pennsylvania tablet, it is evident that this cannot be the first mention of the woman. The meeting must have been recounted in the first tablet, as is the case in the Assyrian version.137 The second tablet takes up the direct recital of the dreams of Gilgamesh and then continues the narrative. Whether in the old Babylonian version the scene between Enkidu and the woman was described with the same naïve details, as in the Assyrian version, of the sexual intercourse between the two for six days and seven nights cannot of course be determined, though presumably the Assyrian version, with the tendency of epics to become more elaborate as they pass from age to age, added some realistic touches. Assuming that lines 44–63 of the Pennsylvania tablet—the cohabitation of Enkidu and the address of the woman—is a repetition of what was already described in the first tablet, the comparison with the Assyrian version I, 4, 16–41, not only points to the elaboration of the later version, but likewise to an independent recension, even where parallel lines can be picked out. Only lines 46–48 of the Pennsylvania tablet form a complete parallel to line 21 of column 4 of the Assyrian version. The description in lines 22–32 of column 4 is missing, though it may, of course, have been included in part in the recital in the first tablet of the old Babylonian version. Lines 49–59 of the Pennsylvania tablet are covered by 33–39, the only slight difference being the specific mention in line 58 of the Pennsylvania tablet of Eanna, the temple in Erech, described as “the dwelling of Anu,” whereas in the Assyrian version Eanna is merely referred to as the “holy house” and described as “the dwelling of Anu and Ishtar,” where Ishtar is clearly a later addition. Leaving aside lines 60–61, which may be merely a variant (though independent) of line 39 of column 4 of the Assyrian version, we now have in the Pennsylvania tablet a second speech of the woman to Enkidu (not represented in the Assyrian version) beginning like the first one with alka, “Come” (lines 62–63), in which she asks Enkidu to leave the “accursed ground” in which he dwells. This speech, as the description which follows, extending into columns 3–4, [59]and telling how the woman clothed Enkidu, how she brought him to the sheep folds, how she taught him to eat bread and to drink wine, and how she instructed him in the ways of civilization, must have been included in the second tablet of the Assyrian version which has come down to us in a very imperfect form. Nor is the scene in which Enkidu and Gilgamesh have their encounter found in the preserved portions of the second (or possibly the third) tablet of the Assyrian version, but only a brief reference to it in the fourth tablet,138 in which in Epic style the story is repeated, leading up to the second exploit—the joint campaign of Enkidu and Gilgamesh against Ḫuwawa. This reference, covering only seven lines, corresponds to lines 192–231 of the Pennsylvania tablet; but the former being the repetition and the latter the original recital, the comparison to be instituted merely reveals again the independence of the Assyrian version, as shown in the use of kibsu, “tread” (IV, 2, 46), for šêpu, “foot” (l. 216), i-na-uš, “quake” (line 5C), as against ir-tu-tu (ll. 221 and 226). Such variants as dGish êribam ûl iddin (l. 217) against dGilgamesh ana šurûbi ûl namdin, (IV, 2, 47). and again iṣṣabtûma kima lîm “they grappled at the gate of the family house” (IV, 2, 48), against iṣṣabtûma ina bâb bît emuti, “they grappled at the gate of the family house” (IV, 2, 48), all point once more to the literary independence of the Assyrian version. The end of the conflict and the reconciliation of the two heroes is likewise missing in the Assyrian version. It may have been referred to at the beginning of column 3139 of Tablet IV. Coming to the Yale tablet, the few passages in which a comparison [60]may be instituted with the fourth tablet of the Assyrian version, to which in a general way it must correspond, are not sufficient to warrant any conclusions, beyond the confirmation of the literary independence of the Assyrian version. The section comprised within lines 72–89, where Enkidu’s grief at his friend’s decision to fight Ḫuwawa is described140, and he makes confession of his own physical exhaustion, may correspond to Tablet IV, column 4, of the Assyrian version. This would fit in with the beginning of the reverse, the first two lines of which (136–137) correspond to column 5 of the fourth tablet of the Assyrian version, with a variation “seven-fold fear”141 as against “fear of men” in the Assyrian version. If lines 138–139 (in column 4) of the Yale tablet correspond to line 7 of column 5 of Tablet IV of the Assyrian version, we would again have an illustration of the elaboration of the later version by the addition of lines 3–6. But beyond this we have merely the comparison of the description of Ḫuwawa “Whose roar is a flood, whose mouth is fire, and whose breath is death” which occurs twice in the Yale tablet (lines 110–111 and 196–197), with the same phrase in the Assyrian version Tablet IV, 5, 3—but here, as just pointed out, with an elaboration. Practically, therefore, the entire Yale tablet represents an addition to our knowledge of the Ḫuwawa episode, and until we are fortunate enough to discover more fragments of the fourth tablet of the Assyrian version, we must content ourselves with the conclusions reached from a comparison of the Pennsylvania tablet with the parallels in the Assyrian version. It may be noted as a general point of resemblance in the exterior form of the old Babylonian and Assyrian versions that both were inscribed on tablets containing six columns, three on the obverse and three on the reverse; and that the length of the tablets—an average of 40 to 50 lines—was about the same, thus revealing in the external form a conventiona1 size for the tablets in the older period, which was carried over into later times. [61] 1 See for further details of this royal library, Jastrow, Civilization of Babylonia and Assyria, p. 21 seq. 2 Das Babylonische Nimrodepos (Leipzig, 1884–1891), supplemented by Haupt’s article Die Zwölfte Tafel des Babylonischen Nimrodepos in BA I, pp. 48–79, containing the fragments of the twelfth tablet. The fragments of the Epic in Ashurbanapal’s library—some sixty—represent portions of several copies. Sin-liḳî-unnini—perhaps from Erech, since this name appears as that of a family in tablets from Erech (see Clay, Legal Documents from Erech, Index, p. 73)—is named in a list of texts (K 9717—Haupt’s edition No. 51, line 18) as the editor of the Epic, though probably he was not the only compiler. Since the publication of Haupt’s edition, a few fragments were added by him as an appendix to Alfred Jeremias Izdubar-Nimrod (Leipzig, 1891) Plates II–IV, and two more are embodied in Jensen’s transliteration of all the fragments in the Keilinschriftliche Bibliothek VI; pp. 116–265, with elaborate notes, pp. 421–531. Furthermore a fragment, obtained from supplementary excavations at Kouyunjik, has been published by L. W. King in his Supplement to the Catalogue of the Cuneiform Tablets in the Kouyunjik Collection of the British Cuneiform Tablets in the Kouyunjik Collection of the British Museum No. 56 and PSBA Vol. 36, pp. 64–68. Recently a fragment of the 6th tablet from the excavations at Assur has been published by Ebeling, Keilschrifttexte aus Assur Religiösen Inhalts No. 115, and one may expect further portions to turn up. The designation “Nimrod Epic” on the supposition that the hero of the Babylonian Epic is identical with Nimrod, the “mighty hunter” of Genesis 10, has now been generally abandoned, in the absence of any evidence that the Babylonian hero bore a name like [10n]Nimrod. For all that, the description of Nimrod as the “mighty hunter” and the occurrence of a “hunter” in the Babylonian Epic (Assyrian version Tablet I)—though he is not the hero—points to a confusion in the Hebrew form of the borrowed tradition between Gilgamesh and Nimrod. The latest French translation of the Epic is by Dhorme, Choix de Textes Religieux Assyro-Babyloniens (Paris, 1907), pp. 182–325; the latest German translation by Ungnad-Gressmann, Das Gilgamesch-Epos (Göttingen, 1911), with a valuable analysis and discussion. These two translations now supersede Jensen’s translation in the Keilinschriftliche Bibliothek, which, however, is still valuable because of the detailed notes, containing a wealth of lexicographical material. Ungnad also gave a partial translation in Gressmann-Ranke, Altorientalische Texte and Bilder I, pp. 39–61. In English, we have translations of substantial portions by Muss-Arnolt in Harper’s Assyrian and Babylonian Literature (New York, 1901), pp. 324–368; by Jastrow, Religion of Babylonia and Assyria (Boston, 1898), Chap. XXIII; by Clay in Light on the Old Testament from Babel, pp. 78–84; by Rogers in Cuneiform Parallels to the Old Testament, pp. 80–103; and most recently by Jastrow in Sacred Books and Early Literature of the East (ed. C. F. Horne, New York, 1917), Vol. I, pp. 187–220. 3 See Luckenbill in JAOS, Vol. 37, p. 452 seq. Prof. Clay, it should be added, clings to the older reading, Hammurabi, which is retained in this volume. 4 ZA, Vol. 14, pp. 277–292. 5 The survivor of the Deluge is usually designated as Ut-napishtim in the Epic, but in one passage (Assyrian version, Tablet XI, 196), he is designated as Atra-ḫasis “the very wise one.” Similarly, in a second version of the Deluge story, also found in Ashurbanapal’s library (IV R² additions, p. 9, line 11). The two names clearly point to two versions, which in accordance with the manner of ancient compositions were merged into one. See an article by Jastrow in ZA, Vol. 13, pp. 288–301. 6 Published by Scheil in Recueil des Travaux, etc. Vol. 20, pp. 55–58. 7 The text does not form part of the Gilgamesh Epic, as the colophon, differing from the one attached to the Epic, shows. 8 Ein altbabylonisches Fragment des Gilgamosepos (MVAG 1902, No. 1). 9 On these variant forms of the two names see the discussion below, p. 24. 10 The passage is paralleled by Ecc. 9, 7–9. See Jastrow, A Gentle Cynic, p. 172 seq. 11 Among the Nippur tablets in the collection of the University of Pennsylvania Museum. The fragment was published by Dr. Poebel in his Historical and Grammatical Texts No. 23. See also Poebel in the Museum Journal, Vol. IV, p. 47, and an article by Dr. Langdon in the same Journal, Vol. VII, pp. 178–181, though Langdon fails to credit Dr. Poebel with the discovery and publication of the important tablet. 12 No. 55 in Langdon’s Historical and Religious Texts from the Temple Library of Nippur (Munich, 1914). 13 No. 5 in his Sumerian Liturgical Texts. (Philadelphia, 1917) 14 See on this name below, p. 23. 15 See further below, p. 37 seq. 16 See Poebel, Historical and Grammatical Texts, No. 1, and Jastrow in JAOS, Vol. 36, pp. 122–131 and 274–299. 17 See an article by Jastrow, Sumerian and Akkadian Views of Beginnings (JAOS Vol. 36, pp. 274–299). 18 See on this point Eduard Meyer, Sumerier und Semiten in Babylonien (Berlin, 1906), p. 107 seq., whose view is followed in Jastrow, Civilization of Babylonia and Assyria, p. 121. See also Clay, Empire of the Amorites (Yale University Press, 1919), p. 23 et seq. 19 See the discussion below, p. 24 seq. 20 Dr. Poebel published an article on the tablet in OLZ, 1914, pp. 4–6, in which he called attention to the correct name for the mother of Gilgamesh, which was settled by the tablet as Ninsun. 21 Historical Texts No. 2, Column 2, 26. See the discussion in Historical and Grammatical Texts, p. 123, seq. 22 See Fostat in OLZ, 1915, p. 367. 23 Publications of the University of Pennsylvania Museum, Babylonian Section, Vol. X, No. 3 (Philadelphia, 1917). It is to be regretted that Dr. Langdon should not have given full credit to Dr. Poebel for his discovery of the tablet. He merely refers in an obscure footnote to Dr. Poebel’s having made a copy. 24 E.g., in the very first note on page 211, and again in a note on page 213. 25 Dr. Langdon neglected to copy the signs 4 šú-si = 240 which appear on the edge of the tablet. He also misunderstood the word šú-tu-ur in the colophon which he translated “written,” taking the word from a stem šaṭâru, “write.” The form šú-tu-ur is III, 1, from atâru, “to be in excess of,” and indicates, presumably, that the text is a copy “enlarged” from an older original. See the Commentary to the colophon, p. 86. 26 Museum Journal, Vol. VIII, p. 29. 27 See below, p. 23. 28 I follow the enumeration of tablets, columns and lines in Jensen’s edition, though some fragments appear to have been placed by him in a wrong position. 29 According to Bezold’s investigation, Verbalsuffixformen als Alterskriterien babylonisch-assyrischer Inschriften (Heidelberg Akad. d. Wiss., Philos.-Histor. Klasse, 1910, 9te Abhandlung), the bulk of the tablets in Ashurbanapal’s library are copies of originals dating from about 1500 B.C. It does not follow, however, that all the copies date from originals of the same period. Bezold reaches the conclusion on the basis of various forms for verbal suffixes, that the fragments from the Ashurbanapal Library actually date from three distinct periods ranging from before c. 1450 to c. 700 B.C. 30 “Before thou comest from the mountain, Gilgamesh in Erech will see thy dreams,” after which the dreams are recounted by the woman to Enkidu. The expression “thy dreams” means here “dreams about thee.” (Tablet I, 5, 23–24). 31 Lines 100–101. 32 In a paper read before the American Oriental Society at New Haven, April 4, 1918. 33 See the commentary to col. 4 of the Yale tablet for further details. 34 This is no doubt the correct reading of the three signs which used to be read Iz-tu-bar or Gish-du-bar. The first sign has commonly the value Gish, the second can be read Gin or Gi (Brünnow No. 11900) and the third Mash as well as Bar. See Ungnad in Ungnad-Gressmann, Das Gilgamesch-Epos, p. 76, and Poebel, Historical and Grammatical Texts, p. 123. 35 So also in Sumerian (Zimmern, Sumerische Kultlieder aus altbabylonischer Zeit, No. 196, rev. 14 and 16.) 36 The sign used, LUM (Brünnow No. 11183), could have the value ḫu as well as ḫum. 37 The addition “father-in-law of Moses” to the name Ḫobab b. Re’uel in this passage must refer to Re’uel, and not to Ḫobab. In Judges 4, 11, the gloss “of the Bene Ḫobab, the father-in-law of Moses” must be separated into two: (1) “Bene Ḫobab,” and (2) “father-in-law of Moses.” The latter addition rests on an erroneous tradition, or is intended as a brief reminder that Ḫobab is identical with the son of Re’uel. 38 See his List of Personal Names from the Temple School of Nippur, p. 122. Ḫu-um-ba-bi-tu and ši-kin ḫu-wa-wa also occur in Omen Texts (CT XXVII, 4, 8–9 = Pl. 3, 17 = Pl. 6, 3–4 = CT XXVIII, 14, 12). The contrast to ḫuwawa is ligru, “dwarf” (CT XXVII, 4, 12 and 14 = Pl. 6, 7.9 = Pl. 3, 19). See Jastrow, Religion Babyloniens und Assyriens, II, p. 913, Note 7. Ḫuwawa, therefore, has the force of “monster.” 39 Ungnad-Gressmann, Das Gilgamesch-Epos, p. 111 seq. 40 Ungnad, 1. c. p. 77, called attention to this name, but failed to draw the conclusion that Ḫu(m)baba therefore belongs to the West and not to the East. 41 First pointed out by Ungnad in OLZ 1910, p. 306, on the basis of CT XVIII, 30, 10, where En-gi-dú appears in the column furnishing phonetic readings. 42 See Clay Amurru, pp. 74, 129, etc. 43 Tablet I, 2, 39–40; 3, 6–7 and 33–34; 4, 3–4. 44 Tablet I, 2, 1 and IX, 2, 16. Note also the statement about Gilgamesh that “his body is flesh of the gods” (Tablet IX, 2, 14; X, 1, 7). 45 BOR IV, p. 264. 46 Lewin, Die Scholien des Theodor bar Koni zur Patriarchengeschichte (Berlin, 1905), p. 2. See Gressmann in Ungnad-Gressmann, Das Gilgamesch-Epos, p. 83, who points out that the first element of גלמגוס compared with the second of גמיגמוס gives the exact form that we require, namely, Gilgamos. 47 Tablet I, col. 2, is taken up with this episode. 48 See Poebel, Historical and Grammatical Texts, p. 123. 49 See Poebel, Historical Texts No. 2, col. 2, 26. 50 Hilprecht, Old Babylonian Inscriptions I, 1 No. 26. 51 Delitzsch, Assyrische Lesestücke, p. 88, VI, 2–3. Cf. also CT XXV, 28(K 7659) 3, where we must evidently supply [Esigga]-tuk, for which in the following line we have again Gish-bil-ga-mesh as an equivalent. See Meissner, OLZ 1910, 99. 52 See, e.g., Barton, Haverford Collection II No. 27, Col. I, 14, etc. 53 Deimel, Pantheon Babylonicum, p. 95. 54 CT XII, 50 (K 4359) obv. 17. 55 See Barton, Origin and Development of Babylonian Writing, II, p. 99 seq., for various explanations, though all centering around the same idea of the picture of fire in some form. 56 See the passages quoted by Poebel, Historical and Grammatical Texts, p. 126. 57 E.g., Genesis 4, 20, Jabal, “the father of tent-dwelling and cattle holding;” Jubal (4, 21), “the father of harp and pipe striking.” 58 See particularly the plays (in the J. Document) upon the names of the twelve sons of Jacob, which are brought forward either as tribal characteristics, or as suggested by some incident or utterance by the mother at the birth of each son. 59 The designation is variously explained by Arabic writers. See Beidhawi’s Commentary (ed. Fleischer), to Súra 18, 82. 60 The writing Gish-gi-mash as an approach to the pronunciation Gilgamesh would thus represent the beginning of the artificial process which seeks to interpret the first syllable as “hero.” 61 See above, p. 27. 62 Poebel, Historical Texts, p. 115 seq. 63 Many years ago (BA III, p. 376) I equated Etana with Ethan in the Old Testament—therefore a West Semitic name. 64 See Clay, The Empire of the Amorites, p. 80. 65 Professor Clay strongly favors an Amoritic origin also for Gilgamesh. His explanation of the name is set forth in his recent work on The Empire of the Amorites, page 89, and is also referred to in his work on Amurru, page 79, and in his volume of Miscellaneous Inscriptions in the Yale Babylonian Collection, page 3, note. According to Professor Clay the original form of the hero’s name was West Semitic, and was something like Bilga-Mash, the meaning of which was perhaps “the offspring of Mash.” For the first element in this division of the name cf. Piliḳam, the name of a ruler of an early dynasty, and Balaḳ of the Old Testament. In view of the fact that the axe figures so prominently in the Epic as an instrument wielded by Gilgamesh, Professor Clay furthermore thinks it reasonable to assume that the name was interpreted by the Babylonian scribe as “the axe of Mash.” In this way he would account for the use of the determinative for weapons, which is also the sign Gish, in the name. It is certainly noteworthy that the ideogram Gish-Tún in the later form of Gish-Tún-mash = pašu, “axe,” CT XVI, 38:14b, etc. Tun also = pilaḳu “axe,” CT xii, 10:34b. Names with similar element (besides Piliḳam) are Belaḳu of the Hammurabi period, Bilaḳḳu of the Cassite period, etc. It is only proper to add that Professor Jastrow assumes the responsibility for the explanation of the form and etymology of the name Gilgamesh proposed in this volume. The question is one in regard to which legitimate differences of opinion will prevail among scholars until through some chance a definite decision, one way or the other, can be reached. 66 me-iḫ-rù (line 191). 67 Tablet I, 5, 23. Cf. I, 3, 2 and 29. 68 Tablet IV, 4, 7 and I, 5, 3. 69 Assyrian version, Tablet II, 3b 34, in an address of Shamash to Enkidu. 70 So Assyrian version, Tablet VIII, 3, 11. Also supplied VIII, 5, 20 and 21; and X, 1, 46–47 and 5, 6–7. 71 Tablet XII, 3, 25. 72 Ward, Seal Cylinders of Western Asia, Chap. X, and the same author’s Cylinders and other Ancient Oriental Seals—Morgan collection Nos. 19–50. 73 E.g., Ward No. 192, Enkidu has human legs like Gilgamesh; also No. 189, where it is difficult to say which is Gilgamesh, and which is Enkidu. The clothed one is probably Gilgamesh, though not infrequently Gilgamesh is also represented as nude, or merely with a girdle around his waist. 74 E.g., Ward, Nos. 173, 174, 190, 191, 195 as well as 189 and 192. 75 On the other hand, in Ward Nos. 459 and 461, the conflict between the two heroes is depicted with the heroes distinguished in more conventional fashion, Enkidu having the hoofs of an animal, and also with a varying arrangement of beard and hair. 76 See Jastrow, Religion of Babylonia and Assyria (Boston, 1898), p. 468 seq. 77 Ungnad-Gressmann, Das Gilgamesch-Epos, p. 90 seq. 78 Pennsylvania tablet, l. 198 = Assyrian version, Tablet IV, 2, 37. 79 “Enkidu blocked the gate” (Pennsylvania tablet, line 215) = Assyrian version Tablet IV, 2, 46: “Enkidu interposed his foot at the gate of the family house.” 80 Pennsylvania tablet, lines 218 and 224. 81 Yale tablet, line 198; also to be supplied lines 13–14. 82 Yale tablet, lines 190 and 191. 83 PSBA 1914, 65 seq. = Jensen III, 1a, 4–11, which can now be completed and supplemented by the new fragment. 84 I.e., Enkidu will save Gilgamesh. 85 These two lines impress one as popular sayings—here applied to Enkidu. 86 King’s fragment, col. I, 13–27, which now enables us to complete Jensen III, 1a, 12–21. 87 Yale tablet, lines 252–253. 88 Yale tablet, lines 143–148 = Assyrian version, Tablet IV, 6, 26 seq. 89 Assyrian version, Tablet III, 2a, 13–14. 90 Lines 215–222. 91 Assyrian version, Tablet V, Columns 3–4. We have to assume that in line 13 of column 4 (Jensen, p. 164), Enkidu takes up the thread of conversation, as is shown by line 22: “Enkidu brought his dream to him and spoke to Gilgamesh.” 92 Assyrian version, Tablet VI, lines 146–147. 93 Lines 178–183. 94 Lines 176–177. 95 Tablet VII, Column 6. 96 Assyrian version, Tablet VI, 200–203. These words are put into the mouth of Gilgamesh (lines 198–199). It is, therefore, unlikely that he would sing his own praise. Both Jensen and Ungnad admit that Enkidu is to be supplied in at least one of the lines. 97 Lines 109 and 112. 98 Assyrian version, Tablet IX, 1, 8–9. 99 Tablet VIII, 5, 2–6. 100 So also Gressmann in Ungnad-Gressmann, Das Gilgamesch-Epos, p. 97, regards Enkidu as the older figure. 101 See Jastrow, Adam and Eve in Babylonian Literature, AJSL, Vol. 15, pp. 193–214. 102 Assyrian version, Tablet I, 2, 31–36. 103 It will be recalled that Enkidu is always spoken of as “born in the field.” 104 Note the repetition ibtani “created” in line 33 of the “man of Anu” and in line 35 of the offspring of Ninib. The creation of the former is by the “heart,” i.e., by the will of Aruru, the creation of the latter is an act of moulding out of clay. 105 Tablet I, Column 3. 106 Following as usual the enumeration of lines in Jensen’s edition. 107 An analogy does not involve a dependence of one tale upon the other, but merely that both rest on similar traditions, which may have arisen independently. 108 Note that the name of Eve is not mentioned till after the fall (Genesis 3, 20). Before that she is merely ishsha, i.e., “woman,” just as in the Babylonian tale the woman who guides Enkidu is ḫarimtu, “woman.” 109 “And he drank and became drunk” (Genesis 9, 21). 110 “His heart became glad and his face shone” (Pennsylvania Tablet, lines 100–101). 111 That in the combination of this Enkidu with tales of primitive man, inconsistent features should have been introduced, such as the union of Enkidu with the woman as the beginning of a higher life, whereas the presence of a hunter and his father shows that human society was already in existence, is characteristic of folk-tales, which are indifferent to details that may be contradictory to the general setting of the story. 112 Pennsylvania tablet, lines 102–104. 113 Line 105. 114 Tablet I, 1, 9. See also the reference to the wall of Erech as an “old construction” of Gilgamesh, in the inscription of An-Am in the days of Sin-gamil (Hilprecht, Old Babylonian Inscriptions, I, No. 26.) Cf IV R² 52, 3, 53. 115 The invariable designation in the Assyrian version as against Uruk ribîtim, “Erech of the plazas,” in the old Babylonian version. 116 In Ungnad-Gressmann, Das Gilgamesch-Epos, p. 123 seq. 117 See Jensen, p. 266. Gilgamesh is addressed as “judge,” as the one who inspects the divisions of the earth, precisely as Shamash is celebrated. In line 8 of the hymn in question, Gilgamesh is in fact addressed as Shamash. 118 The darkness is emphasized with each advance in the hero’s wanderings (Tablet IX, col. 5). 119 This tale is again a nature myth, marking the change from the dry to the rainy season. The Deluge is an annual occurrence in the Euphrates Valley through the overflow [50n]of the two rivers. Only the canal system, directing the overflow into the fields, changed the curse into a blessing. In contrast to the Deluge, we have in the Assyrian creation story the drying up of the primeval waters so that the earth makes its appearance with the change from the rainy to the dry season. The world is created in the spring, according to the Akkadian view which is reflected in the Biblical creation story, as related in the P. document. See Jastrow, Sumerian and Akkadian Views of Beginnings (JAOS, Vol 36, p. 295 seq.). 120 Aš-am in Sumerian corresponding to the Akkadian Šabaṭu, which conveys the idea of destruction. 121 The month is known as the “Mission of Ishtar” in Sumerian, in allusion to another nature myth which describes Ishtar’s disappearance from earth and her mission to the lower world. 122 Historical Texts No. 1. The Sumerian name of the survivor is Zi-ū-gíd-du or perhaps Zi-ū-sū-du (cf. King, Legends of Babylon and Egypt, p. 65, note 4), signifying “He who lengthened the day of life,” i.e., the one of long life, of which Ut-napishtim (“Day of Life”) in the Assyrian version seems to be an abbreviated Akkadian rendering, [n]with the omission of the verb. So King’s view, which is here followed. See also CT XVIII, 30, 9, and Langdon, Sumerian Epic of Paradise, p. 90, who, however, enters upon further speculations that are fanciful. 123 See the translation in Ungnad-Gressmann, Das Gilgamesch-Epos, pp. 69, seq. and 73. 124 According to Professor Clay, quite certainly Amurru, just as in the case of Enkidu. 125 Gressmann in Ungnad-Gressmann, Das Gilgamesch-Epos, p. 100 seq. touches upon this motif, but fails to see the main point that the companions are also twins or at least brothers. Hence such examples as Abraham and Lot, David and Jonathan, Achilles and Patroclus, Eteokles and Polyneikes, are not parallels to Gilgamesh-Enkidu, but belong to the enlargement of the motif so as to include companions who are not regarded as brothers. 126 Or Romus. See Rendell Harris, l. c., p. 59, note 2. 127 One might also include the primeval pair Yama-Yami with their equivalents in Iranian mythology (Carnoy, Iranian Mythology, p. 294 seq.). 128 Becoming, however, a triad and later increased to seven. Cf. Rendell Harris, l. c., p. 32. 129 I am indebted to my friend, Professor A. J. Carnoy, of the University of Louvain, for having kindly gathered and placed at my disposal material on the “twin-brother” motif from Indo-European sources, supplemental to Rendell Harris’ work. 130 On the other hand, Uruk mâtum for the district of Erech, i.e., the territory over which the city holds sway, appears in both versions (Pennsylvania tablet, 1. 10 = Assyrian version I, 5, 36). 131 “My likeness” (line 27). It should be noted, however, that lines 32–44 of I, 5, in Jensen’s edition are part of a fragment K 9245 (not published, but merely copied by Bezold and Johns, and placed at Jensen’s disposal), which may represent a duplicate to I, 6, 23–34, with which it agrees entirely except for one line, viz., line 34 of K 9245 which is not found in column 6, 23–34. If this be correct, then there is lacking after line 31 of column 5, the interpretation of the dream given in the Pennsylvania tablet in lines 17–23. 132 ina šap-li-ki, literally, “below thee,” whereas in the old Babylonian version we have ana ṣi-ri-ka, “towards thee.” 133 Repeated I, 6, 28. 134 ul-tap-rid ki-is-su-šú-ma. The verb is from parâdu, “violent.” For kissu, “strong,” see CT XVI, 25, 48–49. Langdon (Gilgamesh Epic, p. 211, note 5) renders the phrase: “he shook his murderous weapon!!”—another illustration of his haphazard way of translating texts. 135 Shown by the colophon (Jeremias, Izdubar-Nimrod, Plate IV.) 136 Lines 42–43 must be taken as part of the narrative of the compiler, who tells us that after the woman had informed Enkidu that Gilgamesh already knew of Enkidu’s coming through dreams interpreted by Ninsun, Gilgamesh actually set out and encountered Enkidu. 137 Tablet I, col. 4. See also above, p. 19. 138 IV, 2, 44–50. The word ullanum, (l.43) “once” or “since,” points to the following being a reference to a former recital, and not an original recital. 139 Only the lower half (Haupt’s edition, p. 82) is preserved. 140 “The eyes of Enkidu were filled with tears,” corresponding to IV, 4, 10. 141 Unless indeed the number “seven” is a slip for the sign ša. See the commentary to the line. Pennsylvania Tablet The 240 lines of the six columns of the text are enumerated in succession, with an indication on the margin where a new column begins. This method, followed also in the case of the Yale tablet, seems preferable to Langdon’s breaking up of the text into Obverse and Reverse, with a separate enumeration for each of the six columns. In order, however, to facilitate a comparison with Langdon’s edition, a table is added: Obverse Col. I, 1 = Line 1 of our text. ,, I, 5 = ,, 5 ,, ,, ,, ,, I, 10 = ,, 10 ,, ,, ,, ,, I, 15 = ,, 15 ,, ,, ,, ,, I, 20 = ,, 20 ,, ,, ,, ,, I, 25 = ,, 25 ,, ,, ,, ,, I, 30 = ,, 30 ,, ,, ,, ,, I, 35 = ,, 35 ,, ,, ,, Col. II, 1 = Line 41 ,, ,, ,, ,, II, 5 = ,, 45 ,, ,, ,, ,, II, 10 = ,, 50 ,, ,, ,, ,, II, 15 = ,, 55 ,, ,, ,, ,, II, 20 = ,, 60 ,, ,, ,, ,, II, 25 = ,, 65 ,, ,, ,, ,, II, 30 = ,, 70 ,, ,, ,, ,, II, 35 = ,, 75 ,, ,, ,, Col. III, 1 = Line 81 ,, ,, ,, ,, III, 5 = ,, 85 ,, ,, ,, ,, III, 10 = ,, 90 ,, ,, ,, ,, III, 15 = ,, 95 ,, ,, ,, ,, III, 26 = ,, 100 ,, ,, ,, ,, III, 25 = ,, 105 ,, ,, ,, ,, III, 30 = ,, 110 ,, ,, ,, ,, III, 35 = ,, 115 ,, ,, ,, Reverse Col. I, 1 (= Col. IV) = Line 131 of our text. ,, I, 5 = ,, 135 ,, ,, ,, ,, I, 10 = ,, 140 ,, ,, ,, ,, I, 15 = ,, 145 ,, ,, ,, ,, I, 20 = ,, 150 ,, ,, ,, ,, I, 25 = ,, 155 ,, ,, ,, ,, I, 30 = ,, 160 ,, ,, ,, ,, II, 1 (= Col. V) = Line 171 ,, ,, ,, ,, II, 5 = ,, 175 ,, ,, ,, ,, II, 10 = ,, 180 ,, ,, ,, ,, II, 15 = ,, 185 ,, ,, ,, ,, II, 20 = ,, 190 ,, ,, ,, ,, II, 25 = ,, 195 ,, ,, ,, ,, II, 30 = ,, 200 ,, ,, ,, ,, III, 1 (= Col. VI) = Line 208 ,, ,, ,, ,, III, 5 = ,, 212 ,, ,, ,, ,, III, 10 = ,, 217 ,, ,, ,, ,, III, 15 = ,, 222 ,, ,, ,, ,, III, 20 = ,, 227 ,, ,, ,, ,, III, 25 = ,, 232 ,, ,, ,, ,, III, 30 = ,, 237 ,, ,, ,, ,, III, 33 = ,, 240 ,, ,, ,, [62] Pennsylvania Tablet. Transliteration. Col. I. 1it-bi-e-ma dGiš šú-na-tam i-pa-áš-šar 2iz-za-kàr-am a-na um-mi-šú 3um-mi i-na šá-at mu-ši-ti-ia 4šá-am-ḫa-ku-ma at-ta-na-al-la-ak 5i-na bi-ri-it it-lu-tim 6ib-ba-šú-nim-ma ka-ka-bu šá-ma-i 7[ki]-iṣ-rù šá A-nim im-ḳu-ut a-na ṣi-ri-ia 8áš-ši-šú-ma ik-ta-bi-it e-li-ia 9ú-ni-iš-šú-ma nu-uš-šá-šú ú-ul il-ti-’i 10Urukki ma-tum pa-ḫi-ir e-li-šú 11it-lu-tum ú-na-šá-ku ši-pi-šú 12ú-um-mi-id-ma pu-ti 13i-mi-du ia-ti 14áš-ši-a-šú-ma ab-ba-la-áš-šú a-na ṣi-ri-ki 15um-mi dGiš mu-di-a-at ka-la-ma 16iz-za-kàr-am a-na dGiš 17mi-in-di dGiš šá ki-ma ka-ti 18i-na ṣi-ri i-wa-li-id-ma 19ú-ra-ab-bi-šú šá-du-ú 20ta-mar-šú-ma [kima Sal(?)] ta-ḫa-du at-ta 21it-lu-tum ú-na-šá-ku ši-pi-šú 22tí-iṭ-ṭi-ra-áš-[šú tu-ut]-tu-ú-ma 23ta-tar-ra-[as-su] a-na ṣi-[ri]-ia 24[uš]-ti-nim-ma i-ta-mar šá-ni-tam[63] 25[šú-na]-ta i-ta-wa-a-am a-na um-mi-šú 26[um-mi] a-ta-mar šá-ni-tam 27[šú-na-tu a-ta]-mar e-mi-a i-na su-ḳi-im 28[šá Uruk]ki ri-bi-tim 29ḫa-aṣ-ṣi-nu na-di-i-ma 30e-li-šú pa-aḫ-ru 31ḫa-aṣ-ṣi-nu-um-ma šá-ni bu-nu-šú 32a-mur-šú-ma aḫ-ta-du a-na-ku 33a-ra-am-šú-ma ki-ma áš-šá-tim 34a-ḫa-ab-bu-ub el-šú 35el-ki-šú-ma áš-ta-ka-an-šú 36a-na a-ḫi-ia 37um-mi dGiš mu-da-at [ka]-la-ma 38[iz-za-kàr-am a-na dGiš] 39[dGiš šá ta-mu-ru amêlu] 40[ta-ḫa-ab-bu-ub ki-ma áš-šá-tim el-šú] Col. II. 41áš-šum uš-[ta]-ma-ḫa-ru it-ti-ka 42dGiš šú-na-tam i-pa-šar 43dEn-ki-[dũ wa]-ši-ib ma-ḫar ḫa-ri-im-tim 44ur-[šá ir]-ḫa-mu di-da-šá(?) ip-tí-[e] 45[dEn-ki]-dũ im-ta-ši a-šar i-wa-al-du 46ûm, 6 ù 7 mu-ši-a-tim 47dEn-[ki-dũ] ti-bi-i-ma 48šá-[am-ka-ta] ir-ḫi 49ḫa-[ri-im-tum pa-a]-šá i-pu-šá-am-ma 50iz-za-[kàr-am] a-na dEn-ki-dũ 51a-na-tal-ka dEn-ki-dũ ki-ma ili ta-ba-áš-ši 52am-mi-nim it-ti na-ma-áš-te-e 53ta-at-ta-[na-al]-ak ṣi-ra-am[64] 54al-kam lu-úr-di-ka 55a-na libbi [Urukki] ri-bi-tim 56a-na bît [el]-lim mu-šá-bi šá A-nim 57dEn-ki-dũ ti-bi lu-ru-ka 58a-na Ê-[an]-na mu-šá-bi šá A-nim 59a-šar [dGiš gi]-it-ma-[lu] ne-pi-ši-tim 60ù at-[ta] ki-[ma Sal ta-ḫa]-bu-[ub]-šú 61ta-[ra-am-šú ki-ma] ra-ma-an-ka 62al-ka ti-ba i-[na] ga-ag-ga-ri 63ma-a-ag-ri-i-im 64iš-me a-wa-as-sa im-ta-ḫar ga-ba-šá 65mi-il-[kum] šá aššatim 66im-ta-ḳu-ut a-na libbi-šú 67iš-ḫu-ut li-ib-šá-am 68iš-ti-nam ú-la-ab-bi-iš-sú 69li-ib-[šá-am] šá-ni-a-am 70ši-i it-ta-al-ba-áš 71ṣa-ab-tat ga-as-su 72ki-ma [ili] i-ri-id-di-šú 73a-na gu-up-ri šá-ri-i-im 74a-šar tar-ba-ṣi-im 75i-na [áš]-ri-šú [im]-ḫu-ruri-ia-ú 76[ù šú-u dEn-ki-dũ i-lit-ta-šú šá-du-um-ma] 77[it-ti ṣabâti-ma ik-ka-la šam-ma] 78[it-ti bu-lim maš-ḳa-a i-šat-ti] 79[it-ti na-ma-áš-te-e mê i-ṭab lib-ba-šú] (Perhaps one additional line missing.) Col. III. 81ši-iz-ba šá na-ma-áš-te-e 82i-te-en-ni-ik 83a-ka-lam iš-ku-nu ma-ḫar-šú 84ib-tí-ik-ma i-na-at-tal 85ù ip-pa-al-la-as[65] 86ú-ul i-di dEn-ki-dũ 87aklam a-na a-ka-lim 88šikaram a-na šá-te-e-im 89la-a lum-mu-ud 90ḫa-ri-im-tum pi-šá i-pu-šá-am-ma 91iz-za-kàr-am a-na dEn-ki-dũ 92a-ku-ul ak-lam dEn-ki-dũ 93zi-ma-at ba-la-ṭi-im 94šikaram ši-ti ši-im-ti ma-ti 95i-ku-ul a-ak-lam dEn-ki-dũ 96a-di ši-bi-e-šú 97šikaram iš-ti-a-am 987 aṣ-ṣa-am-mi-im 99it-tap-šar kab-ta-tum i-na-an-gu 100i-li-iṣ libba-šú-ma 101pa-nu-šú [it]-tam-ru 102ul-tap-pi-it [lùŠÚ]-I 103šú-ḫu-ra-am pa-ga-ar-šú 104šá-am-nam ip-ta-šá-áš-ma 105a-we-li-iš i-we 106il-ba-áš li-ib-šá-am 107ki-ma mu-ti i-ba-áš-ši 108il-ki ka-ak-ka-šú 109la-bi ú-gi-ir-ri 110uš-sa-ak-pu re’ûti mu-ši-a-tim 111ut-tap-pi-iš šib-ba-ri 112la-bi uk-ta-ši-id 113it-ti-[lu] na-ki-[di-e] ra-bu-tum 114dEn-ki-dũ ma-aṣ-ṣa-ar-šú-nu 115a-we-lum giš-ru-um 116iš-te-en it-lum 117a-na [na-ki-di-e(?) i]-za-ak-ki-ir (About five lines missing.) Col. IV. (About eight lines missing.) 131i-ip-pu-uš ul-ṣa-am 132iš-ši-ma i-ni-i-šú 133i-ta-mar a-we-lam[66] 134iz-za-kàr-am a-na ḫarimtim 135šá-am-ka-at uk-ki-ši a-we-lam 136a-na mi-nim il-li-kam 137zi-ki-ir-šú lu-uš-šú 138ḫa-ri-im-tum iš-ta-si a-we-lam 139i-ba-uš-su-um-ma i-ta-mar-šú 140e-di-il e-eš ta-ḫi-[il-la]-am 141lim-nu a-la-ku ma-na-aḫ-[ti]-ka 142e-pi-šú i-pu-šá-am-ma 143iz-za-kàr-am a-na dEn-[ki-dũ] 144bi-ti-iš e-mu-tim ik …… 145ši-ma-a-at ni-ši-i-ma 146tu-a-(?)-ar e-lu-tim 147a-na âli(?) dup-šak-ki-i e-ṣi-en 148uk-la-at âli(?) e-mi-sa a-a-ḫa-tim 149a-na šarri šá Urukki ri-bi-tim 150pi-ti pu-uk epiši(-ši) a-na ḫa-a-a-ri 151a-na dGiš šarri šá Urukki ri-bi-tim 152pi-ti pu-uk epiši(-ši) 153a-na ḫa-a-a-ri 154áš-ša-at ši-ma-tim i-ra-aḫ-ḫi 155šú-ú pa-na-nu-um-ma 156mu-uk wa-ar-ka-nu 157i-na mi-il-ki šá ili ga-bi-ma 158i-na bi-ti-iḳ a-bu-un-na-ti-šú 159ši-ma-as-su 160a-na zi-ik-ri it-li-im 161i-ri-ku pa-nu-šú (About three lines missing.) [67] Col. V. (About six lines missing.) 171i-il-la-ak [dEn-ki-dũ i-na pa-ni] 172u-šá-am-ka-at [wa]-ar-ki-šú 173i-ru-ub-ma a-na libbi Urukki ri-bi-tim 174ip-ḫur um-ma-nu-um i-na ṣi-ri-šú 175iz-zi-za-am-ma i-na su-ḳi-im 176šá Urukki ri-bi-tim 177pa-aḫ-ra-a-ma ni-šú 178i-ta-wa-a i-na ṣi-ri-šú 179a-na ṣalam dGiš ma-ši-il pi-it-tam 180la-nam šá-pi-il 181si-ma …. [šá-ki-i pu]-uk-ku-ul 182............. i-pa-ka-du 183i-[na mâti da-an e-mu]-ki i-wa 184ši-iz-ba šá na-ma-aš-te-e 185i-te-en-ni-ik 186ka-a-a-na i-na [libbi] Urukki kak-ki-a-tum 187it-lu-tum ú-te-el-li-lu 188šá-ki-in ur-šá-nu 189a-na itli šá i-šá-ru zi-mu-šú 190a-na dGiš ki-ma i-li-im 191šá-ki-iš-šum me-iḫ-rù 192a-na dIš-ḫa-ra ma-a-a-lum 193na-di-i-ma 194dGiš it-[ti-il-ma wa-ar-ka-tim] 195i-na mu-ši in-ni-[ib-bi]-it 196i-na-ag-šá-am-ma 197it-ta-[zi-iz dEn-ki-dũ] i-na sûḳim 198ip-ta-ra-[aṣ a-la]-ak-tam 199šá dGiš 200[a-na e-pi-iš] da-na-ni-iš-šú (About three lines missing.) [68] Col. VI. (About four lines missing.) 208šar(?)-ḫa 209dGiš … 210i-na ṣi-ri-[šú il-li-ka-am dEn-ki-dũ] 211i-ḫa-an-ni-ib [pi-ir-ta-šú] 212it-bi-ma [il-li-ik] 213a-na pa-ni-šú 214it-tam-ḫa-ru i-na ri-bi-tum ma-ti 215dEn-ki-dũ ba-ba-am ip-ta-ri-ik 216i-na ši-pi-šú 217dGiš e-ri-ba-am ú-ul id-di-in 218iṣ-ṣa-ab-tu-ma ki-ma li-i-im 219i-lu-du 220zi-ip-pa-am ’i-bu-tu 221i-ga-rum ir-tu-tu 222dGiš ù dEn-ki-dũ 223iṣ-ṣa-ab-tu-ú-ma 224ki-ma li-i-im i-lu-du 225zi-ip-pa-am ’i-bu-tu 226i-ga-rum ir-tu-tú 227ik-mi-is-ma dGiš 228i-na ga-ag-ga-ri ši-ip-šú 229ip-ši-iḫ uz-za-šú-ma 230i-ni-iḫ i-ra-as-su 231iš-tu i-ra-su i-ni-ḫu 232dEn-ki-dũ a-na šá-ši-im 233iz-za-kàr-am a-na dGiš 234ki-ma iš-te-en-ma um-ma-ka 235ú-li-id-ka 236ri-im-tum šá su-pu-ri 237dNin-sun-na 238ul-lu e-li mu-ti ri-eš-ka 239šar-ru-tú šá ni-ši 240i-ši-im-kum dEn-lil 241 duppu 2 kam-ma 242šú-tu-ur e-li ………………… 243 4 šú-ši [62] Translation. Col. I. 1Gish sought to interpret the dream; 2Spoke to his mother: 3“My mother, during my night 4I became strong and moved about 5among the heroes; 6And from the starry heaven 7A meteor(?) of Anu fell upon me: 8I bore it and it grew heavy upon me, 9I became weak and its weight I could not endure. 10The land of Erech gathered about it. 11The heroes kissed its feet.1 12It was raised up before me. 13They stood me up.2 14I bore it and carried it to thee.” 15The mother of Gish, who knows all things, 16Spoke to Gish: 17“Some one, O Gish, who like thee 18In the field was born and 19Whom the mountain has reared, 20Thou wilt see (him) and [like a woman(?)] thou wilt rejoice. 21Heroes will kiss his feet. 22Thou wilt spare [him and wilt endeavor] 23To lead him to me.” 24He slept and saw another[63] 25Dream, which he reported to his mother: 26[“My mother,] I have seen another 27[Dream.] My likeness I have seen in the streets 28[Of Erech] of the plazas. 29An axe was brandished, and 30They gathered about him; 31And the axe made him angry. 32I saw him and I rejoiced, 33I loved him as a woman, 34I embraced him. 35I took him and regarded him 36As my brother.” 37The mother of Gish, who knows all things, 38[Spoke to Gish]: 39[“O Gish, the man whom thou sawest,] 40[Whom thou didst embrace like a woman]. Col II. 41(means) that he is to be associated with thee.” 42Gish understood the dream. 43[As] Enki[du] was sitting before the woman, 44[Her] loins(?) he embraced, her vagina(?) he opened. 45[Enkidu] forgot the place where he was born. 46Six days and seven nights 47Enkidu continued 48To cohabit with [the courtesan]. 49[The woman] opened her [mouth] and 50Spoke to Enkidu: 51“I gaze upon thee, O Enkidu, like a god art thou! 52Why with the cattle 53Dost thou [roam] across the field?[64] 54Come, let me lead thee 55into [Erech] of the plazas, 56to the holy house, the dwelling of Anu, 57O, Enkidu arise, let me conduct thee 58To Eanna, the dwelling of Anu, 59The place [where Gish is, perfect] in vitality. 60And thou [like a wife wilt embrace] him. 61Thou [wilt love him like] thyself. 62Come, arise from the ground 63(that is) cursed.” 64He heard her word and accepted her speech. 65The counsel of the woman 66Entered his heart. 67She stripped off a garment, 68Clothed him with one. 69Another garment 70She kept on herself. 71She took hold of his hand. 72Like [a god(?)] she brought him 73To the fertile meadow, 74The place of the sheepfolds. 75In that place they received food; 76[For he, Enkidu, whose birthplace was the mountain,] 77[With the gazelles he was accustomed to eat herbs,] 78[With the cattle to drink water,] 79[With the water beings he was happy.] (Perhaps one additional line missing.) Col. III. 81Milk of the cattle 82He was accustomed to suck. 83Food they placed before him, 84He broke (it) off and looked 85And gazed.[65] 86Enkidu had not known 87To eat food. 88To drink wine 89He had not been taught. 90The woman opened her mouth and 91Spoke to Enkidu: 92“Eat food, O Enkidu, 93The provender of life! 94Drink wine, the custom of the land!” 95Enkidu ate food 96Till he was satiated. 97Wine he drank, 98Seven goblets. 99His spirit was loosened, he became hilarious. 100His heart became glad and 101His face shone. 102[The barber(?)] removed 103The hair on his body. 104He was anointed with oil. 105He became manlike. 106He put on a garment, 107He was like a man. 108He took his weapon; 109Lions he attacked, 110(so that) the night shepherds could rest. 111He plunged the dagger; 112Lions he overcame. 113The great [shepherds] lay down; 114Enkidu was their protector. 115The strong man, 116The unique hero, 117To [the shepherds(?)] he speaks: (About five lines missing.) Col. IV. (About eight lines missing.) 131Making merry. 132He lifted up his eyes, 133He sees the man.[66] 134He spoke to the woman: 135“O, courtesan, lure on the man. 136Why has he come to me? 137His name I will destroy.” 138The woman called to the man 139Who approaches to him3 and he beholds him. 140“Away! why dost thou [quake(?)] 141Evil is the course of thy activity.”4 142Then he5 opened his mouth and 143Spoke to Enkidu: 144”[To have (?)] a family home 145Is the destiny of men, and 146The prerogative(?) of the nobles. 147For the city(?) load the workbaskets! 148Food supply for the city lay to one side! 149For the King of Erech of the plazas, 150Open the hymen(?), perform the marriage act! 151For Gish, the King of Erech of the plazas, 152Open the hymen(?), 153Perform the marriage act! 154With the legitimate wife one should cohabit. 155So before, 156As well as in the future.6 157By the decree pronounced by a god, 158From the cutting of his umbilical cord 159(Such) is his fate.” 160At the speech of the hero 161His face grew pale. (About three lines missing.) [67] Col. V. (About six lines missing.) 171[Enkidu] went [in front], 172And the courtesan behind him. 173He entered into Erech of the plazas. 174The people gathered about him. 175As he stood in the streets 176Of Erech of the plazas, 177The men gathered, 178Saying in regard to him: 179“Like the form of Gish he has suddenly become; 180shorter in stature. 181[In his structure high(?)], powerful, 182.......... overseeing(?) 183In the land strong of power has he become. 184Milk of cattle 185He was accustomed to suck.” 186Steadily(?) in Erech ..... 187The heroes rejoiced. 188He became a leader. 189To the hero of fine appearance, 190To Gish, like a god, 191He became a rival to him.7 192For Ishḫara a couch 193Was stretched, and 194Gish [lay down, and afterwards(?)] 195In the night he fled. 196He approaches and 197[Enkidu stood] in the streets. 198He blocked the path 199of Gish. 200At the exhibit of his power, (About three lines missing.) [68] Col. VI. (About four lines missing.) 208Strong(?) … 209Gish 210Against him [Enkidu proceeded], 211[His hair] luxuriant. 212He started [to go] 213Towards him. 214They met in the plaza of the district. 215Enkidu blocked the gate 216With his foot, 217Not permitting Gish to enter. 218They seized (each other), like oxen, 219They fought. 220The threshold they demolished; 221The wall they impaired. 222Gish and Enkidu 223Seized (each other). 224Like oxen they fought. 225The threshold they demolished; 226The wall they impaired. 227Gish bent 228His foot to the ground,8 229His wrath was appeased, 230His breast was quieted. 231When his breast was quieted, 232Enkidu to him 233Spoke, to Gish: 234“As a unique one, thy mother 235bore thee. 236The wild cow of the stall,9 237Ninsun, 238Has exalted thy head above men. 239Kingship over men 240Enlil has decreed for thee. 241Second tablet, 242enlarged beyond [the original(?)]. 243240 lines. [69] 1 I.e., paid homage to the meteor. 2 I.e., the heroes of Erech raised me to my feet, or perhaps in the sense of “supported me.” 3 I.e., Enkidu. 4 I.e., “thy way of life.” 5 I.e., the man. 6 I.e., an idiomatic phrase meaning “for all times.” 7 I.e., Enkidu became like Gish, godlike. Cf. col. 2, 11. 8 He was thrown and therefore vanquished. 9 Epithet given to Ninsun. See the commentary to the line. Commentary on the Pennsylvania Tablet. Line 1. The verb tibû with pašâru expresses the aim of Gish to secure an interpretation for his dream. This disposes of Langdon’s note 1 on page 211 of his edition, in which he also erroneously speaks of our text as “late.” Pašâru is not a variant of zakâru. Both verbs occur just as here in the Assyrian version I, 5, 25. Line 3. ina šât mušitia, “in this my night,” i.e., in the course of this night of mine. A curious way of putting it, but the expression occurs also in the Assyrian version, e.g., I, 5, 26 (parallel passage to ours) and II, 4a, 14. In the Yale tablet we find, similarly, mu-ši-it-ka (l. 262), “thy night,” i.e., “at night to thee.” Line 5. Before Langdon put down the strange statement of Gish “wandering about in the midst of omens” (misreading id-da-tim for it-lu-tim), he might have asked himself the question, what it could possibly mean. How can one walk among omens? Line 6. ka-ka-bu šá-ma-i must be taken as a compound term for “starry heaven.” The parallel passage in the Assyrian version (Tablet I, 5, 27) has the ideograph for star, with the plural sign as a variant. Literally, therefore, “The starry heaven (or “the stars in heaven”) was there,” etc. Langdon’s note 2 on page 211 rests on an erroneous reading. Line 7. kiṣru šá Anim, “mass of Anu,” appears to be the designation of a meteor, which might well be described as a “mass” coming from Anu, i.e., from the god of heaven who becomes the personification of the heavens in general. In the Assyrian version (I, 5, 28) we have kima ki-iṣ-rù, i.e., “something like a mass of heaven.” Note also I, 3, 16, where in a description of Gilgamesh, his strength is said to be “strong like a mass (i.e., a meteor) of heaven.” Line 9. For nuššašu ûl iltê we have a parallel in the Hebrew phrase נלְַפָסֵתִי נשַׂפָס (Isaiah 1, 14). Line 10. Uruk mâtum, as the designation for the district of Erech, occurs in the Assyrian version, e.g., I, 5, 31, and IV, 2, 38; also to be supplied, I, 6, 23. For paḫir the parallel in the Assyrian version has iz-za-az (I, 5, 31), but VI, 197, we find paḫ-ru and paḫ-ra. Line 17. mi-in-di does not mean “truly” as Langdon translates, but “some one.” It occurs also in the Assyrian version X, 1, 13, mi-in-di-e ma-an-nu-ṵ, “this is some one who,” etc. [70] Line 18. Cf. Assyrian version I, 5, 3, and IV, 4, 7, ina ṣiri âlid—both passages referring to Enkidu. Line 21. Cf. Assyrian version II, 3b, 38, with malkê, “kings,” as a synonym of itlutum. Line 23. ta-tar-ra-as-sú from tarâṣu, “direct,” “guide,” etc. Line 24. I take uš-ti-nim-ma as III, 2, from išênu (יָשֵׁן), the verb underlying šittu, “sleep,” and šuttu, “dream.” Line 26. Cf. Assyrian version I, 6, 21—a complete parallel. Line 28. Uruk ri-bi-tim, the standing phrase in both tablets of the old Babylonian version, for which in the Assyrian version we have Uruk su-pu-ri. The former term suggests the “broad space” outside of the city or the “common” in a village community, while supûri, “enclosed,” would refer to the city within the walls. Dr. W. F. Albright (in a private communication) suggests “Erech of the plazas” as a suitable translation for Uruk ribîtim. A third term, Uruk mâtum (see above, note to line 10), though designating rather the district of which Erech was the capital, appears to be used as a synonym to Uruk ribîtim, as may be concluded from the phrase i-na ri-bi-tum ma-ti (l. 214 of the Pennsylvania tablet), which clearly means the “plaza” of the city. One naturally thinks of רְחֹבֹת עִיר in Genesis 10, 11—the equivalent of Babylonian ri-bi-tu âli—which can hardly be the name of a city. It appears to be a gloss, as is הִיַפָס הָעִיּר הַגְּדֹלָה at the end of v. 12. The latter gloss is misplaced, since it clearly describes “Nineveh,” mentioned in v. 11. Inasmuch as רְחֹבֹת עִיר immediately follows the mention of Nineveh, it seems simplest to take the phrase as designating the “outside” or “suburbs” of the city, a complete parallel, therefore, to ri-bi-tu mâti in our text. Nineveh, together with the “suburbs,” forms the “great city.” Uruk ribîtim is, therefore, a designation for “greater Erech,” proper to a capital city, which by its gradual growth would take in more than its original confines. “Erech of the plazas” must have come to be used as a honorific designation of this important center as early as 2000 B. C., whereas later, perhaps because of its decline, the epithet no longer seemed appropriate and was replaced by the more modest designation of “walled Erech,” with an allusion to the tradition which ascribed the building of the wall of the city to Gilgamesh. At all [71]events, all three expressions, “Erech of the plazas,” “Erech walled” and “Erech land,” are to be regarded as synonymous. The position once held by Erech follows also from its ideographic designation (Brünnow No. 4796) by the sign “house” with a “gunufied” extension, which conveys the idea of Unu = šubtu, or “dwelling” par excellence. The pronunciation Unug or Unuk (see the gloss u-nu-uk, VR 23, 8a), composed of unu, “dwelling,” and ki, “place,” is hardly to be regarded as older than Uruk, which is to be resolved into uru, “city,” and ki, “place,” but rather as a play upon the name, both Unu + ki and Uru + ki conveying the same idea of the city or the dwelling place par excellence. As the seat of the second oldest dynasty according to Babylonian traditions (see Poebel’s list in Historical and Grammatical Texts No. 2), Erech no doubt was regarded as having been at one time “the city,” i.e., the capital of the entire Euphrates Valley. Line 31. A difficult line for which Langdon proposes the translation: “Another axe seemed his visage”!!—which may be picturesque, but hardly a description befitting a hero. How can a man’s face seem to be an axe? Langdon attaches šá-ni in the sense of “second” to the preceding word “axe,” whereas šanî bunušu, “change of his countenance” or “his countenance being changed,” is to be taken as a phrase to convey the idea of “being disturbed,” “displeased” or “angry.” The phrase is of the same kind as the well-known šunnu ṭêmu, “changing of reason,” to denote “insanity.” See the passages in Muss-Arnolt, Assyrian Dictionary, pp. 355 and 1068. In Hebrew, too, we have the same two phrases, e.g., וַיְשַׁנֹּו ַפָסֶת־טַעְמֹו (I Sam. 21, 14 = Ps. 34, 1), “and he changed his reason,” i.e., feigned insanity and מְשַׁנֶּה פָּנָיו (Job 14, 20), “changing his face,” to indicate a radical alteration in the frame of mind. There is a still closer parallel in Biblical Aramaic: Dan. 3, 19, “The form of his visage was changed,” meaning “he was enraged.” Fortunately, the same phrase occurs also in the Yale tablet (l. 192), šá-nu-ú bu-nu-šú, in a connection which leaves no doubt that the aroused fury of the tyrant Ḫuwawa is described by it: ”Ḫuwawa heard and his face was changed” precisely, therefore, as we should say—following Biblical usage—“his countenance fell.” Cf. also the phrase pânušu arpu, “his countenance [72]was darkened” (Assyrian version I, 2, 48), to express “anger.” The line, therefore, in the Pennsylvania tablet must describe Enkidu’s anger. With the brandishing of the axe the hero’s anger was also stirred up. The touch was added to prepare us for the continuation in which Gish describes how, despite this (or perhaps just because of it), Enkidu seemed so attractive that Gish instantly fell in love with him. May perhaps the emphatic form ḫaṣinumma (line 31) against ḫaṣinu (line 29) have been used to indicate “The axe it was,” or “because of the axe?” It would be worth while to examine other texts of the Hammurabi period with a view of determining the scope in the use and meaning of the emphatic ma when added to a substantive. Line 32. The combination amur ù aḫtadu occurs also in the El-Amarna Letters, No. 18, 12. Line 34. In view of the common Hebrew, Syriac and Arabic חָבַב “to love,” it seems preferable to read here, as in the other passages in the Assyrian versions (I, 4, 15; 4, 35; 6, 27, etc.), a-ḫa-ab-bu-ub, aḫ-bu-ub, iḫ-bu-bu, etc. (instead of with p), and to render “embrace.” Lines 38–40, completing the column, may be supplied from the Assyrian version I, 6, 30–32, in conjunction with lines 33–34 of our text. The beginning of line 32 in Jensen’s version is therefore to be filled out [ta-ra-am-šú ki]-i. Line 43. The restoration at the beginning of this line En-ki-[dũ wa]-ši-ib ma-ḫar ḫa-ri-im-tim enables us to restore also the beginning of the second tablet of the Assyrian version (cf. the colophon of the fragment 81, 7–27, 93, in Jeremias, Izdubar-Nimrod, plate IV = Jensen, p. 134), [dEn-ki-dũ wa-ši-ib] ma-ḫar-šá. Line 44. The restoration of this line is largely conjectural, based on the supposition that its contents correspond in a general way to I, 4, 16, of the Assyrian version. The reading di-da is quite certain, as is also ip-ti-[e]; and since both words occur in the line of the Assyrian version in question, it is tempting to supply at the beginning ur-[šá] = “her loins” (cf. Holma, Namen der Körperteile, etc., p. 101), which is likewise found in the same line of the Assyrian version. At all events the line describes the fascination exercised [73]upon Enkidu by the woman’s bodily charms, which make him forget everything else. Lines 46–47 form a parallel to I, 4, 21, of the Assyrian version. The form šamkatu, “courtesan,” is constant in the old Babylonian version (ll. 135 and 172), as against šamḫatu in the Assyrian version (I, 3, 19, 40, 45; 4, 16), which also uses the plural šam-ḫa-a-ti (II, 3b, 40). The interchange between ḫ and k is not without precedent (cf. Meissner, Altbabylonisches Privatrecht, page 107, note 2, and more particularly Chiera, List of Personal Names, page 37). In view of the evidence, set forth in the Introduction, for the assumption that the Enkidu story has been combined with a tale of the evolution of primitive man to civilized life, it is reasonable to suggest that in the original Enkidu story the female companion was called šamkatu, “courtesan,” whereas in the tale of the primitive man, which was transferred to Enkidu, the associate was ḫarimtu, a “woman,” just as in the Genesis tale, the companion of Adam is simply called ishshâ, “woman.” Note that in the Assyrian parallel (Tablet I, 4, 26) we have two readings, ir-ḫi (imperf.) and a variant i-ri-ḫi (present). The former is the better reading, as our tablet shows. Lines 49–59 run parallel to the Assyrian version I, 4, 33–38, with slight variations which have been discussed above, p. 58, and from which we may conclude that the Assyrian version represents an independent redaction. Since in our tablet we have presumably the repetition of what may have been in part at least set forth in the first tablet of the old Babylonian version, we must not press the parallelism with the first tablet of the Assyrian version too far; but it is noticeable nevertheless (1) that our tablet contains lines 57–58 which are not represented in the Assyrian version, and (2) that the second speech of the “woman” beginning, line 62, with al-ka, “come” (just as the first speech, line 54), is likewise not found in the first tablet of the Assyrian version; which on the other hand contains a line (39) not in the Babylonian version, besides the detailed answer of Enkidu (I 4, 42–5, 5). Line 6, which reads “Enkidu and the woman went (il-li-ku) to walled Erech,” is also not found in the second tablet of the old Babylonian version. Line 63. For magrû, “accursed,” see the frequent use in Astrological texts (Jastrow, Religion Babyloniens und Assyriens II, page [74]450, note 2). Langdon, by his strange error in separating ma-a-ag-ri-im into two words ma-a-ak and ri-i-im, with a still stranger rendering: “unto the place yonder of the shepherds!!”, naturally misses the point of this important speech. Line 64 corresponds to I, 4, 40, of the Assyrian version, which has an additional line, leading to the answer of Enkidu. From here on, our tablet furnishes material not represented in the Assyrian version, but which was no doubt included in the second tablet of that version of which we have only a few fragments. Line 70 must be interpreted as indicating that the woman kept one garment for herself. Ittalbaš would accordingly mean, “she kept on.” The female dress appears to have consisted of an upper and a lower garment. Line 72. The restoration “like a god” is favored by line 51, where Enkidu is likened to a god, and is further confirmed by l. 190. Line 73. gupru is identical with gu-up-ri (Thompson, Reports of the Magicians and Astrologers, etc., 223 rev. 2 and 223a rev. 8), and must be correlated to gipâru (Muss-Arnolt, Assyrian Dictionary, p. 229a), “planted field,” “meadow,” and the like. Thompson’s translation “men” (as though a synonym of gabru) is to be corrected accordingly. Line 74. There is nothing missing between a-šar and tar-ba-ṣi-im. Line 75. ri-ia-ú, which Langdon renders “shepherd,” is the equivalent of the Arabic riʿy and Hebrew רְעִי “pasturage,” “fodder.” We have usually the feminine form ri-i-tu (Muss-Arnolt, Assyrian Dictionary, p. 990b). The break at the end of the second column is not serious. Evidently Enkidu, still accustomed to live like an animal, is first led to the sheepfolds, and this suggests a repetition of the description of his former life. Of the four or five lines missing, we may conjecturally restore four, on the basis of the Assyrian version, Tablet I, 4, 2–5, or I, 2, 39–41. This would then join on well to the beginning of column 3. Line 81. Both here and in l. 52 our text has na-ma-áš-te-e, as against nam-maš-ši-i in the Assyrian version, e.g., Tablet I, 2, 41; 4, 5, etc.,—the feminine form, therefore, as against the masculine. Langdon’s note 3 on page 213 is misleading. In astrological texts we also find nam-maš-te; e.g., Thompson, Reports of the Magicians and Astrologers, etc., No. 200, Obv. 2. [75] Line 93. zi-ma-at (for simat) ba-la-ṭi-im is not “conformity of life” as Langdon renders, but that which “belongs to life” like si-mat pag-ri-šá, “belonging to her body,” in the Assyrian version III, 2a, 3 (Jensen, page 146). “Food,” says the woman, “is the staff of life.” Line 94. Langdon’s strange rendering “of the conditions and fate of the land” rests upon an erroneous reading (see the corrections, Appendix I), which is the more inexcusable because in line 97 the same ideogram, Kàš = šikaru, “wine,” occurs, and is correctly rendered by him. Šimti mâti is not the “fate of the land,” but the “fixed custom of the land.” Line 98. aṣ-ṣa-mi-im (plural of aṣṣamu), which Langdon takes as an adverb in the sense of “times,” is a well-known word for a large “goblet,” which occurs in Incantation texts, e.g., CT XVI, 24, obv. 1, 19, mê a-ṣa-am-mi-e šú-puk, “pour out goblets of water.” Line 18 of the passage shoves that aṣammu is a Sumerian loan word. Line 99. it-tap-šar, I, 2, from pašâru, “loosen.” In combination with kabtatum (from kabitatum, yielding two forms: kabtatum, by elision of i, and kabittu, by elision of a), “liver,” pašâru has the force of becoming cheerful. Cf. ka-bit-ta-ki lip-pa-šir (ZA V., p. 67, line 14). Line 100, note the customary combination of “liver” (kabtatum) and “heart” (libbu) for “disposition” and “mind,” just as in the standing phrase in penitential prayers: “May thy liver be appeased, thy heart be quieted.” Line 102. The restoration [lùŠÚ]-I = gallabu “barber” (Delitzsch, Sumer. Glossar, p. 267) was suggested to me by Dr. H. F. Lutz. The ideographic writing “raising the hand” is interesting as recalling the gesture of shaving or cutting. Cf. a reference to a barber in Lutz, Early Babylonian Letters from Larsa, No. 109, 6. Line 103. Langdon has correctly rendered šuḫuru as “hair,” and has seen that we have here a loan-word from the Sumerian Suḫur = kimmatu, “hair,” according to the Syllabary Sb 357 (cf. Delitzsch, Sumer. Glossar., p. 253). For kimmatu, “hair,” more specifically hair of the head and face, see Holma, Namen der Körperteile, page 3. The same sign Suḫur or Suḫ (Brünnow No. 8615), with Lal, i.e., “hanging hair,” designates the “beard” (ziḳnu, cf. Brünnow, No. 8620, and Holma, l. c., p. 36), and it is interesting to [76]note that we have šuḫuru (introduced as a loan-word) for the barbershop, according to II R, 21, 27c (= CT XII, 41). Ê suḫur(ra) (i.e., house of the hair) = šú-ḫu-ru. In view of all this, we may regard as assured Holma’s conjecture to read šú-[ḫur-ma-šú] in the list 93074 obv. (MVAG 1904, p. 203; and Holma, Beiträge z. Assyr. Lexikon, p. 36), as the Akkadian equivalent to Suḫur-Maš-Ḫa and the name of a fish, so called because it appeared to have a double “beard” (cf. Holma, Namen der Körperteile). One is tempted, furthermore, to see in the difficult word שכירה (Isaiah 7, 20) a loan-word from our šuḫuru, and to take the words ַפָסֶת־הָרַֹפָסשׁ וְשַׂעַר הָרַגְלַיִם “the head and hair of the feet” (euphemistic for the hair around the privates), as an explanatory gloss to the rare word שכירה for “hair” of the body in general—just as in the passage in the Pennsylvania tablet. The verse in Isaiah would then read, “The Lord on that day will shave with the razor the hair (השכירה), and even the beard will be removed.” The rest of the verse would represent a series of explanatory glosses: (a) “Beyond the river” (i.e., Assyria), a gloss to יְגַלַּח (b) “with the king of Assyria,” a gloss to בְּתַעַר “with a razor;” and (c) “the hair of the head and hair of the feet,” a gloss to השכירה. For “hair of the feet” we have an interesting equivalent in Babylonian šu-ḫur (and šú-ḫu-ur) šêpi (CT XII, 41, 23–24 c-d). Cf. also Boissier, Documents Assyriens relatifs aux Présages, p. 258, 4–5. The Babylonian phrase is like the Hebrew one to be interpreted as a euphemism for the hair around the male or female organ. To be sure, the change from ה to כ in השכירה constitutes an objection, but not a serious one in the case of a loan-word, which would aim to give the pronunciation of the original word, rather than the correct etymological equivalent. The writing with aspirated כ fulfills this condition. (Cf. šamkatum and šamḫatum, above p. 73). The passage in Isaiah being a reference to Assyria, the prophet might be tempted to use a foreign word to make his point more emphatic. To take השכירה as “hired,” as has hitherto been done, and to translate “with a hired razor,” is not only to suppose a very wooden metaphor, but is grammatically difficult, since השכירח would be a feminine adjective attached to a masculine substantive. Coming back to our passage in the Pennsylvania tablet, it is to [77]be noted that Enkidu is described as covered “all over his body with hair” (Assyrian version, Tablet I, 2, 36) like an animal. To convert him into a civilized man, the hair is removed. Line 107. mutu does not mean “husband” here, as Langdon supposes, but must be taken as in l. 238 in the more general sense of “man,” for which there is good evidence. Line 109. la-bi (plural form) are “lions”—not “panthers” as Langdon has it. The verb ú-gi-ir-ri is from gâru, “to attack.” Langdon by separating ú from gi-ir-ri gets a totally wrong and indeed absurd meaning. See the corrections in the Appendix. He takes the sign ú for the copula (!!) which of course is impossible. Line 110. Read uš-sa-ak-pu, III, 1, of sakâpu, which is frequently used for “lying down” and is in fact a synonym of ṣalâlu. See Muss-Arnolt, Assyrian Dictionary, page 758a. The original has very clearly Síb (= rê’u, “shepherd”) with the plural sign. The “shepherds of the night,” who could now rest since Enkidu had killed the lions, are of course the shepherds who were accustomed to watch the flocks during the night. Line 111. ut-tap-pi-iš is II, 2, napâšu, “to make a hole,” hence “to plunge” in connection with a weapon. Šib-ba-ri is, of course, not “mountain goats,” as Langdon renders, but a by-form to šibbiru, “stick,” and designates some special weapon. Since on seal cylinders depicting Enkidu killing lions and other animals the hero is armed with a dagger, this is presumably the weapon šibbaru. Line 113. Langdon’s translation is again out of the question and purely fanciful. The traces favor the restoration na-ki-[di-e], “shepherds,” and since the line appears to be a parallel to line 110, I venture to suggest at the beginning [it-ti]-lu from na’âlu, “lie down”—a synonym, therefore, to sakâpu in line 110. The shepherds can sleep quietly after Enkidu has become the “guardian” of the flocks. In the Assyrian version (tablet II, 3a, 4) Enkidu is called a na-kid, “shepherd,” and in the preceding line we likewise have lùNa-Kid with the plural sign, i.e., “shepherds.” This would point to nakidu being a Sumerian loan-word, unless it is vice versa, a word that has gone over into the Sumerian from Akkadian. Is perhaps the fragment in question (K 8574) in the Assyrian version (Haupt’s ed. No. 25) the parallel to our passage? If in line 4 of this fragment we could read šú for sa, i.e., na-kid-šú-nu, “their shepherd, we would have a [78]parallel to line 114 of the Pennsylvania tablet, with na-kid as a synonym to maṣṣaru, “protector.” The preceding line would then be completed as follows: [it-ti-lu]-nim-ma na-kidmeš [ra-bu-tum] (or perhaps only it-ti-lu-ma, since the nim is not certain) and would correspond to line 113 of the Pennsylvania tablet. Inasmuch as the writing on the tiny fragment is very much blurred, it is quite possible that in line 2 we must read šib-ba-ri (instead of bar-ba-ri), which would furnish a parallel to line 111 of the Pennsylvania tablet. The difference between Bar and Šib is slight, and the one sign might easily be mistaken for the other in the case of close writing. The continuation of line 2 of the fragment would then correspond to line 112 of the Pennsylvania tablet, while line 1 of the fragment might be completed [re-e]-u-ti(?) šá [mu-ši-a-tim], though this is by no means certain. The break at the close of column 3 (about 5 lines) and the top of column 4 (about 8 lines) is a most serious interruption in the narrative, and makes it difficult to pick up the thread where the tablet again becomes readable. We cannot be certain whether the “strong man, the unique hero” who addresses some one (lines 115–117) is Enkidu or Gish or some other personage, but presumably Gish is meant. In the Assyrian version, Tablet I, 3, 2 and 29, we find Gilgamesh described as the “unique hero” and in l. 234 of the Pennsylvania tablet Gish is called “unique,” while again, in the Assyrian version, Tablet I, 2, 15 and 26, he is designated as gašru as in our text. Assuming this, whom does he address? Perhaps the shepherds? In either case he receives an answer that rejoices him. If the fragment of the Assyrian version (K 8574) above discussed is the equivalent to the close of column 3 of the Pennsylvania tablet, we may go one step further, and with some measure of assurance assume that Gish is told of Enkidu’s exploits and that the latter is approaching Erech. This pleases Gish, but Enkidu when he sees Gish(?) is stirred to anger and wants to annihilate him. At this point, the “man” (who is probably Gish, though the possibility of a third personage must be admitted) intervenes and in a long speech sets forth the destiny and higher aims of mankind. The contrast between Enkidu and Gish (or the third party) is that between the primitive [79]savage and the civilized being. The contrast is put in the form of an opposition between the two. The primitive man is the stronger and wishes to destroy the one whom he regards as a natural foe and rival. On the other hand, the one who stands on a higher plane wants to lift his fellow up. The whole of column 4, therefore, forms part of the lesson attached to the story of Enkidu, who, identified with man in a primitive stage, is made the medium of illustrating how the higher plane is reached through the guiding influences of the woman’s hold on man, an influence exercised, to be sure, with the help of her bodily charms. Line 135. uk-ki-ši (imperative form) does not mean “take away,” as Langdon (who entirely misses the point of the whole passage) renders, but on the contrary, “lure him on,” “entrap him,” and the like. The verb occurs also in the Yale tablet, ll. 183 and 186. Line 137. Langdon’s note to lu-uš-šú had better be passed over in silence. The form is II. 1, from ešû, “destroy.” Line 139. Since the man whom the woman calls approaches Enkidu, the subject of both verbs is the man, and the object is Enkidu; i.e., therefore, “The man approaches Enkidu and beholds him.” Line 140. Langdon’s interpretation of this line again is purely fanciful. E-di-il cannot, of course, be a “phonetic variant” of edir; and certainly the line does not describe the state of mind of the woman. Lines 140–141 are to be taken as an expression of amazement at Enkidu’s appearance. The first word appears to be an imperative in the sense of “Be off,” “Away,” from dâlu, “move, roam.” The second word e-eš, “why,” occurs with the same verb dâlu in the Meissner fragment: e-eš ta-da-al (column 3, 1), “why dost thou roam about?” The verb at the end of the line may perhaps be completed to ta-ḫi-il-la-am. The last sign appears to be am, but may be ma, in which case we should have to complete simply ta-ḫi-il-ma. Taḫîl would be the second person present of ḫîlu. Cf. i-ḫi-il, frequently in astrological texts, e.g., Virolleaud, Adad No. 3, lines 21 and 33. Line 141. The reading lim-nu at the beginning, instead of Langdon’s mi-nu, is quite certain, as is also ma-na-aḫ-ti-ka instead of what Langdon proposes, which gives no sense whatever. Manaḫtu in the sense of the “toil” and “activity of life” (like עָמָל throughout the Book of Ecclesiastes) occurs in the introductory lines to [80]the Assyrian version of the Epic I, 1, 8, ka-lu ma-na-aḫ-ti-[šu], “all of his toil,” i.e., all of his career. Line 142. The subject of the verb cannot be the woman, as Langdon supposes, for the text in that case, e.g., line 49, would have said pi-šá (“her mouth”) not pi-šú (“his mouth”). The long speech, detailing the function and destiny of civilized man, is placed in the mouth of the man who meets Enkidu. In the Introduction it has been pointed out that lines 149 and 151 of the speech appear to be due to later modifications of the speech designed to connect the episode with Gish. Assuming this to be the case, the speech sets forth the following five distinct aims of human life: (1) establishing a home (line 144), (2) work (line 147), (3) storing up resources (line 148), (4) marriage (line 150), (5) monogamy (line 154); all of which is put down as established for all time by divine decree (lines 155–157), and as man’s fate from his birth (lines 158–159). Line 144. bi-ti-iš e-mu-ti is for bîti šá e-mu-ti, just as ḳab-lu-uš Ti-a-ma-ti (Assyrian Creation Myth, IV, 65) stands for ḳablu šá Tiamti. Cf. bît e-mu-ti (Assyrian version, IV, 2, 46 and 48). The end of the line is lost beyond recovery, but the general sense is clear. Line 146. tu-a-ar is a possible reading. It may be the construct of tu-a-ru, of frequent occurrence in legal texts and having some such meaning as “right,” “claim” or “prerogative.” See the passages given by Muss-Arnolt, Assyrian Dictionary, p. 1139b. Line 148. The reading uk-la-at, “food,” and then in the wider sense “food supply,” “provisions,” is quite certain. The fourth sign looks like the one for “city.” E-mi-sa may stand for e-mid-sa, “place it.” The general sense of the line, at all events, is clear, as giving the advice to gather resources. It fits in with the Babylonian outlook on life to regard work and wealth as the fruits of work and as a proper purpose in life. Line 150 (repeated lines 152–153) is a puzzling line. To render piti pûk epši (or epiši), as Langdon proposes, “open, addressing thy speech,” is philologically and in every other respect inadmissible. The word pu-uk (which Langdon takes for “thy mouth”!!) can, of course, be nothing but the construct form of pukku, which occurs in the Assyrian version in the sense of “net” (pu-uk-ku I, 2, 9 and 21, and also in the colophon to the eleventh tablet furnishing the [81]beginning of the twelfth tablet (Haupt’s edition No. 56), as well as in column 2, 29, and column 3, 6, of this twelfth tablet). In the two last named passages pukku is a synonym of mekû, which from the general meaning of “enclosure” comes to be a euphemistic expression for the female organ. So, for example, in the Assyrian Creation Myth, Tablet IV, 66 (synonym of ḳablu, “waist,” etc.). See Holma, Namen der Körperteile, page 158. Our word pukku must be taken in this same sense as a designation of the female organ—perhaps more specifically the “hymen” as the “net,” though the womb in general might also be designated as a “net” or “enclosure.” Kak-(ši) is no doubt to be read epši, as Langdon correctly saw; or perhaps better, epiši. An expression like ip-ši-šú lul-la-a (Assyrian version, I, 4, 13; also line 19, i-pu-us-su-ma lul-la-a), with the explanation šipir zinništi, “the work of woman” (i.e., after the fashion of woman), shows that epêšu is used in connection with the sexual act. The phrase pitî pûk epiši a-na ḫa-a-a-ri, literally “open the net, perform the act for marriage,” therefore designates the fulfillment of the marriage act, and the line is intended to point to marriage with the accompanying sexual intercourse as one of the duties of man. While the general meaning is thus clear, the introduction of Gish is puzzling, except on the supposition that lines 149 and 151 represent later additions to connect the speech, detailing the advance to civilized life, with the hero. See above, p. 45 seq. Line 154. aššat šimâtim is the “legitimate wife,” and the line inculcates monogamy as against promiscuous sexual intercourse. We know that monogamy was the rule in Babylonia, though a man could in addition to the wife recognized as the legalized spouse take a concubine, or his wife could give her husband a slave as a concubine. Even in that case, according to the Hammurabi Code, §§145–146, the wife retained her status. The Code throughout assumes that a man has only one wife—the aššat šimâtim of our text. The phrase “so” (or “that”) before “as afterwards” is to be taken as an idiomatic expression—“so it was and so it should be for all times”—somewhat like the phrase maḫriam ù arkiam, “for all times,” in legal documents (CT VIII, 38c, 22–23). For the use of mûk see Behrens, Assyrisch-Babylonische Briefe, p. 3. Line 158. i-na bi-ti-iḳ a-bu-un-na-ti-šú. Another puzzling line, for which Langdon proposes “in the work of his presence,” which [82]is as obscure as the original. In a note he says that apunnâti means “nostrils,” which is certainly wrong. There has been considerable discussion about this term (see Holma, Namen der Körperteile, pages 150 and 157), the meaning of which has been advanced by Christian’s discussion in OLZ 1914, p. 397. From this it appears that it must designate a part of the body which could acquire a wider significance so as to be used as a synonym for “totality,” since it appears in a list of equivalent for Dur = nap-ḫa-ru, “totality,” ka-lu-ma, “all,” a-bu-un-na-tum e-ṣi-im-tum, “bony structure,” and kul-la-tum, “totality” (CT XII, 10, 7–10). Christian shows that it may be the “navel,” which could well acquire a wider significance for the body in general; but we may go a step further and specify the “umbilical cord” (tentatively suggested also by Christian) as the primary meaning, then the “navel,” and from this the “body” in general. The structure of the umbilical cord as a series of strands would account for designating it by a plural form abunnâti, as also for the fact that one could speak of a right and left side of the appunnâti. To distinguish between the “umbilical cord” and the “navel,” the ideograph Dur (the common meaning of which is riksu, “bond” [Delitzsch, Sumer. Glossar., p. 150]), was used for the former, while for the latter Li Dur was employed, though the reading in Akkadian in both cases was the same. The expression “with (or at) the cutting of his umbilical cord” would mean, therefore, “from his birth”—since the cutting of the cord which united the child with the mother marks the beginning of the separate life. Lines 158–159, therefore, in concluding the address to Enkidu, emphasize in a picturesque way that what has been set forth is man’s fate for which he has been destined from birth. [See now Albright’s remarks on abunnatu in the Revue d’Assyriologie 16, pp. 173–175, with whose conclusion, however, that it means primarily “backbone” and then “stature,” I cannot agree.] In the break of about three lines at the bottom of column 4, and of about six at the beginning of column 5, there must have been set forth the effect of the address on Enkidu and the indication of his readiness to accept the advice; as in a former passage (line 64), Enkidu showed himself willing to follow the woman. At all events the two now proceed to the heart of the city. Enkidu is in front [83]and the woman behind him. The scene up to this point must have taken place outside of Erech—in the suburbs or approaches to the city, where the meadows and the sheepfolds were situated. Line 174. um-ma-nu-um are not the “artisans,” as Langdon supposes, but the “people” of Erech, just as in the Assyrian version, Tablet IV, 1, 40, where the word occurs in connection with i-dip-pi-ir, which is perhaps to be taken as a synonym of paḫâru, “gather;” so also i-dip-pir (Tablet I, 2, 40) “gathers with the flock.” Lines 180–182 must have contained the description of Enkidu’s resemblance to Gish, but the lines are too mutilated to permit of any certain restoration. See the corrections (Appendix) for a suggested reading for the end of line 181. Line 183 can be restored with considerable probability on the basis of the Assyrian version, Tablet I, 3, 3 and 30, where Enkidu is described as one “whose power is strong in the land.” Lines 186–187. The puzzling word, to be read apparently kak-ki-a-tum, can hardly mean “weapons,” as Langdon proposes. In that case we should expect kakkê; and, moreover, to so render gives no sense, especially since the verb ú-te-el-li-lu is without much question to be rendered “rejoiced,” and not “purified.” Kakkiatum—if this be the correct reading—may be a designation of Erech like ribîtim. Lines 188–189 are again entirely misunderstood by Langdon, owing to erroneous readings. See the corrections in the Appendix. Line 190. i-li-im in this line is used like Hebrew Elohîm, “God.” Line 191. šakiššum = šakin-šum, as correctly explained by Langdon. Line 192. With this line a new episode begins which, owing to the gap at the beginning of column 6, is somewhat obscure. The episode leads to the hostile encounter between Gish and Enkidu. It is referred to in column 2 of the fourth tablet of the Assyrian version. Lines 35–50—all that is preserved of this column—form in part a parallel to columns 5–6 of the Pennsylvania tablet, but in much briefer form, since what on the Pennsylvania tablet is the incident itself is on the fourth tablet of the Assyrian version merely a repeated summary of the relationship between the two heroes, leading up to the expedition against Ḫu(m)baba. Lines 38–40 of [84]column 2 of the Assyrian version correspond to lines 174–177 of the Pennsylvania tablet, and lines 44–50 to lines 192–221. It would seem that Gish proceeds stealthily at night to go to the goddess Ishḫara, who lies on a couch in the bît êmuti , the “family house” Assyrian version, Tablet IV, 2. 46–48). He encounters Enkidu in the street, and the latter blocks Gish’s path, puts his foot in the gate leading to the house where the goddess is, and thus prevents Gish from entering. Thereupon the two have a fierce encounter in which Gish is worsted. The meaning of the episode itself is not clear. Does Enkidu propose to deprive Gish, here viewed as a god (cf. line 190 of the Pennsylvania tablet = Assyrian version, Tablet I, 4, 45, “like a god”), of his spouse, the goddess Ishḫara—another form of Ishtar? Or are the two heroes, the one a counterpart of the other, contesting for the possession of a goddess? Is it in this scene that Enkidu becomes the “rival” (me-iḫ-rù, line 191 of the Pennsylvania tablet) of the divine Gish? We must content ourself with having obtained through the Pennsylvania tablet a clearer indication of the occasion of the fight between the two heroes, and leave the further explanation of the episode till a fortunate chance may throw additional light upon it. There is perhaps a reference to the episode in the Assyrian version, Tablet II, 3b, 35–36. Line 196. For i-na-ag-šá-am (from nagâšu), Langdon proposes the purely fanciful “embracing her in sleep,” whereas it clearly means “he approaches.” Cf. Muss-Arnolt, Assyrian Dictionary, page 645a. Lines 197–200 appear to correspond to Tablet IV, 2, 35–37, of the Assyrian version, though not forming a complete parallel. We may therefore supply at the beginning of line 35 of the Assyrian version [ittaziz] Enkidu, corresponding to line 197 of the Pennsylvania tablet. Line 36 of IV, 2, certainly appears to correspond to line 200 (dan-nu-ti = da-na-ni-iš-šú). Line 208. The first sign looks more like šar, though ur is possible. Line 211 is clearly a description of Enkidu, as is shown by a comparison with the Assyrian version I, 2, 37: [pi]-ti-ik pi-ir-ti-šú uḫ-tan-na-ba kima dNidaba, “The form of his hair sprouted like wheat.” We must therefore supply Enkidu in the preceding line. Tablet IV, 4, 6, of the Assyrian version also contains a reference to the flowing hair of Enkidu. [85] Line 212. For the completion of the line cf. Harper, Assyrian and Babylonian Letters, No. 214. Line 214. For ribîtu mâti see the note above to line 28 of column 1. Lines 215–217 correspond almost entirely to the Assyrian version IV, 2, 46–48. The variations ki-ib-su in place of šêpu, and kima lîm, “like oxen,” instead of ina bâb êmuti (repeated from line 46), ana šurûbi for êribam, are slight though interesting. The Assyrian version shows that the “gate” in line 215 is “the gate of the family house” in which the goddess Ishḫara lies. Lines 218–228. The detailed description of the fight between the two heroes is only partially preserved in the Assyrian version. Line 218. li-i-im is evidently to be taken as plural here as in line 224, just as su-ḳi-im (lines 27 and 175), ri-bi-tim (lines 4, 28, etc.), tarbaṣim (line 74), aṣṣamim (line 98) are plural forms. Our text furnishes, as does also the Yale tablet, an interesting illustration of the vacillation in the Hammurabi period in the twofold use of im: (a) as an indication of the plural (as in Hebrew), and (b) as a mere emphatic ending (lines 63, 73, and 232), which becomes predominant in the post-Hammurabi age. Line 227. Gilgamesh is often represented on seal cylinders as kneeling, e.g., Ward Seal Cylinders Nos. 159, 160, 165. Cf. also Assyrian version V, 3, 6, where Gilgamesh is described as kneeling, though here in prayer. See further the commentary to the Yale tablet, line 215. Line 229. We must of course read uz-za-šú, “his anger,” and not uṣ-ṣa-šú, “his javelin,” as Langdon does, which gives no sense. Line 231. Langdon’s note is erroneous. He again misses the point. The stem of the verb here as in line 230 (i-ni-iḫ) is the common nâḫu, used so constantly in connection with pašâḫu, to designate the cessation of anger. Line 234. ištên applied to Gish designates him of course as “unique,” not as “an ordinary man,” as Langdon supposes. Line 236. On this title “wild cow of the stall” for Ninsun, see Poebel in OLZ 1914, page 6, to whom we owe the correct view regarding the name of Gilgamesh’s mother. Line 238. mu-ti here cannot mean “husband,” but “man” in [86]general. See above note to line 107. Langdon’s strange misreading ri-eš-su for ri-eš-ka (“thy head”) leads him again to miss the point, namely that Enkidu comforts his rival by telling him that he is destined for a career above that of the ordinary man. He is to be more than a mere prize fighter; he is to be a king, and no doubt in the ancient sense, as the representative of the deity. This is indicated by the statement that the kingship is decreed for him by Enlil. Similarly, Ḫu(m)baba or Ḫuwawa is designated by Enlil to inspire terror among men (Assyrian version, Tablet IV, 5, 2 and 5), i-šim-šú dEnlil = Yale tablet, l. 137, where this is to be supplied. This position accorded to Enlil is an important index for the origin of the Epic, which is thus shown to date from a period when the patron deity of Nippur was acknowledged as the general head of the pantheon. This justifies us in going back several centuries at least before Hammurabi for the beginning of the Gilgamesh story. If it had originated in the Hammurabi period, we should have had Marduk introduced instead of Enlil. Line 242. As has been pointed out in the corrections to the text (Appendix), šú-tu-ur can only be III, 1, from atâru, “to be in excess of.” It is a pity that the balance of the line is broken off, since this is the first instance of a colophon beginning with the term in question. In some way šutûr must indicate that the copy of the text has been “enlarged.” It is tempting to fill out the line šú-tu-ur e-li [duppi labiri], and to render “enlarged from an original,” as an indication of an independent recension of the Epic in the Hammurabi period. All this, however, is purely conjectural, and we must patiently hope for more tablets of the Old Babylonian version to turn up. The chances are that some portions of the same edition as the Yale and Pennsylvania tablets are in the hands of dealers at present or have been sold to European museums. The war has seriously interfered with the possibility of tracing the whereabouts of groups of tablets that ought never to have been separated. [87] Yale Tablet. Transliteration. (About ten lines missing.) Col. I. 11.................. [ib]-ri(?) 12[mi-im-ma(?) šá(?)]-kú-tu wa(?)-ak-rum 13[am-mi-nim] ta-aḫ-ši-iḫ 14[an-ni]-a-am [e-pi]-šá-am 15...... mi-im[-ma šá-kú-tu(?)]ma- 16di-iš 17[am-mi]-nim [taḫ]-ši-iḫ 18[ur(?)]-ta-du-ú [a-na ki-i]š-tim 19ši-ip-ra-am it-[ta-šú]-ú i-na [nišê] 20it-ta-áš-šú-ú-ma 21i-pu-šú ru-ḫu-tam 22.................. uš-ta-di-nu 23............................. bu 24............................... (About 17 lines missing.) 40.............. nam-........ 41.................... u ib-[ri] ..... 42.............. ú-na-i-du ...... 43[zi-ik]-ra-am ú-[tí-ir]-ru 44[a-na] ḫa-ri-[im]-tim 45[i]-pu(?)-šú a-na sa-[ka]-pu-ti Col. II. (About eleven lines missing.) 57... šú(?)-mu(?) ............... 58ma-ḫi-ra-am [šá i-ši-šú] 59šú-uk-ni-šum-[ma] ............... 60la-al-la-ru-[tu] .................. 61um-mi d-[Giš mu-di-a-at ka-la-ma] 62i-na ma-[ḫar dŠamaš i-di-šá iš-ši][88] 63šá ú 64i-na- an(?)-[na am-mi-nim] 65ta-[aš-kun(?) a-na ma-ri-ia li-ib-bi la] 66ṣa-[li-la te-mid-su] 67............................. (About four lines missing.) 72i-na [šá dEn-ki-dũ im-la-a] di-[im-tam] 73il-[pu-ut li]-ib-ba-šú-[ma] 74[zar-biš(?)] uš-ta-ni-[iḫ] 75[i-na šá dEn]-ki-dũ im-la-a di-im-tam 76[il-pu-ut] li-ib-ba-šú-ma 77[zar-biš(?)] uš-ta-ni-[iḫ] 78[dGiš ú-ta]-ab-bil pa-ni-šú 79[iz-za-kar-am] a-na dEn-ki-dũ 80[ib-ri am-mi-nim] i-na-ka 81[im-la-a di-im]-tam 82[il-pu-ut li-ib-bi]-ka 83[zar-biš tu-uš-ta]-ni-iḫ 84[dEn-ki-dũ pi-šú i-pu-šá]-am-ma 85iz-za-[kàr-am] a-na dGiš 86ta-ab-bi-a-tum ib-ri 87uš-ta-li-pa da-1da-ni-ia 88a-ḫa-a-a ir-ma-a-ma 89e-mu-ki i-ni-iš 90dGiš pi-šú i-pu-šá-am-ma 91iz-za-kàr-am a-na dEn-ki-dũ (About four lines missing.) Col. III. 96..... [a-di dḪu]-wa-wa da-pi-nu 97.................. ra-[am(?)-ma] 98................ [ú-ḫal]- li-ik 99[lu-ur-ra-du a-na ki-iš-ti šá] iserini[89] 100............ lam(?) ḫal-bu 101............ [li]-li-is-su 102.............. lu(?)-up-ti-šú 103dEn-ki-dũ pi-šú i-pu-šá-am-ma 104iz-za-kàr-am a-na dGiš 105i-di-ma ib-ri i-na šadî(-i) 106i-nu-ma at-ta-la-ku it-ti bu-lim 107a-na ištên(-en) kas-gíd-ta-a-an nu-ma-at ki-iš-tum 108[e-di-iš(?)] ur-ra-du a-na libbi-šá 109d[Ḫu-wa]-wa ri-ig-ma-šú a-bu-bu 110pi-[šú] dBil-gi-ma 111na-pi-iš-šú mu-tum 112am-mi-nim ta-aḫ-ši-iḫ 113an-ni-a-am e-pi-šá-am 114ga-[ba]-al-la ma-ḫa-ar 115[šú]-pa-at dḪu-wa-wa 116(d)Giš pi-šú i-pu-šá-am-ma 117[iz-za-k]àr-am a-na dEn-ki-dũ 118....... su(?)-lu-li a-šá-ki2-šá 119............. [i-na ki-iš]-tim 120............................... 121ik(?) ......................... 122a-na .......................... 123mu-šá-ab [dḪu-wa-wa] ....... 124ḫa-aṣ-si-nu ................. 125at-ta lu(?) ................. 126a-na-ku lu-[ur-ra-du a-na ki-iš-tim] 127dEn-ki-dũ pi-šú i-pu-[šá-am-ma] 128iz-za-kàr-am a-na [dGiš] 129ki-i ni[il]-la-ak [iš-te-niš(?)] 130a-na ki-iš-ti [šá iṣerini] 131na-ṣi-ir-šá dGiš muḳ-[tab-lu] 132da-a-an la ṣa[-li-lu(?)] 133dḪu-wa-wa dpi-ir-[ḫu ša (?)][90] 134dAdad iš .......... 135šú-ú .................. Col. IV. 136áš-šúm šú-ul-lu-m[u ki-iš-ti šáiṣerini] 137pu-ul-ḫi-a-tim 7 [šú(?) i-šim-šú dEnlil] 138dGiš pi-šú i-pu [šá-am-ma] 139iz-za-kàr-am a-na [dEn-ki-dũ] 140ma-an-nu ib-ri e-lu-ú šá-[ru-ba(?)] 141i-ṭib-ma it-ti dŠamaš da-ri-iš ú-[me-šú] 142a-we-lu-tum ba-ba-nu ú-tam-mu-šá-[ma] 143mi-im-ma šá i-te-ni-pu-šú šá-ru-ba 144at-ta an-na-nu-um-ma ta-dar mu-tam 145ul iš-šú da-na-nu ḳar-ra-du-ti-ka 146lu-ul-li-ik-ma i-na pa-ni-ka 147pi-ka li-iš-si-a-am ṭi-ḫi-e ta-du-ur 148šum-ma am-ta-ḳu-ut šú-mi lu-uš-zi-iz 149dGiš mi3-it-ti dḪu-wa-wa da-pi-nim 150il(?)-ḳu-ut iš-tu 151i-wa-al-dam-ma tar-bi-a i-na šam-mu(?) Il(?) 152iš-ḫi-it-ka-ma la-bu ka-la-ma ti-di 153it- ku(?) ..... [il(?)]-pu-tu-(?) ma ..... 154.............. ka-ma 155.............. ši pi-ti 156............ ki-ma re’i(?) na-gi-la sa-rak-ti 157.... [ta-šá-s]i-a-am tu-lim-mi-in li-ib-bi 158[ga-ti lu]-uš-ku-un-ma 159[lu-u-ri]-ba-am iṣerini[91] 160[šú-ma sá]-ṭa-ru-ú a-na-ku lu-uš-ta-ak-na 161[pu-tu-ku(?)] ib-ri a-na ki-iš-ka-tim lu-mu-ḫa 162[be-le-e li-iš-]-pu-ku i-na maḫ-ri-ni 163[pu-tu]-ku a-na ki-iš-ka-ti-i i-mu-ḫu 164wa-áš-bu uš-ta-da-nu um-mi-a-nu 165pa-ši iš-pu-ku ra-bu-tim 166ḫa-aṣ-si-ni 3 biltu-ta-a-an iš-tap-ku 167pa-aṭ-ri iš-pu-ku ra-bu-tim 168me-še-li-tum 2 biltu-ta-a-an 169ṣi-ip-ru 30 ma-na-ta-a-an šá a-ḫi-ši-na 170išid(?) pa-aṭ-ri 30 ma-na-ta-a-an ḫuraṣi 171[d]Giš ù [dEn-ki-]dũ 10 biltu-ta-a-an šá-ak-nu] 172.... ul-la . .[Uruk]ki 7 i-di-il-šú 173...... iš-me-ma um-ma-nu ib-bi-ra 174[uš-te-(?)]-mi-a i-na sûḳi šá Urukki ri-bi-tim 175...... [u-še(?)]-ṣa-šú dGis 176[ina sûḳi šá(?) Urukki] ri-bi-tim 177[dEn-ki-dũ(?) ú]-šá-ab i-na maḫ-ri-šú 178..... [ki-a-am(?) i-ga]-ab-bi 179[........ Urukki ri]-bi-tim 180 [ma-ḫa-ar-šú] Col. V. 181dGiš šá i-ga-ab-bu-ú lu-mu-ur 182šá šú-um-šú it-ta-nam-ma-la ma-ta-tum 183lu-uk-šú-su-ma i-na ki-iš-ti iṣerini 184ki-ma da-an-nu pi-ir-ḫu-um šá Urukki[92] 185lu-ši-eš-mi ma-tam 186ga-ti lu-uš-ku-un-ma lu-uk-[šú]4-su-ma iṣerini 187šú-ma šá-ṭa-ru-ú a-na-ku lu-uš-tak-nam 188ši-bu-tum šá Urukki ri-bi-tim 189zi-ik-ra ú-ti-ir-ru a-na dGiš 190ṣi-iḫ-ri-ti-ma dGiš libbi-ka na-ši-ka 191mi-im-ma šá te-te-ni-pu-šú la ti-di 192ni-ši-im-me-ma dḪu-wa-wa šá-nu-ú bu-nu-šú 193ma-an-nu-um [uš-tam]-ḫa-ru ka-ak-ki-šú 194a-na ištên(-en) [kas-gíd-ta-a]-an nu-ma-at kišti 195ma-an-nu šá [ur-ra]-du a-na libbi-šá 196dḪu-wa-wa ri-ig-ma-šú a-bu-bu 197pi-šú dBil-gi-ma na-pi-su mu-tum 198am-mi-nim taḫ-ši-iḫ an-ni-a-am e-pi-šá 199ga-ba-al-la ma-ḫa-ar šú-pa-at dḪu-wa-wa 200iš-me-e-ma dGiš zi-ki-ir ma-li-[ki]-šú 201ip-pa-al-sa-am-ma i-ṣi-iḫ a-na ib-[ri-šú] 202i-na-an-na ib-[ri] ki-a-am [a-ga-ab-bi] 203a-pa-al-aḫ-šú-ma a-[al-la-ak a-na kišti] 204[lu]ul-[lik it-ti-ka a-na ki-iš-ti iṣerini(?)] (About five lines missing.) 210........................ -ma 211li ............... -ka[93] 212ilu-ka li(?) ..............-ka 213ḫarrana li-šá-[tir-ka a-na šú-ul-mi] 214a-na kar šá [Urukki ri-bi-tim] 215ka-mi-is-ma dGiš [ma-ḫa-ar dŠamaš(?)] 216a-wa-at i-ga-ab- [bu-šú-ma] 217a-al-la-ak dŠamaš katâ-[ka a-ṣa-bat] 218ul-la-nu lu-uš-li-ma na-pi-[iš-ti] 219te-ir-ra-an-ni a-na kar i-[na Urukki] 220ṣi-il-[la]m šú-ku-un [a-na ia-a-ši(?)] 221iš-si-ma dGiš ib-[ri.....] 222te-ir-ta-šú .......... 223is(?) .............. 224tam ................ 225........................ 226i-nu(?)-[ma] .................. (About two lines missing.) Col. VI. 229[a-na-ku] dGiš [i-ik]-ka-di ma-tum 230........... ḫarrana šá la al-[kam] ma-ti-ma 231.... a-ka-lu ..... la(?) i-di 232[ul-la-nu] lu-uš-li-[mu] a-na-ku 233[lu-ud-lul]-ka i-na [ḫ]u-ud li-ib-bi 234...... [šú]-ḳu-ut-[ti] la-li-ka 235[lu-še-šib(?)] - ka i-na kussêmeš 236....................... ú-nu-su 237[bêlêmeš(?)ú-ti-ir]-ru ra-bu-tum 238[ka-aš-tum] ù iš-pa-tum 239[i-na] ga-ti iš-ku-nu 240[il-]te-ki pa-ši 241....... -ri iš-pa-as-su[94] 242..... [a-na] ili šá-ni-tam 243[it-ti pa(?)] - tar-[šú] i-na ši-ip-pi-šú 244........ i-ip-pu-šú a-la-kam 245[ša]-niš ú-ga-ra-bu dGiš 246[a-di ma]-ti tu-ut-te-ir a-na libbi Urukki 247[ši-bu]-tum i-ka-ra-bu-šú 248[a-na] ḫarrani i-ma-li-ku dGiš 249[la t]a-at-kal dGiš a-na e-[mu]-ḳi-ka 250[a-]ka-lu šú-wa-ra-ma ú-ṣur ra-ma-an-ka 251[li]-il-lik dEn-ki-dũ i-na pa-ni-ka 252[ur-ḫa]-am a-we-ir a-lik ḫarrana(-na) 253[a-di] šá kišti ni-ri-bi-tim 254[šá(?)] [d]Ḫu-wa-wa ka-li-šú-nu ši-ip-pi-iḫ(?)-šú 255[ša(?)a-lik] maḫ-ra tap-pa-a ú-šá-lim 256[ḫarrana](-na)-šú šú-wa-ra-[ma ú-ṣur ra-ma-na-ka] 257[li-šak-šid]-ka ir-[ni-ta]-ka dŠamaš 258[ta]-ak-bi-a-at pi-ka li-kal-li-ma i-na-ka 259li-ip-ti-ḳu pa-da-nam pi-ḫi-tam 260ḫarrana li-iš-ta-zi-ik a-na ki-ib-si-ka 261šá-di-a li-iš-ta-zi-ik a-na šêpi-ka 262mu-ši-it-ka aw-a-at ta-ḫa-du-ú 263li-ib-la-ma dLugal-ban-da li-iz-zi-iz-ka[95] 264i-na ir-ni-ti-ka 265ki-ma ṣi-iḫ-ri ir-ni-ta-ka-ma luš-mida(-da) 266i-na na-ri šá dḪu-wa-wa šá tu-ṣa-ma-ru 267mi-zi ši-pi-ka 268i-na bat-ba-ti-ka ḫi-ri bu-ur-tam 269lu-ka-a-a-nu mê ellu i-na na-di-ka 270[ka-]su-tim me-e a-na dŠamaš ta-na-di 271[li-iš]ta-ḫa-sa-as dLugal-ban-da 272[dEn-ki-]dũ pi-su i-pu-šá-am-ma, iz-za-kàr a-na dGiš 273[is(?)]-tu(?) ta-áš-dan-nu e-pu-uš a-la-kam 274[la pa]la-aḫ libbi-ka ia-ti tu-uk-la-ni 275[šú-ku-]un i-di-a-am šú-pa-as-su 276[ḫarrana(?)]šá dḪu-wa-wa it-ta-la-ku 277.......... ki-bi-ma te-[ir]-šú-nu-ti (Three lines missing.) L.E. 281.............. nam-ma-la 282............... il-li-ku it-ti-ia 283............... ba-ku-nu-ši-im 284......... [ul]-la(?)-nu i-na ḫu-ud li-ib-bi 285[i-na še-me-e] an-ni-a ga-ba-šú 286e-diš ḫarrana(?) uš-te-[zi-ik] 287a-lik dGiš lu-[ul-lik a-na pa-ni-ka] 288li-lik il-ka .......... 289li-šá-ak-lim-[ka ḫarrana] ...... 290dGiš ù[dEn-ki-dũ] ....... 291mu-di-eš .......... 292bi-ri-[su-nu] ........ [87] Translation. (About ten lines missing.) Col. I. 11.................. (my friend?) 12[Something] that is exceedingly difficult, 13[Why] dost thou desire 14[to do this?] 15.... something (?) that is very [difficult (?)], 16[Why dost thou] desire 17[to go down to the forest]? 18A message [they carried] among [men] 19They carried about. 20They made a .... 21.............. they brought 22.............................. 23.............................. (About 17 lines missing.) 40............................. 41................... my friend 42................ they raised ..... 43answer [they returned.] 44[To] the woman 45They proceeded to the overthrowing Col. II. (About eleven lines missing.) 57.......... name(?) ............. 58[The one who is] a rival [to him] 59subdue and ................ 60Wailing ................ 61The mother [of Gišh, who knows everything] 62Before [Shamash raised her hand][88] 63Who 64Now(?) [why] 65hast thou stirred up the heart for my son, 66[Restlessness imposed upon him (?)] 67............................ (About four lines missing.) 72The eyes [of Enkidu filled with tears]. 73[He clutched] his heart; 74[Sadly(?)] he sighed. 75[The eyes of En]kidu filled with tears. 76[He clutched] his heart; 77[Sadly(?)] he sighed. 78The face [of Gišh was grieved]. 79[He spoke] to Enkidu: 80[“My friend, why are] thy eyes 81[Filled with tears]? 82Thy [heart clutched] 83Dost thou sigh [sadly(?)]?” 84[Enkidu opened his mouth] and 85spoke to Gišh: 86“Attacks, my friend, 87have exhausted my strength(?). 88My arms are lame, 89my strength has become weak.” 90Gišh opened his mouth and 91spoke to Enkidu: (About four lines missing.) Col. III. 96..... [until] Ḫuwawa, [the terrible], 97........................ 98............ [I destroyed]. 99[I will go down to the] cedar forest,[89] 100................... the jungle 101............... tambourine (?) 102................ I will open it. 103Enkidu opened his mouth and 104spoke to Gišh: 105“Know, my friend, in the mountain, 106when I moved about with the cattle 107to a distance of one double hour into the heart of the forest, 108[Alone?] I penetrated within it, 109[To] Ḫuwawa, whose roar is a flood, 110whose mouth is fire, 111whose breath is death. 112Why dost thou desire 113To do this? 114To advance towards 115the dwelling(?) of Ḫuwawa?” 116Gišh opened his mouth and 117[spoke to Enkidu: 118”... [the covering(?)] I will destroy. 119....[in the forest] 120.................... 121.................... 122To ................. 123The dwelling [of Ḫuwawa] 124The axe .......... 125Thou .......... 126I will [go down to the forest].” 127Enkidu opened his mouth and 128spoke to [Gish:] 129“When [together(?)] we go down 130To the [cedar] forest, 131whose guardian, O warrior Gish, 132a power(?) without [rest(?)], 133Ḫuwawa, an offspring(?) of ....[90] 134Adad ...................... 135He ........................ Col. IV. 136To keep safe [the cedar forest], 137[Enlil has decreed for it] seven-fold terror.” 138Gish [opened] his mouth and 139spoke to [Enkidu]: 140“Whoever, my friend, overcomes (?) [terror(?)], 141it is well (for him) with Shamash for the length of [his days]. 142Mankind will speak of it at the gates. 143Wherever terror is to be faced, 144Thou, forsooth, art in fear of death. 145Thy prowess lacks strength. 146I will go before thee. 147Though thy mouth calls to me; “thou art afraid to approach.” 148If I fall, I will establish my name. 149Gish, the corpse(?) of Ḫuwawa, the terrible one, 150has snatched (?) from the time that 151My offspring was born in ...... 152The lion restrained (?) thee, all of which thou knowest. 153........................ 154.............. thee and 155................ open (?) 156........ like a shepherd(?) ..... 157[When thou callest to me], thou afflictest my heart. 158I am determined 159[to enter] the cedar forest.[91] 160I will, indeed, establish my name. 161[The work(?)], my friend, to the artisans I will entrust. 162[Weapons(?)] let them mould before us.” 163[The work(?)] to the artisans they entrusted. 164A dwelling(?) they assigned to the workmen. 165Hatchets the masters moulded: 166Axes of 3 talents each they moulded. 167Lances the masters moulded; 168Blades(?) of 2 talents each, 169A spear of 30 mina each attached to them. 170The hilt of the lances of 30 mina in gold 171Gish and [Enki]du were equipped with 10 talents each 172.......... in Erech seven its .... 173....... the people heard and .... 174[proclaimed(?)] in the street of Erech of the plazas. 175..... Gis [brought him out(?)] 176[In the street (?)] of Erech of the plazas 177[Enkidu(?)] sat before him 178..... [thus] he spoke: 179”........ [of Erech] of the plazas 180............ [before him] Col. V. 181Gish of whom they speak, let me see! 182whose name fills the lands. 183I will lure him to the cedar forest, 184Like a strong offspring of Erech.[92] 185I will let the land hear (that) 186I am determined to lure (him) in the cedar (forest)5. 187A name I will establish.” 188The elders of Erech of the plazas 189brought word to Gish: 190“Thou art young, O Gish, and thy heart carries thee away. 191Thou dost not know what thou proposest to do. 192We hear that Huwawa is enraged. 193Who has ever opposed his weapon? 194To one [double hour] in the heart of the forest, 195Who has ever penetrated into it? 196Ḫuwawa, whose roar is a deluge, 197whose mouth is fire, whose breath is death. 198Why dost thou desire to do this? 199To advance towards the dwelling (?) of Ḫuwawa?” 200Gish heard the report of his counsellors. 201He saw and cried out to [his] friend: 202“Now, my friend, thus [I speak]. 203I fear him, but [I will go to the cedar forest(?)]; 204I will go [with thee to the cedar forest]. (About five lines missing.) 210.............................. 211May ................... thee[93] 212Thy god may (?) ........ thee; 213On the road may he guide [thee in safety(?)]. 214At the rampart of [Erech of the plazas], 215Gish kneeled down [before Shamash(?)], 216A word then he spoke [to him]: 217“I will go, O Shamash, [thy] hands [I seize hold of]. 218When I shall have saved [my life], 219Bring me back to the rampart [in Erech]. 220Grant protection [to me ?]!” 221Gish cried, ”[my friend] ...... 222His oracle .................. 223........................ 224........................ 225........................ 226When (?) (About two lines missing.) Col. VI. 229”[I(?)] Gish, the strong one (?) of the land. 230...... A road which I have never [trodden]; 231........ food ...... do not (?) know. 232[When] I shall have succeeded, 233[I will praise] thee in the joy of my heart, 234[I will extol (?)] the superiority of thy power, 235[I will seat thee] on thrones.” 236.................. his vessel(?) 237The masters [brought the weapons (?)]; 238[bow] and quiver 239They placed in hand. 240[He took] the hatchet. 241................. his quiver.[94] 242..... [to] the god(?) a second time 243[With his lance(?)] in his girdle, 244......... they took the road. 245[Again] they approached Gish! 246”[How long] till thou returnest to Erech?” 247[Again the elders] approached him. 248[For] the road they counselled Gis: 249“Do [not] rely, O Gish, on thy strength! 250Provide food and save thyself! 251Let Enkidu go before thee. 252He is acquainted with the way, he has trodden the road 253[to] the entrance of the forest. 254of Ḫuwawa all of them his ...... 255[He who goes] in advance will save the companion. 256Provide for his [road] and [save thyself]! 257(May) Shamash [carry out] thy endeavor! 258May he make thy eyes see the prophecy of thy mouth. 259May he track out (for thee) the closed path! 260May he level the road for thy treading! 261May he level the mountain for thy foot! 262During thy night6 the word that wilt rejoice 263may Lugal-banda convey, and stand by thee[95] 264in thy endeavor! 265Like a youth may he establish thy endeavor! 266In the river of Ḫuwawa as thou plannest, 267wash thy feet! 268Round about thee dig a well! 269May there be pure water constantly for thy libation 270Goblets of water pour out to Shamash! 271[May] Lugal-banda take note of it!” 272[Enkidu] opened his mouth and spoke to Gish: 273”[Since thou art resolved] to take the road. 274Thy heart [be not afraid,] trust to me! 275[Confide] to my hand his dwelling(?)!” 276[on the road to] Ḫuwawa they proceeded. 277....... command their return (Three lines missing.) L.E. 281............... were filled. 282.......... they will go with me. 283............................... 284.................. joyfully. 285[Upon hearing] this word of his, 286Alone, the road(?) [he levelled]. 287“Go, O Gish [I will go before thee(?)]. 288May thy god(?) go ......... 289May he show [thee the road !] ..... 290Gish and [Enkidu] 291Knowingly .................... 292Between [them] ................ [96]Lines 13–14 (also line 16). See for the restoration, lines 112–13. Line 62. For the restoration, see Jensen, p. 146 (Tablet III, 2a,9.) Lines 64–66. Restored on the basis of the Assyrian version, ib. line 10. Line 72. Cf. Assyrian version, Tablet IV, 4, 10, and restore at the end of this line di-im-tam as in our text, instead of Jensen’s conjecture. Lines 74, 77 and 83. The restoration zar-biš, suggested by the Assyrian version, Tablet IV, 4, 4. Lines 76 and 82. Cf. Assyrian version, Tablet VIII, 3, 18. Line 78. (ú-ta-ab-bil from abâlu, “grieve” or “darkened.” Cf. uš-ta-kal (Assyrian version, ib. line 9), where, perhaps, we are to restore it-ta-[bil pa-ni-šú]. Line 87. uš-ta-li-pa from elêpu, “exhaust.” See Muss-Arnolt, Assyrian Dictionary, p. 49a. Line 89. Cf. Assyrian version, ib. line 11, and restore the end of the line there to i-ni-iš, as in our text. Line 96. For dapinu as an epithet of Ḫuwawa, see Assyrian version, Tablet III, 2a, 17, and 3a, 12. Dapinu occurs also as a description of an ox (Rm 618, Bezold, Catalogue of the Kouyunjik Tablets, etc., p. 1627). Line 98. The restoration on the basis of ib. III, 2a, 18. Lines 96–98 may possibly form a parallel to ib. lines 17–18, which would then read about as follows: “Until I overcome Ḫuwawa, the terrible, and all the evil in the land I shall have destroyed.” At the same time, it is possible that we are to restore [lu-ul]-li-ik at the end of line 98. Line 101. lilissu occurs in the Assyrian version, Tablet IV, 6, 36. Line 100. For ḫalbu, “jungle,” see Assyrian version, Tablet V, 3, 39 (p. 160). Lines 109–111. These lines enable us properly to restore Assyrian version, Tablet IV, 5, 3 = Haupt’s edition, p. 83 (col. 5, 3). No doubt the text read as ours mu-tum (or mu-u-tum) na-pis-su. Line 115. šupatu, which occurs again in line 199 and also line 275.šú-pa-as-su (= šupat-su) must have some such meaning as [97]“dwelling,” demanded by the context. [Dhorme refers me to OLZ 1916, p. 145]. Line 129. Restored on the basis of the Assyrian version, Tablet IV, 6, 38. Line 131. The restoration muḳtablu, tentatively suggested on the basis of CT XVIII, 30, 7b, where muḳtablu, “warrior,” appears as one of the designations of Gilgamesh, followed by a-lik pa-na, “the one who goes in advance,” or “leader”—the phrase so constantly used in the Ḫuwawa episode. Line 132. Cf. Assyrian version, Tablet I, 5, 18–19. Lines 136–137. These two lines restored on the basis of Jensen IV, 5, 2 and 5. The variant in the Assyrian version, šá niše (written Ukumeš in one case and Lumeš in the other), for the numeral 7 in our text to designate a terror of the largest and most widespread character, is interesting. The number 7 is similarly used as a designation of Gilgamesh, who is called Esigga imin, “seven-fold strong,” i.e., supremely strong (CT XVIII, 30, 6–8). Similarly, Enkidu, ib. line 10, is designated a-rá imina, “seven-fold.” Line 149. A difficult line because of the uncertainty of the reading at the beginning of the following line. The most obvious meaning of mi-it-tu is “corpse,” though in the Assyrian version šalamtu is used (Assyrian version, Tablet V, 2, 42). On the other hand, it is possible—as Dr. Lutz suggested to me—that mittu, despite the manner of writing, is identical with miṭṭú, the name of a divine weapon, well-known from the Assyrian creation myth (Tablet IV, 130), and other passages. The combination miṭ-ṭu šá-ḳu-ú-, “lofty weapon,” in the Bilingual text IV, R², 18 No. 3, 31–32, would favor the meaning “weapon” in our passage, since [šá]-ḳu-tu is a possible restoration at the beginning of line 150. However, the writing mi-it-ti points too distinctly to a derivative of the stem mâtu, and until a satisfactory explanation of lines 150–152 is forthcoming, we must stick to the meaning “corpse” and read the verb il-ḳu-ut. Line 152. The context suggests “lion” for the puzzling la-bu. Line 156. Another puzzling line. Dr. Clay’s copy is an accurate reproduction of what is distinguishable. At the close of the line there appears to be a sign written over an erasure. Line 158. [ga-ti lu-]uš-kun as in line 186, literally, “I will place my hand,” i.e., I purpose, I am determined. [98] Line 160. The restoration on the basis of the parallel line 187. Note the interesting phrase, “writing a name” in the sense of acquiring “fame.” Line 161. The kiškattê, “artisans,” are introduced also in the Assyrian version, Tablet VI, 187, to look at the enormous size and weight of the horns of the slain divine bull. See for other passages Muss-Arnolt Assyrian Dictionary, p. 450b. At the beginning of this line, we must seek for the same word as in line 163. Line 162. While the restoration belê, “weapon,” is purely conjectural, the context clearly demands some such word. I choose belê in preference to kakkê, in view of the Assyrian version, Tablet VI, 1. Line 163. Putuku (or putukku) from patâku would be an appropriate word for the fabrication of weapons. Line 165. The rabûtim here, as in line 167, I take as the “master mechanics” as contrasted with the ummianu, “common workmen,” or journeymen. A parallel to this forging of the weapons for the two heroes is to be found in the Sumerian fragment of the Gilgamesh Epic published by Langdon, Historical and Religious Texts from the Temple Library of Nippur (Munich, 1914), No. 55, 1–15. Lines 168–170 describe the forging of the various parts of the lances for the two heroes. The ṣipru is the spear point Muss-Arnolt, Assyrian Dictionary, p. 886b; the išid paṭri is clearly the “hilt,” and the mešelitum I therefore take as the “blade” proper. The word occurs here for the first time, so far as I can see. For 30 minas, see Assyrian version, Tablet VI, 189, as the weight of the two horns of the divine bull. Each axe weighing 3 biltu, and the lance with point and hilt 3 biltu we would have to assume 4 biltu for each pašu, so as to get a total of 10 biltu as the weight of the weapons for each hero. The lance is depicted on seal cylinders representing Gilgamesh and Enkidu, for example, Ward, Seal Cylinders, No. 199, and also in Nos. 184 and 191 in the field, with the broad hilt; and in an enlarged form in No. 648. Note the clear indication of the hilt. The two figures are Gilgamesh and Enkidu—not two Gilgameshes, as Ward assumed. See above, page 34. A different weapon is the club or mace, as seen in Ward, Nos. 170 and 173. This appears also to be the weapon which Gilgamesh holds in his hand on the colossal figure from the palace of Sargon (Jastrow, Civilization of [99]Babylonia and Assyria, Pl. LVII), though it has been given a somewhat grotesque character by a perhaps intentional approach to the scimitar, associated with Marduk (see Ward, Seal Cylinders, Chap. XXVII). The exact determination of the various weapons depicted on seal-cylinders merits a special study. Line 181. Begins a speech of Ḫuwawa, extending to line 187, reported to Gish by the elders (line 188–189), who add a further warning to the youthful and impetuous hero. Line 183. lu-uk-šú-su (also l. 186), from akâšu, “drive on” or “lure on,” occurs on the Pennsylvania tablet, line 135, uk-ki-ši, “lure on” or “entrap,” which Langdon erroneously renders “take away” and thereby misses the point completely. See the comment to the line of the Pennsylvania tablet in question. Line 192. On the phrase šanû bunu, “change of countenance,” in the sense of “enraged,” see the note to the Pennsylvania tablet, l.31. Line 194. nu-ma-at occurs in a tablet published by Meissner, Altbabyl. Privatrecht, No. 100, with bît abi, which shows that the total confine of a property is meant; here, therefore, the “interior” of the forest or heart. It is hardly a “by-form” of nuptum as Muss-Arnolt, Assyrian Dictionary, p. 690b, and others have supposed, though nu-um-tum in one passage quoted by Muss-Arnolt, ib. p. 705a, may have arisen from an aspirate pronunciation of the p in nubtum. Line 215. The kneeling attitude of prayer is an interesting touch. It symbolizes submission, as is shown by the description of Gilgamesh’s defeat in the encounter with Enkidu (Pennsylvania tablet, l. 227), where Gilgamesh is represented as forced to “kneel” to the ground. Again in the Assyrian version, Tablet V, 4, 6, Gilgamesh kneels down (though the reading ka-mis is not certain) and has a vision. Line 229. It is much to be regretted that this line is so badly preserved, for it would have enabled us definitely to restore the opening line of the Assyrian version of the Gilgamesh Epic. The fragment published by Jeremias in his appendix to his Izdubar-Nimrod, Plate IV, gives us the end of the colophon line to the Epic, reading ……… di ma-a-ti (cf. ib., Pl. I, 1. … a-ti). Our text evidently reproduces the same phrase and enables us to supply ka, as well as [100]the name of the hero Gišh of which there are distinct traces. The missing word, therefore, describes the hero as the ruler, or controller of the land. But what are the two signs before ka? A participial form from pakâdu, which one naturally thinks of, is impossible because of the ka, and for the same reason one cannot supply the word for shepherd (nakidu). One might think of ka-ak-ka-du, except that kakkadu is not used for “head” in the sense of “chief” of the land. I venture to restore [i-ik-]ka-di, “strong one.” Our text at all events disposes of Haupt’s conjecture iš-di ma-a-ti (JAOS 22, p. 11), “Bottom of the earth,” as also of Ungnad’s proposed [a-di pa]-a-ti, “to the ends” (Ungnad-Gressmann, Gilgamesch-Epos, p. 6, note), or a reading di-ma-a-ti, “pillars.” The first line of the Assyrian version would now read šá nak-ba i-mu-ru [dGis-gi(n)-maš i-ik-ka]-di ma-a-ti, i.e., “The one who saw everything, Gilgamesh the strong one (?) of the land.” We may at all events be quite certain that the name of the hero occurred in the first line and that he was described by some epithet indicating his superior position. Lines 229–235 are again an address of Gilgamesh to the sun-god, after having received a favorable “oracle” from the god (line 222). The hero promises to honor and to celebrate the god, by erecting thrones for him. Lines 237–244 describe the arming of the hero by the “master” craftsman. In addition to the pašu and paṭru, the bow (?) and quiver are given to him. Line 249 is paralleled in the new fragment of the Assyrian version published by King in PSBA 1914, page 66 (col. 1, 2), except that this fragment adds gi-mir to e-mu-ḳi-ka. Lines 251–252 correspond to column 1, 6–8, of King’s fragment, with interesting variations “battle” and “fight” instead of “way” and “road,” which show that in the interval between the old Babylonian and the Assyrian version, the real reason why Enkidu should lead the way, namely, because he knows the country in which Ḫuwawa dwells (lines 252–253), was supplemented by describing Enkidu also as being more experienced in battle than Gilgamesh. Line 254. I am unable to furnish a satisfactory rendering for this line, owing to the uncertainty of the word at the end. Can it [101]be “his household,” from the stem which in Hebrew gives us מִשְׁפָּחָה “family?” Line 255. Is paralleled by col. 1, 4, of King’s new fragment. The episode of Gišh and Enkidu proceeding to Ninsun, the mother of Gish, to obtain her counsel, which follows in King’s fragment, appears to have been omitted in the old Babylonian version. Such an elaboration of the tale is exactly what we should expect as it passed down the ages. Line 257. Our text shows that irnittu (lines 257, 264, 265) means primarily “endeavor,” and then success in one’s endeavor, or “triumph.” Lines 266–270. Do not appear to refer to rites performed after a victory, as might at a first glance appear, but merely voice the hope that Gišh will completely take possession of Ḫuwawa’s territory, so as to wash up after the fight in Ḫuwawa’s own stream; and the hope is also expressed that he may find pure water in Ḫuwawa’s land in abundance, to offer a libation to Šhamašh. Line 275. On šú-pa-as-su = šupat-su, see above, to l. 115. [Note on Sabitum (above, p. 11) In a communication before the Oriental Club of Philadelphia (Feb. 10, 1920), Prof. Haupt made the suggestion that sa-bi-tum (or tu), hitherto regarded as a proper name, is an epithet describing the woman who dwells at the seashore which Gilgamesh in the course of his wanderings reaches, as an “innkeeper”. It is noticeable that the term always appears without the determinative placed before proper names; and since in the old Babylonian version (so far as preserved) and in the Assyrian version, the determinative is invariably used, its consistent absence in the case of sabitum (Assyrian Version, Tablet X, 1, 1, 10, 15, 20; 2, 15–16 [sa-bit]; Meissner fragment col. 2, 11–12) speaks in favor of Professor Haupt’s suggestion. The meaning “innkeeper”, while not as yet found in Babylonian-Assyrian literature is most plausible, since we have sabū as a general name for ’drink’, though originally designating perhaps more specifically sesame wine (Muss-Arnolt, Assyrian Dictionary, p. 745b) or distilled brandy, according to Prof. Haupt. Similarly, in the Aramaic dialects, sebha is used for “to drink” and in the Pael to “furnish drink”. Muss-Arnolt in [102]his Assyrian Dictionary, 746b, has also recognized that sabitum was originally an epithet and compares the Aramaic sebhoyâthâ(p1) “barmaids”. In view of the bad reputation of inns in ancient Babylonia as brothels, it would be natural for an epithet like sabitum to become the equivalent to “public” women, just as the inn was a “public” house. Sabitum would, therefore, have the same force as šamḫatu (the “harlot”), used in the Gilgamesh Epic by the side of ḫarimtu “woman” (see the note to line 46 of Pennsylvania Tablet). The Sumerian term for the female innkeeper is Sal Geštinna “the woman of the wine,” known to us from the Hammurabi Code §§108–111. The bad reputation of inns is confirmed by these statutes, for the house of the Sal Geštinna is a gathering place for outlaws. The punishment of a female devotee who enters the “house of a wine woman” (bît Sal Geštinna §110) is death. It was not “prohibition” that prompted so severe a punishment, but the recognition of the purpose for which a devotee would enter such a house of ill repute. The speech of the sabitum or innkeeper to Gilgamesh (above, p. 12) was, therefore, an invitation to stay with her, instead of seeking for life elsewhere. Viewed as coming from a “public woman” the address becomes significant. The invitation would be parallel to the temptation offered by the ḫarimtu in the first tablet of the Enkidu, and to which Enkidu succumbs. The incident in the tablet would, therefore, form a parallel in the adventures of Gilgamesh to the one that originally belonged to the Enkidu cycle. Finally, it is quite possible that sabitum is actually the Akkadian equivalent of the Sumerian Sal Geštinna, though naturally until this equation is confirmed by a syllabary or by other direct evidence, it remains a conjecture. See now also Albright’s remarks on Sabitum in the A. J. S. L. 36, pp. 269 seq.] [103] 1 Scribal error for an. 2 Text apparently di. 3 Hardly ul. 4 Omitted by scribe. 5 Kišti omitted by scribe. 6 I.e., at night to thee, may Lugal-banda, etc. Corrections to the Text of Langdon’s Edition of the Pennsylvania Tablet.1 Column 1. 5. Read it-lu-tim (“heroes”) instead of id-da-tim (“omens”). 6. Read ka-ka-bu instead of ka-ka-’a. This disposes of Langdon’s note 2 on p. 211. 9 Read ú-ni-iš-šú-ma, “I became weak” (from enêšu, “weak”) instead of ilam iš-šú-ma, “He bore a net”(!). This disposes of Langdon’s note 5 on page 211. 10. Read Urukki instead of ad-ki. Langdon’s note 7 is wrong. 12. Langdon’s note 8 is wrong. ú-um-mid-ma pu-ti does not mean “he attained my front.” 14. Read ab-ba-la-áš-šú instead of at-ba-la-áš-šú. 15. Read mu-di-a-at instead of mu-u-da-a-at. 20. Read ta-ḫa-du instead of an impossible [sa]-ah-ḫa-ta—two mistakes in one word. Supply kima Sal before taḫadu. 22. Read áš-šú instead of šú; and at the end of the line read [tu-ut]-tu-ú-ma instead of šú-ú-zu. 23. Read ta-tar-ra-[as-su]. 24. Read [uš]-ti-nim-ma instead of [iš]-ti-lam-ma. 28. Read at the beginning šá instead of ina. 29. Langdon’s text and transliteration of the first word do not tally. Read ḫa-aṣ-ṣi-nu, just as in line 31. 32. Read aḫ-ta-du (“I rejoiced”) instead of aḫ-ta-ta. Column 2. 4. Read at the end of the line di-da-šá(?) ip-tí-[e] instead of Di-?-al-lu-un (!). 5. Supply dEn-ki-dū at the beginning. Traces point to this reading. 19. Read [gi]-it-ma-[lu] after dGiš, as suggested by the Assyrian version, Tablet I, 4, 38, where emûḳu (“strength”) replaces nepištu of our text. 20. Read at-[ta kima Sal ta-ḫa]-bu-[ub]-šú. 21. Read ta-[ra-am-šú ki-ma]. [104] 23. Read as one word ma-a-ag-ri-i-im (“accursed”), spelled in characteristic Hammurabi fashion, instead of dividing into two words ma-a-ak and ri-i-im, as Langdon does, who suggests as a translation “unto the place yonder(?) of the shepherd”(!). 24. Read im-ta-ḫar instead of im-ta-gar. 32. Supply ili(?) after ki-ma. 33. Read šá-ri-i-im as one word. 35. Read i-na [áš]-ri-šú [im]-ḫu-ru. 36. Traces at beginning point to either ù or ki (= itti). Restoration of lines 36–39 (perhaps to be distributed into five lines) on the basis of the Assyrian version, Tablet I, 4, 2–5. Column 3. 14. Read Kàš (= šikaram, “wine”) ši-ti, “drink,” as in line 17, instead of bi-iš-ti, which leads Langdon to render this perfectly simple line “of the conditions and the fate of the land”(!). 21. Read it-tam-ru instead of it-ta-bir-ru. 22. Supply [lùŠú]-I. 29. Read ú-gi-ir-ri from garû (“attack), instead of separating into ú and gi-ir-ri, as Langdon does, who translates “and the lion.” The sign used can never stand for the copula! Nor is girru, “lion!” 30. Read Síbmeš, “shepherds,” instead of šab-[ši]-eš! 31. šib-ba-ri is not “mountain goat,” nor can ut-tap-pi-iš mean “capture.” The first word means “dagger,” and the second “he drew out.” 33. Read it-ti-[lu] na-ki-[di-e], instead of itti immer nakie which yields no sense. Langdon’s rendering, even on the basis of his reading of the line, is a grammatical monstrosity. 35. Read giš instead of wa. 37. Read perhaps a-na [na-ki-di-e i]- za-ak-ki-ir. Column 4. 4. The first sign is clearly iz, not ta, as Langdon has it in note 1 on page 216. 9. The fourth sign is su, not šú. 10. Separate e-eš (“why”) from the following. Read ta-ḫi-[il], followed, perhaps, by la. The last sign is not certain; it may be ma. [105] 11. Read lim-nu instead of mi-nu. In the same line read a-la-ku ma-na-aḫ-[ti]-ka instead of a-la-ku-zu(!) na-aḫ … ma, which, naturally, Langdon cannot translate. 16. Read e-lu-tim instead of pa-a-ta-tim. The first sign of the line, tu, is not certain, because apparently written over an erasure. The second sign may be a. Some one has scratched the tablet at this point. 18. Read uk-la-at âli (?) instead of ug-ad-ad-lil, which gives no possible sense! Column 5. 2. Read [wa]-ar-ki-šú. 8. Read i-ta-wa-a instead of i-ta-me-a. The word pi-it-tam belongs to line 9! The sign pi is unmistakable. This disposes of note 1 on p. 218. 9. Read Mi = ṣalmu, “image.” This disposes of Langdon’s note 2 on page 218. Of six notes on this page, four are wrong. 11. The first sign appears to be si and the second ma. At the end we are perhaps to supply [šá-ki-i pu]-uk-ku-ul, on the basis of the Assyrian version, Tablet IV, 2, 45, šá-ki-i pu-[uk-ku-ul]. 12. Traces at end of line suggest i-pa(?)-ka-du. 13. Read i-[na mâti da-an e-mu]-ki i-wa. 18. Read ur-šá-nu instead of ip-šá-nu. 19. Read i-šá-ru instead of i-tu-ru. 24. The reading it-ti after dGiš is suggested by the traces. 25. Read in-ni-[ib-bi-it] at the end of the line. 28. Read ip-ta-ra-[aṣ a-la]-ak-tam at the end of the line, as in the Assyrian version, Tablet IV, 2, 37. 30. The conjectural restoration is based on the Assyrian version, Tablet IV, 2, 36. Column 6. 3. Read i-na ṣi-ri-[šú]. 5. Supply [il-li-ik]. 21. Langdon’s text has a superfluous ga. 22. Read uz-za-šú, “his anger,” instead of uṣ-ṣa-šú, “his javelin” (!). 23. Read i-ni-iḫ i-ra-as-su, i.e., “his breast was quieted,” in the sense of “his anger was appeased.” 31. Read ri-eš-ka instead of ri-eš-su. [106] In general, it should be noted that the indications of the number of lines missing at the bottom of columns 1–3 and at the top of columns 4–6 as given by Langdon are misleading. Nor should he have drawn any lines at the bottom of columns 1–3 as though the tablet were complete. Besides in very many cases the space indications of what is missing within a line are inaccurate. Dr. Langdon also omitted to copy the statement on the edge: 4 šú-ši, i.e., “240 lines;” and in the colophon he mistranslates šú-tu-ur, “written,” as though from šaṭâru, “write,” whereas the form is the permansive III, 1, of atâru, “to be in excess of.” The sign tu never has the value ṭu! In all, Langdon has misread the text or mistransliterated it in over forty places, and of the 204 preserved lines he has mistranslated about one-half. 1 The enumeration here is according to Langdon’s edition. Plates Plate I. The Yale Tablet. Plate II. The Yale Tablet. Plate III. The Yale Tablet. Plate IV. The Yale Tablet. Plate V. The Yale Tablet. Plate VI. The Yale Tablet. Plate VII. The Yale Tablet.

      Compared to the other versions focusing on the epic of Gilgamesh, this version looks more into Gilgamesh's cure for immortality after Enkidu's death. The "us" in this instance would be Gilgamesh and his search for a cure while the "them" would be the enemies which are trying stop him which include the forces he come along. The text is able to create this distinction by describing Gilgamesh as the main character as the one who is need of a cure because struggles to come to terms that he will die one day. Not to mention, Enkidu as a being was able to turn Gilgamesh into a noble figure who used his power for good turning him into a more likeable figure which is why the reader also roots for him to find a cure. Gilgamesh as a figure shows that in his time period, males were the ones who were seen as leaders who have strength because the other females in all versions of the text do not carry dynamic roles that showcase their personality or even their endearing qualities. There are more political and nationalistic themes compared to the Sumerian versions which illustrate how linguistics and language can play a role in how a culture might be perceived. By using the strong characteristics of Gilgamesh, the text is ultimately able to show the civilization of Uruk and create a sense of identity as a result. CC BY Ajey Sasimugunthan (contact)

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary:<br /> I really enjoyed this manuscript from Torsekar et al on "Contrasting responses to aridity by

      different-sized decomposers cause similar decomposition rates across a precipitation gradient". The authors aimed to examine how climate interacts with decomposers of different size categories to influence litter decomposition. They proposed a new hypothesis: "The opposing climatic dependencies of macrofauna and that of microorganisms and mesofauna should lead to similar overall decomposition rates across precipitation gradients".

      This study emphasizes the importance as well as the contribution of different groups of organisms (micro, meso, macro, and whole community) across different seasons (summer with the following characteristics: hot with no precipitation, and winter with the following characteristics: cooler and wetter winter) along a precipitation gradient. The authors made use of 1050 litter baskets with different mesh sizes to capture decomposers contribution. They proposed a new hypothesis that was aiming to understand the "dryland decomposition conundrum". They combined their decomposition experiment with the sampling of decomposers by using pittfall traps across both experiment seasons. This study was carried out in Israel and based on a single litter species that is native to all seven sites. The authors found that microorganism contribution dominated in winter while macrofauna decomposition dominated the overall decomposition in summer. These seasonality differences combined with the differences in different decomposers groups fluctuation along precipitation resulted in similar overall decomposition rates across sites.<br /> I believe this manuscript has a potential to advance our knowledge on litter decomposition.

      Strengths:

      Well design study with combination of different approaches (methods) and consideration of seasonality to generalize pattern.

      The study expands to current understanding of litter decomposition and interaction between factors affecting the process (here climate and decomposers).

      Weaknesses:

      The study was only based on a single litter species.

      We now discuss the advantages and limitations of this approach in the methods and devote a completely new paragraph to this important point in the discussion (lines 394-401).

      Reviewer #2 (Public Review):

      Summary: Torsekar et al. use a leaf litter decomposition experiment across seasons, and in an aridity gradient, to provide a careful test of the role of different-sized soil invertebrates in shaping the rates of leaf litter decomposition. The authors found that large-sized invertebrates are more active in the summer and small-sized invertebrates in the winter. The summed effects of all invets then translated into similar levels of decomposition across seasons. The system breaks down in hyper-arid sites.

      Strengths: This is a well-written manuscript that provides a complete statistical analysis of a nice dataset. The authors provide a complete discussion of their results in the current literature.

      Weaknesses:

      I have only three minor comments. Please standardize the color across ALL figures (use the same color always for the same thing, and be friendly to color-blind people).

      Thank you for this important suggestion. We have now changed all figures to standardize all colors and chose a more color-blind friendly pallete.

      Fig 1 may benefit from separating the orange line (micro and meso) into two lines that reflect your experimental setup and results. I would mention the dryland decomposition conundrum earlier in the Introduction.

      We based our novel hypotheses on a thorough literature search. Accordingly, decomposition is expected to be positively associated with moisture, regardless of the decomposer body size. Our contribution to theory was to suggest that macro-detritivores may respond very differently to climatic conditions and dominate litter decomposition in warm arid-lands (we listed the reasons in the text). Consequently, we did not distinguish between microorganisms and mesofauna. We assumed that both groups inhabit the litter substrate and have limited adaptation to dry conditions. Our results provide strong evidence that this presumption is likely wrong and that mesofauna respond to climate very differently from micro-decomposers. Yet, we cannot use hindsight understanding to improve our original hypothesis. We now emphasize this important point at the discussion as important future direction. 

      Although we are very appreciative and pleased with the reviewer enthusiasm to highlight the importance of our work as a possible solution to the longstanding dryland decomposition conundrum, we decided not to move it to the introduction. This is because we think that our work is not centred on resolving the DDC but provides more general principles that may lead to a paradigm shift in the way ecologists study nutrient cycling across ecosystems.

      And the manuscript is full of minor grammatical errors. Some careful reading and fixing of all these minor mistakes here and there would be needed.

      We apologize and did our best to find and fix those mistakes

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      I really enjoyed this manuscript from Torsekar et al on "Contrasting responses to aridity by different-sized decomposers cause similar decomposition rates across a precipitation gradient". The authors aimed to examine how climate interacts with decomposers of different size categories to influence litter decomposition. They proposed a new hypothesis: "The opposing climatic dependencies of macrofauna and that of microorganisms and mesofauna should lead to similar overall decomposition rates across precipitation gradients".

      This study emphasizes the importance as well as the contribution of different groups of organisms (micro, meso, macro, and whole community) across different seasons (summer with the following characteristics: hot with no precipitation, and winter with the following characteristics: cooler and wetter winter) along a precipitation gradient. The authors made use of 1050 litter baskets with different mesh sizes to capture decomposers contribution. They proposed a new hypothesis that was aiming to understand the "dryland decomposition conundrum". They combined their decomposition experiment with the sampling of decomposers by using pitfall traps across both experiment seasons. This study was carried out in Israel and based on a single litter species that is native to all seven sites. The authors found that microorganism contribution dominated in winter while macrofauna decomposition dominated the overall decomposition in summer. These seasonality differences combined with the differences in different decomposers groups fluctuation along precipitation resulted in similar overall decomposition rates across sites.

      I believe this manuscript has the potential to advance our knowledge on litter decomposition. Below i provide my general and specific comments.

      General comments:

      (1) Study in general is well designed and well thought beforehand,

      (2) Study aims to expand the current understanding of the dryland decomposition conundrum

      (3) The should put a caveat to the fact they only use one litter species and call for examining litter mixture in the same gradient.

      (4) Please check the way you reduce the random effects from your initial model, I have provided a better way to do so in my specific comments

      (5) For Figure 1, authors can check my comment on this and see if they could revise the figure.

      Thank you for the positive feedback and your valuable comments. We have tried to best address all comments and suggestions for improvement and clarification

      Specific comments

      Line # 57 Please write "Theory suggests" instead of "Theory suggest"

      We changed the text as suggested

      Line # 70, please write "Indeed, handful evidence shows" instead of "Indeed, handful evidence show"

      We changed the text as suggested

      Figure 1: I like this conceptual framework. I have a silly question, why is it that the slopes of the whole community at the beginning (between Hyperarid and Arid) is the same as the Macro fauna, I would think the slope should be higher as this is adding up right? and also the same goes for the decomposition of whole community later on. For me this should reflect the adding or summing up (if i am right) then the authors should think about how this could be reflected in the figure.

      We agree with your interpretation that the whole community decomposition reflects the addition by constituent decomposers. The slope of the whole community decomposition between hyper-arid and arid is slightly higher than the one of macro decomposition to reflect the additive effect of macro with meso+micro decomposition. We have now changed the figure slightly to make this point more visible (Line 106).

      Line # 111 Please make "Methods" bold as well to be consistent with others headings.

      We changed the formatting as suggested

      Line #125 and in other lines as well please replace "X" by "x" to denote multiplication.

      We changed the formatting as suggested

      Table 1 Please add "*" to climate like this "Climate*" so that the end note of the table could make sense

      Thank you for this suggestion. We have now added the asterisk referring to the note below the Table.

      Figure 2, please consider putting at line #133, mean annual precipitation (MAP), as such for line # 135 You can directly says The precipitation map ....

      We made both changes as suggested.

      Line # 138 I would not use the different units for the same values. I do understand that you want to emphasize the accuracy but i would write instead 3 +- 0.001 g

      We changed the units as suggested.

      Line # 145, how is the litter basket customized to rest at 1 cm above ground level?

      We have now clarified –that we cut-open windows one centimeter above the cage floor. The cages were positioned on the soil (line 144).

      Lines # 181-183, I like the approach of checking the necessity of having the random effects. However, it has been reported that likelihood ratio test (LRT) are not really reliable to test for random effects. I will suggest you rather use permutations instead. I think the function is confint(MODEL) you need to specify the number of permutation the higher the better but you should start with 99 first and see how the results look like if promising then you can even go to 9999. But it will need computation power and and time.

      Thank you for the suggestion. We now used a simulation-based exact test, instead of a LRT, to examine the random effect, as recommended by the authors from the “lme4” package. As recommended, we used 9999 simulations. The simulation test yielded a similar result to those originally reported (see lines 181-183).

      Line # 187, 188, 188, please do not use capital letter to start mesofauna, macrofauna and whole-community

      We changed the formatting as suggested

      Line # 205 Please add the version number of R in the text.

      We now included the version number as suggested.

      Line # 209-211, could you please check whether "then" is the word you want to use or "than"

      Our bad- we indeed meant “than” and have made the appropriate changes.

      Line # 227 and in other places as well please provide the second degree of freedom of the F test.

      Thank you for this important comment. We have now added the second degree of freedom to the relevant results (lines 229, 232).

      Figure 3 and Figure 4 show some results that are negative, can you please explain what might be the reasons behind this?

      We now explain this important point in the figures’ captions.

      Figure 5 Please add label to the x-axis.

      Thank you-we have now included a label.

      Line # 357, the sentence "... meso-decomposition, like microbial decomposition,...", I don't understand which criteria authors used to classify microbial decomposition as "meso-decomposition"?

      We now remove this potential cause of confusion by using the term ‘meso-decomposition’ to distinguish from microbial decomposition (Line 366).

      Line # 380 Kindly put "per se" in italic.

      We changed the formatting as suggested

      References

      The references format are not consistent. For example for the same journal (say Trends in Ecology and Evolution) the authors sometimes wrote the full name like at line # 36 (and also realize that "vol" should not be written as such) but wrote the abbreviations at line #42

      Our bad- we apologize and carefully checked all references to make sure the style is consistent.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      Strengths: 

      Overall the work is novel and moves the field of Alzheimer's disease forward in a significant way. The manuscript reports a novel concept of aberrant activity in VIP interneurons during the early stages of AD thus contributing to dysfunctions of the CA1 microcircuit. This results in the enhancement of the inhibitory tone on the primary cells of CA1. Thus, the disinhibition by VIP interneurons of Principal Cells is dampened. The manuscript was skillfully composed, and the study was of strong scientific rigor featuring well-designed experiments. Necessary controls were present. Both sexes were included.

      We express our gratitude to the reviewer for their keen appreciation of our efforts and their enthusiasm for the outcomes of this research.

      Limitations:

      (1) The authors attributed aberrant circuit activity to the accumulation of "Abeta intracellularly" inside IS-3 cells. That is problematic. 6E10 antibody recognizes amyloid plaques in addition to Amyloid Precursor Protein (APP) as well as the C99 fragment. There are no plaques at the ages 3xTg mice were examined. Thus, the staining shown in Figure 1a is of APP/C99 inside neurons, not abeta accumulations in neurons. At the ages of 3-6 months, 3xTg starts producing abeta oligomers and potentially tau oligomers as well (Takeda et al., 2013 PMID: 23640054; Takeda et al., 2015 PMID: 26458742 and others). Emerging literature suggests that abeta and tau oligomers disrupt circuit function. Thus, a more likely explanation of abeta and tau oligomers disrupting the activity of VIP neurons is plausible.

      The Reviewer correctly points out that 3xTg-AD mice typically do not exhibit plaques before 6 months of age, with limited amounts even up to 12 months, particularly in the hippocampus. To the best of our knowledge, the 6E10 antibody binds to an epitope in APP (682-687) that is also present in the Abeta (3-8) peptide. Consequently, 6E10 detects full-length APP, α-APP (soluble alpha-secretase-cleaved APP), and Abeta (LaFerla et al., 2007). Nonetheless, we concur with the Reviewer's observation that the detected signal includes Abeta oligomers and the C99 fragment, which is currently considered an early marker of AD pathology (Takasugi et al., 2023; Tanuma et al., 2023). Studies have demonstrated intracellular accumulation of C99 in 3-month-old 3xTg mice (Lauritzen et al., 2012), and its binding to the Kv7 potassium channel family, which results in inhibiting their activity (Manville and Abbott, 2021). If a similar mechanism operates in IS-3 cells, it could explain the changes in their firing properties observed in our study. Consequently, we have revised the manuscript to include this crucial information in both the Results and Discussion sections.

      (2) Authors suggest that their animals do not exhibit loss of synaptic connections and show Figure 3d in support of that suggestion. However, imaging with confocal microscopy of 70micron thick sections would not allow the resolution of pre- and post-synaptic terminals. More sensitive measures such as electron microscopy or array tomography are the appropriate techniques to pursue. It is important for the authors to either remove that data from the manuscript or address the limitations of their technique in the discussion section. There is a possibility of loss of synaptic connections in their mouse model at the ages examined.

      We appreciate the Reviewer’s perspective on the techniques used for imaging synaptic connections. While we acknowledge the limitations of confocal microscopy for resolving pre- and post-synaptic structures in thick sections, we respectfully disagree regarding the exclusive suitability of electron microscopy (EM). Our approach involved confocal 3D image acquisition using a 63x objective at 0.2 um lateral resolution and 0.25 Z-step, providing valuable quantitative insights into synaptic bouton density. Despite the challenges posed by thick sections, this method together with automatic analysis allows for careful quantification. Although EM offers unparalleled resolution, it presents challenges in quantification. We have included the important details regarding image acquisition and analysis in the revised manuscript.

      Reviewer #2 (Public Review):

      Summary:

      The submitted manuscript by Michaud and Francavilla et al., is a very interesting study describing early disruptions in the disinhibitory modulation exerted by VIP+ interneurons in CA1, in a triple transgenic model of Alzheimer's disease. They provide a comprehensive analysis at the cellular, synaptic, network, and behavioral level on how these changes correlate and might be related to behavioral impairments during these early stages of the disease.

      Main findings:

      - 3xTg mice show early Aß accumulation in VIP-positive interneurons.

      - 3xTg mice show deficits in a spatially modified version of the novel object recognition test. - 3xTg mice VIP cells present slower action potentials and diminished firing frequency upon current injection.

      - 3xTg mice show diminished spontaneous IPSC frequency with slower kinetics in Oriens / Alveus interneurons.

      - 3xTg mice show increased O/A interneuron activity during specific behavioral conditions. - 3xTg mice show decreased pyramidal cell activity during specific behavioral conditions.

      Strengths:

      This study is very important for understanding the pathophysiology of Alzheimer´s disease and the crucial role of interneurons in the hippocampus in healthy and pathological conditions.

      We are thankful to the reviewer for their insightful recognition of our efforts and their enthusiasm for the results of this research.

      Weaknesses:

      Although results nicely suggest that deficits in VIP physiological properties are related to the differences in network activity, there is no demonstration of causality.

      We completely agree with the reviewer's observation regarding the lack of demonstration of causality in our results. Investigating causality in the relationship between deficits in VIP physiological properties and differences in network activity is indeed a crucial aspect of this project. However, achieving this goal will require a significant amount of time and dedicated manipulations in a new mouse model (VIP-Cre-3xTg). We appreciate the importance of this line of investigation and consider it as a priority for our future research endeavors.

      Recommendations for the authors:

      Reviewer #1 (Recommendations for the authors):

      Limitations:

      (1) The authors should describe their model and state the age at which these mice start depositing amyloid plaques and neurofibrillary tangles. Readers might not be familiar with this model. It is also important to mention that circuit disruptions are assessed prior to plaque and tangle formation.

      We have included a detailed description of the 3xTg-AD mouse model in the Introduction section, including information on the age at which amyloid plaques and neurofibrillary tangles begin to appear. Additionally, we have clarified that circuit disruptions were assessed before the formation of plaques and tangles. These details have been added to both the Introduction and the Results sections to ensure clarity for readers unfamiliar with the model.

      (2) Ns are presented in Supplemental Table 1. Units are presented in a note to Supplementary Table 1. It would be advisable to specify Ns and units as the data is being presented in the results section or figure legends for easy access.

      We have now included the Ns (sample sizes), specifying the number of cells or sections and the number of experimental animals, directly within the Results section and in the figure legends. This ensures that readers have immediate access to this information without needing to refer to the supplementary materials.

      (3) Several typos require correction:

      a. "mamory" - Line 22, page 5.

      b. The term "Interneurons" is abbreviated as both "INs" and "IN" throughout the manuscript. The author should consistently choose one abbreviation.

      We have corrected the typo "mamory" to "memory" on line 22, page 5. Additionally, we have standardized the abbreviation for "Interneurons" to "INs" throughout the manuscript for consistency.

      (4) Note 2 in Supplementary Table 1 states that animals of both sexes with equal distribution were used throughout the study. It would be best for the reader to assess the data distribution based on sex. Thus, it is advisable for the authors to depict male and female data points as distinct symbols throughout the figures.

      Unfortunately, we do not have detailed sex-disaggregated data for all datasets, which limits our ability to depict male and female data points separately across all figures. Therefore, we have opted to pool data from both sexes for a more comprehensive analysis. We believe this approach maintains the robustness of our findings.

      Reviewer #2 (Recommendations for the authors):

      Major Points:

      - To keep the logical line of reasoning and to be able to interpret the results, it would be important to use the same metrics when comparing the population activity of O/A interneurons and principal cells in the different behavioral conditions.

      We have revised Figures 4 and 5 to enhance the coherence in data presentation. This includes using consistent metrics for comparing the population activity of both O/A interneurons and principal cells across different behavioral conditions. These changes ensure a clearer and more logical interpretation of the results.

      - Although results nicely suggest that deficits in VIP physiological properties are related to the differences in network activity, there is no demonstration of causality. Would it be possible to test if manipulating VIP neurons one could obtain such specific results? Alternatively, it could be discussed more in detail how the decrease in disinhibition could lead to the changes in network activity demonstrated here.

      We agree with the reviewer that establishing causality between VIP neuron deficits and changes in network activity would be very important. However, demonstrating causality would require a new line of investigation, involving the use of specific mouse models to selectively manipulate VIP neurons. This is an exciting direction that we plan to prioritize in our future research. For this study, we have included a discussion on the potential mechanisms by which decreased disinhibition might lead to the observed changes in network activity. Specifically, we propose that in young adult 3xTg-AD mice, the altered firing of I-S3 cells may lead to enhanced inhibition of principal cells. This could shift the excitation/inhibition balance, input integration and firing output of principal cells thereby impacting overall network activity. These points are discussed in detail in the revised Discussion section.

      - On the same lines the correlations showed in the manuscript, would be more robust if there was an in vivo demonstration that 3xTg mice indeed show decreased activity in vivo. The same experiments could also clarify if VIP cells in control animals are more active at the time of decision-making and during object exploration as suggested in the manuscript.

      Thank you for your comment. In response to the point raised, we would like to highlight that we have recently documented the increased activity of VIP-INs in the D-zone of the T-maze and during object exploration in a study published in Cell Reports (Tamboli et al., 2024). This publication is now referenced in our manuscript to support our findings. Regarding the in vivo activity of 3xTg mice, our observations indicated no significant differences in major behavioral patterns such as locomotion, rearing, and exploration of the T-maze when comparing Tg and non-Tg mice. These findings are presented in detail in Figure 4c and Supplementary Fig. 5. We believe these data support the robustness of our correlations by demonstrating that the overall behavioral activity of 3xTg mice is comparable to that of non-transgenic controls, thus focusing attention on the specific roles of VIP-INs in early prodromal state of AD pathology.

      Minor Points:

      - Figure 1c: Heading of VIP-Tg should have capital letters.

      Thank you for pointing that out. We have corrected the heading to "VIP-Tg" with capital letters in Figure 1c.

      - Figure 1d: The finding that no change was observed in the percentage of VIP+/CR+ is based on three animals and 3-4 slices per mouse. However, the result of VIP+CR+ in tg-mice has an outlier that might bias the results. I would suggest increasing the number of animals to confirm these results.

      Thank you for your insightful suggestion. We addressed the potential impact of the outlier in the VIP+/CR+ cell density analysis by recalculating the results after removing the outlier using the interquartile range method. This reanalysis revealed a statistically significant difference in the VIP+/CR+ cell density between non-Tg and Tg mice, which we have now detailed in the Results section. Despite this, we have chosen to retain the outlier in our final presentation to accurately represent the biological variability observed in our sample. We agree that increasing the number of animals would further validate these findings and will consider this in future studies.

      - Figure 3d: Would it be possible to identify the recorded interneurons? Is it expected that most of those are OLM cells?

      Thank you for your question. We were unable to fully recover all recorded cells using biocytin staining. However, for those cells with preserved axonal structures, we identified both OLM and bistratified cells, which are the primary targets of I-S3 cells. We have now included this information in the Results section to clarify the types of interneurons identified.

      - Figure 3: Why quantify VGat terminals instead of quantification of VIP-GFP terminals? Combined with the Calretinine labeling it would be more useful to indicate that no changes were observed at the morphological bouton level specifically in disinhibitory interneurons. Please also describe which imageJ plugin was used for the quantification.

      Thank you for your question. Our primary objective was to quantify the synaptic terminals of CR+ INs in the CA1 O/A region, which are predominantly formed by I-S3 cells. Therefore, VGaT and CR co-localization was used to guide this analysis. GFP expression in axonal boutons can sometimes be inconsistent and less reliable for precise quantification. For this analysis, we utilized the “Analyze Particles” function in ImageJ, combined with watershed segmentation, which is now specified in the Methods section.

      -  Figure 4g: How was the statistical test performed? If data was averaged across mice, please add error bars and data points in the figure.

      Thank you for your question. To compare the alternation percentage between non-Tg and Tg mice, we used Fisher’s Exact test as detailed in Supplementary Table 1. In this analysis, we considered each animal's choice individually, comparing the preference for correct versus incorrect choices between the two groups. Since Fisher’s Exact test is designed for analyzing qualitative data rather than quantitative data, averaging across mice was not applicable, and therefore, we did not include error bars or data points in the figure.

      - Figure 4h: To conclude that the increase in activity is larger in the 3xTg mice, there should be a statistical comparison for the magnitude of change between the decision and the stem zone for control and 3xTg mice. To show that there is no significant difference in this measurement in the control mice is insufficient.

      Thank you for your suggestion. We performed a statistical comparison of the magnitude of change in activity between the stem zone and the D-zone for non-Tg and 3xTg mice, as recommended. Our analysis showed no significant difference in this magnitude of change between the two genotypes. These results have now been included in the Results section. However, we would like to highlight an important finding regarding the nature of these changes. In the 3xTg mice, there was a consistent increase in the activity of O/A INs when entering the Dzone. In contrast, non-Tg mice displayed a range of responses, including both increases and decreases in activity. This indicates a higher reliability in the firing of O/A INs in the D-zone of 3xTg mice. Our recent study suggests that VIP-INs are particularly active in the D-zone (Tamboli et al., 2024). Therefore, the absence or reduced input from VIP-INs in 3xTg mice may lead to the observed higher engagement of O/A INs in this zone. We believe this observation is crucial for understanding the differential yet nuanced changes in neural dynamics in these mice.

      - In the methods, it is stated that there was a pre-selection of animals depending on learning performance. Would it be possible to also show the data from animals that did not properly learn? Alternatively, it would be useful to plot the correlation between performance in this test and the difference between activity in the stem and the decision-making zone. The reason to ask for this is that there is a trend for control animals to show reduced alternations (50 vs 80%, although not significant, it is a big difference). Considering that there is also a trend in control animals to show increased activity in the decision-making zone, it would be important to confirm that this is not only due to differences in performance. The current statistical procedure does not allow discarding this.

      In this study, we excluded from the analysis the animals that refused to explore the T-maze and spent all their time in the stem corner, or refused to explore the objects and stayed in the open field maze (OFM) corner. These exclusions applied to both non-Tg (n = 6) and Tg (n = 5) groups, indicating that low exploratory activity is not necessarily linked to AD-related mutations. During the T-maze test, we also observed several animals that made incorrect choices (4 out of 9 non-Tg and 1 out of 6 Tg mice). However, due to the low number of animals making incorrect choices, we were unable to form a separate group for analysis based on incorrect choices. These details are now provided in the Methods section.

      - Figure 4i. It is not clear when exactly cell activity was measured. If it was during the entire recording time, I think it would be interesting to see if the activity of O/A interneurons is different specifically during interaction with the object in 3xTg mice.

      Cell activity was indeed measured throughout the entire recording session and analyzed in relation to animal behavior (immobility to walking; Fig. 4d,e), and periods specifically related to interaction with objects were extracted for analysis (Figure 4i).

      - Why was the object modulation measured during a different task in which both objects were the same? The figure is misleading in that sense, as it suggests the experiment was the same as for the other panels with two different objects. It would be important to correct this if the authors want to correlate the deficits in NOR in 3xTg mice and changes in IN activity.

      The study specifically investigated object-modulated neural activity during the Sampling phase. Therefore, two identical objects were placed in the arena for animal exploration. As mentioned above, due to several animals failing to explore the OFM and objects on the second day, they were excluded from the analysis, preventing the conduct of the novel-object exploration Test Trial. Both non-Tg and Tg mice showed a lack of exploration in the OFM and Tmaze, for reasons that remain unclear. Consequently, we opted to present robust data on neural activity during the initial sampling of two identical objects. However, further investigation is needed to understand how this activity relates to deficits observed in the classical NOR test.

      - Figure. 5c-f. I would strongly suggest performing the same quantification and displaying similar figures for the fiber photometry experiments in interneurons and principal cells. It would help to interpret the data.

      We have taken the reviewer's suggestion into account and standardized the data analysis and presentation. Figures 4d, e and 5c, d now depict the walk-induced activity in INs and PCs, respectively. Figures 4h and 5f compare activity between the stem and D-zone in the T-maze. Additionally, Figures 4j and 5h illustrate the object modulation of INs and PCs, respectively.

      - Although velocity and mobility were quantified, it would be important to show also that they are not different during those times when activity was dissimilar, as in the decision zone.

      We have analyzed these data and found no significant differences between the two genotypes in terms of velocity and mobility during these periods. This analysis is now presented in Supplementary Figure 5e, f and detailed in the Results section.

      - Figure 5g-h. Similarly, I would suggest using the same metrics in order to correlate the results from interneuron and principal cell activity photometry.

      We have updated this figure to align with the presentation of interneurons (Figure 4j) and included RMS analysis to emphasize lower variance in object modulation of PCs as an indicator of increased network inhibition.

      - Was object modulation variance also different for INs depending on the mouse phenotype?

      We conducted this additional analysis but did not find any significant difference.

      - Figure S4: would it be possible to identify the postsynaptic partners?

      As mentioned above, for those cells with preserved axonal structures, we identified both OLM and bistratified cells. We have now included this information in the Results section to clarify the types of interneurons identified.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment 

      The authors present 16 new well-preserved specimens from the early Cambrian Chengjiang biota. These specimens potentially represent a new taxon which could be useful in sorting out the problematic topology of artiopodan arthropods - a topic of interest to specialists in Cambrian arthropods. Because the anatomic features in the new specimens were neither properly revealed nor correctly interpreted, the evidence for several conclusions is inadequate. 

      We thank the Senior Editor, Reviewing Editor and three reviewers for their work, and for their comments aimed at improving this project and manuscript. We have engaged with all the comments in detail, in order to strengthen our work. This includes adding additional data to support that all Acanthomeridion specimens belong to a single species, running further phylogenetic analyses including more trilobite terminals to test the specific hypothesis and interpretation raised by Reviewer 2, and visualising our results in treespace in order to determine support for the different interpretations of the ventral structures and their implications for the evolution of Artiopoda. We have also greatly expanded the introduction, which we feel adds clarity to areas misunderstood by some reviewers in the previous version of the manuscript.

      Our point-by-point response to the public reviews of the reviewers are outlined below. We have also made changes resulting from the additional suggestions which are not public, which we have not reproduced below. We submit a new version of the main text, and can provide a tracked changes version if required. The new main text includes 9 figures and is 8624 words including captions and reference list.

      Public Reviews: 

      Reviewer #1 (Public Review): 

      Summary: 

      Du et al. report 16 new well-preserved specimens of atiopodan arthropods from the Chengjiang biota, which demonstrate both dorsal and ventral anatomies of a potential new taxon of artipodeans that are closely related to trilobites. Authors assigned their specimens to Acanthomeridion serratum and proposed A. anacanthus as a junior subjective synonym of Acanthomeridion serratum. Critically, the presence of ventral plates (interpreted as cephalic liberigenae), together with phylogenic results, lead authors to conclude that the cephalic sutures originated multiple times within the Artiopoda. 

      We thank Reviewer 1 for their comments on the strengths and weaknesses of the previous version of the manuscript. We hope that the revised version strengthens our conclusions that Acanthomeridion anacanthus is a junior synonym of A. serratum.

      Strengths: 

      New specimens are highly qualified and informative. The morphology of the dorsal exoskeleton, except for the supposed free cheek, was well illustrated and described in detail, which provides a wealth of information for taxonomic and phylogenic analyses. 

      Weaknesses: 

      The weaknesses of this work are obvious in a number of aspects. Technically, ventral morphology is less well revealed and is poorly illustrated. Additional diagrams are necessary to show the trunk appendages and suture lines. Taxonomically, I am not convinced by the authors' placement. The specimens are markedly different from either Acanthomeridion serratum Hou et al. 1989 or A. anacanthus Hou et al. 2017. The ontogenetic description is extremely weak and the morpholical continuity is not established. Geometric and morphometric analyses might be helpful to resolve the taxonomic and ontogenic uncertainties. 

      We appreciate that the reviewer was not convinced by our synonimisation in the first version of the manuscript. The recommendation of the reviewer to provide linear morphometric support for our synonymisation was much appreciated. We have provided measurements of the length and width of the thorax (Figure 6 in the new version), visualising the position of specimens previously assigned to A. anacanthus, to show this morphological continuity. These act as a complement to Figure 5, which shows the fossils in an ontogenetic trend.

      I am confused by the author's description of the free cheek (libragena) and ventral plate. Are they the same object? How do they connect with other parts of the cephalic shield, e.g. hypostome, and fixgena? Critically, the homology of cephalic slits (eye slits, eye notch, dorsal suture, facial suture) is not extensively discussed either morphologically or functionally.

      We appreciate that the brevity of the introduction in the previous version led to some misunderstandings and some confusion. We have provided a greatly expanded introduction, including a new Figure 1, which outlines the possible homologies of the ventral plates and the three hypotheses considered in this study. The function of the cephalic and dorsal suture are now discussed in more detail both in introduction and discussion.

      Finally, the authors claimed that phylogenic results support two separate origins rather than a deep origin. However, the results in Figure 4 can explain a deep homology of the cephalic suture at molecular level and multiple co-options within the Atiopoda. 

      A deep molecular origin is difficult to demonstrate using solely fossil material from an extinct group such as Artiopoda. Thus our study focuses on morphological origins. The number of losses required for a deep morphological origin means that we favour multiple independent morphological origins.

      Reviewer #2 (Public Review): 

      Overall: This paper describes new material of Acanthomeridion serratum that the authors claim supports its synonymy with Acanthomeridion anacanthus. The material is important and the description is acceptable after some modification. In addition, the paper offers thoughts and some exploration of the possibility of multiple origins of the dorsal facial suture among artiopods, at least once within Trilobita and also among other non-trilobite artiopods. Although this possibility is real and apparently correct, the suggestions presented in this paper are both surprising and, in my opinion, unlikely to be true because the potential homologies proposed with regard to Acanthomeridion and trilobite-free cheeks are unconventional and poorly supported. 

      What to do? I can see two possibilities. One, which I recommend, is to concentrate on improving the descriptive part of the paper and omit discussion and phylogenetic analysis of dorsal facial suture distribution, leaving that for more comprehensive consideration elsewhere. The other is to seek to improve both simultaneously. That may be possible but will require extensive effort. 

      We thank the reviewer for their detailed comments and suggestions for multiple ways in which we might revise the manuscript. We have taken the option that is more effort, but we hope more reward, in interrogating the larger question alongside improving the descriptive part of the paper. This has taken a long time and incorporation of new techniques, but has in our opinion greatly strengthened the work.

      Major concerns 

      Concern 1 - Ventral sclerites as free cheek homolog, marginal sutures, and the trilobite doublure 

      Firstly, a couple of observations that bear on the arguments presented - the eyes of A. serratum are almost marginal and it is not clear whether a) there is a circumocular suture in this animal and b) if there was, whether it merged with the marginal suture. These observations are important because this animal is not one in which an impressive dorsal facial suture has been demonstrated - with eyes that near marginal it simply cannot do so. Accordingly, the key argument of this paper is not quite what one would expect. That expectation would be that a non-trilobite artiopod, such as A. serratum, shows a clear dorsal facial suture. But that is not the case, at least with A. serratum, because of its marginal eyes. Rather, the argument made is that the ventral doublure of A. serratum is the homolog of the dorsal free cheeks of trilobites. This opens up a series of issues. 

      We appreciate that the reviewer disagrees with both interpretations we offered for the ventral plates, and has offered a third interpretation for the homology of this feature with the doublure of trilobites. Support for our original interpretation comes from the position of the eye stalks in Acanthomeridion, which fall very close to the suture between ventral plate rest of the cephalon. However, we appreciate that the reviewer has a valid interpretation, that the ventral plates might be homologues of the doublure alone.

      To clarify the (two, now three) hypotheses of homology for the ventral plates considered in this study, we provide a new summary figure (Figure 1). In addition, the introduction has been greatly lengthened with further discussion of the different suture types in trilobites, their importance for trilobite classification schemes, and extensive references to older literature are now included. Further, we add background to the hypotheses around the origins of dorsal ecdysial sutures. 

      We add that the interpretation of A. serratum as having features homologous to the dorsal sutures of trilobites is already present in the literature, and so while the reviewer may disagree with it, it is certainly a hypothesis that requires testing.

      The paper's chief claim in this regard is that the "teardrop" shaped ventral, lateral cephalic plates in Acanthomeridion serratum are potential homologs of the "free cheeks" of those trilobites with a dorsal facial suture. There is no mention of the possibility that these ventral plates in A. serratum could be homologs of the lateral cephalic doublure of olenelloid trilobites, which is bound by an operative marginal suture or, in those trilobites with a dorsal facial suture, that it is a homolog of only the doublure portions of the free cheeks and not with their dorsal components. 

      We include this third possibility in our revised analyses and manuscript. To test this properly required adding in an olenelloid trilobite to our matrix, as we needed a terminal that had both a marginal and circumoral suture, but not fused. We chose Olenellus getzi for this purpose, as it is the only Olenellus with some appendages known (the antennae). We also added further characters to the morphological matrix, and additional trilobites from which soft tissues are known, in order to better resolve this part of the tree. Trilobites in the final analyses were: Anacheirurus adserai, Cryptolithus tesselatus, Eoredlichia intermedia, Olenoides serratus, Olenellus getzi, Triarthrus eatoni.

      However, addition of these trilobites added a further complication. Under unconstrained analysis, Olenellus getzi was resolved with Eoredlichia intermediata as a clade sister to all other trilobites.

      Thus the topology of Paterson et al. 2019 (PNAS) was not recovered, and so the hypothesis of Reviewer 2 could not be robustly tested. In order to achieve a topology comparable to Paterson et al., we ran a further three analyses, where we constrained a clade of all trilobites except for O. getzi. This recovered a topology where the earliest diverging trilobites had unfused sutures, and thus one suitable for considering the role of Acanthomeridion serratum ventral plates as homologues of the doublure of trilobites.

      Unfortunately, for these analyses (both constrained and unconstrained), Acanthomeridion was not resolved as sister to trilobites, but instead elsewhere in the tree (see Table 1 in main text, Fig. 9, and  SFig 9). Thus our analyses do not find support for the reviewer’s hypothesis as multiple origins of this feature are still required.

      It was still an excellent point that we should consider this hypothesis, and we have retained it, and discussion surrounding it, in our manuscript.

      The introduction to the paper does not inform the reader that all olenelloids had a marginal suture - a circumcephalic suture that was operative in their molting and that this is quite different from the situation in, say, "Cedaria" woosteri in which the only operative cephalic exoskeletal suture was circumocular. The conservative position would be that the olenelloid marginal suture is the homolog of the marginal suture in A. serratum: the ventral plates thus being homolog of the trilobite cephalic doublure, not only potential homolog to the entire or dorsal only part of the free cheeks of trilobites with a dorsal facial suture. As the authors of this paper decline to discuss the doublure of trilobites (there is a sole mention of the word in the MS, in a figure caption) and do not mention the olenelloid marginal suture, they give the reader no opportunity to assess support for this alternative. 

      At times the paper reads as if the authors are suggesting that olenelloids, which had a marginal cephalic suture broadly akin to that in Limulus, actually lacked a suture that permitted anterior egression during molting. The authors are right to stress the origin of the dorsal cephalic suture in more derived trilobites as a character seemingly of taxonomic significance but lines such as 56 and 67 may be taken by the non-specialist to imply that olenelloids lacked a forward egressionpermiting suture. There is a notable difference between not knowing whether sutures existed (a condition apparently quite common among soft-bodied artiopods) and the well-known marginal suture of olenelloids, but as the MS currently reads most readers will not understand this because it remains unexplained in the MS. 

      As noted in response to a previous point (above) we now have a greatly expanded introduction which should give the reader an opportunity to assess support for this alternative hypothesis. We now include Olenellus getzi in our analyses, and have added characters to the morphological matrix to make this clear.

      A reference to the case of ‘Cedaria’ woosteri is made in the introduction to highlight further the variability of trilobites, as is a reference to Foote’s analysis of cranidial shapes and support this provides for a  single origin of the dorsal suture.

      With that in mind, it is also worth further stressing that the primary function of the dorsal sutures in those which have them is essentially similar to the olenelloid/limulid marginal suture mentioned above. It is notable that the course of this suture migrated dorsally up from the margin onto the dorsal shield and merged with the circumocular suture, but this innovation does not seem to have had an impact on its primary function - to permit molting by forward egression. Other trilobites completely surrendered the ability to molt by forward egression, and there are even examples of this occurring ontogenetically within species, suggesting a significant intraspecific shift in suture functionality and molting pattern. The authors mention some of this when questioning the unique origin of the dorsal facial suture of trilobites, although I don't understand their argument: why should the history of subsequent evolutionary modification of a character bear on whether its origin was unique in the group? 

      We include reference to evolutionary modification and loss of this character as it is important to stress that if a character is known to have been lost multiple times it is possible that it had a deeper root (in an earlier diverging member of Artiopoda than Trilobita) and was lost in olenelloids. This is the question that we seek to address in our manuscript.

      The bottom line here is that for the ventral plates of A. serratum to be strict homologs of only the dorsal portion of the dorsal free cheeks, there would be no homolog of the trilobite doublure in A. serratum. The conventional view, in contrast, would be that the ventral plates are a homolog of the ventral doublure in all trilobites and ventral plates in artiopods. I do not think that this paper provides a convincing basis for preferring their interpretation, nor do I feel that it does an adequate job of explaining issues that are central to the subject. 

      We stress that our interpretations – that the ventral plates are not homologous to any artiopodan feature or that they are homologous to the free cheeks of trilobites – have both been raised in the literature before. Whereas we could not find mention of the reviewer’s ‘conventional view’ relating to Acanthomeridion. We appreciate that this view is still valid and worth investigating, which we have done in the further analyses conducted. However, we did not find support for it. Instead we find some support for both ventral plates as homologues of free cheeks, and as unique structures within Artiopoda.

      Concern 2. Varieties of dorsal sutures and the coexistence of dorsal and marginal sutures 

      The authors do not clarify or discuss connections between the circumocular sutures (a form of dorsal suture that separates the visual surface from the rest of the dorsal shield) and the marginal suture that facilitates forward egression upon molting. Both structures can exist independently in the same animal - in olenelloids for example. Olenelloids had both a suture that facilitated forward egression in molting (their marginal suture) and a dorsal suture (their circumocular suture). The condition in trilobites with a dorsal facial suture is that these two independent sutures merged - the formerly marginal suture migrating up the dorsal pleural surface to become confluent with the circumocular suture. (There are also interesting examples of the expansion of the circumocular suture across the pleural fixigena.) The form of the dorsal facial suture has long figured in attempts at higher-level trilobite taxonomy, with a number of character states that commonly relate to the proximity of the eye to the margin of the cephalic shield. The form of the dorsal facial suture that they illustrate in Xanderella, which is barely a strip crossing the dorsal pleural surface linking marginal and circumocular suture, is comparable to that in the trilobites Loganopeltoides and Entomapsis but that is a rare condition in that clade as a whole. The paper would benefit from a clear discussion of these issues at the beginning - the dorsal facial suture that they are referring to is a merged circumcephalic suture and circumocular suture - it is not simply the presence of a molt-related suture on the dorsal side of the cephalon. 

      We have added in an expanded introduction where these points are covered in detail. We appreciate that this was not clear in the earlier version, and this suggestion has greatly improved our work.

      Concern 3. Phylogenetics 

      While I appreciate that the phylogenetic database is a little modified from those of other recent authors, still I was surprised not to find a character matrix in the supplementary information (unless it was included in some way I overlooked), which I would consider a basic requirement of any paper presenting phylogenetic trees - after all, there's no a space limit. It is not possible for a reviewer to understand the details of their arguments without seeing the character states and the matrix of state assignments. 

      A link to a morphobank project was included in the first submission. This project has been updated for the current submission, including an additional matrix to treat the reviewer’s hypothesis for the ventral plates. Morphobank Project #P4290. Email address: P4290, reviewer password:

      Acanthomeridion2023, accessible at morphobank.org. We have added in additional details for the reviewer and others to help them access the project:

      The project can be accessed at morphobank.org, using the below credentials to log in:  Email address: P4290, Password: Acanthomeridion 2023.

      The section "phylogenetic analyses" provides a description of how tree topology changes depending on whether sutures are considered homologous or not using the now standard application of both parsimony and maximum likelihood approaches but, considering that the broader implications of this paper rest of the phylogenetic interpretation, I also found the absence of detailed discussion of the meaning and implications of these trees to be surprising, because I anticipated that this was the main reason for conducting these analysis. The trees are presented and briefly described but not considered in detail. I am troubled by "Circles indicate presence of cephalic ecdysial sutures" because it seems that in "independent origin of sutures" trilobites are considered to have two origins (brown color dot) of cephalic ecdysial sutures - this may be further evidence that the team does not appreciate that olenelloids have cephalic ecdysial sutures, as the basal condition in all trilobites. Perhaps I'm misunderstanding their views, but from what's presented it's not possible to know that. Similarly, in the "sutures homologous" analyses why would there be two independent green dots for both Acanthomeridion and Trilobita, rather than at the base of the clade containing them both, as cephalic ecdysial sutures are basal to both of them? Here again, we appear to see evidence that the team considers dorsal facial sutures and cephalic ecdysial sutures to be synonymous - which is incorrect.  

      We appreciate that the reviewer misunderstood the meaning of the dots, leading to confusion. The dots indicated how features were coded in the phylogenetic analysis. In our revised version of this figure (Figure 8 in the new version), these dots are now clearly labelled as indicating ‘coding in phylogenetic matrix’. Further, with the revised character list, we now can provide additional detail for the types of sutures (relevant as we now include more trilobite terminals).

      This point aside, and at a minimum, that team needs to do a more thorough job of characterizing and considering the variety of conditions of dorsal sutures among artiopods, their relationships to the marginal suture and to the circumocular suture, the number, and form of their branches, etc. 

      We thank the reviewer for this summary, and appreciate their concerns and thorough review. Our revised version takes into account all these points raised, and they have greatly improved the clarity, scope and thoroughness of the work.

      Reviewer #3 (Public Review): 

      Summary:

      Well-illustrated new material is documented for Acanthomeridion, a formerly incompletely known Cambrian arthropod. The formerly known facial sutures are shown to be associated with ventral plates that the authors very reasonably homologise with the free cheeks of trilobites. A slight update of a phylogenetic dataset developed by Du et al, then refined slightly by Chen et al, then by Schmidt et al, and again here, permits another attempt to optimise the number of origins of dorsal ecdysial sutures in trilobites and their relatives. 

      Strengths:

      Documentation of an ontogenetic series makes a sound case that the proposed diagnostic characters of a second species of Acanthomeridion are variations within a single species. New microtomographic data shed some light on appendage morphology that was not formerly known. The new data on ventral plates and their association with the ecdysial sutures are valuable in underpinning homologies with trilobites. 

      We thank the Reviewer 3 for their positive comments about the manuscript. We appreciate the constructive comments for improvements, and detailed corrections, which we have incorporated into our revised work.

      Weaknesses:

      The main conclusion remains clouded in ambiguity because of a poorly resolved Bayesian consensus and is consistent with work led by the lead author in 2019 (thus compromising the novelty of the findings). The Bayesian trees being majority rules consensus trees, optimising characters onto them (Figure 7b, d) is problematic. Optimising on a consensus tree can produce spurious optimisations that inflate tree length or distort other metrics of fit. Line 264 refers to at least three independent origins of cephalic sutures in artiopodans but the fully resolved Figure 7c requires only two origins. 

      We thank the reviewer for pointing this out. However now the analyses have been re-run we have new results to consider. The results still support multiple origins of sutures. We also note that the dots were indicating how terminals were coded. This is now clearer in the revised version of this figure (Figure 8 in the new version).

      We have extended our interrogation of the trees by incorporating treespace analyses. These add support for the nodes of interest (around the base of trilobites), showing that the coding of Acanthomeridion ventral plate homologies impacts its position in the tree, and thus has implications for our understanding of the evolution of sutures in trilobites.

      The question of how many times dorsal ecdysial sutures evolved in Artiopoda was addressed by Hou et al (2017), who first documented the facial sutures of Acanthomeridion and optimised them onto a phylogeny to infer multiple origins, as well as in a paper led by the lead author in Cladistics in 2019. Du et al. (2019) presented a phylogeny based on an earlier version of the current dataset wherein they discussed how many times sutures evolved or were lost based on their presence in

      Zhiwenia/Protosutura, Acanthomeridion, and Trilobita. To their credit, the authors acknowledge this (lines 62-65). The answer here is slightly different (because some topologies unite Acanthomeridion and trilobites). 

      The following points are not meant to be "Weaknesses" but rather are refinements: 

      I recommend changing the title of the paper from "cephalic sutures" to "dorsal ecdysial sutures" to be more precise about the character that is being tracked evolutionarily. Lots of arthropods have cephalic sutures (e.g., the ventral marginal suture of xiphosurans; the Y-shaped dorsomedian ecdysial line in insects). The text might also be updated to change other instances of "cephalic sutures" to a more precise wording. 

      We appreciate this point and have changed the title as suggested. 

      The authors have provided (but not explicitly identified) support values for nodes in their Bayesian trees but not in their parsimony ones. Please do the jackknife or bootstrap for the parsimony analyses and make it clear that the Bayesian values are posterior probabilities. 

      With the addition of further trilobite terminals to our parsimony analyses, the results became poor.

      Specifically the internal relationships of trilobites did not conform to any previous study, and Olenellus getzi was not resolved as an early diverging member of the group. This meant that these analyses could not be used for addressing the hypothesis of reviewer two. We decided to exclude reporting parsimony analysis results from this version to avoid confusion.

      We have added a note that the values reported at the nodes are posterior probabilities to figures S8, S9 and S10 where we show the full Bayesian results.

      In line 65 or somewhere else, it might be noted that a single origin of the dorsal facial sutures in trilobites has itself been called into question. Jell (2003) proposed that separate lineages of Eutrilobita evolved their facial sutures independently from separate sister groups within Olenellina. 

      We have added this to the introduction (Line 98). Thank you for raising this point.

      I have provided minor typographic or terminological corrections to the authors in a list of recommendations that may not be publicly available. 

      We appreciate the points made by the reviewer and their detailed corrections, which we have corrected in the revised version.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this paper the authors provide a characterisation of auditory responses (tones, noise, and amplitude modulated sounds) and bimodal (somatosensory-auditory) responses and interactions in the higher order lateral cortex (LC) of the inferior colliculus (IC) and compare these characteristic with the higher order dorsal cortex (DC) of the IC - in awake and anaesthetised mice. Dan Llano's group have previously identified gaba'ergic patches (modules) in the LC distinctly receiving inputs from somatosensory structures, surrounded by matrix regions receiving inputs from auditory cortex. They here use 2P calcium imaging combined with an implanted prism to - for the first time - get functional optical access to these subregions (modules and matrix) in the lateral cortex of IC in vivo, in order to also characterise the functional difference in these subparts of LC. They find that both DC and LC of both awake and anaesthetised appears to be more responsive to more complex sounds (amplitude modulated noise) compared to pure tones and that under anesthesia the matrix of LC is more modulated by specific frequency and temporal content compared to the gaba'ergic modules in LC. However, while both LC and DC appears to have low frequency preferences, this preference for low frequencies is more pronounced in DC. Furthermore, in both awake and anesthetized mice somatosensory inputs are capable of driving responses on its own in the modules of LC, but very little in the matrix. The authors now compare bimodal interactions under anaesthesia and awake states and find that effects are different in some cases under awake and anesthesia - particularly related to bimodal suppression and enhancement in the modules.

      The paper provides new information about how subregions with different inputs and neurochemical profiles in the higher order auditory midbrain process auditory and multisensory information, and is useful for the auditory and multisensory circuits neuroscience community.

      The manuscript is improved by the response to reviewers. The authors have addressed my comments by adding new figures and panels, streamlining the analysis between awake and anaesthetised data (which has led to a more nuanced, and better supported conclusion), and adding more examples to better understand the underlying data. In streamlining the analyses between anaesthetised and awake data I would probably have opted for bringing these results into merged figures to avoid repetitiveness and aid comparison, but I acknowledge that that may be a matter of style. The added discussions of differences between awake and anaesthesia in the findings and the discussion of possible reasons why these differences are present help broaden the understanding of what the data looks like and how anaesthesia can affect these circuits.

      As mentioned in my previous review, the strength of this study is in its demonstration of using prism 2p imaging to image the lateral shell of IC to gain access to its neurochemically defined subdivisions, and they use this method to provide a basic description of the auditory and multisensory properties of lateral cortex IC subdivisions (and compare it to dorsal cortex of IC). The added analysis, information and figures provide a more convincing foundation for the descriptions and conclusions stated in the paper. The description of the basic functionality of the lateral cortex of the IC are useful for researchers interested in basic multisensory interactions and auditory processing and circuits. The paper provides a technical foundation for future studies (as the authors also mention), exploring how these neurochemically defined subdivisions receiving distinct descending projections from cortex contribute to auditory and multisensory based behaviour.

      Minor comment:

      - The authors have now added statistics and figures to support their claims about tonotopy in DC and LC. I asked for and I think allows readers to better understand the tonotopical organisation in these areas. One of the conclusions by the authors is that the quadratic fit is a better fit that a linear fit in DCIC. Given the new plots shown and previous studies this is likely true, though it is worth highlighting that adding parameters to a fitting procedure (as in the case when moving from linear to quadratic fit) will likely lead to a better fit due to the increased flexibility of the fitting procedure.

      Thank you for the suggestion. We have highlighted that the quadratic function allowed the regression model to include the cells tuned to higher frequencies at the rostromedial part of the DC and result in a better fit, which is consistent with the tonotopic organization that was previously described as shown in text at (lines 208-211).

      Reviewer #2 (Public Review):

      Summary:

      The study describes differences in responses to sounds and whisker deflections as well as combinations of these stimuli in different neurochemically defined subsections of the lateral and dorsal cortex of the inferior colliculus in anesthetised and awake mice.

      Strengths:

      A major achievement of the work lies in obtaining the data in the first place as this required establishing and refining a challenging surgical procedure to insert a prism that enabled the authors to visualise the lateral surface of the inferior colliculus. Using this approach, the authors were then able to provide the first functional comparison of neural responses inside and outside of the GABA-rich modules of the lateral cortex. The strongest and most interesting aspects of the results, in my opinion, concern the interactions of auditory and somatosensory stimulation. For instance, the authors find that a) somatosensory-responses are strongest inside the modules and b) somatosensory-auditory suppression is stronger in the matrix than in the modules. This suggests that, while somatosensory inputs preferentially target the GABA-rich modules, they do not exclusively target GABAergic neurons within the modules (given that the authors record exclusively from excitatory neurons we wouldn't expect to see somatosensory responses if they targeted exclusively GABAergic neurons) and that the GABAergic neurons of the modules (consistent with previous work) preferentially impact neurons outside the modules, i.e. via long-range connections.

      Weaknesses:

      While the findings are of interest to the subfield they have only rather limited implications beyond it and the writing is not quite as precise as it could be.

      Reviewer #3 (Public Review):

      The lateral cortex of the inferior colliculus (LC) is a region of the auditory midbrain noted for receiving both auditory and somatosensory input. Anatomical studies have established that somatosensory input primarily impinges on "modular" regions of the LC, which are characterized by high densities of GABAergic neurons, while auditory input is more prominent in the "matrix" regions that surround the modules. However, how auditory and somatosensory stimuli shape activity, both individually and when combined, in the modular and matrix regions of the LC has remained unknown.

      The major obstacle to progress has been the location of the LC on the lateral edge of the inferior colliculus where it cannot be accessed in vivo using conventional imaging approaches. The authors overcame this obstacle by developing methods to implant a microprism adjacent to the LC. By redirecting light from the lateral surface of the LC to the dorsal surface of the microprism, the microprism enabled two-photon imaging of the LC via a dorsal approach in anesthetized and awake mice. Then, by crossing GAD-67-GFP mice with Thy1-jRGECO1a mice, the authors showed that they could identify LC modules in vivo using GFP fluorescence while assessing neural responses to auditory, somatosensory, and multimodal stimuli using Ca2+ imaging. Critically, the authors also validated the accuracy of the microprism technique by directly comparing results obtained with a microprism to data collected using conventional imaging of the dorsal-most LC modules, which are directly visible on the dorsal IC surface, finding good correlations between the approaches.

      Through this innovative combination of techniques, the authors found that matrix neurons were more sensitive to auditory stimuli than modular neurons, modular neurons were more sensitive to somatosensory stimuli than matrix neurons, and bimodal, auditory-somatosensory stimuli were more likely to suppress activity in matrix neurons and enhance activity in modular neurons. Interestingly, despite their higher sensitivity to somatosensory stimuli than matrix neurons, modular neurons in the anesthetized prep were overall more responsive to auditory stimuli than somatosensory stimuli (albeit with a tendency to have offset responses to sounds). This suggests that modular neurons should not be thought of as primarily representing somatosensory input, but rather as being more prone to having their auditory responses modified by somatosensory input. However, this trend was different in the awake prep, where modular neurons became more responsive to somatosensory stimuli. Thus, to this reviewer, one of the most intriguing results of the present study is the extent to which neural responses in the LC changed in the awake preparation. While this is not entirely unexpected, the magnitude and stimulus specificity of the changes caused by anesthesia highlight the extent to which higher-level sensory processing is affected by anesthesia and strongly suggests that future studies of LC function should be conducted in awake animals.

      Together, the results of this study expand our understanding of the functional roles of matrix and module neurons by showing that responses in LC subregions are more complicated than might have been expected based on anatomy alone. The development of the microprism technique for imaging the LC will be a boon to the field, finally enabling much-needed studies of LC function in vivo. The experiments were well-designed and well-controlled, the limitations of two-photon imaging for tracking neural activity are acknowledged, and appropriate statistical tests were used.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      - Increase font size of scale bars on figure 6.

      Thank you for the suggestion. We have increased the font size of the scale bar.

      Reviewer #2 (Recommendations For The Authors):

      Line 505: typo: 'didtinction'

      Thank you for the suggestion and we do apologize for the typo. We have fixed the word as shown in the text (line 506).

      No further comments.

      Reviewer #3 (Recommendations For The Authors):

      Line 543: Change "contripute" to "contribute"

      Thank you for the suggestion and we do apologize for the typo. We have fixed the word as shown in the text (line 544).

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      (1) In the first paragraph of the result section it is not clear why the authors introduce the function of p53ΔAS/ΔAS in thymocyte and then they mention fibroblasts. The authors should clarify this point. The authors should also explain based on what rationale they use doxorubicin and nutlin to analyze p53 activity (Figure 1 and figure S1). 

      We thank the reviewer for this comment. In the revised manuscript, we corrected this by mentioning, at the beginning of the Results section: “We analyzed cellular stress responses in thymocytes, known to undergo a p53-dependent apoptosis upon irradiation (Lowe et al., 1993), and in primary fibroblasts, known to undergo a p53-dependent cell cycle arrest in response to various stresses - e.g. DNA damage caused by irradiation or doxorubicin (Kastan et al., 1992), and the Nutlin-mediated inhibition of Mdm2, a negative regulator of p53 (Vassilev et al., 2004).”

      (2) The authors should provide quantification for the western blot in figure 2D because the reduction of p53 protein level in mutant vs wt tumors is not striking. 

      In the previous version of the manuscript, the quantification of p53 bands had been included, but quantification results were mentioned below the actin bands, rather than the p53 bands, and this was probably confusing. We have corrected this in the revised version of the manuscript. The quantification results are now provided just below the p53 bands in Figs. 1B and 2D, which should clarify this point. For Figure 2D, the quantifications show a strong decrease in p53 levels for 3 out of 4 analyzed mutant tumors. For consistency purposes, in the revised manuscript the quantification results also appear below Myc bands in Fig. 2C.

      (3) In the discussion section, the authors propose that a difference in Ackr4 expression may have prognostic value and that measuring ACKR4 gene expression in male patients with Burkitt lymphoma could be useful to identify the patients at higher risk. However the authors perform a lot of correlative analysis, both in mice and in patients, but the manuscript lacks of functional experiments that could help to functionally characterize Ackr4 and Mt2 in the etiology of B-cell lymphomas in males (both in mouse and in human models).

      In the previous version of the manuscript, we proposed that Ackr4 might act as a suppressor of B-cell lymphomagenesis by attenuating Myc signaling. This hypothesis relied on studies showing that Ackr4 impairs the Ccr7 signaling cascade, which may lead to decreased Myc activity (Ulvmar et al., 2014; Shi et al., 2015; Bastow et al., 2021) and that the loss of Ccr7 may delay Myc-driven lymphomagenesis (Rehm et al., 2011). Furthermore, we proposed that the increased expression of Mt2 in p53ΔAS/ΔAS Em-Myc male splenic cells reflected an increase in Myc activity, because Mt2 is known to be regulated by Myc (Qin et al., 2021) and because the Mt2 promoter is bound by Myc in B cells according to experiments reported in the ChIP-Atlas database. However, in the first version of the manuscript this hypothesis might have appeared only partially supported by our data because an increase in Myc activity could be expected to have a more general impact, i.e. an impact not only on the expression of Mt2, but also on the expression of many canonical Myc target genes. In the revised manuscript, we show that this is indeed the case. We performed a gene set enrichment analysis (GSEA) comparing the RNAseq data from p53ΔAS/ΔAS Eμ-Myc and p53+/+ Eμ-Myc male splenic cells and found an enrichment of hallmark Myc targets in p53ΔAS/ΔAS Eμ-Myc cells. These new data, which strengthen our hypothesis of differences in Myc signaling intensity, are presented in Fig. 3K and Table S2.

      Importantly, we now go beyond correlative analyses by providing direct experimental evidence that ACKR4 impacts on the behavior of Burkitt lymphoma cells. We used a CRISPR-Cas9 approach to knock-out ACKR4 in Raji Burkitt lymphoma cells and found that ACKR4 KO cells exhibited a 4-fold increase in chemokine-guided cell migration. These new data are presented in Figure 4F and the supplemental Figures S5-S7.  

      Finally, following a suggestion of Reviewer#2, we now also point out that “Ackr4 regulates B cell differentiation (Kara et al., 2018), which raises the possibility that an altered p53-Ackr4 pathway in p53ΔAS/ΔAS Eμ-Myc male splenic cells might contribute to increase the pools of pre-B and immature B cells that may be prone to lymphomagenesis.”

      In sum, we now mention in the Discussion that a decrease in Ackr4 expression might promote B-cell lymphomagenesis through three non-exclusive mechanisms.

      Reviewer #2 (Recommendations For The Authors): 

      (1) A great addition would be to demonstrate how p53AS specifically contributes to the regulation of Ackr4. In particular, is there evidence that p53AS might be preferentially recruited on p53 RE within that gene as compared to WT? The availability of specific antibodies that distinguish between AS and WT p53 might help to address this (experimentally complex) question. As a note, usage of such antibodies would also strengthen Fig 1B, in which the AS isoform appears as a mere faint shadow under p53, thus making its "disappearance" in trp53ΔAS/ΔAS difficult to evaluate. 

      We agree with the referee that efficient antibodies against p53-AS isoforms would have been useful. In fact, we tried a non-commercial antibody developed for that purpose, but it led to many unspecific bands in western blots and appeared not reliable. Importantly however, our luciferase assays clearly show that both p53-a and p53-AS can transactivate Ackr4, a result that might be expected because these isoforms share the same DNA binding domain. Furthermore, because p53-a isoforms appear more abundant than p53-AS isoforms at the protein and RNA levels (Figs. 1B and S1A), and because the loss of p53-AS isoforms leads to a significant decrease in p53-a protein levels (Figs. 1B and 2D), we think that in p53ΔAS/ΔAS cells the reduction in p53-a levels might be the main reason for a decreased transactivation of Ackr4. This is now more clearly discussed in the revised manuscript.

      (2) A most interesting observation is in Fig3 A and Fig S3, showing that spleen cells of p53ΔAS Eμ-Myc males (but not females) were enriched in pre-B and immature B cells as compared to WT counterparts. This observation points to a possible defect in B cell maturation process. It would be most interesting to determine whether this particular defect is directly mediated by a p53AS-Ackr4 axis. The hypothesis raised by the authors in the Discussion section is that increased Ackr4 expression may delay lymphomatogenesis, but data in Fig 3A and 3S actually suggest that ΔAS increases the pool of immature B-cell that may be prone to lymphomagenesis. 

      We thank the reviewer for this useful comment, which we integrated in the Discussion of the revised manuscript. Ackr4 was shown to regulate B cell differentiation (Kara at al. (2018) J Exp Med 215, 801–813), so this is indeed one of the possible mechanisms by which a deregulation of the p53-Ackr4 axis might promote lymphomagenesis. We now mention: “Ackr4 regulates B cell differentiation (Kara et al., 2018), which raises the possibility that an altered p53-Ackr4 pathway in p53ΔAS/ΔAS Eμ-Myc male splenic cells might contribute to increase the pools of pre-B and immature B cells that may be prone to lymphomagenesis.” This is presented as one of three possible mechanisms by which decreased Ackr4 levels may promote tumorigenesis, the two others being the impact of Ackr4 on the chemokine-guided migration of lymphoma cells and its apparent effect on Myc signalling.

      (3) The concordance with a male-specific prognostic effect of Ackr4 is most interesting in itself but is only of correlative evidence with respect to the study. Is there any information on whether p53AS expression is also a prognostic factor in BL? And is there evidence that Ackr4 may also be a male-specific prognostic factor in other B-cell malignancies, e.g. Multiple Myeloma?

      We have now performed the CRISPR-mediated knock-out of ACKR4 in Burkitt lymphoma cells and found that it leads to a dramatic increase in chemokine-guided cell migration, which goes beyond correlation. This significant new result is mentioned in the revised abstract and presented in detail in Figures 4F and S5-S7.

      Regarding p53-AS isoforms, they are murine-specific isoforms (Marcel et al. (2011) Cell Death Diff 18, 1815-1824), so there is no information on p53-AS expression in Burkitt lymphoma. Human p53 isoforms with alternative C-terminal domains are p53b and p53g isoforms, but the datasets we analyzed did not provide any information on the relative levels of p53a (the canonical isoform), p53b or p53g isoforms. We agree with the referee that this is an interesting question, but that cannot be answered with currently available datasets.

      Regarding the different types of B-cell malignancies, we had already shown that Ackr4 is a male-specific prognostic factor in Burkitt lymphomas but not in Diffuse Large B cell lymphomas, which indicated that it is not a prognostic factor in all types of B cell lymphomas. For this revision, we also searched for its potential prognostic value in multiple myeloma, and found that, as for DLBCL, it is not a prognostic factor in this cancer type. This new analysis is presented in Figure S4C.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review): 

      Summary: This article explores the role of Ecdysone in regulating female sexual receptivity in Drosophila. The researchers found that PTTH, throughout its role as a positive regulator of ecdysone production, negatively affects the receptivity of adult virgin females. Indeed, loss of larval PTTH before metamorphosis significantly increases female receptivity right after adult eclosion and also later. However, during metamorphic neurodevelopment, Ecdysone, primarily through its receptor EcR-A, is required to properly develop the P1 neurons since its silencing led to morphological changes associated with a reduction in adult female receptivity. Nonetheless, the result shown in this manuscript sheds light on how Ecdysone plays a dual role in female adult receptivity, inhibiting it during larval development and enhancing it during metamorphic development. Unfortunately, this dual and opposite effect in two temporally different developmental stages has not been highlighted or explained. 

      Strengths: This paper exhibits multiple strengths in its approach, employing a well-structured experimental methodology that combines genetic manipulations, behavioral assays, and molecular analysis to explore the impact of Ecdysone on regulating virgin female receptivity in Drosophila. The study provides clear and substantial findings, highlighting that removing PTTH, a positive Ecdysone regulator, increases virgin female receptivity. Additionally, the research expands into the temporal necessity of PTTH and Ecdysone function during development. 

      Weaknesses: 

      There are two important caveats with the data that are reflecting a weakness: 

      (1) Contradictory Effects of Ecdysone and PTTH: One notable weakness in the data is the contrasting effects observed between Ecdysone and its positive regulator PTTH. PTTH loss of function increases female receptivity, while ecdysone loss of function reduces it. Given that PTTH positively regulates Ecdysone, one would expect that the loss of function of both would result in a similar phenotype or at least a consistent directional change. 

      A1. As newly formed prepupae, the ptth-Gal4>UAS-Grim flies display similar changes in gene expression to the genetic control flies to response to a high-titer ecdysone pulse. These include the repression of EcR (McBrayer et al.,2007). We tested whether there is a similar feedforward relationship between PTTH and EcR-A. We quantified the EcR-A mRNA level of PTTH -/- and PTTH -/+ in the whole body of newly formed prepupae. Indeed, PTTH -/- induced increased EcR-A expression in the whole body of newly formed prepupae compared with PTTH -/+ flies. Because of the function of EcR-A in gene expression, this suggests that PTTH -/- disturbs the regulation of a serious of gene expressions during metamorphosis. However, it is not sure that the EcR-A expression in pC1 neurons is increased compared with genetic controls when PTTH is deleted. Furthermore, PTTH -/- must affect development of other neurons rather than only pC1 neurons. So, the feedforward relationship between PTTH and EcRA at the start of prepupal stage is one possible cause for the contradictory effects of PTTH -/- and EcR-A RNAi in pC1 neurons.  

      (2) Discordant Temporal Requirements for Ecdysone and PTTH: Another weakness lies in the different temporal requirements for Ecdysone and PTTH. The data from the manuscript suggest that PTTH is necessary during the larval stage, as shown in Figure 2 E-G, while Ecdysone is required during the pupal stage, as indicated in Figure 5 I-K. Ecdysone is a crucial developmental hormone with precisely regulated expression throughout development, exhibiting several peaks during both larval and pupal stages. PTTH is known to regulate Ecdysone during the larval stage, specifically by stimulating the kinetics of Ecdysone peaking at the wandering stage. However, it remains unclear whether pupal PTTH, expressed at higher levels during metamorphosis, can stimulate Ecdysone production during the pupal stage. Additionally, given the transient nature of the Ecdysone peak produced at wandering time, which disappears shortly before the end of the prepupal stage, it is challenging to infer that larval PTTH will regulate Ecdysone production during the pupal stage based on the current state of knowledge in the neuroendocrine field.  

      Considering these two caveats, the results suggest that the authors are witnessing distinct temporal and directional effects of Ecdysone on virgin female receptivity.  

      A2. First of all, it is necessary to clarify the detailed time for the manipulation of Ptth gene and PTTH neurons. In Figure 3, activation of PTTH neurons during the stage 2 inhibited the female receptivity. The “stage 2” is from six hours before the 3rd-instar larvae to the end of the wandering larvae (the start of prepupae). In Figure 5, The “pupal stage” is from the prepupal stage to the end of pupal stage. This “pupal stage” includes the forming of prepupae when the ecdysone peak is not disappeared. The time of manipulating Ptth and EcR-A in pC1 neurons are continuous. In addition, the pC1-Gal4 expressing neurons appear also at the start of prepupal stage. So, it is possible that PTTH regulates female receptivity through the function of EcR-A in pC1 neurons. 

      Reviewer #1 (Recommendations For The Authors): 

      In light of the significant caveat previously discussed, I will just make a few general suggestions: 

      (1) The paper primarily focuses on robust phenotypes, particularly in PTTH mutants, with a well-detailed execution of several experiments, resulting in thorough and robust outcomes. However, due to the caveat previously presented (opposite effect in larva and pupa), consider splitting the paper into two parts: Figures 1 to 4 deal with the negative effect of PTTH-Ecdysone on early virgin female receptivity, while Figures 5 to 7 focus on the positive metamorphic effect of Ecdysone in P1 metamorphic neurodevelopment. However, in this scenario, the mechanism by which PTTH loss of function increases female receptivity should be addressed.

      A3. It is a good suggestion that splitting the paper into two parts associated with the PTTH function and EcR function in pC1 neurons separately, if it is impossible that PTTH functions in female receptivity through the function of EcR-A in pC1 neurons. However, because of the feedforward relationship between PTTH and EcR-A in the newly formed prepupae, and the time of manipulating Ptth and EcR-A in pC1 neurons is continuous, it is possible that these two functions are not independent of each other. So, we still keep the initial edition.

      (2) Validate the PTTH mutants by examining homozygous mutant phenotypes and the dose-dependent heterozygous mutant phenotype using existing PTTH mutants. This could also be achieved using RNAi techniques.

      A4. We did not get other existing PTTH mutants. We instead decreased the PTTH expression in PTTH neurons and dsx+ neurons, but did not detect the similar phenotype to that of PTTH -/-. Similarly, the overexpression through PTTH-Gal4>UAS-PTTH is also not sufficient to change female receptivity. It is possible that both decreasing and increasing PTTH expression are not sufficient to change female receptivity.

      (3) Clarify if elav-Gal4 is not expressed in PTTH neurons and discuss how the rescue mechanisms work (hormonal, paracrine, etc.) in the text.

      A5. We tested the overlap of elav-Gal4>GFP signal and the stained PTTH with PTTH antibody. We did not detect the overlap. It suggests that elav-Gal4 is not expressed in PTTH neurons. However, we detected the expression of PTTH (PTTH antibody) in CNS when overexpressed PTTH using elav-Gal4>UASPTTH based on PTTH -/-. Furthermore, this rescued the phenotype of PTTH -/- in female receptivity. Insect PTTH isoforms have similar probable signal peptide for secreting. Indeed, except for the projection of axons to PG gland, PTTH also carries endocrine function acting on its receptor Torso in light sensors to regulate light avoidance of larvae. The overexpressed PTTH in other neurons through elav-Gal4>UASPTTH may act on the PG gland through endocrine function and then induce the ecdysone synthesis and release. So that, although elav-Gal4 is not expressed in PTTH neurons, the ecdysone synthesis triggered by PTTH from the hemolymph may result in the rescued PTTH -/- phenotype in female receptivity.

      (4) Consider renaming the new PTTH mutant to avoid confusion with the existing PTTHDelta allele. 

      A6. We have renamed our new PTTH mutant as PtthDelete.

      (5) Include the age of virgin females in each figure legend, especially for Figures 2 to 7, to aid in interpretation. This is essential information since wild-type early virgins -day 1- show no receptivity. In contrast, they reach a typical 80% receptivity later, and the mechanism regulating the first face might differ from the one occurring later.

      A7. We have included the age of virgin females in each figure legend. 

      (6) Explain the relevance of observing that PTTH adult neurons are dsx-positive, as it's unclear why this observation is significant, considering that these neurons are not responsible for the observed receptivity effect in virgin females. Alternatively, address this in the context of the third instar larva or clarify its relevance.  

      A8. We decreased the DsxF expression in PTTH neurons and did not detect significantly changed female receptivity. Almost all neurons regulating female receptivity, including pC1 neurons, express DsxF. We suppose that PTTH neurons have some relationship with other DsxF-positive neurons which regulate female receptivity. Indeed, we detected the overlap of dsx-LexA>LexAop-RFP and torso-Gal4>UAS-GFP during larval stage. Furthermore, decreasing Torso expression in pC1 neurons significantly inhibit female receptivity. 

      These results suggest that, PTTH regulates female receptivity not only through ecdysone, but also may through regulating other neurons especially DsxF-positive neurons associated with female receptivity directly. 

      Reviewer #2 (Public Review): 

      Summary: The authors tried to identify novel adult functions of the classical Drosophila juvenile-adult transition axis (i.e. ptth-ecdysone). Surprisingly, larval ptth-expressing neurons expressed the sex-specific doublesex gene, thus belonging to the sexual dimorphic circuit. Lack of ptth during late larval development caused enhanced female sexual receptivity, an effect rescued by supplying ecdysone in the food. Among many other cellular players, pC1 neurons control receptivity by encoding the mating status of females. Interestingly, during metamorphosis, a subtype of pC1 neurons required Ecdysone Receptor A in order to regulate such female receptivity. A transcriptomic analysis using pC1-specific Ecdyone signaling down-regulation gives some hints of possible downstream mechanisms. 

      Strengths: the manuscript showed solid genetic evidence that lack of ptth during development caused enhanced copulation rate in female flies, which includes ptth mutant rescue experiments by overexpressing ptth as well as by adding ecdysone-supplemented food. They also present elegant data dissecting the temporal requirements of ptth-expressing neurons by shifting animals from non-permissive to permissive temperatures, in order to inactivate neuronal function (although not exclusively ptth function). By combining different drivers together with a EcR-A RNAi line authors also identified the Ecdysone receptor requirements of a particular subtype of pC1 neurons during metamorphosis. Convincing live calcium imaging showed no apparent effect of EcR-A in neural activity, although some effect on morphology is uncovered. Finally, bulk RNAseq shows differential gene expression after EcR-A down-regulation. 

      Weaknesses: the paper has three main weaknesses. The first one refers to temporal requirements of ptth and ecdysone signaling. Whereas ptth is necessary during larval development, the ecdysone effect appears during pupal development. ptth induces ecdysone synthesis during larval development but there is no published evidence about a similar role for ptth during pupal stages. Furthermore, larval and pupal ecdysone functions are different (triggering metamorphosis vs tissue remodeling). The second caveat is the fact that ptth and ecdysone loss-of-function experiments render opposite effects (enhancing and decreasing copulation rates, respectively). The most plausible explanation is that both functions are independent of each other, also suggested by differential temporal requirements. Finally, in order to identify the effect in the transcriptional response of down-regulating EcR-A in a very small population of neurons, a scRNAseq study should have been performed instead of bulk RNAseq. 

      In summary, despite the authors providing convincing evidence that ptth and ecdysone signaling pathways are involved in female receptivity, the main claim that ptth regulates this process through ecdysone is not supported by results. More likely, they'd rather be independent processes. 

      B1. Clarification: in Figure 3, activation of PTTH neurons during the stage 2 inhibited the female receptivity. The “stage 2” is from six hours before the 3rd-instar larvae to the end of the wandering larvae (the start of prepupae). In Figure 5, The “pupal stage” is from the start of prepupal stage to the end of pupal stage. This “pupal stage” includes the forming of prepupae when the ecdysone peak is not disappeared. The time of manipulating Ptth and EcR-A in pC1 neurons are continuous. In addition, the pC1-Gal4 expressing neurons appear also at the start of prepupal stage. So, it is possible that PTTH regulates female receptivity through the function of EcR-A in pC1 neurons. 

      B2. During the forming of prepupae, the ptth-Gal4>UAS-Grim flies display similar changes in gene expression to the genetic control flies to response to a high-titer ecdysone pulse. These include the repression of EcR (McBrayer et al.,2007). We tested whether there is a similar feedforward relationship between PTTH and EcR-A. We quantified the EcR-A mRNA level of PTTH -/- and PTTH -/+ in the whole body of newly formed prepupae. Indeed, PTTH -/- induced increased EcR-A compared with PTTH -/+ flies. Because of the function of EcR-A in gene expression, this suggests that PTTH -/- disturbs the regulation of a serious of gene expressions during metamorphosis. However, it is not sure that the EcR-A expression in pC1 neurons is increased compared with genetic controls when PTTH is deleted. Furthermore, PTTH -/- must affect the development of other neurons rather than only pC1 neurons. So, the feedforward relationship between PTTH and EcR-A at the start of prepupal stage is one possible cause for the contradictory effects of PTTH -/- and EcR-A RNAi in pC1 neurons.

      B3. We will do single cell sequencing in pC1 neurons for the exploration of detailed molecular mechanism of female receptivity in the future.

      Reviewer #2 (Recommendations For The Authors): 

      Additional experiments and suggestions: 

      - torso LOF in the PG to determine whether or not the ecdysone peak regulated by ptth (there is a 1-day delay in pupation) is responsible for the ptth effect in L3. In the same line, what happens if torso is downregulated in the pC1 neurons? Is there any effect on copulation rates? 

      B4. Because the loss of phm-Gal4, we could not test female receptivity when decreasing the expression of Torso in PG gland. However, decreasing Torso expression in pC1 neurons significantly inhibit female receptivity. This suggests that PTTH regulates female receptivity not only through ecdysone but also through regulating dsx+ pC1 neurons in female receptivity directly.

      - What is the effect of down-regulating ptth in the dsx+ neurons? No ptth RNAi experiments are shown in the paper. 

      B5. We decreased PTTH expression in dsx+ neurons but did not detect the change in female receptivity.  We also decreased PTTH expression in PTTH neurons using PTTH-Gal4, also did not detect the change in female receptivity. Similarly, the overexpression through PTTH-Gal4>UAS-PTTH is also not sufficient to change female receptivity. It is possible that both decreasing and increasing PTTH expression are not sufficient to change female receptivity.

      - Why are most copulation rate experiments performed between 4-6 days after eclosion? ptth LOF effect only lasts until day 3 after eclosion (but very weak-fig 1). Again, this supports the idea that ptth and ecdysone effects are unrelated.

      B6. Most behavioral experiments were performed between 4-6 days after eclosion as most other studies in flies, because the female receptivity reaches the peak at that time. Ptth LOF made female receptivity enhanced from the first day after eclosion. This seems like the precocious puberty. Wild type females reach high receptivity at 2 days after eclosion (about 75% within 10 min). We suppose that Ptth LOF effect only lasts until day 3 after eclosion because too high level of receptivity of control flies to exceed.

      It is not sure whether the effect of PTTH-/- in female receptivity disappears after the 3rd day of adult flies. So that it is not sure whether PTTH and EcR-A effects in pC1 neurons are unrelated.

      - The fact that pC1d neuronal morphology changes (and not pC1b) does not explain the effect of EcR-A LOF. Despite it is highlighted in the discussion, data do not support the hypothesis. How do these pC1 neurons look like in a ptth mutant animal regarding Calcium imaging and/or morphology? 

      B7. We detected the pattern of pC1 neurons when PTTH is deleted. Consistent with the feedforward relationship between PTTH and expression of EcR-A in newly formed prepupae, PTTH deletion induced less established pC1-d neurons contrary to that induced by EcR-A reduction in pC1 neurons. However, it is not sure that the expression of EcR-A in pC1 neurons is increased when PTTH is deleted. Furthermore, on the one hand, manipulation of PTTH has general effect on the neurodevelopment not only regulating pC1 neurons. On the other hand, the detailed pattern of pC1-b neurons which is the key subtype regulating female receptivity when EcR-A is decreased in pC1 neurons or PTTH is deleted could not be seen clearly. So, the abnormal development of pC1-b neurons, if this is true, is just one of the possible reasons for the effect of PTTH deletion on female receptivity.

      - The discussion is incomplete, especially the link between ptth and ecdysone; discuss why the phenotype is the opposite (ptth as a negative regulator of ecdysone in the pupa, for instance); the difference in size due to ptth LOF might be related to differential copulation rates.  

      B8. We have revised the discussion. We could not exclude the effect of size of body on female receptivity when PTTH was deleted or PTTH neurons were manipulated, although there was not enough evidence for the effect of body size on female receptivity.

      - scheme of pC neurons may help. 

      B9. We have tried to label pC1 neurons with GFP and sort pC1 neurons through flow cytometry sorting, but could not success. This may because the number of pC1 neurons is too low in one brain. We will try single-cell sequencing in the future. 

      - Immunofluorescence images are too small.

      B10. We have resized the small images.

      Reviewer #3 (Public Review): 

      Summary: 

      This manuscript shows that mutations that disable the gene encoding the PTTH gene cause an increase in female receptivity (they mate more quickly), a phenotype that can be reversed by feeding these mutants the molting hormone, 20-hydoxyecdysone (20E). The use of an inducible system reveals that inhibition or activation of PTTH neurons during the larval stages increases and decreases female receptivity, respectively, suggesting that PTTH is required during the larval stages to affect the receptivity of the (adult) female fly. Showing that these neurons express the sex-determining gene dsx leads the authors to show that interfering with 20E actions in pC1 neurons, which are dsx-positive neurons known to regulate female receptivity, reduces female receptivity and increases the arborization pattern of pC1 neurons. The work concludes by showing that targeted knockdown of EcRA in pC1 neurons causes 527 genes to be differentially expressed in the brains of female flies, of which 123 passed a false discovery rate cutoff of 0.01; interestingly, the gene showing the greatest down-regulation was the gene encoding dopamine beta-monooxygenase. 

      Strengths 

      This is an interesting piece of work, which may shed light on the basis for the observation noted previously that flies lacking PTTH neurons show reproductive defects ("... females show reduced fecundity"; McBrayer, 2007; DOI 10.1016/j.devcel.2007.11.003). 

      Weaknesses: 

      There are some results whose interpretation seem ambiguous and findings whose causal relationship is implied but not demonstrated. 

      (1) At some level, the findings reported here are not at all surprising. Since 20E regulates the profound changes that occur in the central nervous system (CNS) during metamorphosis, it is not surprising that PTTH would play a role in this process. Although animals lacking PTTH (rather paradoxically) live to adulthood, they do show greatly extended larval instars and a corresponding great delay in the 20E rise that signals the start of metamorphosis. For this reason, concluding that PTTH plays a SPECIFIC role in regulating female receptivity seems a little misleading, since the metamorphic remodeling of the entire CNS is likely altered in PTTH mutants. Since these mutants produce overall normal (albeit larger--due to their prolonged larval stages) adults, these alterations are likely to be subtle. Courtship has been reported as one defect expressed by animals lacking PTTH neurons, but this behavior may stand out because reduced fertility and increased male-male courtship (McBrayer, 2007) would be noticeable defects to researchers handling these flies. By contrast, detecting defects in other behaviors (e.g., optomotor responses, learning and memory, sleep, etc) would require closer examination. For this reason, I would ask the authors to temper their statement that PTTH is SPECIFICALLY involved in regulating female receptivity.  

      C1. We agree with that, it is not surprising that PTTH regulates the profound changes that occur in the CNS during metamorphosis through ecdysone. Also, the behavioral changes induced by PTTH mutants include not only female receptivity. We will temper the statement about the function of PTTH on female receptivity.

      We think there are two new points in our text although more evidences are needed in the future. On the one hand, PTTH deletion and the reduction of EcR-A in pC1 neurons during metamorphosis have opposite effects on female receptivity. On the other hand, development of pC1-b neurons regulated by EcR-A during metamorphosis is important for female receptivity.

      (2) The link between PTTH and the role of pC1 neurons in regulating female receptivity is not clear. Again, since 20E controls the metamorphic changes that occur in the CNS, it is not surprising that 20E would regulate the arborization of pC1 neurons. And since these neurons have been implicated in female receptivity, it would therefore be expected that altering 20E signaling in pC1 neurons would affect this phenotype. However, this does not mean that the defects in female receptivity expressed by PTTH mutants are due to defects in pC1 arborization. For this, the authors would at least have to show that PTTH mutants show the changes in pC1 arborization shown in Fig. 6. And even then the most that could be said is that the changes observed in these neurons "may contribute" to the observed behavioral changes. Indeed, the changes observed in female receptivity may be caused by PTTH/20E actions on different neurons.

      C2. As newly formed prepupae, the ptth-Gal4>UAS-Grim flies display similar changes in gene expression to the genetic control flies to response to a high-titer ecdysone pulse. These include the repression of EcR (McBrayer et al., 2007). We tested whether there is a similar feedforward relationship between PTTH and EcR-A. We quantified the EcR-A mRNA level of PTTH -/- and PTTH -/+ in the whole body of newly formed prepupae. Indeed, PTTH -/- induced upregulated EcR-A in the whole body of newly formed prepupae compared with PTTH -/+ flies. We also detected the pattern of pC1 neurons when PTTH is deleted. Consistent with the feedforward relationship between PTTH and expression of EcR-A in newly formed prepupae, PTTH deletion induced less established pC1-d neurons contrary to that induced by EcR-A reduction in pC1 neurons. 

      However, it is not sure that the expression of EcR-A in pC1 neurons increases compared with genetic controls when PTTH is deleted. Furthermore, on the one hand, manipulation of PTTH has general effect on the neurodevelopment. On the other hand, the detailed pattern of pC1-b neurons which is the key subtype regulating female receptivity through EcR-A function in pC1 neurons could not be seen clearly. So, the abnormal development of pC1b neurons, if this is true, is just one of the possible reasons for the effect of PTTH deletion on female receptivity.

      (3) Some of the results need commenting on, or refining, or revising:  a- For some assays PTTH behaves sometimes like a recessive gene and at other times like a semidominant, and yet at others like a dominant gene. For instance, in Fig. 1D-G, PTTH[-]/+ flies behave like wildtype (D), express an intermediate phenotype (E-F), or behave like the mutant (G). This may all be correct but merits some comment.

      C3. Female receptivity increases with the increase of age after eclosion, not only for wild type flies but also PTTH mutants. At the first day after eclosion (Figure 1D), maybe the loss of PTTH in PTTH[-]/+ flies is not enough for sexual precocity as in PTTH -/-. At the second day after eclosion and after (Figure 1E-G), the loss of PTTH in PTTH[-]/+ flies is sufficient to enhance female receptivity compared with wild type flies. However, After the 2nd day of adult, female receptivity of all genotype flies increases sharply. At the 3rd day of adult and after, female receptivity of PTTH -/- reaches the peak and the receptivity of PTTH[-]/+ reaches more nearly to PTTH -/- when flies get older.  

      b - Some of the conclusions are overstated. i) Although Fig. 2E-G does show that silencing the PTTH neurons during the larval stages affects copulation rate (E) the strength of the conclusion is tempered by the behavior of one of the controls (tub-Gal80[ts]/+, UAS-Kir2.1/+) in panels F and G, where it behaves essentially the same as the experimental group (and quite differently from the PTTH-Gal4/+ control; blue line).(Incidentally, the corresponding copulation latency should also be shown for these data.). ii) For Fig. 5I-K, the conclusion stated is that "Knock-down of EcR-A during pupal stage significantly decreased the copulation rate." Although strictly correct, the problem is that panel J is the only one for which the behavior of the control lacking the RNAi is not the same as that of the experimental group. Thus, it could just be that when the experiment was done at the pupal stage is the only situation when the controls were both different from the experimental. Again, the results shown in J are strictly speaking correct but the statement is too definitive given the behavior of one of the controls in panels I and K. Note also that panel F shows that the UAS-RNAi control causes a massive decrease in female fertility, yet no mention is made of this fact.

      C4. i) For all figures in the text, only when all the control groups were significant different from assay group, we say the assay group is significantly different. In Figure 2E-G, the control groups were both different from the assay group only at the larval stage. The difference between two control groups may due to the genetic background. We have described more detailed statistical analysis in the legend. In addition, the corresponding copulation latency has been shown. ii) For Figure 5, we have revised the conclusion in text as “when the experiment was done at the pupal stage is the only situation when the controls were both different from the experimental.” Besides, the UAS-RNAi control causes a massive decrease in female fertility in panel F has been mentioned.

      Reviewer #3 (Recommendations For The Authors): 

      (1) I am not sure that PTTH neurons should be referred to as "PG neurons". I am aware that this name has been used before but the PG is a gland that does not have neurons; it is not even innervated in all insects. 

      C5. Agree. “PG neurons” has been changed into “PTTH neurons”.

      (2) Fig. 1A warrants some explanation. One can easily imagine what it shows but a description is warranted. 

      C6. Explanation has been added.

      (3) When more than one genotype is compared it would be more useful to use letters to mark the genotypes that are not statistically different from each other rather than simply using asterisks. For instance, in the case of copulation latencies shown in Fig. 1E-G, which result does the comparison refer to? For example, since the comparisons are the result of ANOVAs, which comparison receives "*" in Fig. 1F? Is it PTTH[-]/+ vs PTTH[-]/PTTH[-] or vs. +/+? 

      C7. Referred genotypes and conditions were marked in all figure legends.

      (4) Fig. 1H: Why is copulation latency of PTTH[-]/PTTH[-]+elav-GAL4 significantly different from that of PTTH[-]/PTTH[-]? This merits a comment. Also, why was elav-GAL4 used to effect the rescue and not the PTTH-GAL4 driver? 

      C8. We could not explain this phenomenon. This may due to the different genetic backgrounds between controls. We have mentioned this in figure legend.

      (5) Fig. 2C, the genotype is written in a confusing order, GAL4+UAS should go together as should LexA+LexAop. 

      C9. We have revised for avoiding confusion.

      (6) In Fig. 2, is "larval stage" the same period that is shown in Fig. 3A? Please clarify.

      C10. We have clarified this in text and legends.

      (7) Fig. 6. The fact that pC1 neurons can be labeled using the pC1-ss2-Gal4 at the start of the pupal stage does not mean that this is when these neurons appear (are born), only when they start expressing this GAL4. Other types of evidence would be needed to make a statement about the birthdate of these neurons. 

      C11. We have revised the description for the appearance of pC1-ss2-Gal4>GFP. The detailed birth time of pC1 neurons will be tested in future.

      (8) The results shown in Fig. 7 are not pursued further and thus appear like a prelude to the next manuscript. Unless the authors have more to add regarding the role of one of the differentially expressed genes (e.g., dopamine beta-monooxygenase, which they single out) I would suggest leaving this result out. 

      C12. We have leave this out.

      (9) Female flies lacking PTTH neurons were reported to show lower fecundity by McBrayer et al. (2007) and should be cited. 

      C13. This important study has been cited in the first manuscript. In this revision, we have cited it again when mentioning the lower fecundity of female flies lacking PTTH neurons.

      (10) Line 230: when were PTTH neurons activated? Since they are dead by 10h post-eclosion it isn't clear if this experiment even makes sense. 

      C14. Yes, we did this for making sure that PTTH neurons do not affect female receptivity at adult stage again.

      (11) Line 338: the statements in the figures say that PTTH function is required during the larval stages, not during metamorphosis 

      C15. This has been revised as “The result suggested that EcR-A in pC1 neurons plays a role in virgin female receptivity during metamorphosis. This is consistent with that PTTH regulates virgin female receptivity before the start of metamorphosis.”

      (12) Did the authors notice any abnormal behavior in males? McBrayer et al. (2007) mention that males lacking PTTH neurons show male-male courtship. This may remit to the impact of 20E on other dsx[+] neurons. 

      C16. Yes, we have noticed that males lacking PTTH show male-male courtship. It is possible that PTTH deletion induces male-male courtship through the impact of 20E on other dsx+ or fru+ neurons. We have added the corresponding discussion.

      (13) Line 145: please define CCT at first use 

      C17. CCT has been defined.

      (14) Overall the manuscript is well written; however, it would still benefit from editing by a native English speaker. I have marked a few corrections that are needed, but I probably missed some. 

      + Line 77: "If female is not willing..." should say "If THE female is not willing..." 

      + Line 78 "...she may kick the legs, flick the wings," should say "...she may kick HER legs, flick HER wings," 

      + Lines 93-94 this sentence is unclear: "...while the neurons in that fru P1 promoter or dsx is expressed regulate some aspects..." 

      + Line 108 "...similar as the function of hypothalamic-pituitary-gonadal (HPG).." should say "...similar

      TO the function of hypothalamic-pituitary-gonadal (HPG).." 

      + Line 152 "Due to that 20E functions through its receptor EcR.." should say ""BECAUSE 20E ACTS through its receptor EcR.." 

      + Lines 155, 354 "unnormal" is not commonly used (although it is an English word); "abnormal" is usually used instead. 

      + Line 273: "....we then asked that whether ecdysone regulates" delete "that"  + Sentences lines 306-309 need to be revised.

      C18. Thank you for your suggestions. We have revised as you advise.

    1. Explain how the procedure benefits the students to build buy-in Model good and bad execution Practice, practice, practice

      I totally agree with these three things because I think that the better you can get students to buy-in to what we are doing the better student outcomes will be. Modelling and explaining not just good but also bad execution is helpful because some students may not even realize what they are doing wrong. As for "practice, practice, practice" I believe that practice makes perfect.

    1. Author Response:

      We would like to thank the editors and reviewers for the careful consideration of our manuscript and their many helpful comments. We would like to provide provisional author responses to address the public reviews.

      Response to Reviewer 1:

      Weaknesses:

      While this study convincingly describes the phenotype seen upon Drp1 loss, my major concern is that the mechanism underlying these defects in zygotes remains unclear. The authors refer to mitochondrial fragmentation as the mechanism ensuring organelle positioning and partitioning into functional daughters during the first embryonic cleavage. However, could Drp1 have a role beyond mitochondrial fission in zygotes? I raise these concerns because, as opposed to other Drp1 KO models (including those in oocytes) which lead to hyperfused/tubular mitochondria, Drp1 loss in zygotes appears to generate enlarged yet not tubular mitochondria. Lastly, while the authors discard the role of mitochondrial transport in the clustering observed, more refined experiments should be performed to reach that conclusion.

      It would be difficult to answer from this study whether Drp1 has a role beyond mitochondrial fission in zygotes. However, there are several possible reasons why the Drp1 KO zygotes differs from the somatic cell Drp1 KO models.  

      First, the reviewer mentions that the loss of Drp1 in oocytes leads to hyperfused/tubular mitochondria, but in fact, unlike in somatic cells, the EM images in Drp1 KO oocytes show enlarged mitochondria rather than tubular structures  (Udagawa et al. Current Biology 2014, Fig. 2C and Fig. S1B-D), as in the case of zygotes in this study. 

      These mitochondrial morphologies in Drp1-deficient oocytes/zygotes may be attributed to the unique mitochondrial architecture in these cells. Mitochondria in oocytes have the shape of a small sphere with an irregular cristae located peripherally or transversely. These structural features might be the cause of insensitivity or resistance to inner membrane fusion. In addition, in our previous study (Wakai et al., Molecular Human Reproduction 2014, Fig. 2), overexpression of mitochondrial fusion factors in oocytes resulted in mitochondrial aggregation when outer membrane fusion factor Mfn1/Mfn2 was overexpressed, while overexpression of Opa1 did not cause any morphological changes. Thus, while mitochondria in oocytes/zygotes divide actively, complete fusion, including the inner membrane, as seen in somatic cells, is unlikely to occur.

      As for mitochondrial transport, we do not entirely discard its role. Althogh mitochondrial intrinsic dynamics such as fission are of primary importance for the mitochondrial distribution and partitioning in embryos, the regulation of dynamics by the cytoskeletons may be important and thus needs further study, as the reviewer pointed out.

      Response to Reviewer 2:

      Weaknesses:

      The authors first describe the redistribution of mitochondria during normal development, followed by alterations induced by Drp1 depletion. It would be useful to indicate the time post-hCG for imaging of fertilised zygotes (first paragraph of the results/Figure 1) to compare with subsequent Drp1 depletion experiments.

      We will indicate the time after hCG as the reviewer pointed out. The only problem is that in this experiment, there may be a slight deviation from the actual mitochondrial distribution change (Fig. S1A) due to the manipulation time for Trim-Away (since it was performed outside of the incubator). Also, no significant delay in pronuclear formation or embryonic development was observed with Drp1 depleted zygotes.

      It is noted that Drp1 protein levels were undetectable 5h post-injection, suggesting earlier times were not examined, yet in Figure 3A it would seem that aggregation has occurred within 2 hours (relative to Figure 1).

      As the reviewer pointed out, the depletion of Drp1 is likely to have occurred at an earlier stage. In this study, due to the injection of various RNAs to visualize organelles such as mitochondria and chromosomes, observations were started after about 5 hours of incubation for their fluorescent proteins to be sufficiently expressed. Therefore, for the western blotting analysis, samples were taken into account their condition at the start of the observation.

      Mitochondria appear to be slightly more aggregated in Drp1 fl/fl embryos than in control, though comparison with untreated controls does not appear to have been undertaken. There also appears to be some variability in mitochondrial aggregation patterns following Drp1 depletion (Figure 2-suppl 1 B) which are not discussed.

      We would like to add quantitative data on mitochondrial aggregation in Drp1-depleted embryos.

      The authors use western blotting to validate the depletion of Drp1, however do not quantify band intensity. It is also unclear whether pooled embryo samples were used for western blot analysis.

      We would like to add the quantitative results of the intensity of the bands for the Western blot analysis. The number of embryos analyzed is described in Fig legends, from 20 (Fig. 4) to 30 (Fig. 2) pooled samples were used.

      Likewise, intracellular ROS levels are examined however quantification is not provided. It is therefore unclear whether 'highly accumulated levels' are of significance or related to Drp1 depletion.

      We will present to indicate quantitative results on the accumulation of ROS.

      In previous work, Drp1 was found to have a role as a spindle assembly checkpoint (SAC) protein. It is therefore unclear from the experiments performed whether aggregation of mitochondria separating the pronuclei physically (or other aspects of mitochondrial function) prevents appropriate chromosome segregation or whether Drp1 is acting directly on the SAC.

      It has been reported that Drp1 regulates meiotic spindle through spindle assembly checkpoint (SAC) (Zhou et al., Nature Communications 2022). We would like to mention the possibility pointed out in the discussion part.

      Response to Reviewer 3:

      Seemingly, there are few apparent shortcomings. Following are the specific comments to activate the further open discussion.

      - Line 246: Comments on cristae morphology of mitochondria in Drp1-depleted embryos would better be added.

      We would like to add a comment regarding cristae morphology.

      - Regarding Figure 2H: If possible, a representative picture of Ateam would better be included in the figure. As the authors discussed in line 458, Ateam may be able to detect whether any alterations of local energy demand occurred in the Drp1-depleted embryos.

      ATeam fluorescence is analyzed using a regular fluorescence microscope, not a confocal laser microscope, in order to analyze the intensity in the whole embryo (or the whole blastomere). Therefore, we are currently unable to obtain images of localized areas within the cell (e.g., around the spindle) as expected by the reviewer; as shown in the images in Figure 3-figure supplement 1C, there is a tendency to see high ATP levels at the cell periphery, but further analysis is needed for clear and definitive results.

      - Line 282: In Figure 3-Video 1, mitochondria were seemingly more aggregated around female pronucleus. Is it OK to understand that there is no gender preference of pronuclei being encircled by more aggregated mitochondria?

      Aggregated mitochondria are localized toward the cell center, but do not behave in such a way that they are preferentially concentrated near the female pronucleus.

      - Line 317: A little more explanation of the "variability" would be fine. Does that basically mean that the Ca2+ response in both Drp1-depleted blastomeres were lower than control and blastomere with more highly aggregated mitochondria show severer phenotype compared to the other blastomere with fewer mito?

      We assume that what the reviewer have pointed out is right. However, although we were able to show the bias in Ca2+ store levels between blastomeres of Drp1 depleted embryos, we did not stain mitochondria simultaneously, so we were unable to say details such as more Ca2+ stores in blastomere that inherited more mitochondria or less Ca2+ stores in blastomere with more aggregated mitochondria

      - Regarding Figure 5B (& Figure 1-figure supplement 1B): Do authors think that there would be less abnormalities in the embryos if Drp1 is trim-awayed after 2-cell or 4-cell, in which mitochondria are less involved in the spindle?

      The marked accumulation of mitochondria around the spindle is unique to the first cleavage and seems to be coincident with the migration of the pronuclei toward the center. Since the process of assembly of the male and female pronuclei is also an event unique to the first cleavage, abnormalities such as binucleation due to mitochondrial misplacement are thought to be a phenomenon seen only in the first cleavage. Therefore, if Drp1 is depleted at the 2-cell or 4-cell stage, chromosome segregation errors may be less frequent. However, since unequal partitioning of mitochondria is thought to occur, some abnormalities in embryonic development is likely to be observed.

    1. Author response:

      The following is the authors’ response to the original reviews.

      Reviewer 1:

      Strengths

      We thank the reviewer for recognizing the strengths of our in vivo Ca2+ measurements, super resolution microscopy and assessment of the secretory dysfunction in the Sjogrens syndrome mouse model.

      Weaknesses

      Point 1: The less restricted Ca2+ signal to the apical region of the acinar cell is not really relevant to the reduced activation of TMEM16a by a local signal at the apical plasma membrane.

      We agree that the spatially averaged Ca2+ signal is not indicative of the local Ca2+ signal that activates TMEM16a. The description of the disordered Ca2+ signal in the disease model was intended to simply convey that the Ca2+ signal is altered in the model. Whether or indeed how the altered spatial characteristics of the signal are deleterious is not known but we speculate in the discussion that this contributes to the ultrastructural damage observed.

      Point 2. Secretion is decreased but the amplitude of the globally averaged Ca2+ signals are increased. No proof is offered that the greater distance between IP3R and TMEM16a is the reason for decreased secretion in the face of this increased peak signal.

      We have now added new data that indicates that the local Ca2+ signal is indeed disrupted in the disease model. We show that in control animals, activation of TMEM16a by application of agonist occurs when the pipette is buffered with the slower buffer EGTA but not with the fast buffer BAPTA In contrast, in cells isolated from DMXAA -treated animals both EGTA and BAPTA abolish the agonist-induced currents (new Figure 6). These data are consistent with our super resolution data showing the distance between IP3R and TMEM16a being greaterand thus presumably is enough to allow buffering of Ca2+ release from IP3R such that it does not effectively activate TMEM16a. These data also would suggest that the increased amplitude of the spatially averaged Ca2+ signal is not sufficient to overcome this structural change.

      Point 3. Lack of evidence that the mitochondrial changes are associated with the defect in fluid secretion.

      We agree that a causal link between the decreased secretion and altered mitochondrial morphology and function is not established. Nevertheless, we feel it is reasonable to contend that profound changes in mitochondrial morphology observed at the light and EM level, together with changes in mitochondrial membrane potential and oxygen consumption are consistent with contributing to altered fluid secretion given that this is an energetically costly process. We have altered the discussion to reflect these caveats and ideas.

      Reviewer 2:

      We thank the reviewer for their assessment of our work and constructive comments.

      Reviewer 3:

      We thank the reviewer for their careful appraisal of our manuscript and insightful comments. 

      Point 1: Are all the effects of DMXAA mediated through the STING pathway?

      This is an important point because as noted DMXAA has been reported to inhibit NAD(P)H quinone oxireductase that could contribute to the phenotype reported here. In future studies we intend to test other STING pathway agonists such as MSA-2 and perhaps antagonists of the STING pathway. We have added text to the discussion indicating that all the effects observed may not be a result of activation of the STING pathway.

      Point 2: As noted, and clarified in the text, the driving force for ATP production is the electrochemical H+ gradient which establishes the mitochondrial membrane potential.

      Point 3:  The reviewer suggested there was a decrease mitochondrial membrane potential in the absence of a change in TMRE steady state.

      We apologize for the confusion generated from the presentation of the figure. We normalized TMRE fluorescence against Mitotraker green fluorescence but as shown, the figure does not reflect that the absolute TMRE fluorescence was indeed decreased. Supplemental figure 4 now shows the basal TMRE fluorescence.

      Point 4: Indications that the disruption to ER structure seen in Electron Micrographs contributes to the changes in Ca2+ signal and fluid secretion.

      We did not focus on the relative distance between ER and apical PM in the EMs primarily because the ER that projects towards the apical PM is a relatively minor component of the specialized ER expressing IP3R and is difficult to identify. We note that the disruption of the bulk ER as quantitated by altered ER-mitochondrial interfaces and fragmentation is consistent with our super resolution data and thus likely plays a role in the mechanism that results in dysregulated Ca2+ signals and reduced secretion.

      Recommendations to Authors:

      Reviewing Editor:

      (1) The Editor suggests that we should use the activity of TMEM16a to directly measure the [Ca2+] experienced by the channel.

      We now present new additional data.  First, we show an extended range of pipette [Ca2+] demonstrating identical Ca2+ sensitivity in DMXAA vs vehicle treated cells (Figure 5). Second, importantly, we now present data evaluating the ability of muscarinic stimulation to activate TMEM16a in the presence of either EGTA (slow Ca2+ buffer) or BAPTA (fast Ca2+ buffer). Notably, currents can be stimulated in control cells when the pipette is buffered with EGTA, but not in DMXAA treated cells. BAPTA inhibits activation in both situations (new Figure 6). These data are consistent with TMEM16a being activated by Ca2+ in a microdomain and that this is disrupted in the disease model.   

      (2) The Editor asks whether a decrease in IP3R3 in a subset of the samples could account for the decreased fluid secretion.

      We think this is unlikely given, as noted by the Editor, that a reduction only occurred in a subset of the samples and statistically there was no significant difference to vehicle-treated animals. Moreover, we would note that there is also no difference in the expression of IP3R2 between experimental groups and in studies of transgenic mice where either IP3R2 or IP3R3 were knocked out individually, there was no effect on salivary fluid secretion, indicating that expression of a single subtype can support stimulus-secretion coupling.

      (3) Absolute values for changes in fluorescence (over time) should be included together with SD images.

      These have been added in Figure 3.

      (4) DMXAA has additional effects to STING activation and thus other STING pathway modulators should be used.

      We agree that additional STING agonists should be explored in the future but believe that this is beyond the scope of the present studies. Additional text has been added to the discussion acknowledging the additional targets of DMXAA and that they could contribute to the phenotype.

      (5) No causal link between the observed Ca2+ changes and mitochondrial dysfunction.

      We agree that no experimental evidence is offered to directly support this contention. Nevertheless, dysregulated Ca2+ signals are well-documented to lead to altered mitochondrial structure and function and thus we feel it not unreasonable to speculate that this is a possibility.

      (6) The paper would be improved by directly assessing mechanistic connections between altered Ca2+ signaling and TMEM16a activation.

      We agree, please refer to point 1 and new figure 6.

      Reviewer 1:

      (1) Standard Deviation images should be explained and the location of ROI identified.

      We contend that Standard Deviation images provide an effective visualization (in a single image) of both the magnitude of the Ca2+ increase and the degree of recruitment of cells in the field of view during the entire period of stimulation.  We have added text to describe the utility of this technique. Nevertheless, we now show kinetic traces of the changes in fluorescence over time in both apical and basal regions in Figure 3. We also clarify that the traces shown in Figure 2 are averaged over the entire cell. 

      (2) The Authors should consider that reduced secretion is because cells are dying.

      We believe this is unlikely given the lack of morphological changes in glandular structure and the minor lymphocyte infiltration observed in this model. Nevertheless, we now add data showing that the mass of SMG is not altered in the DMXAA-treated animals compared with vehicle-treated (Figure 1E).

      (3) The role of mitochondria in the DMXAA phenotype is unclear. What is the effect of acutely de-energizing mitochondria on fluid secretion.

      Since fluid secretion is an energetically expensive undertaking, it is not unreasonable to suggest that compromised mitochondrial function may impact secretion. That being said this could occur at multiple levels- production of ATP to fuel the Na/K pump to establish membrane gradients or to provide energy to sequester Ca2+ among a multitude of targets. This will be a subject of ongoing experiments. We contend that experiments to acutely disrupt salivary mitochondria in vivo while assessing fluid secretion would be difficult experiments to perform and interpret given that local administration of agents to SMG would not effect the other major salivary glands and systemic administration would be predicted to have wide-ranging off target effects. 

      (4) Could a subset of cells with low IP3R numbers contribute to reduced fluid secretion?

      Please see the response to Reviewing Editors point 2. 

      (5) An attempt to estimate the effect of the spatial distruption of IP3R and TMEM16a localization should be made.

      Please see the response to Reviewing Editors point 1.

      Minor Points

      We have amended the statement form “Highly expressed” to increased.

      Regions of the cell have been labelled for orientation in the line scans.

      The molecular weight markers have been added in Figure 4.

      Reviewer 2:

      (1) Whether mitochondrial dysfunction is the initiator of the phenotype or a result of the dysregulated Ca2+ signal is unclear.

      We agree that our data does not clarify a classic “Chicken vs Egg” conundrum. We plan further experiments to address this issue. Future plans include repeating the mitochondrial and Ca2+ signaling experiments at earlier time points where we know fluid secretion is not yet impacted. This may potentially reveal the temporal sequence of events. Similarly, we plan experiments to mechanistically address why the global Ca2+ signal is augmented- reduced Ca2+ clearance or enhanced Ca2+ release/influx are possibilities. We speculate that reduced Ca2+ clearance, either because mitochondrial Ca2+ uptake is reduced or as a secondary consequence of reduced ATP levels on SERCA and PMCA is a likely possibility.

      (2) Measurement of ECAR and direct measurements of ATP and Seahorse methods.

      In a separate series of experiments, we monitored ECAR. These data were unfortunately very variable and difficult to interpret, although no obvious compensatory increase was observed. We plan in the future to directly monitor ATP levels in acinar cells using Mg-Green. To normalize for cell numbers in the Seahorse experiments, following centrifugation, cell pellets of equal volume were resuspended in equal volumes of buffer. Acinar cells were seeded onto Cell Tak coated dishes. This information is added to the Methods section.

    1. Note: This response was posted by the corresponding author to Review Commons. The content has not been altered except for formatting.

      Learn more at Review Commons


      Reply to the reviewers

      Reviewer #1 (Evidence, reproducibility and clarity (Required)): Summary: To explore the relationship between histone post-translational modifications (H3K4me3 and H3K27me3) and enhancer activation with gene expression during early embryonic development, the authors used a monolayer differentiation approach to convert mouse embryonic stem cells (ESCs) into Anterior Definitive Endoderm (ADE). They monitored differentiation stages using a dual reporter mESC line (B6), which has fluorescent reporters inserted at the Gsc (GFP) and Hhex (Redstar) loci. Their analyses indicate that the differentiating cells advanced through stages similar to those in the embryo, successfully converting into endoderm and ADE with high efficiency. This is elegant and well performed stem cell biology.

      Their subsequent genome-wide and nascent transcription analyses confirmed that the in vitro gene expression changes correlated with developmental stages and confirmed that transcriptional activation precedes mRNA accumulation. They then focussed on linking active enhancers and histone modifications (H3K4me3 and H3K27me3) were with gene expression dynamics. Finally, the performed PRC2 inhibition and showed that, while it enhanced differentiation efficiency, it also induced ectopic expression of non-lineage specific genes.

      Major comments: In terms of mechanistic advances, they propose that transcriptional up-regulation does not require prior loss of H3K27me3, which they show appears to lag behind gene activation, but critically, on a likely mixed population level. I am sceptical of their interpretation of their data because they are looking at heterogenous populations of cells. To explain, one could imagine a particular H3K27me3 coated gene that gets activated during differentiation. In a population of differentiating cells, while the major sub-population of cells could retain H3K27me3 on this particular gene when it is repressed, a minority sub-population of cells could have no H3K27me3 on the gene when it is actively transcribed. The ChIP and RNA-seq results in this mixed cell scenario would give the wrong impression that the gene is active while retaining H3K27me3, when in reality, it's much more likely that the gene is never expressed when its locus in enriched with the repressive H3K27me3 modification. Therefore, to support their claim, they would have to show that a particular gene is active when its locus is coated with H3K27me3. Personally, I don't feel this approach would be worth pursuing.

      They also report that inhibition of PRC2 using EZH2 inhibitor (EPZ6438) enhanced endoderm differentiation efficiency but led to ectopic expression of pluripotency and non-lineage genes. However, this is not surprising considering the established role of Polycomb proteins as repressors of lineage genes.

      Reviewer #1 (Significance (Required)): I feel that this is a solid and well conducted study in which the authors model early development in vitro. It should be of interest to researchers with an interest in more sophisticated in vitro differentiation systems, perhaps to knockout their gene of interest and study the consequences. However, I don't see any major mechanistic advances in this work.

      *>Author Response *

      *We agree with the point regarding the delayed loss of H3K27me3 relative to gene activation, and indeed this same point has been raised by reviewer 3 (see below). Our cell-population based data does not allow us to directly test if gene up-regulation in a small population of cells from TSSs lacking H3K27me3, accounts for the observed result. Furthermore, there are currently no robust methods to determine cell- or allele-specific expression simultaneously with ChIP/Cut and Run for chromatin marks. However, we provide the following additional evidence that strongly supports our conclusions. *

      • *

      Our FACs isolation strategy used to prepare cell populations for ChIP, microarray expression and 4sU-seq analysis is based on expression (or lack thereof) of a fluorescent GSC-GFP reporter. This means that every cell in the G+ populations express the Gsc fluorescent reporter, at least at the protein level, at the point of isolation. This is despite the presence of appreciable and invariant levels of H3K27me3 at the TSS of the Gsc gene in both G+ and G- populations at day 3 of differentiation. Comparable to our meta-analysis of all upregulated genes shown in the original manuscript (Figure 5 and S5), H3K27me3 levels are then subsequently reduced in the G+ relative to the G- populations at day 4. The transcriptional changes which correspond to the GSG-GFP reporter expression and associated ChIP-seq data are shown in the reviewer figure (Fig R1 A shown in revision plan). To further support our observations, we sought to rule out the possibility that the shift in H3K27me3 and transcription were from mutually exclusive gene sets, from nominal transcription levels or from sites with low level H3K27me3. To do this with a gene set of sufficient size to yield a robust result, we selected upregulated TSSs that had a greater than median value for both transcription (4sU-seq) and H3K27me3 (n=49 of 159 genes; Fig R1 B shown in revision plan). Meta-analysis of these genes showed that, as for all upregulated gene TSS (n=159), transcriptional activation occurred in the presence of substantial and invariant levels of H3K27me3 at day 3 followed by a subsequent reduction by day 4 of differentiation (Fig R1 C shown in revision plan). Importantly, many of these genes yielded high absolute 4sU-seq signal, comparable to that of Gsc, arguing against transcriptional activation being limited to a small subpopulation of cells.

      • *

      Reviewer #2 (Evidence, reproducibility and clarity (Required)): In this paper the authors profile gene expression, including active transcription, and histone modifications (k4 and k27me3) during a complex differentiation protocol from ES cells, which takes advantage of FACS sorting of appropriate fluorescent reporters. The data is of good quality and the experiments are well performed. The main conclusion, that the analyzed histone marks channel differentiation more than they directly allow/block it, is well supported by the data. The paper is interesting and will represent a good addition to an already extensive literature. I have however a few major concerns, described below:

      1/ K4me3 may show more changes than they interpret, at least over the +1 nucl. An alternative quantification to aggregate profiles should be used to more directly address the questions regarding the correlations between histone mods and gene expression.

      *>Author Response *

      *Whilst we state that H3K4me3 levels are somewhat invariant at differentially expressed genes relative to H3K27me3, quantification of individual TSS (+/- 500 bp) did show a direct correlation with gene expression (Figure 5 and S5). To further explore this in response to the reviewer’s comment we will quantify K4me3 signal at the +1 nucleosome to determine if this yields more substantial differences than that observed more broadly across TSSs. *

      2/ Related to the previous point, it appears clear in Fig.4 that the promoters of each gene expression cluster do not belong to a single chromatin configuration. I think it would be important to: 1/ cluster the genes based on promoter histone mods and interrogate gene expression and cluster allocation (basically the reverse to what is presented) 2/ order the genes in the heatmaps identically for K4me3 and K27me3 to more easily understand the respective chromatin composition per cluster

      >Author Response

      We thank the reviewer for these suggestions and will include these analyses in a revised manuscript.

      3/ Also, as it is apparent that not all promoters in every cluster are enriched for the studied marks, could the authors separately analyze these genes? What are they? Do they use alternative promoters?

      >Author Response

      *Indeed, this is the case. Whilst there is significant enrichment of H3K27me3 at the TSS of developmentally regulated genes, not all genes whose expression changes during the differentiation will be polycomb targets. We will further stratify these clusters as suggested and determine what distinguishes the subsets. If informative, this data will be included in a revised manuscript. *

      4/ The use of 4SU-seq to identify active enhancers is welcome; however, I have doubts it is working very efficiently: for instance, in the snapshots shown in Fig.2A, the very active Oct4 enhancers in ES cells are not apparent at all... More validation of the efficiency of the approach seems required.

      >Author Response

      The 4sU-seq data shown in Figure 2A was generated in samples isolated from day 3 and 4 of the ADE differentiation. It is therefore likely that the enhancers have been partly or wholly decommissioned at this point. Indeed, in a separate study we generated 4sU-seq data using the same protocol and conditions as presented here but in ES cells and differentiated NPCs (day 3 to 7) and indeed see transcription at Oct 4 enhancers in ESCs (arrowed in the screenshot shown in revision plan) which are extinguished upon differentiation to neural progenitor cells (NPCs); data from PMID: 31494034).

      5/ The effects of the EZH2 inhibitor are quite minor regarding the efficiency of the differentiation as analyzed by FACS, despite significant gene expression changes. To the knowledge of this referee, this is at odds with results obtained with Ezh2 ko ES cells that display defects in mesoderm and endoderm differentiation. I have issues reconciling these results (uncited PMID: 19026780). Either the authors perform more robust assays (inducible KOs) or they more directly explain the limitations of the study and the controversies with published work.

      >Author Response

      We agree that this result appears to be at odds with the findings in (PMID: 19026780*). This is likely due to the fact that we are acutely reducing H3K27me3 levels for a short period either during or immediately preceding the differentiation rather than removing PRC2 function genetically. This, likely provides a less pronounced defect on the ability to generate endodermal cells. However, we cannot address this without further experimentation which is beyond the scope of this study. We will more fully discuss the results in the context of this and other studies and discuss the limitations of the study in this regard. *

      Minor 1/ please add variance captured to PCA plots 2/ Fig1E add color scales to all heatmaps 3/ Fig4C,D are almost impossible to follow, please find a way to identify better the clusters/samples and make easier to correlate all the variables

      • *

      >Author Response

      *We will address all of these points in a revised manuscript. *

      Reviewer #2 (Significance (Required)):

      The paper is incremental in knowledge, and not by a big margin, as it is known already that histone mods rather channel than drive differentiation. Though, the authors do not clearly address inconsistencies with published work, especially regarding Ezh2 thought to be important to make endoderm. It is however a good addition to current knowledge, provided a better discussion of differences with published work is provided.

      >Author Response

      *As outlined above, we will address this with a more complete discussion about the distinction between the studies and what can and can’t be concluded from our approach. *

      * *

      • *

      Reviewer #3 (Evidence, reproducibility and clarity (Required)): This study investigates the role of chromatin-based regulation during cell fate specification. The authors use an ESC model of differentiation into anterior primitive streak and subsequently definitive endoderm, which they traced via a dual-reporter system that combines GSC-GFP and HHEX-RedStar. The authors mapped changes in (nascent) gene expression and histone modifications (H3K4me3/H3K27me3) at key timepoints and within different populations over six days of differentiation. Finally, the authors test the functional implications of H3K27me3 landscapes via PRC2 inhibition.

      The majority of data chart the descriptive changes in (epi)genomic and transcriptional dynamics coincident with cell differentiation. The use of nascent transcriptomics improves the temporal resolution of expression dynamics, and is an important strategy. By and large the data reinforce established paradigms. For example, that transcription is the dominant mechanism regulating mRNA levels, or that dynamic chromatin states changes occur and largely corelate of gene activity. They also identify putative enhancers with profiling data, albeit these are not validated, and confirm that PRC2 inhibition impacts cell fate processes - in this case promoting endodermal differentiation efficiency. Overall, the study is relatively well-performed and clearly written, with the omics profiling adding more datasets from in vitro cell types that can be difficult to characterise in vivo. Whilst the majority of the study may be considered incremental, the key finding is the authors conclusion that H3K27me3 is subordinate to gene activity rather than an instructive repressor. If borne out, this would mark an important observation with broad implications. However, in my view this conclusion is subject to many confounders and alternative interpretations, and the authors have not ruled out other explanations. Given the centrality of this to the novelty of the study, I would encourage further analysis/stratification of existing data, and potentially further experiments to provide more confidence in this key conclusion.

      Primary issue 1.) The authors show that at the earliest timepoint (d3), nascent gene activation of a handful of genes between G+ and G- populations is not associated with a FC loss of H3K27me3. From this the authors extrapolate their key conclusion that H3K27me3 is subordinate. Causality of chromatin modifications in gene regulation is critical to decipher, and therefore this is an important observation to confirm. Below I go through the possible confounders and issues with the conclusion at this point.

      (i) Single-cell penetrance. A possible (likely?) possibility is that gene activation initially occurs in a relatively small subset of cells at d3. Because these genes are expressed lowly prior to this, they will register as a significant upregulation in bulk analysis. However, in this scenario H3K27me3 would only be lost from a small fraction of cells, which would not be detectable against a backdrop of most cells retaining the mark. In short, the authors have not ruled out heterogeneity driving the effect. Given the different dynamic range of mRNA and chromatin marks, and that a small gain from nothing (RNA) is easier to detect than a small loss from a pre-marked state (chromatin), investigating this further is critical to draw the conclusions the authors have.

      (ii) Initial H3K27me3 levels. The plots in Fig 5 show the intersect FC of H3K27me3 and gene expression. Genes that activate at d3 show no loss of H3K27me3. However, it is important to characterise (and quantitate) whether these genes are significantly marked by H3K27me3 in the first place, which I could not find in the manuscript. Many/several of the genes may not be polycomb marked or may have low levels to begin with. This would obviously confound the analysis, since an absence/low K27 cannot be significantly lost and is unlikely to be functional. Thus, the DEG geneset should be further stratified into H3K27me3+ and K27me3- promoter groups/bins, with significance and conclusions based on the former only (e.g. boxplot in 5F).

      (iii) Sample size. The conclusions are based on a relatively small number of genes that upregulate between G+ and G- (n=55 in figure by my count, text mentions n=52). Irrespective of the other confounders above, this is quite a small subset to make the sweeping general conclusion that "loss of the repressive polycomb mark H3K27me3 is delayed relative to transcriptional activation" in the abstract. Indeed, the small number of DEG suggests the cell types being compared are similar and perhaps therefore have specific genomic features (this could be looked at) that drive .

      >Author Response

      *These are very good points and are also raised by reviewer 1 (see above). We have one example where we can definitively interrogate single cell protein expression, in our current data. Gsc (as monitored by GSC-GFP FACS and the bulk RNA analysis) meets the criteria of being robustly upregulated in all FACs sorted cells in the presence of high levels of H3K27me3 in the D3G+ population. We believe that the additional analysis (Figure R1A shown in revision plan) and the discussion above addresses the reviewer’s concerns about both the levels of expression and magnitude of H3K27me3. With respect to the third point, the numbers are low (although here I present data from the 4SU analysis with approximately three times more data points) however, the point here is not too say this happens in every instance of gene activation but more that it can happen and not just at a small subset of outlier genes. This is important, as the reviewer notes, in our understanding of how polycomb repression is relieved during development. We will also look to see if there are sequence characteristics/ motifs of these genes. In a revised manuscript we would include this data and further analysis as outlined above. The reviewer points out that the numbers vary a little between analyses. This arises due to the annotation of multiple TSSs per genes in some cases. This will be rectified throughout and made clearer in the legends. *

      Other comments: 2.) The authors show that promoter H3K4me3 corelates well with gene expression dynamics in their model. They conclude that "transcription itself is required for H3K4me3 deposition", or in other words is subordinate. This may well be the case but from their correlative data this cannot be inferred. Indeed, several recent and past papers have shown that H3K4me3 itself can directly modulate transcription, for example by triggering RNA II pause-release, by preventing epigenetic silencing and/or by recruiting the PIC. The authors could point out or discuss these alternative possibilities to provide a more balanced discourse.

      >Author Response

      We agree and this will be discussed more thoroughly and both possibilities put forward in the revised manuscript.

      3.) The labelling of some figures is unclear. In Fig 4C and 4D (right) it is impossible to tell what sample each of the lines represents. It is also not clear what the blue zone corresponds to in genome view plots (the whole gene?). Moreover, the replicate numbers are not shown in figure legends.


      >Author Response

      *We agree that the data presented in 4C and D is unclear. We will, as a minimum, collapse profiles into like populations (ESC / G- / G+ / G+H- / G+H+) which makes sense given the similarity of these populations across all analyses (see e.g. PCA analysis in Figure 1). We will also explore alternative ways of presenting the data to better highlight the dynamics and incorporate this with the changes suggested by reviewer 2. The blue shaded area represents the full extent of the key gene being discussed in the screen shot, this is mentioned in the legend but will be made clearer in a revised manuscript. Replication will also be added to the legend throughout (n=2 for ChIP-seq and n=3 for 4sU-seq). *

      4.) It would be nice to provide more discussion to reconcile the conclusions that H3K27me3 in endoderm differentiation is subordinate and the final figure showing inhibiting H3K27me3 has a significant effect on differentiation, since the latter is the functional assessment.

      >Author Response

      *We will build on the points already made that suggests that whilst K27me3 is a passive repressor that serves to act against sub-threshold activating cues, it is nonetheless a critical regulator of developmental fidelity. *

      Reviewer #3 (Significance (Required)): Overall, the study's strengths are in that it characterises epigenomic dynamics within a specific and relevant cell fate model. The nascent transcriptomics adds important resolution, and underpins the core conclusions. The weakness is that data is over-interpreted at this point, and other possibilities are not adequately tested. The conclusions should therefore either be scaled back (which reduces novelty) or further analysis and/or experiments should be performed to support the conclusion. If it proves correct, this would be a significant observation for the community,

      >Author Response

      In a revised manuscript, we will address the reviewer’s concerns with additional data and discussion as indicated above.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Recommendations for the authors:

      Reviewer #1 (Recommendations For The Authors):

      My main concern is still in place. It is unclear whether the proposed method can find actual goal states, and as a result it is unclear what states it finds. Table S1 mentions the model BIOMD0000000454, which is a small metabolic pathway with known equations given in "Example One" in "Metabolic Control Analysis: Rereading Reder". In this model the goal states can be calculated analytically.

      Regarding your statements below: I am not concerned that your method will be less efficient than random search (or any other search..) on small models, but I think it is important for the readers to have evidence that your method is able to discover true goal states at least in small networks, used in your study. You do show that your method scales to complex models. So, in my opinion, the missing part is to show that it is able to find true goal states.

      "...For simple models whose true steady-state distribution can be derived numerically and/or analytically, it is very likely that their exploration will be much simpler and this is not where a lot of improvement over random search may be found, which explains our focus on more complex models..."

      We thank you for your response and for your concerns on the lack of evidence that our method is able to re-discover the true goal states of simple models when these are known a priori. We acknowledge that adding these simple cases is useful for completeness. We did not include these simple models in our main study because in most cases a basic random search over the initial conditions will lead to the re-discovery of these goal states. For instance for the mentioned model BIOMD0000000454 described in the "Example One" from the "Metabolic Control Analysis: Rereading Reder" paper, several simplifying assumptions are made such that the system only has one steady state (x1=0.056, x2=0.769, x3=4.231) which can be found analytically as shown in the paper. In that simple case, this goal state is also straightforward to find with numerical simulation as any valid initial condition will converge to it.

      To address the concerns of the reviewer, we propose to add an additional "sanity check" figure in the supplementary of the revised paper (Figure S4), as well as a “sanity check” subsection in the “Methods”, to present additional experiments made on  simple models such as this one. The novel figure and subsection can be visualized on the paper’s interactive version available online https://developmentalsystems.org/curious-exploration-of-grn-competencies, and we plan to include them as such in the further revision.  We have also included the full code to reproduce this sanity check as a ‘sanity_check.ipynb’  jupyter notebook in the github repository (https://github.com/flowersteam/curious-exploration-of-grn-competencies/blob/main/notebooks/sanity_check.ipynb).

      In the novel figure S4-b, we show the results of our exploration pipeline on the suggested model BIOMD0000000454 as described in the "Example One" of the paper. These results provide evidence that the curiosity search is able to find back the correct unique goal state (x1=0.056, x2=0.769, x3=4.231), as expected.

      We also include a second sanity check on BIOMD0000000341 which models the dynamics of beta-cell mass, insulin and glucose dynamics. This model has two stable fixed points representing physiological (B=300, I=10, G=100) and pathological (B=0, I=0, G=600) steady states, which are the known ground truth steady states as described in Figure 3 of the "A Model of b-Cell Mass, Insulin, and Glucose Kinetics: Pathways to Diabetes" paper. Again, as expected, curiosity search is able to find back those two steady states (Figure S4-a).

      As stated in our previous answer, our main study focuses on more complex models that are not limited to one or few attractors that can easily be discovered with random initial conditions. Regarding the mentioned BIOMD0000000454, maybe something that has been confusing for the reviewer is that we indeed included it in our main study but, as specified in the caption of table S4, at the difference of what is done in the "example one" of the original paper, we let the metabolite concentrations y1,...,y5 evolve in time (instead of enforcing them as constants). When doing so, the resulting dynamics of the system are more complex and exhibit a spectrum of possible steady states (unknown a priori), which differ from the previous case with a single steady state. In that case, the new attractors are not analytically easy to find and the proposed curiosity search becomes interesting as it is able to uncover the distribution of possible steady states much more efficiently than a random search baseline, as shown in the new figures S4-c and S4-d.

      We hope that these new results will address the reviewer’s concerns and provide evidence to the readers on the validity of the approach on simple networks.

      eLife assessment

      This important study develops a machine learning method to reveal hidden unknown functions and behavior in gene regulatory networks by searching parameter space in an efficient way. The evidence for some parts of the paper is still incomplete and needs systematic comparison to other methods and to the ground truth, but the work will be of broad interest to anyone working in biology of all stripes since the ideas reach beyond gene regulatory networks to revealing hidden functions in any complex system with many interacting parts.

      We thank the editors and reviewers for their positive assessment and constructive suggestions. In our response, we acknowledge the importance of systematic comparison to other methods and to the ground truth, when available. However we also emphasize the challenges associated with evaluating such methods in the context of uncovering hidden behaviors in complex biological networks as the ground truth is often unknown. We hope that our explanations will clarify the potential of our approach in advancing the exploration of these systems.

      Public Reviews:

      Reviewer #1 (Public Review):

      Summary: This paper suggests to apply intrinsically-motivated exploration for the discovery of robust goal states in gene regulatory networks.

      Strengths:

      The paper is well written. The biological motivation and the need for such methods are formulated extraordinarily well. The battery of experimental models is impressive.

      We thank the reviewer for sharing interest in the research problem and for recognizing the strengths of our work.

      Weaknesses:

      (1) The proposed method is compared to the random search. That says little about the performance with regard to the true steady-state goal sets. The latter could be calculated at least for a few simple ODE (e.g., BIOMD0000000454, `Metabolic Control Analysis: Rereading Reder'). The experiment with 'oscillator circuits' may not be directly interpolated to the other models.

      The lack of comparison to the ground truth goal set (attractors of ODE) from arbitrary initial conditions makes it hard to evaluate the true performance/contribution of the method. A part of the used models can be analyzed numerically using JAX, while there are models that can be analyzed analytically.

      "...The true versatility of the GRN is unknown and can only be inferred through empirical exploration and proxy metrics....": one could perform a sensitivity analysis of the ODEs, identifying stable equilibria. That could provide a proxy for the ground truth 'versatility'.

      We agree with the reviewer that one primary concern is to properly evaluate the effectiveness of the proposed method. However, as we move toward complex pathways, knowledge of the “true” steady-state goal sets is often unknown which is where the use of machine learning methods as the one we propose are particularly interesting (but challenging to evaluate).

      For simple models whose true steady-state distribution can be derived numerically and/or analytically, it is very likely that their exploration will be much simpler and this is not where a lot of improvement over random search may be found, which explains our focus on more complex models. While we agree that it is still interesting to evaluate exploration methods on these simple models for checking their behavior, it is not clear how to scale this analysis to the targeted more complex systems.

      For systems whose true steady state distribution cannot be derived analytically or numerically, we believe that random search is a pertinent baseline as it is commonly used in the literature to discover the attractors/trajectories of a biological network. For instance, Venkatachalapathy et al. [1] initialize stochastic simulations at multiple randomly sampled starting conditions (which is called a kinetic Monte Carlo-based method) to capture the steady states of a biological system. Similarly, Donzé et al. [29] use a Monte Carlo approach to compute the reachable set of a biological network «when the number of parameters  is large and their uncertain range  is not negligible». For the considered models, the true steady-state goal set is unknown, which is why we chose comparison with random search. We added a “Statistics” subsection in the Methods section providing additional details about the statistical analyses we perform between our method and the random search baseline.

      (2) The proposed method is based on `Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning', which assumes state action trajectories [s_{t_0:t}, a_{t_0:t}], (2.1 Notations and Assumptions' in the IMGEP paper). However, the models used in the current work do not include external control actions, but rather only the initial conditions can be set. It is not clear from the methods whether IMGEP was adapted to this setting, and how the exploration policy was designed w/o actual time-dependent actions. What does "...generates candidate intervention parameters to achieve the current goal....", mean considering that interventions 'Sets the initial state...' as explained in Table 2?

      We thank the reviewer for asking for clarification, as indeed the IMGEP methodology originates from developmental robotics scenarios which generally focus on the problem of robotic sequential decision-making, therefore assuming state action trajectories as presented in Forestier et al. [65]. However, in both cases, note that the IMGEP is responsible for sampling parameters which then govern the exploration of the dynamical system. In Forestier et al. [65], the IMGEP also only sets one vector at the start (denoted ) which was specifying parameters of a movement (like the initial state of the GRN), which was then actually produced with dynamic motion primitives which are dynamical system equations similar to GRN ODEs, so the two systems are mathematically equivalent. More generally, while in our case the “intervention” of the IMGEP (denoted ) only controls the initial state of the GRN, future work could consider more advanced sequential interventions simply by setting parameters of an action policy  at the start which could be called during the GRN’s trajectory to sample control actions  where  would be the state of the GRN. In practice this would also require setting only one vector at the start, so it would remain the same exploration algorithm and only the space of parameters would change, which illustrates the generality of the approach.

      (3) Fig 2 shows the phase space for (ERK, RKIPP_RP) without mentioning the typical full scale of ERK, RKIPP_RP. It is unclear whether the path from (0, 0) to (~0.575, ~3.75) at t=1000 is significant on the typical scale of this phase space. is it significant on the typical scale of this phase space?

      The purpose of Figure 2 is to illustrate an example of GRN trajectory in transcriptional space, and to illustrate what “interventions” and “perturbations” can be in that context. To that end we have used the fixed initial conditions provided in the BIOMD0000000647, replicating Figure 5 of Cho et al. [56].

      While we are not sure of what the reviewer means with “typical” scale of this phase space, we would like to point reviewer toward Figure 8 which shows examples of certain paths that indeed reach further point in the same phase space (up to ~10 in RKIPP_RP levels and ~300 in ERK levels). However, while the paths displayed in Figure 8 are possible (and were discovered with the IMGEP), note that they may be “rarer” to occur naturally  in the sense that a large portion of the tested initial conditions with random search tend to converge toward smaller (ERK, RKIPP_RP) steady-state values similar to the ones displayed in Figure 2.

      (4) Table 2:

      a. Where is 'effective intervention' used in the method?

      b. in my opinion 'controllability', 'trainability', and 'versatility' are different terms. If their correspondence is important I would suggest to extend/enhance the column "Proposed Isomorphism". otherwise, it may be confusing.

      a) We thank the reviewer for pointing out that “effective intervention” is not explicitly used in the method. The idea here is that as we are exploring a complex dynamical system (here the GRN), some of the sampled interventions will be particularly effective at revealing novel unseen outcomes whereas others will fail to produce a qualitative change to the distribution of discovered outcomes. What we show in this paper, for instance in Figure 3a and Figure 4, is that the IMGEP method is particularly sample-efficient in finding those “effective interventions”, at least more than a random exploration. However we agree that the term “effective intervention” is ambiguous (does not say effective in what) and we have replaced it with “salient intervention” in the revised version.

      b) We thank the reviewer for highlighting some confusing terms in our chosen vocabulary, and we have clarified those terms in the revised version. We agree that controllability/trainability and versatility are not exactly equivalent concepts, as controllability/trainability typically refers to the amount to which a system is externally controllable/trainable whereas versatility typically refers to the inherent adaptability or diversity of behaviors that a system can exhibit in response to inputs or conditions. However, they are both measuring the extent of states that can be reached by the system under a distribution of stimuli/conditions, whether natural conditions or engineered ones, which is why we believe that their correspondence is relevant.

      I don't see how this table generalizes "concepts from dynamical complex systems and behavioral sciences under a common navigation task perspective".

      We have replaced the verb “generalize” with “investigate” in the revised version.

      Reviewer #2 (Public Review):

      Summary:

      Etcheverry et al. present two computational frameworks for exploring the functional capabilities of gene regulatory networks (GRNs). The first is a framework based on intrinsically-motivated exploration, here used to reveal the set of steady states achievable by a given gene regulatory network as a function of initial conditions. The second is a behaviorist framework, here used to assess the robustness of steady states to dynamical perturbations experienced along typical trajectories to those steady states. In Figs. 1-5, the authors convincingly show how these frameworks can explore and quantify the diversity of behaviors that can be displayed by GRNs. In Figs. 6-9, the authors present applications of their framework to the analysis and control of GRNs, but the support presented for their case studies is often incomplete.

      Strengths:

      Overall, the paper presents an important development for exploring and understanding GRNs/dynamical systems broadly, with solid evidence supporting the first half of their paper in a narratively clear way.

      The behaviorist point of view for robustness is potentially of interest to a broad community, and to my knowledge introduces novel considerations for defining robustness in the GRN context.

      We thank the reviewer for recognizing the strengths and novelty of the proposed experimental framework for exploring and understanding GRNs, and complex dynamical systems more generally. We agree that the results presented in the section “Possible Reuses of the Behavioral Catalog and Framework” (Fig 6-9) can be seen as incomplete along certain aspects, which we tried to make as explicit as possible throughout the paper, and why we explicitly state that these are “preliminary experiments”. Despite the discussed limitations, we believe that these experiments are still very useful to illustrate the variety of potential use-cases in which the community could benefit from such computational methods and experimental framework, and build on for future work.

      Some specific weaknesses, mostly concerning incomplete analyses in the second half of the paper:

      (1) The analysis presented in Fig. 6 is exciting but preliminary. Are there other appropriate methods for constructing energy landscapes from dynamical trajectories in gene regulatory networks? How do the results in this particular case study compare to other GRNs studied in the paper?

      We are not aware of other methods than the one proposed by Venkatachalapathy et al. [1] for constructing an energy landscape given an input set of recorded dynamical trajectories, although it might indeed be the case. We want to emphasize that any of such methods would anyway depend on the input set of trajectories, and should therefore benefit from a set that is more representative of the diversity of behaviors that can be achieved by the GRN, which is why we believe the results presented in Figure 6 are interesting. As the IMGEP was able to find a higher diversity of reachable goal states (and corresponding trajectories) for many of the studied GRNs, we believe that similar effects should be observable when constructing the energy landscapes for these GRN models, with the discovery of additional or wider “valleys” of reachable steady states.

      Additionally, it is unclear whether the analysis presented in Fig. 6C is appropriate. In particular, if the pseudopotential landscapes are constructed from statistics of visited states along trajectories to the steady state, then the trajectories derived from dynamical perturbations do not only reflect the underlying pseudo-landscape of the GRN. Instead, they also include contributions from the perturbations themselves.

      We agree that the landscape displayed Fig. 6C integrates contributions from the perturbations on the GRN’s behavior, and that it can shape the landscape in various ways, for instance affecting the paths that are accessible, the shape/depth of certain valleys, etc. But we believe that qualitatively or quantitatively analyzing the effect of these perturbations  on the landscape is precisely what is interesting here: it might help 1) understand how a system respond to a range of perturbations and to visualize which behaviors are robust to those perturbations, 2) design better strategies for manipulating those systems to produce certain behaviors

      (2) In Fig. 7, I'm not sure how much is possible to take away from the results as given here, as they depend sensitively on the cohort of 432 (GRN, Z) pairs used. The comparison against random networks is well-motivated. However, as the authors note, comparison between organismal categories is more difficult due to low sample size; for instance, the "plant" and "slime mold" categories each only have 1 associated GRN. Additionally, the "n/a" category is difficult to interpret.

      We acknowledge that this part is speculative as stated in the paper: “the surveyed database is relatively small with respect to the wealth of available models and biological pathways, so we can hardly claim that these results represent the true distribution of competencies across these organism categories”. However, when further data is available, the same methodology can be reused and we believe that the resulting statistical analyses could be very informative to compare organismal (or other) categories.

      (3) In Fig. 8, it is unclear whether the behavioral catalog generated is important to the intervention design problem of moving a system from one attractor basin to another. The authors note that evolutionary searches or SGD could also be used to solve the problem. Is the analysis somehow enabled by the behavioral catalog in a way that is complementary to those methods? If not, comparison against those methods (or others e.g. optimal control) would strengthen the paper.

      We thank the reviewer for asking to clarify this point, which might not be clearly explained in the paper. Here the behavioral catalog is indeed used in a complementary way to the optimization method, by identifying a representative set of reachable attractors which are then used to define the optimization problem. For instance here, thanks to the catalog, we 1) were able to identify a “disease” region and several possible reachable states in that region and 2) use several of these states as starting points of our optimization problem, where we want to find a single intervention that can successfully and robustly reset all those points, as illustrated in Figure 8. Please note that given this problem formulation, a simple random search was used as an optimization strategy. When we mention more advanced techniques such as EA or SGD, it is to say that they might be more efficient optimizers than random search. However, we agree that in many cases optimizing directly will not work if starting from random or bad initial guess, and this even with EA or SGD. In that case the discovered behavioral catalog can be useful to better initialize  this local search and make it more efficient/useful, akin to what is done in Figure 9.

      (4) The analysis presented in Fig. 9 also is preliminary. The authors note that there exist many algorithms for choosing/identifying the parameter values of a dynamical system that give rise to a desired time-series. It would be a stronger result to compare their approach to more sophisticated methods, as opposed to random search and SGD. Other options from the recent literature include Bayesian techniques, sparse nonlinear regression techniques (e.g. SINDy), and evolutionary searches. The authors note that some methods require fine-tuning in order to be successful, but even so, it would be good to know the degree of fine-tuning which is necessary compared to their method.

      We agree that the analysis presented in Figure 9 is preliminary, and thank the reviewer for the suggestion. We would first like to refer to other papers from the ML literature that have more thoroughly analyzed this issue, such as Colas et al. [74] and Pugh et al. [34], and shown the interest of diversity-driven strategies as promising alternatives.  Additionally, as suggested by the reviewer, we added an additional comparison to the CMA-ES algorithm in the revised version in order to complete our analysis. CMA-ES is an evolutionary algorithm which is self-adaptive in the optimization steps and that is known to be better suited than SGD to escape local minimas when the number of parameters is not too high (here we only have 15 parameters). However, our results showed that while CMA-ES explores more the solution space at the beginning of optimization than SGD does, it also ultimately converges into a local minima similarly to SGD. The best solution converges toward a constant signal (of the target b) but fails to maintain the target oscillations, similar to the solutions discovered by gradient descent. We tried this for a few hyperparameters (init mean and std) but always found similar results.  We have updated the figure 9 image and caption, as well as descriptive text, to include these novel results in the revised version. We also added a reference to the CMA-ES paper in the citations.

      Reviewer #1 (Recommendations For The Authors):

      I would suggest to conduct a more rigor analysis of the performance by estimating/approximating the ground truth robust goal sets in important GRNs.

      Also, the use of terminology from different disciplines can be improved. Please see my comments above. Specifically, the connection between controllability in dynamical control systems and versatility used in this paper is unclear.

      We hope to have addressed the reviewer's concerns in our previous answers.

      Reviewer #2 (Recommendations For The Authors):

      Fig 4b: I'm not sure if DBSCAN is the appropriate method to use here, as the visual focus on the core elements of the clusters downplays the full convex hull of the points that random sampling achieves in Z space. An analysis based on convex hulls or the ball-coverage from Fig. 3b would presumably generate plots that were more similar between random sampling and curiosity search. If the goal is to highlight redundancy/non-linearity in the mapping between Z and I, another approach might be to simply bin Z-space in a grid, or to use a clustering algorithm that is less stringent about core/noise distinctions.

      We thank the reviewer for the suggestion. This plot is intended to convey the reader an understanding of why a method that uniformly samples goals in Z (what the  IMGEP is doing), is more efficient than a method that uniformly samples parameters in I (what the random search is doing), in systems for which there is high redundancy/non-linearity in the mapping between I and Z. We agree that binning the Z-space in a grid and counting the number of achieved bins is a way to quantitatively measure this, which is by the way very close to what we do in Figure 3 for measuring the achieved diversity. We believe however that the clustering and coloring provides additional intuitions on why this is the case: it illustrates that large regions of the intervention space map to small regions in the outcome space and vice versa.

      Additional changes in the revised version:

      We added a sentence in the Methods section as well as in the caption of Table S1 providing additional details about the way we simulate the biological models from the BioModels website

      We fixed a wrong reference to Figure 4 in the Methods “Sensitivity measure” subsection with reference to Figure 5.

    1. Author response:

      The following is the authors’ response to the original reviews.

      eLife assessment:

      This important study details an enrichment of the IL-6 signaling pathway in human tendinopathy and applies transcriptional profiling to an advanced in vitro model to test IL-6 specific phenotypes in tendinopathy. Overall, the strength of evidence is solid yet incomplete, as transcriptomic measurements provide clarity, though functional studies including analysis of proliferation are needed to confirm these findings. This work will be of interest to stem cell biologists and immunologists.

      To functionally assess the effect of IL-6 on Scx+ fibroblast proliferation in an acute injury, we repeated the in vivo studies with an EdU staining and a newly established IL-6 KO x ScxGFP+ mouse line. We found no evidence for this effect in acute injuries and acknowledge this in the revised manuscript.

      We further added data collected by combining fluorescence microscopy with human patient-derived tissue to strengthen the link between IL-6, IL-6R, and proliferation of CD90+ cells in chronic injuries.

      See comment 1.1.

      See comment 2.4.

      Changes:

      - Title

      - Abstract

      - Figure 2 and 3 (new data)

      - Figure 7 (new data)

      - Results

      - Discussion

      Reviewer 1

      (1.1) First, the experimental approach does not directly assess proliferation, as such the conclusions regarding proliferation are not well supported. In the ex-vivo model, the use of cell counting approaches is somewhat acceptable since the system is constrained by the absence of potential influx of new cells. However, given the nearly unlimited supply of extrinsically derived cells in vivo (vs. the explant model), assessment of actual proliferation (e.g. Edu, BrdU, Ki67) is critical to support this conclusion.

      To assess the effect of IL-6 on Scx+ fibroblast proliferation in an acute injury, we repeated the in vivo studies with an EdU staining and a newly established IL-6 KO x ScxGFP+ mouse line to combat the considerable background noise of currently available Scx antibodies.

      Under the improved design of these experiments, we could detect no effect of IL-6 on ScxGFP+ cells in an acute injury in vivo. We have therefore replaced figure 5 with the new results in figure 7 and moved figure 5F to the supplementary materials (Supplementary figure 9).

      We acknowledge and discuss this in the discussion section.

      See comment 2.4.

      See comment 2.11.

      Changes:

      - Title

      - Abstract

      - Figure 7 (new data)

      - Supplementary Figure 9

      - Results

      - Discussion

      (1.2) Second, the justification for the use of Scx-GFP+ cells as a progenitor population is not well supported. Indeed, in the discussion, Scx+ cells are treated as though they are uniformly a progenitor population, when the diversity of this population has been established by the cited studies, which do not suggest that these are progenitor populations. Additional definition/ delineation of these cells to identify the subset of these cells that may actually display other putative progenitor markers would support the conclusions. As it stands, the study currently provides important information on the impact of IL6 on Scx+ cells, but not tendon progenitors.

      We further delineated the extrinsic cell populations isolated from mouse Achilles tendons of ScxGFP+ mice using flow cytometric analysis and RT-qPCR. We used tendon population markers suggested by sc-RNA-seq of mouse Achilles tendons.

      (De Micheli et al., Am. J. Physiol. - Cell Physiol., 2020, 319(5), DOI: 10.1152/ajpcell.00372.2020)

      While a small subpopulation of these cells expressed typical progenitor markers (i.e. CD45 and CD146), we could detect no overlap with Scx+ cells. As suggested by the reviewer, we therefore replaced occurrences of “progenitor” in the manuscript with “fibroblast” and performed additional experiments with human patient-derived tissue sections and the fibroblast marker CD90.

      See comment 2.1.

      Changes:

      - Title

      - Abstract

      - Figure 2 (new data)

      - Figure 3 (new data)

      - Supplementary Figure 6 (new data)

      - Results

      - Discussion

      (1.3) Clarity regarding the relevance of the 'sheath-like' component of the assembloid would provide helpful context regarding which types of tendons are likely to have this type of communication vs. those that do not, and if there are differences in tendinopathy prevalence. Understanding why/how this communication between structures is relevant is important.

      Our assembloid concept is inspired by the structure of unsheathed tendons (i.e. biceps, semitendinosus, gracilis) and not sheathed tendons like the flexor tendons.

      We agree that clarity regarding the tendon type having this type of communication is important, so we sharpened previously blurry text passages in the revised manuscript.

      Text changes:

      - Introduction, page 3

      - Results, page 4

      - Results, page 8

      - Results, page 9

      - Results, page 11

      - Discussion, page 25

      - Discussion, page 26

      - Experimental section, page 28

      - Figure 1

      - Figure 2

      - Figure 3

      - Supplementary Table 1

      - Supplementary Figure 3

      - Supplementary Figure 4

      (1.4) Minor: in the text for Figure 6 (2nd paragraph), the comma in 19,694 is superscripted.

      Corrections were made throughout the manuscript.

      Text changes:

      - Results, page 4

      - Results, page 12

      - Results, page 19

      - Results, page 21

      (1.5) Minor: The inclusion of the Scx-GFP mouse should be included in the schematic Figure 5.

      The results presented in the previous draft did not feature tissues from ScxGFP mice but used a Scx-antibody to visually detect Scx+ cells. In anticipation of the revision process, we bred a new IL-6 KO x ScxGFP+ mouse line and repeated the experiment. As suggested by the reviewer, the new schematic figure 7 as well as the former figure 5 moved to the supplementary material now includes this mouse.

      Figure changes:

      - Supplementary Figure 9 (former figure 5)

      - Figure 7

      Reviewer 2

      (2.1) One question that comes to mind is whether the fibroblast progenitors in the extrinsic sheath of Achilles tendon is similar to those surrounding the tail tendon. The similarity of progenitors between different tendons is assumed with this model. I would consider this to be a minor issue.

      Tail tendon fascicles are thought to have a low number of reparative fibroblasts / progenitor cells because they lack a developed extrinsic compartment. Achilles tendons are supposed to have a higher number of reparative fibroblasts / progenitor cells, as their fascicles are surrounded by an extrinsic compartment.

      To verify this here, we added a better characterization and comparison of the cell populations isolated from the tail tendon fascicles and the Achilles tendons.

      First, we added representative light microscopy images of these cells at different timepoints after being cultured on tissue-culture plastic.

      Second, we performed flow cytometric analysis not only on the freshly digested tail tendon fascicles and Achilles tendons, but also on the cultured cells at the timepoint when they would have been embedded into the assembloids.

      Third, we compared the expression of population-specific markers in cells derived from tail tendon fascicle and Achilles tendons.

      As expected, tail tendon fascicle-derived cell populations appeared to be more elongated than Achilles tendon-derived populations shortly after isolation. Similarly, the “maintenance” fibroblasts in healthy tendons are more elongated than the reparative fibroblasts in diseased ones. After culture and priming in tendinopathic niche conditions, both populations assumed a more roundish, reparative phenotype.

      This was consistent with the flow cytometric analysis, which revealed a large difference between freshly isolated populations, that disappeared after extended culture and priming in tendinopathic niche conditions. Gene expression in tail tendon fascicle-derived and Achilles tendon-derived cells was similar after extended culture and priming in tendinopathic niche conditions.

      See comment 1.2.

      See comment 2.10.

      Changes:

      - Supplementary Figure 6 (new data)

      - Results, page 11

      (2.2) The authors use core tendons from IL-6 knockout mice and progenitors from wild-type mice. The reasoning behind this approach was a little confusing... is IL-6 expressed solely in the tendon core compared to the extrinsic sheath?

      Insights gained from human patient-derived tissues (Figure 2) suggest that in a healthy tendon, most of the IL-6 is located in the extrinsic compartment but distributed over compartments in the tendinopathic ones.

      Our assembloid design mimicks this by embedding wildtype fibroblasts into the extrinsic compartment. Our hypothesis was that a wildtype core in tendinopathic niche conditions attracts reparative fibroblasts through IL-6, while an IL-6 knock-out core does not. Therefore, it was important to establish IL-6 gradients close to what they seem to be in vivo.

      Nevertheless, we have to acknowledge that the amount of IL-6 secreted by extrinsic fibroblasts in isolation is quite small compared to what is secreted by a wildtype core (Supplementary Figure 7). Attributing IL-6 in the supernatant of a WT core // WT fibroblast assembloid to the correct cell population is challenging but could be part of future research.  

      Changes:

      - Figure 2 (new data)

      - Supplementary Figure 7 (new data)

      - Results, page 12

      (2.3) Is a co-culture system for 7 days appropriate to model tendinopathy without the supplementation of exogenous inflammatory compounds? The transcriptomic differences in Figure 3 seem to be subtle, and may perhaps suggest that it could be a model that more closely resembles steady state compared to tendinopathy. If so, is IL-6 still relevant during steady state?

      The collective experience in our lab is that core explants exposed to tendinopathic niche conditions (i.e. serum, 37°C, high oxygen, and high glucose levels) assume a disease-like phenotype. (i.e. Wunderli et al., Matrix Biology, 2020, Volume 89 https://doi.org/10.1016/j.matbio.2019.12.003 and Blache et al., Sci. Rep., 2021, 11(1), DOI 10.1038/s41598-021-85331-1).

      Specifically for our core // fibroblast co-culture system, we have reported the emergence of exaggerated tendinopathic hallmarks in a previous publication (Stauber et al., Adv. Healthc. Mater., 2021, 10(20), https://doi.org/10.1002/adhm.202100741).

      We clarified the use of previously validated tendinopathic niche conditions in this manuscript.

      Changes:<br /> - Introduction, page 3<br /> - Results, page 12

      (2.4) The results presented in Figures 4 and 5 are impressive, demonstrating a link between IL-6 and fibroblast progenitor numbers and migration. Their experimental design in these figures show strong evidence, using Tocilizumab and recombinant IL-6 to rescue shown phenotypes. I would reduce the claims on proliferation, however, unless a proliferation-specific marker (e.g., Ki67, BrdU, EdU) is included in confocal analyses of Scx+ progenitors.

      As reviewer 1 pointed out as well, it is important to use a proliferation-specific marker “given the nearly unlimited supply of extrinsically derived cells in vivo (vs. the explant model)”.

      To assess the effect of IL-6 on Scx+ fibroblast proliferation in vivo, we repeated those experiments with a proliferation-specific EdU staining and a newly established IL-6 KO x ScxGFP+ mouse line.

      Under this improved design, we could not detect an effect of IL-6 on proliferation in an acute injury in vivo.

      We have therefore replaced figure 5 with the new results in figure 7 and moved figure 5F to the supplementary materials (Supplementary figure 9).

      We acknowledge and discuss this in the discussion section and softened our statements in the title and the abstract.

      See comment 1.1.

      See comment 2.11.

      Changes:

      - Title

      - Abstract

      - Figure 7 (new data)

      - Supplementary Figure 9

      - Results

      - Discussion

      (2.5) I think it would significantly strengthen the study if they could measure tendon healing in IL-6 knockouts or in wild-type mice treated with IL-6 inhibitors, since conventional ablation of IL-6 may lead to the elevation of compensatory IL-6 superfamily ligands that could activate STAT signaling. The authors claim that reducing IL-6 signaling decreases transcriptomic signatures of tendinopathy, but IL-6 may be necessary to promote normal healing of the tendon following injury. It is supposed that a lack of Scx+ progenitor migration would delay tendon healing.

      Indeed, another study using the same IL-6 knock-out strain showed that a lack of IL-6 signaling resulted in slightly inferior mechanical properties in healing patellar tendons (Lin et al., J. Biomech., 39(1), 2006 https://doi.org/10.1016/j.jbiomech.2004.11.009)

      Also, it might be due to the elevation of compensatory IL-6 superfamily ligands that we found no effect of IL-6 on the proliferation of Scx+ cells in an acute injury in vivo.

      Therefore, assessing the effects of IL-6 inhibitors on tendon healing following an acute injury would have been of great interest to us. Unfortunately, getting the necessary permission from the animal experimentation office for a new invasive treatment protocol was outside of our scope due to the severity degree and time limitations.

      We incorporated and acknowledged these important points in the discussion.

      Text changes:

      - Introduction, page 3

      - Discussion, page 26

      (2.6) Do IL-6 knockout mice and/or mice treated with IL-6 inhibitors have delayed healing following Achilles tendon resection? Please provide experimental evidence.

      See comment 2.5.

      (2.7) I would suggest reducing claims on proliferation, or include a proliferation specific marker (e.g., Ki67, BrdU, EdU) in confocal analyses of Scx+ progenitors.

      See comment 1.1.

      See comment 2.4.

      (2.8) Supplementary Figures 1 and 2: the authors removed outliers. Please specify exactly which outliers were removed in the figures, and provide additional information on the criteria used to identify these outliers.

      To address this comment, we sharpened our criteria for identifying outliers and re-did the analysis depicted in figure 1.

      Briefly, we excluded 5 normal and 5 tendinopathic samples from sheathed tendons which have a different compartmental structure than unsheathed tendons.

      A complete separate analysis of the sheathed tendons would have been beyond the scope of this manuscript, but early screening suggested that IL-6 transcripts are not increased in sheathed tendinopathic tendons.

      We made text changes throughout the manuscript and to the supplementary table 1 and supplementary figure 2 to clearly state our criteria for excluding samples / outliers.

      Changes:

      - Introduction, page 3

      - Results, page 4

      - Results, page 8

      - Results, page 9

      - Results, page 11

      - Discussion, page 25

      - Discussion, page 26

      - Experimental section, page 28

      - Figure 1,

      - Figure 2,

      - Figure 3,

      - Supplementary table 1,

      - Supplementary figure 2,

      - Supplementary figure 3,

      - Supplementary figure 4,

      (2.9) Whenever "positive enrichment" is mentioned in the text, please specify in what group. It is presumed that the enrichment, for example, in the first figure is associated with tendinopathy samples compared to controls, though it is a bit unclear.

      The direction of the enrichment was added to the text.

      Text changes:

      - Abstract, page 1

      - Introduction, page 3

      - Results, page 4

      - Results, page 6

      - Results, page 12

      - Results, page 14

      - Results, page 19

      - Results, page 21

      - Discussion, page 25

      - Discussion, page 26

      - Discussion, page 27

      - Figure 1

      - Figure 5

      - Figure 8

      - Figure 9

      - Supplementary figure 3

      - Supplementary figure 4

      - Supplementary figure 6

      - Supplementary figure 8

      - Supplementary figure 11

      - Supplementary figure 12

      - Supplementary figure 14

      (2.10) Are tail tendon progenitors similar to Achilles tendon progenitors? Please provide a statement that shows similarity (in function, transcriptome, etc.) to support the in vitro tendon model.

      See comment 1.2.

      See comment 2.1.

      (2.11) Are the results in Figure 5F significant? It seems that your pictures show a dramatic change in migration, but the quantification does not?

      We repeated the in vivo studies with a newly established IL-6 KO x ScxGFP+ mouse line to combat the considerable background noise of currently available Scx antibodies.

      Under the improved design of these experiments, we could not detect an effect of IL-6 on ScxGFP+ cells migration in an acute injury in vivo.

      We have therefore replaced figure 5 with the new results in figure 7 and moved figure 5F to the supplementary materials (Supplementary figure 9)

      We acknowledge and discuss this in the discussion section.

      See comment 1.1.

      See comment 2.4.

      Changes:

      - Title

      - Abstract

      - Figure 7 (new data)

      - Supplementary Figure 9

      - Results

      - Discussion

      (2.12) Please provide additional discussion points on cis- versus trans-IL6 signaling in your results found in mouse. Do you think researchers/clinicians would want to target trans-IL6 signaling based on your results? Please support these statements with the expression of IL6R on cells found in the tendon core and external sheath progenitors.

      To address this comment, we performed flow cytometric analysis on Achilles tendon-derived fibroblasts expanded in 2D and digested sub-compartments of the assembloids (Supplementary Figure 7).

      These data suggest that IL6R is neither expressed by core nor extrinsic fibroblasts, but mainly comes from core-resident CD45+ tenophages.

      Human samples co-stained for IL6R and CD68 (an established human macrophage marker) confirmed macrophages as a source of IL-6R in vivo. However, human samples co-stained for IL6R and CD90 (an established marker of reparative fibroblasts in humans) also detected IL6R on CD90+ cells, which have not yet been reported to express IL6R themselves.

      Overall, it is likely that trans-IL-6 signaling is more important for the activation of reparative fibroblasts than cis-IL-6 signaling. We added these statements to the manuscript.

      Changes:

      - Results, page 9

      - Results, page 12

      - Discussion, page 25

      - Discussion, page 26

      - Figure 3 (new data)

      - Supplementary figure 7 (new data)

      (2.13) Please provide more detail on collagen isolation from rat tail in the methods section.

      We provided more details on collagen isolation from rat tail in the experimental section (page 29)

      Changes:

      - Experimental section, page 29

      (2.14) Please comment on whether your in vitro system resembles tendinopathy or a steady state tendon. If it models more of a steady state system, would IL-6 still be relevant?

      See comment 2.3.

      Detailed feedback:

      Reviewer 1:

      This work by Stauber et al. is focused on understanding the signaling mechanisms that are associated with tendinopathy development, and by screening a panel of human tendinopathy samples, identified IL-6/JAK/STAT as a potential mediator of this pathology. Using an innovative explant model they delineated the requirement for IL-6 in the main body of the tendon to alter the dynamics of cells in the peritendinous synovial sheath space.

      The use of a publicly available existing dataset is considered a strength since this dataset includes expression data from several different human tendons experiencing tendinopathy. This facilitates the identification of potentially conserved regulators of the tendinopathy phenotype.

      The clear transcriptional shifts between WT and IL6-/- cores demonstrates the utility of the assembloid model, and supports the importance of IL6 in potentiating the cell response to this stimuli.

      Reviewer 2:

      The authors of this study describe a goal of elucidating the signaling pathways that are upregulated in tendinopathy in order to target these pathways for effective treatments. Their goal is honorable, as tendinopathy is a common debilitating condition with limited treatments. The authors find that IL-6 signaling is upregulated in human tendinopathy samples with transcriptomic and GSEA analyses. The evidence of their initial findings are strong, providing a clinically-relevant phenotype that can be further studied using animal models.

      Along these lines, the authors continue with an advanced in vitro system using the mouse tail tendon as the core with progenitors isolated from the Achilles tendon as the external sheath embedded in a hydrogel matrix. One question that comes to mind is whether the fibroblast progenitors in the extrinsic sheath of Achilles tendon is similar to those surrounding the tail tendon. The similarity of progenitors between different tendons is assumed with this model. I would consider this to be a minor issue, and would consider the in vitro system to be an additional strength of this study.

      In order to address the IL-6 signaling pathway, the authors use core tendons from IL-6 knockout mice and progenitors from wild-type mice. The reasoning behind this approach was a little confusing... is IL-6 expressed solely in the tendon core compared to the extrinsic sheath? Furthermore, is a co-culture system for 7 days appropriate to model tendinopathy without the supplementation of exogenous inflammatory compounds? The transcriptomic differences in Figure 3 seem to be subtle, and may perhaps suggest that it could be a model that more closely resembles steady state compared to tendinopathy. If so, is IL-6 still relevant during steady state?

      Nevertheless, the results presented in Figures 4 and 5 are impressive, demonstrating a link between IL-6 and fibroblast progenitor numbers and migration. Their experimental design in these figures show strong evidence, using Tocilizumab and recombinant IL-6 to rescue shown phenotypes. I would reduce the claims on proliferation, however, unless a proliferation-specific marker (e.g., Ki67, BrdU, EdU) is included in confocal analyses of Scx+ progenitors. The Achilles tendon injury model provides a nice in vivo confirmation of Scx-progenitor migration to the neotendon.

      Given their goal to elucidate signaling pathways that could be targeted in the clinic, I think it would significantly strengthen the study if they could measure tendon healing in IL-6 knockouts or in wild-type mice treated with IL-6 inhibitors, since conventional ablation of IL-6 may lead to the elevation of compensatory IL-6 superfamily ligands that could activate STAT signaling. The authors claim that reducing IL-6 signaling decreases transcriptomic signatures of tendinopathy, but IL-6 may be necessary to promote normal healing of the tendon following injury. It is supposed that a lack of Scx+ progenitor migration would delay tendon healing.

      Overall, the authors of this study elucidated IL-6 signaling in tendinopathy and provided a strong level of evidence to support their conclusions at the transcriptomic level. However, functional studies are needed to confirm these phenotypes and fully support their aims and conclusions. With these additional studies, this work has the potential to significantly influence treatments for those suffering from tendinopathy.

    1. Author response:

      The following is the authors’ response to the previous reviews.

      Public Reviews:

      Reviewer #1 (Public Review):

      In this manuscript, Lee et al. compared encoding of odor identity and value by calcium signaling from neurons in the ventral pallidum (VP) in comparison to D1 and D2 neurons in the olfactory tubercle (OT).

      Strengths:

      They utilize a strong comparative approach, which allows the comparison of signals in two directly connected regions. First, they demonstrate that both D1 and D2 OT neurons project strongly to the VP, but not the VTA or other examined regions, in contrast to accumbal D1 neurons which project strongly to the VTA as well as the VP. They examine single unit calcium activity in a robust olfactory cue conditioning paradigm that allows them to differentiate encoding of olfactory identity versus value, by incorporating two different sucrose, neutral and air puff cues with different chemical characteristics. They then use multiple analytical approaches to demonstrate strong, low-dimensional encoding of cue value in the VP, and more robust, high-dimensional encoding of odor identity by both D1 and D2 OT neurons, though D1 OT neurons are still somewhat modulated by reward contingency/value. Finally, they utilize a modified conditioning paradigm that dissociates reward probability and lick vigor to demonstrate that VP encoding of cue value is not dependent on encoding of lick vigor during sucrose cues, and that separable populations of VP neuros encode cue value/sucrose probability and lick vigor. Direct comparisons of single unit responses between the two regions now utilize linear mixed effects models with random effects for subject,

      Weaknesses:

      The manuscript still includes mention of differences in effect size or differing "levels" of significance between VP and OT D1 neurons without reports of a direct comparisons between the two populations. This is somewhat mitigated by the comprehensive statistical reporting in the supplemental information, but interpretation of some of these results is clouded by the inclusion of OT D2 neurons in these analyses, and the limited description or contextualization in the main text.

      We think the reviewer is mistaken and have clarified the text.  Each pairwise comparison between VP, OTD1 and OTD2, for each odor across days is shown as a heatmap in supplementary figure 8B, with further details in table 37. Absolute diff 3H no statistics

      Reviewer #2 (Public Review):

      We appreciate the authors revision of this manuscript and toning down some of the statements regarding "contradictory" results. We still have some concerns about the major claims of this paper which lead us to suggest this paper undergo more revision as follows since, in its present form, we fear this paper is misleading for the field in two areas. here is a brief outline:

      (1) Despite acknowledging that the injections only occurred in the anteromedial aspect of the tubercle, the authors still assert broad conclusions regarding where the tubercle projects and what the tubercle does. for instance, even the abstract states "both D1 and D2 neurons of the OT project primarily to the VP and minimally elsewhere" without mention that this is the "anteromedial OT". Every conclusion needs to specify this is stemming from evidence in just the anteromedial tubercle, as the authors do in some parts of the the discussion.

      We have clarified in multiple locations that we are recorded from the anteromedial OT, including the abstract, and further clarified this in the conclusions throughout the results and discussion. We refrain stating “anteromedial OT” at every mention of the OT, but think we have now made it abundantly clear that our observations are from the anteromedial OT. It is worth noting that retrograde tracing from the VTA did not label any neuron in any part of the OT, suggesting that the conclusion may well extend beyond the anteromedial portion. Though, we acknowledge further work is needed to comprehensively characterize the OT outputs.

      (2) The authors now frame the 2P imaging data that D1 neuron activity reflects "increased contrast of identity or an intermediate and multiplexed encoding of valence and identity". I struggle to understand what the authors are actually concluding here. Later in discussion, the authors state that they saw that OT D1 and D2 neurons "encode odor valence" (line 510). 

      The point we aim to make is that valence encoding is different between the OT and VP. We do not think the reward modulated activity in OT is valence encoding, at least not as it is in the VP.  We do observe some valence encoding at the population level, which is different from individual valence encoding neurons. The ability of classifiers to segregate population activity based on reward might be considered valence encoding, but we contrast it with that in VP where individual neurons signal reward prediction. This is more robust than that in the OT data where few neurons robustly encode valence. The increased response of the OTD1 neurons after reward association, is more consistent with contrast enhancement than valence encoding.  We believe this distinction is important and reflects a transformation between two reward-related brain areas. For clarification of the sentence in question we have changed it to reflects “increased contrast of iden-ty or an intermediate encoding of valence that also encodes iden-ty.” (line 488)

      We appreciate the authors note that there is "poor standardization" when it comes to defining valence (line 521). We are ok with the authors speculating and think this revision is more forthcoming regarding the results and better caveats the conclusions. I suggest in abstract the authors adjust line 14/15 to conclude that, "While D1 OT neurons showed larger responses to rewarded odors, in line with prior work, we propose this might be interpreted as identity encoding with enhanced contrast." [eliminating "rather than valence encoding" since that is a speculation best reserved for discussion as the authors nicely do.

      We accept this suggestion and have modified the abstract sentence to say, “Though D1 OT neurons showed larger responses to rewarded odors than other odors, consistent with prior findings, we interpret this as iden-ty encoding with enhanced contrast.”  We believe this is appropriately qualified as an interpreta-on, and should not be confusing.

      The above items stated, one issue comes to mind, and that is, why of all reasons would the authors find that the anteromedial aspect of the tubercle is not greatly reflecting valence. the anteromedial aspect of the tubercle, over all other aspects of the tubercle, is thought my many to more greatly partake in valence and other hedonic-driven behaviors given its dense reception of VTA DAergic fibers (as shown by Ikemoto, Kelsch, Zhang, and others). So this finding is paradoxical in contrast to if the authors would had studied the anterolateral tubercle or posterior lateral tubercle which gets less DA input.

      We agree that this seems surprising.  This is why we focused on the anteromedial expecting to find valence encoding.  It remains possible that other parts of the OT, or more dorsal aspects of the anteromedial OT encode valence, as has been reported by Murthy and colleagues.  However, it remains unclear if their recordings are in the OT or VP.  Nonetheless our findings indicate that more work is required to understand the contribution of the OT to valence encoding.  It is also important to note that our conclusions are drawn in comparison to the VP, which has more robust valence encoding than the OT. Thus, in comparison the OT sample in our recordings lack robust valence signaling.  We think this comparison is important, due to the lack of clear framework for defining valence that may create misleading statements in past OT work.  

      Reviewer #3 (Public Review):

      Summary:

      This manuscript describes a study of the olfactory tubercle in the context of reward representation in the brain. The authors do so by studying the responses of OT neurons to odors with various reward contingencies and compare systematically to the ventral pallidum. Through careful tracing, they present convincing anatomical evidence that the projection from the olfactory tubercle is restricted to the lateral portion of the ventral pallidum.

      Using a clever behavioral paradigm, the authors then investigate how D1 receptor- vs. D2 receptor-expressing neurons of the OT respond to odors as mice learn different contingencies. The authors find that, while the D1-expressing OT neurons are modulated marginally more by the rewarded odor than the D2-expressing OT neurons as mice learn the contingencies, this modulation is significantly less than is observed for the ventral pallidum. In addition, neither of the OT neuron classes shows conspicuous amount of modulation by the reward itself. In contrast, the OT neurons contained information that could distinguish odor identities. These observations have led the authors to conclude that the primary feature represented in the OT may not be reward.

      Strengths:

      The highly localized projection pattern from olfactory tubercle to ventral pallidum is a valuable finding and suggests that studying this connection may give unique insights into the transformation of odor by reward association.

      Comparison of olfactory tubervle vs. ventral pallidum is a good strategy to further clarify the olfactory tubercle's position in value representation in the brain.

      Weaknesses:

      The study comes to a different conclusion about the olfactory tubercle regarding reward representations from several other prior works. Whether this stems from a difference in the experimental configurations such as behavioral paradigms used or indeed points to a conceptually different role for the olfactory tubercle remains to be seen.

      We acknowledge that our results lead us to conclusions that are different from that of prior work.  But we note that our results are not directly at odds, as we see similar reward modulation of D1 OT neurons as has been reported previously. Our conclusion is different because we contrast our OT responses with that in the VP where valence is more robustly encoded at the single neuron level. We also note, that many of the past studies do not define valence as stringently as we do.  Thus, increased activity with reward, as observed in our data and past studies, seems more like reward modulation than valence.