2,729 Matching Annotations
  1. Jul 2023
    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript by Curtis et al. reports the interaction between CaMKII and alpha-actinin-2. The authors found that the interaction was elevated after NMDA receptor activation in dendritic spines. In addition, this study reveals NMDA receptor binding to CaMKII facilitates alpha-actinin-2 access to the CaMKII regulatory segment, indicating that the NMDA receptor is involved in this interaction. The authors identified the EF1-4 motifs mediated this interaction, and overexpression of this motif inhibited structural LTP. Moreover, biochemical measurements of affinities from various combination of protein fragments including autoinhibited CaMKII 1-315, regulatory segments of CaMKII, and the EFhand motif reveals that autoinhibited CaMKII has limited access to alpha-actinin-2. The authors also solved the structure of the interaction, supporting their finding in neurons at the molecular level. The authors claim that the interaction between CaMKII and alpha-actinin-2 is essential for structural LTP through cooperative action by the NMDA receptor and actin cytoskeleton.

      Overall, the experiments are well-designed and the results are largely convincing and well-interpreted. But some aspects of the experiments need to be clarified.

      1) Time resolution of the interaction analysis appears to be poor, as calcium elevation in a dendritic spine would be at milli-second order. What is the time window to interact alpha-actinin-2 with CaMKII during NMDA receptor activation or LTP?

      We have performed additional time-course experiments to determine how quickly interactions between alpha-actinin-2 and CaMKII are elevated following NMDAR activation. The results of these experiments are shown in Figure 2A and Figure 2-Figure Supplement 1. We found that the change in association was established rapidly after NMDAR activation (t50% = 22±1 s, Figure 2A), which is consistent with proposed time-courses for CaMKII interactions following the induction of LTP (see Yasuda, Hayashi & Hell, Nat Rev Neuroscience, 2022, PMID 36056211). We have included additional text in the results (lines 138-147), methods (lines 609-611 & 650-652), and discussion (lines 426-427) sections explaining these experiments, and figure legends are provided for the new figures on lines 10061009 and lines 1096-1101.

      2) The authors analyzed the binding of CaMKII and alpha-actinin-2 with partial fragments. It remains to be unknown whether CaMKII can form a protein complex with GluN2B and alpha-actinin-2 in a single CaMKII protomer.

      The reviewer is referring to experiments shown in figure 5, in which we found that a fragment of GluN2B (1260-1492) increases pull-down of full-length CaMKIIa with a fusion of GST to the EF3-4 region of a-actinin-2. This region of GluN2B contains a CaMKII phosphorylation sequence (positions 1290-1309) that occupies the substrate binding groove of the kinase domain (Stratton et al., Cell Reports, 2023, PMID 35830796). Therefore, the most logical explanation for the results of the pulldown experiment is that GluN2B increases a-actinin-2 access to the regulatory segment by binding to the substrate binding groove of the same CaMKII protomer. Nevertheless, we discuss the difficulty of conceptualising and investigating interactions between oligomeric proteins within the PSD on lines 451461.

      3) Besides synaptic localization, the effect of the interaction on the enzymatic activity of CaMKII is not known.

      The Colbran laboratory has previously examined the effect of a-actinin-2 on CaMKII activity. Jalan-Sakrikar and colleagues (JBC, 2012, PMID 22427672) showed that a fragment of aactinin-2 corresponding to EF hands 3 and 4 is able to weakly activate CaMKII (~ 10 % compared to Ca2+/CaM) towards peptide substrates autocamtide-2 and GluN2B but not syntide-2 (see Figure 1B&C of this paper). An earlier study by Robison and colleagues (JBC, 2005, PMID 16172120) found that aactinin-2 antagonises Ca2+/CaM-dependent activation of unphosphorylated CaMKII towards autocamtide2, but does not affect the activity of pT286 auto-activated CaMKII (see Figure 4A of this paper). This work is referred to on lines 63-65 of the introduction.

      4) Although the authors quantify the effect of the EF-hand disruptor by measuring numbers of the dendritic spine by its shape, the specificity of the EF-hand disruptor needs to be clarified.

      There are two known interaction partners for the EF hand region of a-actinin-2: CaMKII and Titin (Young et al., EMBO J, 1998, PMID 9501083; Atkinson et al., Nat Struct Biol, 2001, PMID 11573089). Titin is an extremely long sarcomeric protein that is expressed in striated muscle cells but not neurons. Therefore, the effects of the disruptor are highly likely to reflect disruption of interactions to CaMKII. We also performed control experiments with EF34 L854R that does not bind CaMKII effectively (Figure 3-figure supplement 1C). We have added a sentence to clarify the specificity of the EF-hand disruptor on lines 182-184, as follows: ” Furthermore, the only known interaction partner for the EF14 region of a-actinin-2 besides CaMKII is the muscle-specific protein titin (Young et al., 1998), so any effects of EF14 in neurons are likely to reflect destabilisation of native interactions between CaMKII and a-actinin-2”.

    1. Author Response

      Reviewer #1 (Public Review):

      This study uses electrophysiological techniques in vitro to address the role of the Na+ leak channel NALCN in various physiological functions in cartwheel interneurons of the dorsal cochlear nucleus. Comparing wild type and glycinergic neuron-specific knockout mice for NALCN, the authors show that these channels 1) are required for spontaneous firing, 2) are modulated by noradrenaline (NA, via alpha2 receptors) and GABA (through GABAB receptors), 3) how the modulation by NA enhances IPSCs in these neurons.

      This work builds on previous results from the Trussell's lab in terms of the physiology of cartwheel cells, and from other labs in terms of the role of NALCN channels, that have been characterized in more and more brain areas somewhat recently; for this reason, this study could be of interest for researchers that work in other preparations as well. The general conclusions are strongly supported by results that are clearly and elegantly presented.

      I have a few comments that, in my opinion, might help clarify some aspects of the manuscript.

      1) It is mentioned throughout the manuscript, including the abstract, that the results suggest a closed apposition of NALCN channels and alpha2 and GABAB receptors. From what I understand, this conclusion comes from the fact that GABAB receptors activate GIRK channels through a membrane-delimited mechanism. Is it possible that these receptors converge on other effectors, for example adenylate cyclase (see https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6374141/).

      It will be of interest to test the roles of adenylyl cyclase modulation in the control of NALCN, as a complement to the studies we have presented here.

      2) In Figure 2G, the neurons from NALCN KO mice appear to reach a significantly higher frequency than those from WT (figure 2E, 110 vs. 70 spikes/s). Was this higher frequency a feature of all experiments? The results mention a rundown of peak firing rate due to whole-cell dialysis, but, from what I understand, the control conditions should be similar for all experiments.

      The peak firing rates in control solutions for WT and KO CWC are not statistically different.

      3) Also in Figure 2, the firing patterns for neurons from WT and NALCN KO mice appear to be quite different, with spikes appearing to be generated during the hyperpolarization of the bursts in the second half of the current step for WT neurons but always during the depolarization in KO neurons. Was this always the case? If so, could NALCN channels be involved in this type of firing? Along these lines, it would be interesting to show an example of a firing pattern of neurons from WT mice in the presence of NA, which inhibits NALCN channels.

      The specific pattern of spikes in CWC is quite variable from trial-to-trial or cell-to-cell, as it is dependent on multiple CaV and calcium dependent K channels subtypes, and is not dependent on the genotypes used here. The primary effects observed in the KO are in background firing and sensitivity to NA, both reflected alterations in rheobase. The firing pattern example requested was shown in the raster plot of fig 2B2.

      4) It might be interesting to discuss how the hyperpolarization induced by the activation of GIRK channels and inhibition of NALCN channels could have different consequences due to their opposite effect on the input resistance.

      We considered this as a point of discussion, but decided that making sense of it would depend on assumptions about the location of the channels (dendritic vs somatic, distance to AIS) that we do not have data for. For example, a dendritic increase in resistance through NALCN block, leading to a hyperpolarization of the soma, might have actions similar to a somatic hyperpolarizing conductance increase by GIRK, as far as the voltage at the AIS is concerned.

      Reviewer #3 (Public Review):

      The study by Ngodup and colleagues describes the contribution of sodium leak NALCN conductance on the effects of noradrenaline on cartwheel interneurons of the DCN. The manuscript is very well-written and the experiments are well-controlled. The scope of the study is of high biological relevance and recapitulates a primary finding of the Khaliq lab (Philippart et al., eLife, 2018) in ventral midbrain dopamine neurons, that Gi/o-coupled receptors inhibit NALCN current to reduce neuronal excitability. Together these studies provide unequivocable evidence for NALCN as a downstream target of these receptors. There are no major concerns. I have only minor suggestions:

      Minor

      1) As introduced in the introduction, NALCN is inhibited by extracellular calcium which has led to some discourse of the relevance of NALCN when recorded in 0.1 mM calcium. A strength of this study is the effect of NA on NALCN is recorded in physiological levels of calcium (1.2 mM). I suggest including the concentration of extracellular calcium in the aCSF in the Results section instead of relying on the reader to look to the Methods.

      Will do.

      2) It would be interesting to include the basal membrane properties of the KO compared to wildtype, including membrane resistance and resting membrane potential. From the example recording in Figure 2, one might think that the KOs have lower membrane resistance, so it is interesting that the 2 mV hyperpolarization produced similar effects on rheobase. In addition, from the example in Figure 2G, it appears that NA has an effect on firing frequency with large current injection in the KO. Is this true in grouped data and if so, is there any speculation into how this occurs?

      Will do.

      3) Please expand on the rationale for why GABAB and alpha2 must be physically close to NALCN. To my knowledge, the mechanism by which these receptors inhibit NALCN is not known. Must it be membrane-delimited?

      Given the known membrane delimited modulation of GIRK by GABAB, and that alpha2 and GABAB receptors appear to share the same population of NALCN channels, and that alpha2 receptors do not appear to target GIRK channels, we felt the simplest explanation would be coupling through G-proteins, with spatial segregation of different receptor/channel pools providing the means for separating GIRK and NALCN effects. However, the involvement of an additional second messenger is testable.

    1. Author Response

      We wish to thank the Reviewers for the appreciation they have expressed for our work, and the constructive feedback that they offered. We agree that clarifying the interpretation of synergy and information decomposition in the context of macroscale BOLD signals and loss of consciousness will be a valuable addition to the manuscript, and so will be improving the quality of our figures, and we will endeavour to do so. Briefly, at this stage we just wish to clarify that it is not our intention to claim that Phi-R and synergy, as measured at the level of regional BOLD signals, represent a direct cause of consciousness, or are identical to it. Rather, our work is intended to use these measures similarly to the use of sample entropy and LZC for BOLD signals: as theoretically grounded macroscale indicators, whose empirical relationship to consciousness may reveal the relevant underlying phenomena. We will ensure that our updated manuscript reflects this additional nuance.

    1. Author Response

      We thank the reviewers for their very thorough and detailed comments as well as the overall positive reception of the work. Additionally, the reviewers provided excellent detailed suggestions for future work.

      Specific response to Reviewer 1:

      “Indeed, the major disappointment of this work is the clinical relevance that was highlighted in the Introduction but was never really studied in the end. iPSC from patients could be added to the study.”

      We completely agree that it would be very exciting to use patient-derived iPSC in the platform that we describe in this manuscript. We recognize that extensive work to characterize and validate BMECS differentiated from patient-derived iPSCs would be required, including validating BBB-like properties, before retinol transport data could be collected and interpreted. This work is beyond the scope of the current manuscript. We hope that in the future the in vitro model we describe in this manuscript will be used for exactly this type of clinically relevant application.

      Specific response to Reviewer 2:

      “1) The authors assume that there is a significant fraction of free ROL, 20% for ROH/RBP and 7% for RBP/TTR complexes (summarized in Table 1). This implies that at the physiological concentration of ROH/RBP in the plasma of 2 uM, free ROL represents 0.4 uM. However, the concentration of free ROL is limited by its poor solubility in the aqueous phase, which is around 0.06 uM (Szuts EZ, 1991, Arch Biochem Biophys). Moreover, taking into account the large concentration of other potential nonspecific carriers for lipids, it is safe to assume that there is virtually no free ROH in the plasma. There is also an important physiological reason for the limited amount of free ROL. Its rapid and nonspecific partition into cells (also observed in this study) would work against the highly specific RBP/STRA6-dependent ROH uptake pathway, undermining its physiological function.”

      The reviewer raises an important point that we considered carefully during the design of the research. As the reviewer says, Szuts (1991) reported retinol (ROH) solubility of ~0.06 µM (range of 0.03 – 0.11 µM). Szuts defined ROH solubility as ‘the amount of dissolved solute in equilibrium with its solid state…includ[ing] all its dissolved forms (monomers, multimers, and micelles)’. We are using a definition of ‘free’ ROH as ‘ROH not bound to protein’; in our work ‘free’ ROH could include retinol multimers and micelles, which likely do exist under our experimental conditions. (We did not see any evidence of solid ROH.) That said, we calculate that the concentration of free ROH (ROH not bound to protein) is ~0.14 µM when both RBP and TTR are present. In more complex biological mixtures containing other ROH carriers, the concentration of unbound ROH is expected to be lower, in agreement with the reviewer.

      One key point is that the free ROH concentration depends on the experimental setup, and must be correctly accounted for. For example, in some of the literature investigating STRA6-mediated uptake and signaling in vitro, purified ROH-RBP is used as the retinol source and samples do not include TTR. In such a case, the unbound ROH concentration in an equilibrated sample is anticipated to be significantly higher than the physiological concentration. Our investigation demonstrates that unbound ROH can accumulate intracellularly; thus, failure to include TTR and/or to account for the action of unbound ROH could lead to errors in mechanistic interpretation of experimental studies on retinol transport into cells or across barriers such as the BBB.

      2) “However, a question remains: would the outcome of the experiment be different if the basolateral chamber contained an ROH acceptor (retinol-binding proteins) rather than Hank's balanced salt solution, to which the partition of ROL is limited by its water solubility?”

      We agree with the reviewer that it would be very interesting to determine whether retinol permeability changes in the presence of RBP and/or TTR on the basolateral side. This is a logical next step and can readily be performed in the Transwell setup. We chose not to do this for this project because we wanted to compare our setup with other in vitro models (e.g., with porcine BMECs) where no retinol-binding proteins were present basolaterally.

      3) “The authors claim that transthyretin (TTR) increases BMECs permeability when compared to ROH/RBP. However, the mechanistic explanation for this phenomenon remains unclear. Do the authors imply the presence of a putative TTR receptor whose signaling could affect the efflux of ROL at the basolateral side of BMECs? TTR is an ubiquitous plasma protein. The concentration of TTR is tightly regulated and maintained between 300 - 330 mg/L. Therefore, it is questionable how TTR can serve as a signaling molecule modulating retinoid homeostasis in the brain.”

      We disagree with the reviewer about the TTR concentration. Per Johnson et al (Clin Chem Lab Med 2007, 45:419-426), TTR concentration varies with age, gender, inflammation and nutritional status, with typical concentrations for adults ranging from 150-450 mg/L. We were surprised at our observations that TTR enhanced ROH permeability across BMECs and that LRAT expression increased in the presence of TTR. We do not currently have a mechanistic interpretation and agree with the reviewer that further exploration of these tantalizing observations is warranted.

      “Additional technical issues that could affect the experimental outcomes: The formation of the ROH/RBP-TTR complex should be confirmed and purified using gel filtration to separate free TTR and ROH/RBP. Only fractions containing the complex should be used in the experiments. Assuming that the complex is formed with 100% efficiency is overly optimistic.”

      We respectfully disagree with the reviewer regarding using gel filtration to isolate TTR/ROH/RBP complexes. Any such isolated complexes will fairly rapidly re-equilibrate so that some protein and some ROH is unbound. It is important to note that we do not assume that the complex is formed with 100% efficiency. In fact, on the contrary, we explicitly take into account the distribution of materials (free TTR, free RBP, free ROH, RBP-ROH, TTR-RBP-ROH) in any sample; values are reported in the manuscript. This issue is also relevant to the first point raised by the reviewer. We routinely validated binding of ROH to RBP by FRET and ROH-RBP to TTR by fluorescence anisotropy.

      “Reloading RBP with isotopically labeled ROH requires an additional purification step. Stripping ROL from the ROH/RBP complex with organic solvent (diethyl ether) is appropriate but relatively harsh, causing partial unfolding of a fraction of RBP. Therefore, assuming that 100% of stripped RBP remains functional and can be reloaded with ROH is inaccurate. Reloading apo-RBP with a stoichiometric amount of ROH without an additional purification step (e.g., ion exchanger) leads to an excess of free ROL and/or its nonspecific association with nonfunctional RBP fractions. Measuring absorbance at 330 nm is not sufficient proof of binding since free ROH also absorbs at the same wavelength.”

      We produced RBP by refolding of guanidine-denatured RBP in an excess of ROH to ensure near 100% ROH loading. High quality refolded RBP can qualitatively be determined by examination of the A330/280 absorbance ratio, which should be ~1.0. We then extract ROH to completion by diethyl ether to produce pure apo-RBP (ROH-free). We utilized this diethyl-ether stripped apo-RBP stock for all future characterizations, including binding to ROH and TTR. We found our stripped apo-RBP was a suitable replacement for serum sources in every biophysical assay performed. Reloaded ROH-RBP elutes as a single peak on ion exchange chromatography, indicating the vast majority of stripped RBP is available for ROH binding. We provide detailed information about RBP characterization in Est and Murphy, Prot. Exp. Purif. (2020), to which the interested reader is referred.

    1. Author Response

      The following is the authors’ response to the current reviews.

      Comment 1: The descriptions about body weights should be matched.

      Regrettably, we did not monitor the body weights throughout the study. We have now revised the description clarifying the confusions. Importantly we evaluated the weights of the muscle (EDL and soleus) and heart tissues in 8-month-old mice (Fig. 1A).

      Comment 2: Quantitative data for figures.

      As stated in the manuscript, the presented images are representatives of at least three mice per genotype. However, assessing specific measurements such as cell sizes, diameters, or mitochondria sizes in histological tissue sections and electron microscopical fields is not feasible due to practical limitations. Unfortunately, we do not have access to specialized software for such analyses. While semi-quantification of Western blot bands is possible, implementing this for all Western blots in the manuscript would result in a substantial increase in the number of bar graphics. Below are Western blots from additional two pairs of mice used in all figures.

      Comment 3: Confusions about “total mitochondrial content”.

      The mitochondria content in cells was assessed by quantitatively comparing the DNA level of the mitochondrial gene cytochrome B to that of the nuclear gene 18S using quantitative PCR. This method is commonly used to determine the relative number of mitochondria in cells. However, we have revised and provided a clearer description in the figure legend to avoid any potential confusion.

      Comment 4: Suggestions on further analyses of PGC1-alpha and TFAM. LC3-I and -II.

      We evaluated LC3-I/II levels in PTPMT1 knockout muscles, and our findings did not indicate any signs of increased autophagic activity (Supplementary Figure S3). We will examine PGC1-alph and TFAM levels in our future studies. It is worth noting that in our previous RNA-seq analyses of PTPMT1 knockout hematopoietic cells, we did not observe any significant alterations in the expression levels of these two genes.

      Comment 5: Description on fibrotic lesions.

      Quantifying fibrotic areas poses a significant challenge. Therefore, we were only able to describe this finding.

      Comment 6: Fig 6 is not well organized and aligned.

      In response to your suggestion, we have reorganized this figure accordingly. Panels C, D, and E display mitochondrial OCR data derived from three biological replicates/genotype. We feel that these changes are sufficient to demonstrate the differences in substrate utilization between PTPMT1 knockout and control mitochondria.

      Comment 7: Descriptions on glucose oxidation and glycolysis in different types of muscle fibers are confusing

      We have followed the suggestions and revised the descriptions accordingly.

      Comment 8: A discussion about lactate utilization in cardiomyocytes would be helpful.

      Following this suggestion, we have now added a brief discussion.

      Comment 9: “Cropped” images were used in Fig 10.

      The images shown in Fig. 10 were not cropped images. In order to efficiently use the tissue and mitochondrial lysates, the Western blot membranes were intentionally cut into smaller fragments based on the molecular weights of the proteins to be detected. These smaller membrane sections were then employed for individual Western blotting purposes.

      Minor comment 1: The order of Fig 1 panels should be reorganized.

      Following this suggestion, we have now reorganized this figure.

      Minor comment 2: Suggestion for an Echocardiograph result table.

      These analyses were carried out by trained personnel at the Emory Animal Physiology Core. The data presented in our manuscript was provided by them. It is important to note that no additional parameters were measured beyond the data provided by the Core.

      Minor comment 3: Is ROS production increased in PTPMT1 knockout muscle cells?

      Yes, PTPMT1 knockout tissues showed elevated overall cellular ROS levels even at 3 months (Figure 6I).

      Minor comment 4: Typo in S10 legend.

      The typo has been corrected.


      The following is the authors’ response to the original reviews.

      Comment 1: The effects of PTPMT1 on the skeletal muscle and heart might be an embryonic defect. They might be mediated by significantly reduced mTOR signaling

      We acknowledge the valid point made by this reviewer. While both CKMM-Cre and Myh6Cre express Cre during the embryonic stage, we did not observe any developmental defects in skeletal muscle-specific (PTPMT1fl/fl/CKMM-Cre) or heart-specific (PTPMT1fl/fl/Myh6-Cre) knockout mice. These knockout mice appeared indistinguishable from their WT littermates until the age of 3-4 months.

      Morphologically, the skeletal muscle and heart dissected from these mice showed no abnormalities. Additionally, mitochondria isolated from these tissues did not exhibit any morphological/structural defects. Undoubtedly, the late-onset phenotypes observed in the knockout mice over time was attributed to the metabolic defects arising from the loss of PTPMT1 in the embryos. Although PTPMT1 knockout muscle cells and cardiomyocytes initially maintained energy homeostasis through enhanced fatty acid and glutamate oxidation, along with metabolic adaptations or activation of alternative energy-producing pathways in the first few months, they eventually encountered substantial energy deficits. This was attributed to the subsequent occurrence of oxidative stress and mitochondrial damage. In response to this valuable feedback, we have included a brief discussion in the manuscript's discussion section to address this point.

      As mentioned in the manuscript, the late-onset phenotypes observed in our study were likely a result of subsequent damages induced by prolonged metabolic substrate shift and lipid accumulation within the cells. We agree with the reviewer that decreased mTOR activities may also contribute to these late effects, and have included a brief discussion in the discussion section.

      Comment 2: Why are the effects of the loss of PTPMT1 similar in the skeletal muscle and heart.

      The depletion of PTPMT1 yields similar effects in both tissue types; however, the manifestations occur earlier in the skeletal muscle. Although mitochondria in the skeletal muscle and heart have distinct preferences for energy sources, prolonged forced utilization of fatty acids caused by PTPMT1 depletion eventually leads to lipid accumulation and cellular damage (lipotoxicity) in both tissue types. This phenomenon underscores the importance of maintaining a balance in substrate utilization to prevent adverse effects on cellular health in the skeletal muscle and heart.

      Comment 3: AMPK is activated in PTPMT1 knockout cardiomyocytes; this should have cardioprotective effects.

      AMPK can be activated through various mechanisms. In our study, AMPK activation occurs in response to energetic stress in late-stage PTPMT1 knockout tissues that displayed significantly reduced ATP levels, aligning with its role as a bioenergetic stress sensor. It is possible that AMPK activation alone was insufficient to overcome the secondary damages induced by the prolonged metabolic switch from carbohydrate metabolism to fatty acid metabolism.

      Comment 4: Knockout skeletal muscles and hearts had lipid accumulation; why were knockout mice smaller than controls? Are there any changes in white fat, core temperature or browning of fat? Rescue experiments should be considered to prove that lipid accumulation is the cause of death in the knockout mice.

      We believe that the lipid accumulation observed in muscle cells and cardiomyocytes of the knockout mice does not necessarily imply that these tissue-specific knockout mice would be heavier or have increased body fat. We appreciate the suggestions regarding energy expenditure tests and rescue experiments. We will certainly consider incorporating these experiments into our future study.

      As stated in the manuscript, we did not observe any morphological changes in white or brown fat tissues in the adipocyte-specific PTPMT1 knockout mice. Furthermore, we assessed body temperature and its response to a cold environment (4°C), and no differences were detected between the knockout mice and the control mice.

      Comment 5: Are there sex differences in muscle and heart phenotypes in the tissue specific knockout mice?

      We did not observe significant differences in phenotypes between male and female knockout mice.

      Comment 6: What happens to UCP2 activity in PTPMT1 deleted cells and what is its function in mediating AMPK and/mTOR regulation.

      Currently, there is a lack of direct methods available to measure UCP2 activity. The relationship between UCP2 and the regulation of AMPK and mTOR has not been extensively investigated.

      Comment 7: What is the effect of PTPMT1 deletion on cardiolipin synthesis?

      PTPMT1 has been implicated in both facilitating mitochondrial utilization of pyruvate and participating in the synthesis of cardiolipin. To investigate the impact of PTPMT1 knockout on cardiolipin levels, we plan to establish a mass spectrometry assay for the quantitative analysis of cardiolipin in knockout mitochondria. Completing these experiments might require a considerable amount of time. Nonetheless, we extensively addressed this point in the discussion section.

      Minor concerns:

      Comment 8: The title needs more specificity.

      As suggested, we have revised the title to "Loss of PTPMT1 restricts mitochondrial utilization of carbohydrates and induces muscle atrophy and heart failure in tissue-specific knockout mice".

      Comment 9: Heart and skeletal muscle weights in Fig 1A should be normalized against tibia length.

      Unfortunately, we did not perform normalization in this study. However, we appreciate the suggestion and will incorporate it into our future studies. It is important to note that the lengths of tibias in the knockout mice were only marginally shorter.

      Comment 10: Low magnification and longitudinal section of the muscle should be shown in Fig 1B and 2A.

      The histological images provide supporting evidence for the conclusion, despite not being optimal in quality. We acknowledge the suggested improvements and assure you that we will integrate them into our future studies. It is crucial to emphasize that each conclusion in this study was derived from multiple experimental designs, rather than solely relying on morphological changes.

      Comment 11: Fig 1F is mislabeled as 1G.

      We have conducted a thorough review and can confidently confirm that the labeling is correct.

      Comment 12: Fig 2F and 6B should be quantified.

      As indicated in the manuscript, the images presented are representatives of at least three mice per genotype. While semi-quantification of Western blot bands is possible, implementing this for all Western blots in the manuscript would result in a substantial increase in the number of bar graphics. Below are Western blot images from additional two pairs of mice included in Fig. 2F and Fig. 6B. Furthermore, Western blot images from two additional pairs of mice in other figures are also provided below.

      Author response image 1

      Western blotting data from additional two pairs of mice in Fig. 2F.

      Author response image 2

      Western blotting data from additional two pairs of mice in Fig. 6B.

      Author response image 3

      Western blotting data from additional two pairs of mice in Supplementary Fig. 2G.

      Author response image 4

      Western blotting data from additional two pairs of mice in Supplementary Fig. 3A.

      Author response image 5

      Western blotting data from additional two pairs of mice in Supplementary Fig. 3C.

      Author response image 6

      Western blotting data from additional two pairs of mice in Supplementary Fig. 3D.

      Author response image 7

      Western blotting data from additional two pairs of mice in Supplementary Fig. 4F.

      Author response image 8

      Western blotting data from additional two pairs of mice in

      Author response image 9

      Western blotting data from additional two pairs of mice in Supplementary Fig. 7C.

      Comment 13: Knockout mice should be placed on HFD or keto diet to test for the effects of PTPMT1 depletion.

      We appreciate this thoughtful suggestion. We will certainly incorporate this suggestion into our future studies, expanding beyond the scope of the current initial report.

      Comment 14: Suggestions on Fig 4A.

      Please see our response to Comment 10.

      Comment 15: Suggestions for improving echocardiographs.

      These analyses were conducted by trained personnel at the Emory Animal Physiology Core. The data presented in our manuscript was provided by them. We appreciate bringing the issues to our attention, and we will inform them accordingly.

      Comment 16: Comment on Fig 5B.

      The tissues were sectioned at comparable, if not identical, levels. WT and PTPMT1 knockout heart sections look dramatically different because of the dilated myopathy observed in the knockout hearts.

      Comment 17: Comment on Fig 5C.

      We believe the cell death occurred predominantly in cardiomyocytes.

    1. Author Response

      We thank the reviewers for their careful reading of our manuscript and for their constructive and positive comments. We will revise the manuscript to address their key points. Here, we address the reviewer’s scepticism of sleep-learning being mediated by the episodic memory system. We agree that the reported unconscious learning of novel verbal associations during sleep may not match textbook definitions of episodic memory. However, the traditional definitions of episodic memory have long been criticized (e.g, Henke, 2010; Hannula, Minor, Slabbekoorn, 2023; Shohamy & Turk-Browne, 2013; Dew & Cabeza, 2011; Reder et al, 2009). We stand by our claim that sleep-learning was of episodic nature. Here, we provide arguments for this claim:

      In the introduction and the discussion, we are reporting that we use a computational definition of episodic memory (Cohen & Eichenbaum, 1993; Henke, 2010; O’Reilly et al., 2014; O’Reilly & Rudy, 2000), and not the traditional definition of episodic memory that ties episodic memory to wakefulness and conscious awareness (Gabrieli, 1998; Moscovitch, 2008; Schacter, 1998; Squire & Dede, 2015; Tulving, 2002). Consciousness and wakefulness are no properties of episodic memory according to the computational definition of episodic memory. Instead, the core computational features of episodic memory according to the computational definition are 1) rapid learning, 2) association formation, and 3) a compositional and flexible representation of the associations in long-term memory. We designed the retrieval task in the current study to assess only the retention of sleep-formed flexibly and compositionally stored word-word associations. Reviewer 3 suggests that sound-sound associations may have been formed during sleep and may have been reactivated at test resulting in the translation of the sound pattern of the translation word to the meaning of the translation word and further to the correct superordinate semantic category of the translation word. Although these processing steps during sleep and during the wake retrieval are possible, the rapid sound-sound associative encoding, long-term storage, and the flexible sound retrieval would still require hippocampal processing and hence computations in the episodic memory system. The interpretation in terms of associative auditory learning with a double semantic translation at wake testing is laborious and inefficient and hence a less parsimonious interpretation of sleep-learning than conceptual associative encoding during sleep. Our view resonates the findings by Andrillon et al. (2017) that mere auditory perceptual learning during slow-wave sleep was not possible at all or led to suppressive memory traces that could not be retrieved following awakening.

      Importantly, Züst et al. (Current Biology, 2019) had also presented pseudowords and translation words for paired-associative word encoding during slow-wave sleep. Retrieval testing was performed in the waking state following sleep by use of a cued-recall task, as in the current study. During retrieval testing, Züst et al. recorded brain blood oxygenation using functional magnetic resonance imaging. Importantly, the hippocampus was activated during correctly, but not during incorrectly retrieved memories that had been formed during sleep. Crucially, activation resulting from this contrast within the posterior and anterior hippocampus and within lexical-semantic storage sites in the left temporal pole correlated between participants with retrieval performance (Züst et al., 2019). These correlation results demonstrate that those participants, who learned the vocabulary best during slow-wave sleep activated the hippocampus and lexical-semantic storage sites the most during wake retrieval testing. Because the learning and retrieval tasks in the current study were similar to Züst et al. (2019), the hippocampus was likely mediating the retrieval of the sleep-formed associations in the current study. We have also measured the brain oxygenation using functional magnetic resonance imaging in five persons while they learned pairs of pseudowords and translation words during slow-wave sleep and found the hippocampus activated (besides language areas) in all persons (unpublished).

      For these reasons, we believe that vocabulary presentations during sleep had triggered a hippocampus-mediated rapid conceptual-associative encoding process that provided for flexible representations of combinations of pseudowords and translation words in episodic memory.

    1. Author Response

      We thank the reviewers for their insightful reviews of our work, including both its strengths and limitations. Below we present minor corrections to the preprint and responses to the main points brought up by each reviewer.

      Erratum:

      • Line 330 refers to Fig. 7F (instead of 7D).

      • Line 331 refers to Fig. 7G (instead of 7E).

      Reviewer #1 (Public Review):

      The experimental design presented cannot clearly show that the effect of passive exposure was due to the specific exposure to task-relevant stimuli since there is no control group exposed to irrelevant stimuli.

      We acknowledge the possibility that exposure to task-irrelevant stimuli could result in improvements in learning. Testing this possibility would be a worthwhile goal of future experiments, but it is outside the scope of our current study. We have been careful in our paper to only draw conclusions about the effects of exposure to task-relevant stimuli compared to no exposure. We will also add a discussion of this point and references to the literature pointed out by the reviewer to the final version of our manuscript.

      The conclusion that "passive exposure influences responses to sounds not used during training" (line 147) does not seem fully supported by the authors' analysis. The authors show that there is an increase in accuracy for intermediate sweep speeds despite the fact that this is the first time the animals encounter them in the active session. However, it seems impossible to exclude that this effect is not simply due to the increased accuracy of the extreme sounds that the animals had been trained on.

      The conclusion that the reviewer quotes from our paper is drawn from Figure 3, in which we show that mice exhibit an improvement on non-extreme stimuli after training on extreme stimuli. Panel 3D illustrates that the observed improvements are not just changes in psychometric performance driven by the extreme sounds. In the context of this result, the conclusion relates to generalization in performance on task-relevant stimuli that are closely related to the training stimuli. In our view, it was not entirely obvious a priori that this result would have to occur, since it is possible that performance could improve at the extremes without improving at the intermediate stimuli.

      In the modelling section, the authors adjusted the hyper-parameters to maximize the difference between pure active and passive/active learning. This makes a comparison of learning rates between models somewhat confusing.

      We apologize for the confusion. None of our conclusions are based on comparisons of learning speed between models, but perhaps this was not pointed out sufficiently clearly. The relevant comparisons between conditions for each specific model are made using the same hyperparameters. We will clarify this in the updated version of our manuscript.

      The description of the sound does not state whether when reducing the slope of the sweeps the center or the onset frequency of the sounds is preserved.

      Frequency modulated sounds of different FM slopes were generated such that the center frequency was always the same. This will be clarified in the updated version of our manuscript.

      Reviewer #2 (Public Review):

      One limitation here is that the presented analysis is somewhat simplistic, does not include any detailed psychometric analysis (bias, lapse rates etc), and primarily focuses on learning speed.

      In our analyses of trials that included extreme and intermediate stimuli, we investigated some metrics of the type that the reviewer suggests here. However, since such additional psychometric analyses generally led to null results and would in any case be somewhat tangential to our main results, which are about learning speed and responses to sounds not included during training, we did not include these in our manuscript. A limitation of our study is that the available data does not allow for an analysis of psychometrics during the initial learning stages, since only the extreme stimuli were presented during the task.

      Reviewer #3 (Public Review):

      The first [major weakness] is that even Model 5 differs from their data. For example, the A+P (passive interleaved condition) learning curve in Figure 7 seems to be non-monotonic, and has some sort of complex eigenvalue in its decay to the steady state performance as trials increase. This wasn't present in their experimental data (Figure 2D), and implies a subtle but important difference. There also appear to be differences in how quickly the initial learning (during early trials) occurs for the A+P and A:P conditions. While both A+P and A:P conditions learn faster than A only in M5, A+P and A:P seem to learn in different ways, which isn't supported in their data.

      The reviewer is correct that there are subtle differences between the two learning curves produced by Model 5. Due to noise in the experimental data, however, it is possible that such subtle distinctions also appear in the learning curves of the mice. Further, the slight overshoot of the learning curve that the reviewer mentions is not constrained by the experimental data due to the fact that different mice reach asymptotic performance at different times, and many of them have not even reached asymptotic performance by the end of the training period.

      However, even if there are minor discrepancies between the learning curves produced by the final version of the model and by the mice, we do not see this as being especially surprising or problematic. As in any model, there are a large number of potentially important features that are not included in any of our models–for example, realistic spectrotemporal neural responses, nonlinearity in neural activations, heterogeneity across mice, and many others. The aim of our modeling was to choose a space of possible models (which is inevitably restricted) and show which model version within that space best captures our experimental observations. Expanding the space of possible models that we considered to capture further nuances in the data will be a task for future work.

      The second major weakness is that the authors also don't generate any predictions with M5. Can they test this model of learning somehow in follow-up behavioural experiments in mice? ... Without follow-up experiments to test their mechanism of why passive exposure helps in a schedule-independent way, the impact of this paper will be limited.

      Although testing behavioral predictions from our models was beyond the scope of the current study, we do generate specific predictions with M5 (specifically, about neural representations). Our model produces predictions about neural representations and the ways in which they evolve through learning, and we hope to test these predictions in future work.

      I believe the authors need to place this work in the context of a large amount of existing literature on passive (unsupervised) and active (supervised) learning interactions. This field is broad both experimentally and computationally. For example, there is an entire sub-field of machine learning, called semi-supervised learning that is not mentioned at all in this work.

      We thank the reviewer for pointing this out. The updated version of our manuscript will include a discussion on how our results fit in with this literature.

    1. Author Response

      The following is the authors’ response to the original reviews.

      First, the authors would like to thank the reviewers and editors for their thoughtful comments. The comments were used to guide our revision, which is substantially improved over our initial submission. We have addressed all comments in our responses below, through a combination of clarification, new analyses and new experimental data.

      Reviewer #1 (Public Review):

      In this manuscript, the authors identified and characterized the five C-terminus repeats and a 14aa acidic tail of the mouse Dux protein. They found that repeat 3&5, but not other repeats, contribute to transcriptional activation when combined with the 14aa tail. Importantly, they were able to narrow done to a 6 aa region that can distinguish "active" repeats from "inactive" repeats. Using proximal labeling proteomics, the authors identified candidate proteins that are implicated in Dux-mediated gene activation. They were able to showcase that the C-terminal repeat 3 binds to some proteins, including Smarcc1, a component of SWI/SNF (BAF) complex. In addition, by overexpressing different Dux variants, the authors characterized how repeats in different combinations, with or without the 14aa tail, contribute to Dux binding, H3K9ac, chromatin accessibility, and transcription. In general, the data is of high quality and convincing. The identification of the functionally important two C-terminal repeats and the 6 aa tail is enlightening. The work shined light on the mechanism of DUX function.

      A few major comments that the authors may want to address to further improve the work:

      We thank the reviewer for their efforts and constructive comments, which have guided our revisions.

      1) The summary table for the Dux domain construct characteristics in Fig. 6a could be more accurate. For example, C3+14 clearly showed moderate weaker Dux binding and H3K9ac enrichment in Fig 3c and 3e. However, this is not illustrated in Fig. 6a. The authors may consider applying statistical tests to more precisely determine how the different Dux constructs contribute to DNA binding (Fig. 3c), H3K9ac enrichment (Fig. 3e), Smarcc1 binding (Fig. 5e), and ATAC-seq signal (Fig. 5f).

      We thank the reviewer for this comment, and agree that there were some modest differences in construct characteristics that were not captured in the Summary Table (6a). To better reflect the differences between constructs, we added additional dynamic range to our depiction/scoring, and believe that the new scoring system provides sufficient qualitative range to capture the difference without imposing a statistical approach.

      2) Another concern is that exogenous overexpressed Dux was used throughout the experiments. The authors may consider validating some of the protein-protein interactions using spontaneous or induced 2CLCs (where Dux is expressed).

      We agree that it would be helpful to determine endogenous DUX interaction with our BioID candidates. Here, we attempted co-IPs for endogenous DUX protein with the DUX antibody and were unsuccessful, which indicated that the DUX antibody is useful for detection but not efficient in the primary IP. This is why we utilized the mCherry tag for DUX IP experiments, which worked exceptionally well.

      3) It could be technically challenging, but the authors may consider to validate Dux and Smarcc1 interaction in a biologically more relevant context such as mouse 2-cell embryos where both proteins are expressed. Whether Smarcc1 binding will be dramatically reduced at 4-cell embryos due to loss of Dux expression?

      While we agree that it would be interesting to validate the in vivo interaction of DUX and SMARCC1 in the early embryo, it is not technically feasible for us to conduct the experiment, as the IP would require thousands of two-cell embryos, and we have the issue of poor co-IP quality with the DUX antibody.

      Reviewer #2 (Public Review):

      In this manuscript, Smith et al. delineated novel mechanistic insights into the structure-function relationships of the C-terminal repeat domains within the mouse DUX protein. Specifically, they identified and characterised the transcriptionally active repeat domains, and narrowed down to a critical 6aa region that is required for interacting with key transcription and chromatin regulators. The authors further showed how the DUX active repeats collaborate with the C-terminal acidic tail to facilitate chromatin opening and transcriptional activation at DUX genomic targets.

      Although this study attempts to provide mechanistic insights into how DUX4 works, the authors will need to perform a number of additional experiments and controls to bolster their claims, as well as provide detailed analyses and clarifications.

      We thank this reviewer for their constructive comments, and have conducted several new analyses, additional experiments and clarifications – which have strengthened the manuscript in several locations. Highlights include a statistical approach to the similarity of mouse repeats to themselves and to orthologs (Figure S1d) and clarified interpretations, a wider dynamic range to better reflect changes in DUX construct behaviors (Figure 6a), and additional data on construct behavior, including ‘inactive’ constructs (e.g C1+14aa in Figure 1a,d, new ATAC-seq in Figure S1g), and active constructs such as C3+C5+14aa and C3+C514aa (in Figure S1b).

      Reviewer #3 (Public Review):

      Dux (or DUX4 in human) is a master transcription factor regulating early embryonic gene activation and has garnered much attention also for its involvement in reprogramming pluripotent embryonic stem cells to totipotent "2C-like" cells. The presented work starts with the recognition that DUX contains five conserved c. 100-amino acid carboxy-terminal repeats (called C1-C5) in the murine protein but not in that of other mammals (e.g. human DUX4). Using state-of-the-art techniques and cell models (BioID, Cut&Tag; rescue experiments and functional reporter assays in ESCs), the authors dissect the activity of each repeat, concluding that repeats C3 and C5 possess the strongest transactivation potential in synergy with a short C-terminal 14 AA acidic motif. In agreement with these findings, the authors find that full-length and active (C3) repeat containing Dux leads to increased chromatin accessibility and active histone mark (H3K9Ac) signals at genomic Dux binding sites. A further significant conclusion of this mutational analysis is the proposal that the weakly activating repeats C2 and C4 may function as attenuators of C3+C5-driven activity.

      By next pulling down and identifying proteins bound to Dux (or its repeat-deleted derivatives) using BioID-LC/MS/MS, the authors find a significant number of interactors, notably chromatin remodellers (SMARCC1), a histone chaperone (CHAF1A/p150) and transcription factors previously (ZSCAN4D) implicated in embryonic gene activation.

      The experiments are of high quality, with appropriate controls, thus providing a rich compendium of Dux interactors for future study. Indeed, a number of these (SMARCC1, SMCHD1, ZSCAN4) make biological sense, both for embryonic genome activation and for FSHD (SMCHD1).

      A critical question raised by this study, however, concerns the function of the Dux repeats, apparently unique to mice. While it is possible, as the authors propose, that the weak activating C1, C2 C4 repeats may exert an attenuating function on activation (and thus may have been selected for under an "adaptationist" paradigm), it is also possible that they are simply the result of Jacobian evolutionary bricolage (tinkering) that happens to work in mice. The finding that Dux itself is not essential, in fact appears to be redundant (or cooperates with) the OBOX4 factor, in addition to the absence of these repeats in the DUX protein of all other mammals (as pointed out by the authors), might indeed argue for the second, perhaps less attractive possibility.

      In summary, while the present work provides a valuable resource for future study of Dux and its interactors, it fails, however, to tell a compelling story that could link the obtained data together.

      We appreciated the reviewer’s views regarding the high quality of the work and our generation of an important dataset of DUX interactors. We also appreciate the comments provided to improve the work, and have performed and included in the revised version a set of clarifications, additional analyses and additional experiments that have served to reinforce our main points and provide additional mechanistic links. We also agree that more remains to be done to understand the function and evolution of repeats C1, C2 and C4.

      Reviewer #1 (Recommendations For The Authors):

      1) For immuno-blots, authors may indicate the expected bands to help readers better understand the results.

      Agreed, and we have included the predicted molecular weight of proteins in the Figure Legends. We note that our work shows that the C-terminal domains confer anomalous migration in SDS-PAGE.

      2) Fig. 5b, a blot missing for the mCherry group?

      Figure 5b is a volcano blot, so we believe the reviewer is referring to Figure 5d, which is a coimmunoprecipitation experiment between SMARCC1 and mCherry-tagged DUX constructs. However, we are unsure of the comment as an anti mCherry sample is present in that panel.

      3) Line 99-100, Fig. S1d, it seems that repeat2, but not repeat3, is more similar to human DUX4 C-terminal region.

      This comment and one by another reviewer have prompted us to re-examine the similarities of the DUX repeats, and we have new analyses (Figure S1d) and an alternative framing in the manuscript as a result. We have expanded on this in our response to Reviewer #2, point #1 – and direct the reviewer there for our expanded treatment.

      4) There are a few references are misplaced. For example, line 48, the studies that reported the role of Dux in inducing 2CLCs should be from Hendrickson et al., 2017, De Iaco et al., 2017, and Whiddon et al., 2017. The authors may want to double check all references.

      Thanks for pointing these out. These issues have been corrected in the manuscript.

      5) In the materials & methods section, a few potential errors are noticed. For example, concentrations of PD0325901 and CHIR99021 in mESC medium appear ~1000-fold higher than standards.

      Thanks – corrected.

      Reviewer #2 (Recommendations For The Authors):

      Major Points

      1) Line 99 - The authors claimed that the "human DUX4 C-terminal region is most similar to the 3rd repeat of mouse DUX", but based on Supp. Fig. 1d, the human DUX4 C-term should be most similar to the 2nd repeat of mouse DUX. If this is indeed the case, it will undermine the rest of this study, since the authors claim that the 3rd repeat is transcriptionally active, whereas the 2nd repeat is transcriptionally inactive, and the bulk of this study largely focused on how the active repeats, not the inactive repeats, are critical in recruiting key transcriptional and chromatin regulators to induce the embryonic gene expression program.

      We thank the reviewer for their comments here. Since submission,and as mentioned above for reviewer #1 we have revisited the issue of similarity of the DUX4 C-terminal region to the mouse C-terminal repeats, with a BLAST-based approach that is more rigorous and informed by statistics – which is in Author response table 1 and now in the manuscript as Figure S1d, and has affected our interpretation. Our prior work involved a simple % identity comparison table and we now appreciate that some of the similarity analyses did not meet statistical significance, and therefore we are unable to draw certain conclusions. We make the appropriate modifications in the text. For example, we no longer state that the DUX4 C-terminus appears to be most similar to mouse repeats 3 and 5. This does not affect the main conclusions of the paper regarding interactions of the C-terminus with chromatin-related proteins, only our speculation on which repeat might have represented the original single repeat in the mouse – an issue we think of some interest, but did not rise to the level of mentioning in the original or current abstract.

      Author response table 1.

      Parameters: PAM250 matrix. Gap costs of existence: 15 and extension: 3. Numbers represent e-value of each pairwise comparison

      *No significant similarities found (>0.05).

      2) In Supp Fig 1d, it seems that the rat DUX4 C-terminal region is most similar to the 4th repeat of mouse DUX, which according to the author is supposedly transcriptionally inactive. This weakens the authors justification that the 3rd or 5th repeat is likely the "parental repeat for the other four", and further echoes my concern in point 1 where the human DUX4 C-term is most similar to the 2nd (inactive) repeat of mouse DUX.

      The reviewer’s point is well taken and is addressed in point #1 above.

      3) In Fig. 1d, the authors showed that DUX4-containing C3 and C5, but lacking acidic tail, can promote MERVL::GFP expression, albeit to a slightly lower extent compared to FL. However, in Fig. 2b, C3 or C5 alone (lacking acidic tail) completely failed to promote MERVL::GFP expression. However, in the presence of the acidic tail, both versions were able to promote MERVL::GFP expression, similar to that of FL. The latter would suggest that it is the acidic tail that is crucial for MERVL::GFP expression, and this does not quite agree with Fig 1b, where C12345 (lacking acidic tail) was able to promote MERVL::GFP expression. Although C12345 did not activate MERVL to a similar level as FL, it is clearly proficient, compared to C3 or C5 alone (lacking acidic tail) where there is no increase in MERVL at all. Additional constructs will be helpful to clarify these points. For example, 'C3+C5 minus acidic tail' and 'HD1+HD2+acidic tail only' constructs.

      We agree that constructs such as those mentioned would add to the work. First, we have done the additional construct HD1+HD2+14aa tail, which is presented as ΔC12345+14aa in Figure 2a and in S2a. Additionally, we performed experiments on the requested C3+C5+14aa and C3+C5Δ14aa (see samples 6 and 7 in Author response image 1, which are now included in Supplemental Figure 2b). The results reinforce our hypothesis of an additive effect toward DUX target gene activation by increasing C-terminal repeats and including the 14aa tail.

      Author response image1.

      4) Related to the above, the flow cytometry data for the MERVL::GFP reporter as presented in Figures 1 and 2, as well as in Supp. Fig. 2, show a considerably large difference in the %GFP|mCherry for the FL construct, ranging from ~6-26%. This makes it difficult to convince the reader which of the different DUX domain constructs cannot or can partially induce GFP|mCherry signal when compared to FL, and hence it is tough to definitively ascertain the exact contribution of each of the 5 C-terminal repeats with high confidence, as it appears that there exists a significant amount of variability in this MERVL::GFP reporter system. The authors need to address this issue since this is their primary method to elucidate the transcriptional activity of each of the mouse DUX repeat domains.

      We note that with the Dux-/- cell lines we used throughout the timeline of the study, the percent of %GFP|mCherry expression progressively and slowly decreased – possibly due to slow/modest epigenetic silencing of the reporter. However, we always used the full-length DUX construct to establish the dynamic range. We emphasize that the relative differences between constructs over multiple cell line replicates remained relatively consistent. However, we elected to show absolute values in each experiment, rather than simply normalizing the full-length to 100% and showing relative.

      5) Lines 140-142 - The authors claimed that the functional difference between the transcriptionally active and inactive repeats could be narrowed down to a "6aa region which is conserved between repeats C3 and C5, but not conserved in C1, C2 and C4". Assuming the 6aa sequence is DPLELF, why does C1C3a elicit almost twice the intensity of GFP|mCherry signal compared to C3C1c, despite both constructs having the exact same 6aa sequence?

      Indeed, C1C3a and C3C1c both containing the ‘active’ DPL sequence but having different relative levels of %GFP|mCherry. This is consistent with these sequences having a positive role in DUX target gene regulation – but likely in combination with other other regions which potentiate its affect, possibly through interacting proteins or post-translational modifications.

      Why does DPLEPL (the intermediate C3C1b construct) induce a similar extent of GFP|mCherry signal as the FL construct, even though the former includes 3aa from a transcriptionally inactive repeat? In contrast, GSLELF (the other intermediate C1C3b construct) that also includes 3aa from a transcriptionally inactive repeat is almost completely deficient in inducing any GFP|mCherry signal. Why is that so? Is DPL the most crucial sequence? It will be important to mutate these 3 (or the above 6) residues on FL DUX4 to examine if its transcriptional activity is abolished.

      These are interesting points. DPL does appear to be the most important region in the mouse DUX repeats. However, DPL is not shared in the C-terminus of human DUX4. Notably, the DUX4 C-terminus is sufficient to activate the mouse MERVL::GFP reporter when cloned to mouse homeodomains (see Author response image 2, second sample) and other DUX target genes (initially published in Whiddon et al. 2017). One clear possibility is that the DPL region is helping to coordinate the additive effects of multiple DUX repeats, which only exist in the mouse protein.

      Author response image 2.

      6) Line 154 - The intermediate DUX domain construct C1C3b occupied a different position on the PCA plot from the C1C3c construct that does not contain any of the critical 6aa sequence, as shown in Fig. 2e. However, both these constructs appear to be similarly deficient in inducing any GFP|mCherry signal, as seen in Fig. 2c. Why is that so?

      The PCA plot assesses the impact on the whole transcriptome and not just the MERVL::GFP reporter, suggesting the 3aa region has transcriptional effects on the genome beyond what is detected in the MERVL::GFP reporter.

      7) To strengthen the claim that "Chromatin alterations at DUX bindings sites require a transcriptionally active DUX repeat", the authors should also perform CUT&Tag for constructs containing transcriptionally inactive DUX repeats (e.g. C1+14aa), and show that such constructs fail to occupy DUX binding sites, as well as are deficient in H3K9ac accumulation.

      This is a good comment. We elected to control this with constructs containing or lacking an active repeat. Although we have not pursued this by CUT&TAG, we have examined the impact of DUX constructs with inactive repeats (including the requested C1+14aa, new Figure S1g) by ATAC-seq (see #12, ATAC-seq section, below), and observe no chromatin opening, suggesting that the lack of transcriptional activity is rooted in the inability to open chromatin.

      8) It would be good if the authors could also include CUT&Tag data for some of the C1C3 chimeric constructs that were used in Fig. 2, since the authors argued that the minimal 6aa region is sufficient to activate many of the DUX target genes. This would also strengthen the authors’ case that the transcriptionally active, not inactive, repeats are critical for binding at DUX binding sites and ensuring H3K9ac occupancy.

      We agree that these would be helpful, and have examined the inactive repeats in transcription and ATAC-seq formats during revision (new data in Figures 1d and S1g), but not yet the CUT&TAG format.

      9) Line 213 - "SMARCA4" should have been "SMARCA5"? Based on Fig. 4d, SMARCA5 is picked up in the BirA*-DUX interactome, not SMARCA4.

      Thanks – corrected.

      10) Lines 250-252 - The authors compared the active BirA-C3 against the inactive BirA-C1 to elucidate the interactome of the transcriptionally active C3 repeat, as illustrated in Fig. 5c. They found 12 proteins more enriched in C1 and 154 proteins in C3. This information should be presented clearly as a separate tab in Supp Table 2. What are the proteins common to both constructs, i.e. enriched to a similar extent? Do they include chromatin remodellers too? Although the authors sought to identify differential interactors between the 2 constructs, it is also meaningful to perform 2 separate comparisons - active BirA-C3 against BirA alone control, and inactive BirA-C1 against BirA alone control - like in Fig. 4d, so as to more accurately define whether the active C3 repeat, and not the inactive C1 repeat, interacts with proteins involved in chromatin remodeling.

      We thank the reviewer for this comment, and we have modified the manuscript by adding a second sheet in Supplementary Table 2 including the results for enriched proteins in BirA-C1 vs. C3. Additionally, due to limitations of annotation between BirA alone and BirA*-C3 being sequenced in different mass spectrometry experiments, it is difficult to quantitatively compare the two datasets with pairwise comparisons.

      11) Fig 5d: The authors mentioned in the legend that endogenous IP was performed for SMARCC1. However, in line 266, they stated Flag-tagged SMARCC1. Is SMARCC1 overexpressed? The reciprocal IP should also be presented. More importantly, C1 constructs (e.g. C1+14aa and C1Δ14aa) should also be included.

      To clarify, Figure 4e used exogenously overexpressed FLAG-SMARCC1 in HEK-293T cells to confirm the results of the full-length DUX BioID experiment. Figure 5d was performed with overexpressed DUX construct, but involved endogenous SMARCC1 in mESCs. This has now been made clearer in the revised manuscript.

      12) For both the SMARCC1 CUT&Tag and ATAC-seq experiments shown in Figures 5e and 5f respectively, the authors need to include DUX derivatives that contain transcriptionally inactive repeats with and without the 14aa acidic tail, i.e. C1+14aa and C1Δ14aa, and show that these constructs prevent the binding/recruitment of SMARCC1 to DUX genomic targets, and correspondingly display a decrease in chromatin accessibility. Only then can they assert the requirement of the transcriptionally active repeat domains for proper DUX protein interaction, occupancy and target activation.

      We agree that examination of an inactive repeat in certain approaches would improve the manuscript. Importantly, we have now included C1+14 in our ATAC-seq experiments, and in Author response image 3 two individual replicates, which constitute a new Figure S1g. Compared to the transcriptionally active DUX constructs, which see opening at DUX binding sites, we do not see chromatin opening at DUX binding sites with transcriptionally inactive C1+14.

      Author response image 3.

      13) To prove that DUX-interactors are important for embryonic gene expression, it will be important to perform loss of function studies. For instance, will the knockdown/knockout of SMARCC1 in cells expressing the active DUX repeat(s) lead to a loss of DUX target gene occupancy and activation?

      We agree that it would be interesting to better understand SMARCC1 cooperation with DUX function in the embryo, but we believe this is beyond the scope of this paper.

      Minor Points

      1) Lines 124-126 - What is the reason/rationale for why the authors used one linker (GGGGS2) for constructs with a single internal deletion, but 2 different linkers (GGGGS2 and GAGAS2) for constructs with 2 internal deletions?

      With Gibson cloning, there are homology overhang arms for each PCR amplicon that are required to be specific for each overlap. Additionally, each PCR amplicon needs to be specific enough from one another so that all inserts (up to 5 in this manuscript) are included and oriented in the right order. The linker sequences were included in the homology arm overlaps, so the nucleotide sequences for each linker needed to be specific enough to include all inserts. This is a general rule to Gibson cloning. Additionally, both GGGGS2 and GAGAS2 are common linker sequences used in molecular biology and the amino acids structures are similar to one another, suggesting there is no functional difference between linkers.

      2) Line 704 - 705: In the figure legend, the authors stated that 'Constructs with a single black line have the linker GGGGS2 and constructs with two black lines have linkers with GGGGS2 and GAGAS2, respectively.'. This was not obvious in the figures.

      Constructs used for flow and genomics experiments that are depicted in Figure 2, Supplementary Figure 2, Figure 3, Figure 4, and Figure 5 have depicted black lines where deletions are present. Where these deletions are present, there are linkers in order to preserve spacing and mobility for the protein.

      3) Line 160 - Clusters #1 and #2 are likely written in the wrong order. It should have been "activating the majority of DUX targets in cluster #2, not cluster #1" and "failed to activate those in cluster #1, not cluster #2", based on the RNA-seq heatmap in Fig. 2f.

      We thank the reviewer for this comment, and the error has been corrected in the manuscript.

      4) Line 188 - Delete the word "of" in the following sentence fragment: "DUX binding sites correlating with the of transcriptional".

      Thanks – corrected.

      5) Line 191 - Delete the word "aids" in the following sentence fragment: "important for conferring H3K9ac aids at bound".

      Thanks – corrected.

      6) Line 711 - "C1-C3 a,b,d" should be "C1-C3 a,b,c".

      Thanks – corrected.

      7) Lines 711-712 - The colors "pink to blue" and "blue to pink" are likely written in the wrong order. Based on Fig. 2c, the blue to pink bar graphs should represent C1-C3 a,b,c in that order, and likewise the pink to blue bar graphs should represent C3-C1 a,b,c in that order.

      Thanks – corrected.

      8) There is an overload of data presented in Fig. 2c, such that it is difficult to follow which part of the figure represents each data segment as written in the figure legend. It is recommended that the data presented here is split into 2 sub-figures.

      Figure 2c has a supporting figure in Supplementary Figure 2b. While there is both a graphical depiction of the constructions and the data both in the main panel of Figure 2C, we have depicted it as so to be as clear as possible for the reader to interpret the complexity and presentence of amino acids in each of the constructs.

      9) Line 717 - "following" is misspelt.

      Thanks – corrected.

      10) Lines 720-721 - "(Top)" and "(Bottom)" should be replaced with "(Left)" and "(Right)", as the 2 bar graphs presented in Fig. 2d are placed side by side to each other, not on the top and bottom.

      Thanks – corrected.

      11) Lines 725 and 839 - "Principle" is misspelt. It should be "Principal".

      Thanks – corrected.

      12) In Figures 3d and 3e, the sample labeled "C3+14_1" should be re-labeled to "C3+14", in accordance with the other sub-figures. Additionally, for the sake of consistency, "aa" should be appended to the relevant constructs, e.g. "C3+14aa" and "C3Δ14aa".

      Thanks – corrected.

      13) Line 773 - Were the DUX domain constructs over-expressed for 12hr (as written in the figure legend) or 18hr (as labeled in Fig. 5d)?

      Thanks – corrected.

      14) Related to minor point 19 above, is there a reason/rationale for why some of the experiments used 12hr over-expression of DUX domain constructs (e.g. for CUT&TAG in Fig. 3), whereas in other experiments 18hr over-expression was chosen instead (e.g. flow cytometry for MERVL::GFP reporter in Figures 1 and 2, and co-IP validations of BirA*-DUX interactions in Fig. 4)?

      Thanks for the opportunity to explain. In this work, experiments that reported on proteins that are translated following DUX gene activation (e.g. MERVL:GFP via flow) were done at 18hr to allow for enough time for transcription and translation of GFP (or other DUX target genes). For experiments that report on the impact of DUX on chromatin and transcription, such as RNA-seq, CUT&Tag, and ATAC-seq, we induced DUX domain constructs for 12 hours.

      15) Line 804 - "ΔHDs" is missing between "C2345+14aa" and "ΔHD1".

      Thanks – corrected.

      16) In Fig. 5c, "Chromatin remodelers" is misspelt.

      Thanks – corrected.

      17) There is no reference in the manuscript to the proposed model that is presented in Fig. 6b.

      Thanks – corrected.

      Reviewer #3 (Recommendations For The Authors):

      Given the uncertainty of the function of the Dux peptide repeats in mice, could it not also be possible that the underlying repeated nature of the (coding) DNA? That is, could these DNA repeats exert a regulatory function on Dux transcription itself (also given the dire consequences of misregulated DUX4 expression as seen in FSHD, for example).

      Yes, it remains possible that the internal coding repeats within Dux are playing a role in locus regulation, and might be interesting to examine. However, we consider this question as being outside the scope of the current paper.

      Finally, it would be interesting to know whether these repeats are, in fact, present in all mouse species. Already no longer present in rat, do they exist, or not, in more "distant" mice, e.g. M. caroli?

      Determining whether all mouse strains contain C-terminal repeats in DUX is a question we also considered. However, Dux and its orthologs are present in long and very complex repeat arrays that are not present in the sequencing data or annotation in other mouse strains. Therefore, we are not unable to answer this question from existing sequencing data. Answering would require a considerable genome sequencing and bioinformatics effort, or alternatively a considerable effort aimed at cloning ortholog cDNAs from 2-cell embryos.

      Minor points:

      line 169: here it seems, in fact, that the 'inactive' C2, C4 repeats are more similar to each other (my calculation: 91 and 96% identity at the protein and DNA level, respectively) than the active C3 and C5 repeats (82 and 89% identity, resp.), the outlier being C1.

      Thanks for this comment, which was mentioned by other reviewers as well and has been addressed through new statistical analyses and interpretation (see new Figure S1d).

      line 191: I'm not sure this sentence parses correctly ("...14AA tail is important for conferring H3K9Ac aids at bound sites...")

      We thank the reviewer for this comment, and we have corrected the sentence in the manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      This study presents an important finding on human m6A methyltransferase complex (including METTL3, METTL14 and WTAP). The evidence supporting the claims of the authors is convincing, although the model and assays need to be further modified. The work will be of interest to biologists working on RNA epigenetics and cancer biology.

      In mammals, a large methyltransferase complex (including METTL3, METTL14 and WTAP) deposits m6A across the transcriptome, and METTL3 serves as its catalytic core component. In this manuscript, the authors identified two cleaved forms of METTL3 and described the function of METTL3a (residues 239-580) in breast tumorigenesis. METTL3a mediates the assembly of METTL3-METTL14-WTAP complex, the global m6A deposition and breast cancer progression. Furthermore, the METTL3a-mTOR axis was uncovered to mediate the METTL3 cleavage, providing potential therapeutic target for breast cancer. This study is properly performed and the findings are very interesting; however, some problems with the model and assays need to be modified. It is widely known that METTL3 and METTL14 form a stable heterodimer with the stoichiometric ratio of 1:1 (Wang X et al. Nature 534, 575-578 (2016), Su S et al. Cell Res 32(11), 982994 (2022), Yan X et al. Cell Res 32(12), 1124-1127 (2022)), the numbers of METTL3 and METTL14 in the model of Fig 7P are not equivalent and need to be modified.

      We thank for reviewer’s good suggestion. We have modified the model in Fig. 7P.

      Reviewer #2 (Public Review):

      In this study, Yan et al. report that a cleaved form of METTL3 (termed METTL3a) plays an essential role in regulating the assembly of the METTL3-METTL14-WTAP complex. Depletion of METTL3a leads to reduced m6A level on TMEM127, an mTOR repressor, and subsequently decreased breast cancer cell proliferation. Mechanistically, METTL3a is generated via 26S proteasome in an mTOR-dependent manner.

      The manuscript follows a smooth, logical flow from one result to the next, and most of the results are clearly presented. Specifically, the molecular interaction assays are welldesigned. If true, this model represents a significant addition to the current understanding of m6A-methyltransferase complex formation.

      A few minor issues detailed below should be addressed to make the paper even more robust. The specific comments are contained below.

      1) The existence of METTL3a and METTL3b.<br /> In this study, the author found the cleaved form of METTL3 in breast cancer patient tissues and breast cancer cell lines. Is it a specific event that only occurs in breast cancer? The author may examine the METTL3a in other cell lines if it is a common rule.

      We thank reviewer for point this out. We discovered the cleaved form of METTL3 in breast cancer, and we further examined this cleaved METTL3 in other cell lines such as lung cancer cell lines, renal cancer cell lines, HCT116 and MEF (new Supplementary Figures 1A-1C), these data suggest that it is a common rule. Therefore, we speculate that METTL3a may be ubiquitiously expressed. We have added this part in the revised manuscript, please see Line 118-120.

      2) Generation of METTL3a and METTL3b.

      1) Figure 1 shows that METTL3a and METTL3b were generated from the C-terminal of full-length METTL3. Because the sequence of METTL3a is involved in the sequences of METTL3b, can METTL3b be further cleaved to produce METTL3a?

      Although the sequence of METTL3a is involved in the sequences of METTL3b, overexpression of METTL3b in T47D, MDA-MB-231 and 293T cells did not show METTL3a expression in these cells (please see Figures 3A, 3C, 3G), suggesting that METTL3b can not be further cleaved to produce METTL3a, and the METTL3 cleavage may require its N-terminal region. We have added this in the discussion, please see Line 358 to 360.

      2) Based on current data, the generation of METTL3a and METTL3b are separated. Are there any factors that affect the cleavage ratio between METTL3a and METTL3b?

      We thank for reviewer’s excellent question. In this study, we show that both METTL3a and METTLb are produced through proteasomal cleavage, and both of them are positively regulated by the mTOR pathway. On the other hand, we indeed observed the differential cleavage ratios between METTL3a and METTL3b across different cell lines. For example, METTL3a/METTLb ratio was greater than 1 in MDA-MB-231 cells (see Figure 7C), less than 1 in T47D and 293T cell lines (see Figure 7A and 7B), and equal to 1 in MEF cells (see Figure 7O). Based on these results, we speculate that there may be some factors that control the cleavage ratio between METTL3a and METTL3b, which warrants further investigation. We have added this in the discussion, please see Line 374 to 379.

      3) In Figure 2G, the author shows the result that incubation of the Δ198+Δ238 METTL3 protein with T47D cell lysates cannot produce the METTL3a and METTL3b variants. The author may also show the results that Δ198 METTL3 protein or Δ238 METTL3 protein incubates with T47D cell lysates, respectively.

      Following the reviewer’s suggestion, we had performed in vitro cleavage assays by incubation of METTL3-Δ238 or METTL3-Δ198 with T47D cell lysates, and had incorporated this result in the revised manuscript. Please see our new Supplementary Figure 3A.

      4) As well as many results published in previous studies, the in vitro methylation assay shows that WT METTL3 is capable of methylating RNA probe (figure 2H). The main point of this study is that METTL3a is required for the METTL3-METTL14 assembly. However, the absence of METTL3a in the in vitro system did not inhibit METTL3METTL14 methylation activity. Moreover, the presence of METTL3a even resulted in a weak m6A level.

      The main point of this study is that METTL3a is required for the METTL3WTAP interaction, but dispensable for the METTL3-METTL14 assembly (see Figure 4A-4B). In this in vitro methylation assays, METTL3 and METTL14 is capable of methylating RNA probe in the absent of WTAP. In this condition, we found that METTL3 WT as well as its different variants (METTL3-Δ238, METTL3-Δ198, METTL3b and METTL3a) except the catalytically dead mutant METTL3 APPA showed methylation activity in vitro.

      5) In Figure 4A, the author suggests that WTAP cannot be immunoprecipitated with METTL3a and 3b because WTAP interacted with the N-terminal of METTL3. If this assay is performed in WT cells, the endogenous full-length METTL3 may help to form the complex. In this case, WTAP is supposed to be co-immunoprecipitated.

      We thank reviewer for point this out. METTL3 interacts with WTAP through its N-terminal (1-33aa) (1). Consistently, we find that the two cleaved forms METTL3a and METTL3b which lack the N-terminal region are not able to bind with WTAP. In Figure 4A, we overexpressed METTL3 WT as well as its different variants METTL3-Δ238, METTL3-Δ198, METTL3b and METTL3a respectively in WT cells, and compared the binding ability with WTAP or METTL14 across these overexpressed METTL3 variants. We acknowledge that the exogenous METTL3a and METTL3b interact with endogenous full-length METTL3, and the endogenous full-length METTL3 may help them to form the complex with WTAP. But in fact, the exogenous expression levels of METTL3a and METTL3b are much higher than that of endogenous full-length METTL3 (see Figure 3A and 3C). In this case, METTL3a or METTL3b predominantly interacts with itself, METTL3, METTL14 or other potential interacting proteins through its C-terminal region, this may greatly dilute the condition for the interaction between WTAP and endogenous full-length METTL3. Moreover, in Figure 4A, the comparison is among overexpressed METTL3 variants, the week indirect interactions through much lower expression levels of endogenous protein are probably not comparable to those direct interactions between overexpressed METTL3 variants and WTAP.

      Reference:

      1) Schöller, E., Weichmann, F., Treiber, T., Ringle, S., Treiber, N., Flatley, A., Feederle, R., Bruckmann, A., and Meister, G. (2018). Interactions, localization, and phosphorylation of the m6A generating METTL3–METTL14–WTAP complex. Rna 24, 499-512

      Reviewer #1 (Recommendations For The Authors):

      Major points:

      1) It is widely known that METTL3 and METTL14 form a stable heterodimer with the stoichiometric ratio of 1:1 (Wang X et al. Nature 534, 575-578 (2016), Su S et al. Cell Res 32(11), 982-994 (2022), Yan X et al. Cell Res 32(12), 1124-1127 (2022)), the numbers of METTL3 and METTL14 in the model of Fig 7P are not equivalent and need to be modified.

      We thank for reviewer’s good suggestion. We have modified the model in Fig. 7P.

      2) The in vitro methylation activity was detected by the m6A antibody, which has limited linear range. The MTase-Glo{trade mark, serif} Methyltransferase Assay is a SAMdependent enzyme assay with wide applications (Please refer to the references below).

      Could this assay be performed by authors?

      Wilkinson AW et al. Nature 565(7739), 372-376 (2019).

      Yu D et al. Nucleic Acids Res 49(20),11629-11642 (2021).

      Yan X et al. Cell Res 32(12), 1124-1127 (2022).

      Chen J et al. Nat Commun 13(1), 3257 (2022).

      Thanks for reviewer’s good suggestion. We had performed the in vitro methylation assay by using MTase-Glo kit, and the data is consistent with the dot blot results. Please see the new Figure 2H-J.

      3) When expressed alone in mammalian cell lines, METTL14 is unstable and is easily contaminated with endogenous METTL3 during purification (Yang W et al. Nat Cell Biol 16(2), p.191-8 (2014), Fig 1e). In Fig 2I, Co-expressing METTL3 and METTL14 maybe a good choice.

      We thank for reviewer’s good suggestion. In fact, we co-expressed METTL3 and METTL14 in this in vitro methylation assay in Fig 2I (new Figure 2J in the revised version), METTL3-Flag or its mutant with Flag tag and METTL14-Flag were co-transfected into 293T cells, and co-purified by using Flag M2 magnetic beads from the cell lysates. We have added these details in the indicated method section, please see Line 574-585.

      Other minor points:

      1) In Fig 5D, the protein domain information of METTL3 and relevant references need to be added (Su S et al. Cell Res 32(11), 982-994 (2022), Fig 6g; Yan X et al. Cell Res 32(12), 1124-1127 (2022), Fig 1a).

      We have added these references in the revised manuscript.

      2) In Fig 5, would METTL3b contribute to the METTL3-METTL3 interaction?

      Our data showed that METTL3a but not METTL3b is responsible for the METTL3-WTAP interaction, breast cancer cell proliferation and the m6A modification. Then, we investigated the mechanism of how METTL3a regulates the METTL3-WTAP interaction, and found that METTL3a is essential for METTL3-METTL3 interaction, which is a prerequisite step for WTAP recruitment in MTC complex. In this case, we speculate that METTL3b is not required for the METTL3-METTL3 interaction. Indeed, through Co-IP assays,we found that METTL3b has no effect on the METTL3-METTL3 interaction (new supplementary Figure 4D), which is consistent with our above data showing that METTL3b is dispensable for the METTL3-WTAP interaction. We have added this comment in Page 6, Line 226 to 228.

      3) In Fig 3F, the color in the legend and figure is inconsistent.

      We have corrected the inconsistent color in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      1) In Figure 5D, the construction details of METTL3-HA and Flag should have been included in the method section. Are these tag sequences in the N-terminal of METTL3 protein?

      These tags are all in the C-terminal of METTL3. We have added the construction details of these plasmids in the method section. Please see Line 434.

      2) In Figure 7A, the labels of the inhibitors are overlapped with the figures.

      We have corrected the labels of the inhibitors in Figure 7A in the revised manuscript.

    1. Author Response

      We thank the reviewers and editors for their thoughtful evaluation of our preprint. We felt that the reviews were fair and that addressing them will improve the rigor and clarity of our presentation. We are working to address all of the comments, with intent to submit a revised manuscript in the near future.

    1. Author Response

      Reviewer #1 (Public Review):

      This cross-sectional study examined the results of a survey about cancer treatment disruption during June-August 2020 in 82 counties located in Missouri and Illinois in the U.S. The main outcome was disruption in cancer care. Authors reported that higher education, being a female, experiencing more discrimination in healthcare settings, and having scheduled a telehealth appointment were associated with higher odds of care disruption. Lack of a research focus, lack of following any conceptual framework, the cross-sectional nature of the study, and the small sample size were the noted shortcomings of the manuscript.

      We thank Reviewer 1 for their comments. We agree that it is important to understand COVID-related care disruptions using causal methods. However, this manuscript aimed to examine the local impact of COVID care disruptions. We focused on the Siteman Cancer Center’s (SCC) catchment area because the co-author team includes the SCC’s Associate Director of Community Outreach and Engagement (COE) program, the SCC Associate Director for Diversity, Equity, and Inclusion, multiple members of the SCC COE leadership team. Thus, we are uniquely positioned to mobilize and identify outreach opportunities and/or programs that address any gaps we discover. Moreover, this focus on our catchment area and the motivation for this survey aligns with the National Cancer Institute’s priorities of population health assessments to characterize cancer-relevant knowledge, attitudes, beliefs, and behaviors across cancer center catchment areas. While this is a crosssectional study, this snapshot of care disruption will be helpful in planning local outreach strategies. Lastly, our catchment area is challenged with multiple cancer disparities patterned by social identities. Therefore, our analysis was guided by the theory that social identities related to race, ethnicity, class, and gender shape access to healthcare and disease processes and are the fundamental drivers of health. Thus, we included variables that impact health and are patterned by these social factors.

      Reviewer #2 (Public Review):

      Dr. Kia Davis and colleagues present a thoughtful analysis of disruptions to cancer care during COVID-19 in the article, "Understanding disruptions in cancer care to reduce increased cancer burden: a cross-sectional study." The article is based on an online survey of 680 residents in the Siteman Cancer Center catchment area in Summer 2020. The authors aim to characterize demographic differences in cancer care disruptions. Information about the causes and distribution of care disruption can help reduce the impacts of COVID-19 and guide the recovery of programs and services. The article provides a clear and detailed assessment of factors associated with care disruption and return to care during the first six months of the pandemic.

      A strength of the study is the focus on the catchment area of the cancer center during a period of dramatic change. The results would provide timely and actionable data to address emerging barriers to care and associated social or contextual factors. This information helps the Community Outreach and Engagement efforts to be responsive to community priorities despite rapidly evolving circumstances.

      The analysis would benefit from greater detail in three areas. First, it would be helpful to have more information about how the outcome measures were originally developed or tested. Second, for the regression analysis, it would be helpful to show the demographic characteristics of the two strata to better understand the sample composition. Third, the authors should demonstrate that the data do not violate the assumptions for conducting logistic regression to improve confidence in the findings.

      COVID-19 affected all aspects of the cancer continuum. The study reports factors associated with postponing or canceling cancer-related appointments during the pandemic. It will be of great interest to researchers and practitioners in cancer prevention and control.

      We thank Reviewer 2 for their thoughtful critique of our work. Their suggestions have strengthened our manuscript. Since our article was submitted, the questionnaire where we derived our outcome measure has been published. The questions were drawn from validated measures assessing the impact of pandemics such as H1N1, and major life disruptions such as natural disasters. This language was updated in the manuscript as were the references. Moreover, we added a supplemental Table 2 to show the demographic characteristics by race strata. Finally, we tested and can confirm that the analysis does not validate the assumptions of logistic regression. We believe that our results will aid in the understanding of how COVID impacted cancer care in our catchment area so that we can better mobilize resources. While we understand this is a cross-sectional study with the potential for unmeasured confounding, we believe this snapshot of cancer care during the pandemic will also be of interest to researchers, clinicians, and other practitioners in cancer prevention and control in locations like ours.

    1. Author Response

      Reviewer #3 (Public Review):

      In this manuscript, Castano et al generate and test a small molecule inhibitor of CDKL5, an Xlinked kinase whose loss-of-function is the cause of a severe neurodevelopmental disorder. Since the current knowledge of CDKL5 functions mainly rely on genetic models it is still unclear which effects are caused directly by CDKL5 loss and which can be ascribed to indirect effects. A specific inhibitor would therefore be an important tool for the field.

      Castano and colleagues therefore tested a panel of twenty kinase inhibitors for their capacity to block phosphorylation of a EB2, a bona fide CDKL5 substrate, in rat neurons. Among the three that could inhibit EB2 phosphorylation at low concentrations, one was found to inhibit CDKL5 while not affecting GSK3 kinases, which share significant homology to CDKL5. Considering that genetic studies have previously linked CDKL5 to excitatory synaptic transmission, acute hippocampal slices were exploited to test the consequences of CDKL5 inhibition. While CDKL5 loss in the past was found to affect both AMPA- and NMDA-Rs, the small molecule-based inhibition affected only AMPA-R responses at the post-synaptic level. Since pharmacokinetic analyses showed that the inhibitor has a low capacity for brain penetration the molecule remains limited for testing the acute inhibition of CDKL5 in vitro and ex vivo. Such a tool represents an important aspect in the CDKL5 field and the findings suggesting a direct role of CDKL5 in regulating AMPA-R functions are interesting. However, the manuscript could be improved to render it more readable.

      Thank you for this positive feedback and we hope that our adjustments improve the readability.

      The description of the binding and orthogonal assays, which are the basis for the selection of the small molecule inhibitor, is not straightforward to understand for non-expert readers and could be improved.

      We have added additional text to the Methods and Results to better explain the assays.

      While the in vitro and ex vivo assays are well presented, it is not clear why the myelin basic protein is used as a substrate for CDKL5 in the in vitro kinase assays. Does this protein contain a CDKL5 consensus site?

      To execute the in vitro kinase assays, myelin basic protein (Active Motif, 31314) was employed as a substrate for recombinant CDKL5. Myelin basic protein is used as a substrate for multiple kinases, both serine/threonine and tyrosine kinases, to enable in vitro kinase assays due to the presence of multiple sites for phosphorylation. As such, we and others have used this protein as a kinase substrate for evaluating kinase activity[2, 4]. MBP does not contain a CDKL5 consensus site of RPXS/T*, and as such could be considered a less than ideal substrate to study CDKL5 activity, however for in vitro kinase assays MBP is still suitable as it can be phosphorylated by CDKL5. In addition, CDKL5 is known to phosphorylate substrates that do not contain a consensus motif[3].

    1. Author Response

      Reviewer #1 (Public Review):

      This study demonstrates that a hybrid measurement method increases 3 fold the resolution of mouse USV localization. This increased resolution enables to revise previous occurrence frequency measures for female vocalizations and establishes the existence of vocal dominance in triadic interactions. The method is well described and its efficiency is carefully quantified. A limitation of the study is the absence of ground truth data, which may have been generated eventually with miniaturized loudspeakers in mouse puppets. However, a careful error estimation partially compensates for the absence of these likely challenging calibrations. In addition, the conclusions take into account this uncertainty. The gain in accuracy with respect to previous methods is clear and the impact of localisation accuracy on biological conclusions about vocalisation behavior is clearly exemplified. This study demonstrates the impact of the new method for understanding vocal interactions in the mouse model, which should be of tremendous interest for the growing community studying social interactions in mice.

      We have performed the requested, additional ground estimate using a movable miniature speaker, for more details see point 2 of Reviewer 2, and the new supplementary figure.

      Reviewer #2 (Public Review):

      Past systems for identifying and tracking rodent vocalizations have relied on triangulating positions using only a few high-quality ultrasonic microphones. There are also large arrays of less sensitive microphones, called acoustic cameras that don't capture the detail of the sounds, but do more accurately locate the sound in 3D space. Therefore the key innovation here is that the authors combine these two technologies by primarily using the acoustic camera to accurately find the emitter of each vocalization, and matching it to the highresolution audio and video recordings. They show that this strategy (HyVL) is more accurate than other methods for identifying vocalizing mice and also has greater spatial precision. They go on to use this setup to make some novel and interesting observations. The technology and the study are timely, important, and have the potential to be very useful. As machine learning approaches to behavior become more widespread in use, it is easy to imagine this being incorporated and lowering entry costs for more investigators to begin looking at rodent vocalizations. I have a few comments.

      1) What is the relationship of the current manuscript to this: https://www.biorxiv.org/content/10.1101/2021.10.22.464496v1 which has a number of very similar figures and presents a SLIM-only method that reportedly has lower precision than the current HyVL approach. Is this superseded by the submitted paper?

      The referred manuscript (now published in Scientific Reports) is indeed related to the current work: The currently presented system is based on the integration between SLIM (based on 4 high quality microphones) and Beamforming (based on the 64-channel microphone array). The accuracy of SLIM is generally lower than that of HyVL, but it makes essential contributions to the overall accuracy of HyVL through the integration of the complementary strengths of the two methods/microphone arrays (see Fig. 3A, L-shape of errors). To our knowledge, SLIM was the previously most accurate technique (based on 4 microphones, see comparison in the Discussion), but HyVL exceeds this by a substantial margin. Some figures appear similar mostly due to related code in the underlying analysis pipeline and visualization scripts (e.g. the half-disc densities). However, the set of dyadic and triadic recordings was collected specifically for the present study, and all top-level analyses were performed separately. The single mouse (C57Bl/6 WT) ground truth dataset is shared between the two studies, where in the SLIM paper only the USM4/SLIM part was evaluated (leading to a correspondingly lower, single animal accuracy).

      We felt that the level of detail above would probably impede the reading of the manuscript, and we have therefore added a subset of the above clarifications to the methods and the first time the other study is mentioned.

      2) Can the authors provide any data showing the accuracy of their system in localizing sounds emitted from speakers as a function of position and amplitude? I am imagining that it would be relatively easy to place multiple speakers around the arena as ground truth emitting devices to quantify the capabilities of the system.

      Ground truth data is critical for any meaningful comparison. First, we would like to highlight that we already provided ground truth data in the previous version of the manuscript: In Fig. 3C. we analyzed vocalization data from trials with (1) just a single mouse as well as (2) vocalization at times when all mice were far apart in relation to the accuracy of HyVL (>100 mm, i.e. >25x the accuracy of HyVL) where the chances of erroneous assignment are negligible. We think that these tests are the most relevant, as they are conducted with the relevant sounds, at their actual intensity, spectral profile and emitter acoustics.

      In addition, we have now conducted a series of tests with sounds produced by a miniature speaker placed in 25 different locations to demonstrate the lower-bound of accuracy achievable with the system. The tests indicate an accuracy of MAE < 1mm under these ideal conditions, i.e. without the absorption of the mouse bodies, varying direction of emission of the mouse snout, varying intensity, varying spectral content, duration, etc. Exploring the dependence on all these parameters is in itself interesting, but requires a detailed study in itself. The detailed experimental conditions and results are now provided in Supplementary Fig. 4, including a quantification of the dependence on amplitude.

      3) How is the system's performance affected by overlapping vocalizations? It might be useful to compare the accuracy of caller identification for periods where only one animal is calling at a time vs. periods where multiple animals are simultaneously calling.

      This is an excellent question. Our current code for detecting vocalizations cannot automatically determine if one or multiple vocalizations are concurrently present. We have therefore manually checked all vocalizations for overlapping instances, including those in triadic recordings with two males, where this would be expected to occur most frequently.

      We considered vocalizations to be overlapping if the overlapping constituent timefrequency traces did not form a harmonic stack. Overall, overlaps were surprisingly rare. We did find a couple of cases (<0.1%) where our detection algorithm produced a longer vocalization interval that contained multiple, differently shaped vocalization traces that, when re-analyzed in shortened time-frequency bins with beamforming, belonged to two different males. Note here that beamforming is separately performed from the onset to the end of each vocalization, so the cumulative heatmap can change depending on these onset and end times, which are normally determined by our detection algorithm.

      However, although the identity of the assigned vocalizer could shift in these very rare cases depending on which time bin was re-analyzed, the system’s localization performance remained in principle unaffected: as mentioned above, shorter time bins on non-overlapping parts correctly show the origin of the vocalizations in this case, and therefore a solution to this issue could be a USV detection algorithm that is able to detect the overlap based on the spectral shapes and parses them apart. During the beamforming each vocalization can then be separately localized, by restricting the beamforming to the corresponding time and frequency range. Further, the analysis could be refined so that multiple salient peaks can be detected in the soundfield estimate. This would, however, substantially change the analysis approach, i.e. rather than a single estimate per USV, a sequence of soundfield estimates should be computed and later fused again. Since such a procedure uses less data per single estimate, it also increases the possibility of false positives, which in the current situation with very few overlaps in time, would likely reduce the overall accuracy of the system, we decided to not modify the algorithm in this direction, but we agree that ideally a joint approach - combining separation on the spectrogram and soundfield level - should be pursued. For the present data, if a time window was analyzed such that the intensity map of the sound field contains multiple hotspots of an approximately equal magnitude, the USV would likely remain unassigned, because the within soundfield uncertainty would be higher than for a single peak, and this would reduce the MPI. However, given the rarity of these cases in our dataset, we do not think that their exclusion would change the results appreciably. This information was added as a paragraph to the Discussion.

      It is worth noting that HyVL is very robust: There were a number of cases (<5%) where environmental dampening in combination with harmonic stacking produced interesting timefrequency traces in some of the USM4 microphones, but our system did not have any issue spatially localizing this - what seems like a - smeared vocalization trace. We provide a few examples of this kind in a short video (see Rebuttal Video 2 and the legend at the bottom of this document), where the overlap is also reflected in the intensity map of the sound field, overlaid onto the platform.

      4) Can the authors comment on how sound shadows cast by animals standing between the caller and a USM4 affect either the accuracy of identification or the fidelity of the vocal recording?

      An important point to raise. Sound scattering and dampening caused by the conspecifics of the vocalizing animal can impede the accuracy of any sound localization system, but can unfortunately not be avoided in a social setting. To address this issue, we raised all USM4 microphones by ~12 cm above the interaction platform to minimize the instances of sound blocked by the mice. Further, the Cam64 device should largely be unaffected by sound shadows as it is centrally located above the platform. We have added a modified version of the above comment to the discussion under the heading "Current limitations and future improvements of the presented system".

      5) I'm a bit confused about how the algorithm uses the information from the video camera. Reading through the methods, it seems like they primarily calculate competing location estimates by the two types of microphone data and then make sure that a mouse is in close proximity to one location, discarding the call if there isn't. Why did the authors choose this procedure rather than use the tracked position of the snouts as constrained candidate locations and use the microphone data to arbitrate between them? Do they think that their tracking data are not reliable or accurate enough?

      Thanks for this important suggestion, which we have actually grappled with a lot during the analysis. First of all, the visual tracking data, in particular the manual data, is in our opinion (based on human visual identification) near perfect (within the limits of the video resolution, pixel resolution = 0.8 mm), i.e. on the order of 1-2 mm, and is therefore not the source of any unattributable vocalizations. If we understand the reviewer correctly, then we indeed perform the attribution as he indicates based on the tracked snouts of all mice, specifically by measuring the MPI's of both acoustic location estimates for all mice and then choosing the most reliable one. Specifically, the attributions can be grouped into 3 cases: (i) Estimated origin close to one snout, and snouts rather far apart, (ii) Estimated origin close to one snout and snouts close, and (iii) estimated origin not close to either snout. (i) is easy to address, (ii) is appropriately handled by the mouse probability index, but (iii) is tricky. Since the vocalization has to come from one of the mice, this already indicates that the localization is not working well in this case. Therefore we found it prudent (similar to Neunuebel et al. 2015) to not assign in these cases. Interestingly the MPI is not useful in these cases, as due to the exponential dependence of the normal density on distance, for example a case with a distance of 50 mm to one snout and 60 mm to another snout could lead to an MPI close to 1, which is likely not trustable. We have described this in the Methods as follows:

      "This distance threshold mainly serves to compensate for a deficiency of the 𝑀𝑃𝐼: if all mice are far from the estimate, all 𝑃𝑘 are extremely small, however, the 𝑀𝑃𝐼𝑘 will often exceed 0.95."<br /> Due to the inherent limit for localizing very quiet, short USVs by any system, we think this kind of selection (introduced originally by Neunuebel et al 2015) is a valuable and necessary step in the processing to avoid confusions (which are of course already substantially reduced through HyVL here).

      6) I guess the authors have code that we can run, but I couldn't access it. The manuscript describes the algorithms and equations that are used to calculate the location, but this doesn't really give me a feel for how it works. If you want to have the broadest impact possible, I think you would do well to make the code user-friendly (maybe it is, I don't know). In pursuit of that goal, I would suggest that the authors devote some of the paper to a guided example of how to use it.

      While the code was made available to the reviewers via the link at the beginning of the manuscript (p2, before abstract), we completely agree that this method of distribution is not very accessible. We have therefore created a publicly available GitHub repository (https://github.com/benglitz/HyVL) which hosts the code and details its use on the basis of a sample data set (which is available to the reviewers in the repository link, and later to the public under https://doi.org/10.34973/7kgc-ta72). While we do provide a sample video and analysis workflow there, our data analysis pipeline is quite integrated and other labs will likely use different pipelines. We have therefore tried to make the core functions independent of our pipeline and thus easy to integrate by others into their analysis pipelines.

      Reviewer #3 (Public Review):

      The present manuscript describes a new method to identify the emitter of ultrasonic vocalisations during social interactions between 2 or 3 mice. The method combines two technologies (an "acoustic camera" and a set of four microphones) and succeeds in increasing the spatial precision and the attribution of USV emission to one of the mice. The manuscript describes the characteristics and advantages of each method and the advantages of using both to optimize the identification of USV emitter. The authors used the method to confirm that females are also vocalising during male-female interactions and that females emit USV mostly during nose-nose contact while this was not the case for males. Interestingly, the authors identified that the vocal behaviour of two competing males was strongly asymmetric when facing a female. This was not the case for two females facing one male.

      The method is really promising since the identification of the emitter of USVs during mouse social interactions is a necessary step to speed up our understanding of this communication modality. The increase in spatial precision and in the proportion of attributed vocalisations is non-negligible and will be of great utility in the future.

      We would like to thank the reviewer for this positive perspective on the future utility of our system.

      Generally, the statistical analyses should be adjusted. Indeed, the statistical analyses do not consider the fact that the same individuals were recorded several times (if we understood well the methods). Each point was considered independent (in non-parametric Wilcoxon tests), while this is not the case given the repetitions with the same individuals (the number of repeated encounters per individual should be given in the methods section, by the way). We strongly recommend revising the statistical analyses of the results in Figures 4 and 5. In addition, it could be interesting to check whether the vocal behaviour is stable within each individual (i.e., a male that is vocalising frequently in one situation vocalises always frequently in other situations).

      We generally agree with this suggestion: In order to properly conduct the analysis for individuals as you suggest, a balanced dataset should be used. We had initially collected such a balanced dataset, which was previously not detailed in the manuscript, as the focus was on USV localization/attribution and hence only the recordings containing USVs were analyzed (detailed now in the beginning of Results and Methods). However, overall, the probability of a recording containing vocalizations at all is low: in our balanced set only 23/112 recordings contained vocalizations. We therefore had collected additional recordings with the best vocalizers which created the previously analyzed set of 83 recordings containing USVs recorded with all microphones. This dataset is therefore dominated by recordings from mice that are active vocalizers. While this does not raise any issue for the estimation of the accuracy of the method (Figure 3) or the female vocalizations (Figure 4, because recordings were always randomized across female mice), it precludes an encompassing analysis of individual differences in Figure 5, i.e. the dyadic-triadic comparison. In the new Figure 5, we address the reviewer's question for the dyadic recordings, finding that the current set of recordings does not provide sufficient evidence that individual male mice had significantly different vocalization rates. We would, however, like to point out that this is likely a consequence of the n=4 recordings that are compared here. For the female mice, we also did not find differences in vocalization rates, which is based on n=14 recordings and thus a more reliable result (p=0.16, 1-way ANOVA with factor individual).

      For the triadic recordings, however, due to a limitation in the experiment execution, we unfortunately do not have the complete information available on an experiment level for the triadic recordings, i.e. the video stream was accidentally started after all mice were placed in the platform, and since the same sex animals are visually not separable (while the female mice are separable from the males, based on a slightly shaved region on their head), we cannot completely assess this question in triadic recordings based on the available data. When including the triadic recordings in addition and assuming a single vocalizer (combining all male USVs, see below for why the males could not be assigned in the triadic condition) the male individual comparison can be approximately performed with n=8 recordings, and then the dependence on individual becomes borderline significant (p=0.028, 2-way ANOVA with factors individual and condition).

      For the comparison of vocalization rates in the previous Figure 5 that the reviewer was referring to, we cannot perform a rigorous analysis on the individual level, due to the lack of balance. While we thus agree that differences between individual mice can contribute to the differences observed, we do not think that this would change the conclusion that one of the mice dominates the vocal emissions. If the reviewers agree, we would thus leave Figures 6 (old Fig. 5) and new Figure 7 (behavioral confirmation of dominant/subordinate division) as part of the manuscript, with a clear cautioning about the possible contribution of individual differences to the observed differences. If the reviewers find it inappropriate to leave the results based on the unbalanced dataset in, all results after figure 5 could also be excluded (although we would find this unfortunate, given the additional time and effort we have invested in these).

      It is not easy to understand the rationale behind testing animals in pairs and in triads from the beginning of the manuscript. The authors should better introduce this aspect in the manuscript, especially given the fact that biological results deal with this aspect in Figure 5. The authors might strengthen the parts of the biological results extracted from their new method.

      Thank you for pointing out the need for clarification regarding the rationale behind testing animals in pairs and in triads. It is because courtship interactions are particularly vocal and social, that they are of interest to many fields, e.g. neurodevelopmental disorders.3,4 Due to the natural competitiveness between mice during courtship interactions, high accuracy is particularly beneficial in this regard because it allows disentangling USVs at close distances. We adapted the introduction to better reflect this reasoning and included an extra paragraph in the introduction and also where the biological results from old Fig. 5 / new Fig. 6 are summarized.

      More specifically, the fact that one male takes over the vocal behaviour within a triad is of high interest. Nevertheless, some behavioural data would be needed to strengthen these findings.

      We agree that this is an interesting finding and also agree that some additional behavioral analysis is useful to complement it. In order to arrive at this analysis, we performed all-frame, 3-animal tracking on the 14 triadic recordings with two males. This required switching to skeleton tracking with SLEAP5 in addition to manual post-processing to ensure that no identity switches occur. In each recording the dominant male was then defined as the one that emitted more vocalizations, and then the vocalization-independent spatial interaction histogram was computed, similar to the ones in Fig.4, but now separating between the dominant and the subordinate males (see new Figure 7). The results are consistent with the most typical location of vocalization of the male, in proximity to the female abdomen: The dominant male's spatial interaction histogram (Fig. 7A) was more clearly peaked in the location of the female abdomen very close to the male's snout, in comparison with the subordinate male's histogram (Fig. 7B), which shows up very clearly in the difference between the normalized histograms (Fig. 7C). Significance analysis was performed using 100x bootstrapping on the relative spatial positions to estimate p=0.99 confidence bounds around the histograms of the dominant and subordinate respectively. Significance at a level of p<0.01 highlights multiple relative spatial positions (Fig. 7D), including the one proximal to the snout which has the largest absolute difference (Fig. 7C). Note, that these analyses were conducted on the basis of the non-balanced dataset which contained enough vocalizations to assess the dominant male based on the vocalization rates and thus individual traits of certain animals remain as a possible confound.

      A small proportion of USVs was not assigned. The authors did not discuss the potential reason for this failure (Were the USVs too soft? Did they include specific acoustic characteristics that render them difficult to localise?). These points could be of interest when testing other mouse strains or other species.

      Good point, we agree that it is interesting to know the reasons for failure. As so often, there is not a single property that makes localization hard, but multiple factors contribute. In the SLIM paper, we already identified duration and intensity as important contributors (Fig. 3E/F), and in the speaker test (see new Supplementary Fig. 4) we again demonstrated the influence of intensity. In addition, frequency bandwidth and acoustic occlusion are two other main contributors that each influence the availability of the information/signal-to-noise ratio at the microphones:

      • Frequency bandwidth: In signals that are very narrowband, there are more opportunities for phase ambiguity, in particular for very high-frequency signals. These are avoided/reduced for more wideband signals.

      • Acoustic occlusion: As ultrasonic sounds can be quite directional, if an animal is vocalizing away from a microphone, which in addition would put its body in the way of the sounds to the microphone, then this can reduce the intensity at the microphone to a level where the information is insufficient to utilize information from this microphone. This mostly influences the 4 microphones surrounding the platform, while the Cam64 overhead will likely not be affected by acoustic occlusion in the plain.

      We have added a brief version of this explanation to the discussion under the heading: "Current limitations and future improvements of the presented system"

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, Marmor and colleagues reanalyze a previously published dataset of chronic widefield Ca2+ imaging from the dorsal cortex of mice as they learn a go/no-go somatosensory discrimination task. Comparing hit trials that have a distinct history (i.e. are preceded by distinct trial types), the authors find that hit trials preceded by correct rejections of the nontarget stimulus are associated with larger subsequent neural responses than trials precede by other hits, across the cortex. The authors analyze the time course over which this effect emerges in the barrel cortex (BC) and the rostrolateral visual area (RL), and find that its magnitude increases as the animals become expert task performers. Although the findings are potentially interesting, I, unfortunately, believe that there are important methodological concerns that could put them into question. I also disagree with the rationale that singles out BC and RL as being especially important for the emergence of trial history effects on neural responses during decision-making. I detail these points below .

      1) The authors did not perform correction for hemodynamic contamination of GCaMP fluorescence. In widefield imaging, blood vessels divisively decrease neural signals because they absorb green-wavelength photons, which could lead to crucial confounds in the interpretation of the main results because of neurovascular coupling, which lags neural activity by seconds. For example, if a reward response from the previous trial is associated with a lagged hemodynamic contamination that artificially decreases the signal in the following trial, one could get artificially higher activity in trials that were not preceded by a reward (i.e. CR), which is what the authors observed. Ideally, the experiments would be repeated with proper hemodynamic correction, but at the very least the authors should try to address this with control analyses.

      Done. We basically redone the experiment with proper hemodynamic correction and maintained trial history results. Please see point 1 above for more details (Figures S4 and S5). In addition to hemodynamic controls, we also present novel two-photon single cell data with similar results in Figure S6. We also added a dedicated section for this in the Methods section (pg. 12).

      For example, what is the time course of reward-related responses in BC and elsewhere?

      In general, and specifically in BC, reward related responses return to baseline up to 5 seconds after the start of the reward period and at least 5 seconds before the stimulus presentation of the next trial. In the novel experiments we even extended the baseline period by an additional 2 seconds just in case. Trial history information was still present with an extended inter-trial interval.

      The text now reads (pg. 4): "We further report that responses during the reward period in cortex and specifically in BC went back to baseline 4-5 seconds after the start of the reward period and 6-8 seconds before the presentation of the next stimulus (total inter-trial interval ranged between 10-12 seconds)."

      Do hemodynamics artifacts have a trial-by-trial correlation with the subsequent trial history effect?

      We have now done the proper hemodynamic control (Figure 2) and we did not find a strong effect of hemodynamic responses on trial history information.

      What is the learning time course of reward responses?

      Responses during the reward period as a function of learning were not significantly modulated. We further show the whole learning profile for BC response during the reward period in Author response image 1.

      Author response image 1.

      Response in BC averaged during the reward period (2-4 sec after texture stop) as a function of learning for each mouse separately.

      The text now reads (pg. 4): "In addition, responses in BC during the reward period were not consistently modulated as a function of learning (p>0.05; Wilcoxon signed-rank test between naïve and expert, BC response averaged during the reward period, 2-4 seconds after stimulus onset; n=7 mice). Taken together, we find that direct responses from the reward period do not effect history-related responses during the next trial."

      Note that I don't believe the FA-Hit condition analysis that the authors have already presented provides adequate control, as punishment responses are also pervasive in the cortex and therefore suffer from the same interpretational caveat. Unfortunately, I believe this is a serious methodological issue given the above. However, I will proceed to take the reported results at face value .

      We hope that our additional control analysis regarding the hemodynamic controls are satisfactory.

      2) The statistics used to assess the effect of trial history over learning are inadequate (e.g., Fig 2b). The existence of a significant effect in one condition (e.g., CR-Hit vs. Hit-Hit in expert) but not in another (e.g., same comparison in naive) does not imply that these two conditions are different. This needs to be tested directly. Moreover, the present analysis does not account for the fact that measures across learning stages are taken from the same animals. Thus, the appropriate analysis for these cases would be to first use a two-way ANOVA with repeated measures with factors of trial history and learning stage (or equivalent non-parametric test) and then derive conclusions based on post hoc pairwise tests, corrected for multiple comparisons .

      Done. We performed 2 way ANOVA as suggested and found significant history and learning effects along with a significant interaction effect for BC.

      The text now reads (pg. 4): "This difference was significant during the stim period in learning and expert phases across mice (Fig. 2b; 2-way ANOVA with repeated measures; DF (1-6) F=51 p<0.001, DF (2-12) F=18 p<0.001, DF(2-12) F=5 p<0.05 for trial history, learning and the interaction between trial history and learning; Post hoc Tukey analysis p<0.05 for trial history in learning and expert phases; p>0.05 in the naïve phase)."

      3) I am not convinced that BC and RL are especially important for trial-history-dependent effects. Figures 4 and 5 suggest that this modulation is present across the cortex, and in fact, the difference between CR-Hit and Hit-Hit in some learning stages appears stronger in other areas. BC and RL do have the highest absolute activity during the epochs in Figs 4 and 5, but I would argue that this is likely due to other aspects of the task (e.g., touch) and therefore is not necessarily relevant to the issue of trial history .

      Done. First, we would like to point out that RL during the pre period displays the largest difference between the CR-Hit and Hit-Hit conditions (Fig. 5c bottom). Second, we now show difference maps (i.e., activity in CR-Hit minus Hit-Hit) which clearly show a positive activity patch in BC during the stim period for 5 out of the 7 mice (Fig. S10a). Example maps also highlight RL during the pre period (Fig. S10b). We note that activity patches somewhat spread over to other areas and also slightly vary across mice. This is why the grand average may slightly average out trial history information. Taken together, we strongly feel that during the pre period, trial history information emerges in RL (and adjacent posterior association areas) which shift towards BC during the stim period

      Nevertheless, we agree with the reviewer that other areas (that do not necessarily display high activity) may encode trial history information and we now clearly report this in the text (pg. 5): "We note that other areas, e.g., different association areas, also encoded historydependent information especially during learning and expert phases. In addition, we present activity difference maps between CR-Hit and Hit-Hit conditions during the stim period (Fig. S10a). These maps clearly show the highest trial history information (i.e., difference in activity) in BC. Taken together, these results indicate that BC encodes history-dependent information that emerges during the stim period and just after learning. "

      And also in (pg. 6): " In addition, we present activity difference maps between CR-Hit and HitHit conditions during the pre period (Fig. S10b). These maps localize trial history information to RL which also spreads to other adjacent association areas. Moreover, activity patches slightly vary across the different mice which may affect the grand average (averaged across mice) of each area."

      4) Because of similar arguments to the above, and because this was not directly assessed, I do not believe the conclusion that history information emerges in RL and is transferred to BC is warranted. For instance, there is no direct comparison between areas, but inspection of the ROC plots in Fig 6b suggests that history information emerges concomitantly across cortical areas. I suggest directly comparing the time course between these and other areas

      Done. We now add example history AUC maps and quantify history AUC for all 25 areas during the pre and stim periods. During the pre period (Fig. 6), AUC values are concentrated around the RL (and other PPC areas), whereas during the stim periods AUC values shift to BC. Again, due to the inter-mouse variability, these differences are slightly averaged out which also makes it tough to have strong statistical test (with only 7 mice).

      The text now reads (pg. 7): "We next calculated the history AUC for each pixel during either the pre or stim period. The history AUC maps during the pre period display AUC values around the RL areas (Fig. 6f). In contrast, the history AUC maps during the stim period display AUC values mostly in BC (Fig. 6g). Quantified across 25 areas and averaged across mice, RL displays the highest history AUC during the pre period, whereas BC displays the highest history AUC values during the stim period (Fig. 6h). We note that other cortical areas such as other association areas also display high history AUC values. Taken together, we find that trial history emerges in RL before the texture arrives and then shifts to BC during stimulus presentation. "

      5) How much is task performance itself modulated by trial history? How does this change over the course of learning? These behavioral analyses would greatly help interpret the neural findings and how this trial history might be used behaviorally .

      Done, we have now calculated the dprime for Hit-Hit and CR-Hit trials separately. We find no significant differences between conditions both within and across mice (see Fig. S2 below).

      The text now reads pg. 3): "We note that learning curves that are calculated separately for each pair (i.e., either a preceding Hit or CR trial) were not significantly different (Fig. S2)."

      Reviewer #2 (Public Review):

      Marmor et al. mine a previously published dataset to examine whether recent reward/stimulus history influences responses in sensory (and other) cortices. Bulk L2/3 calcium activity is imaged across all of the dorsal cortex in transgenic mice trained to discriminate between two textures in a go/no-go behavior. The authors primarily focus on comparing responses to a specific stimulus given that the preceding trial was or was not rewarded. There are clear differences in activity during stimulus presentation in the barrel cortex along with other areas, as well as differences even before the second stimulus is presented. These differences only emerge after task learning. The data are of high quality and the paper is clear and easy to follow. My only major criticism is that I am not completely convinced that the observed difference in response is not due to differences in movement by the animal on the two trial types. That said, the demonstration of differences in sensory cortices is relatively novel, as most of the existing literature on trial history effect demonstrates such differences only in higher-order areas .

      Major :

      1a) The claim that body movements do not account for the results is in my view the greatest weakness of the paper - if the difference in response simply reflects a difference in movement, perhaps due to "excitement" in anticipation of reward after not receiving one on CR-H vs. HH trials, then this should show up in movement analysis. The authors do a little bit of this, but to me, more is needed .  

      Done. We have now extensively and carefully analyzed body and whisker movements for CRHit and Hit-Hit conditions. First, In the figure below we decomposed body movements into 22 different body parts using DeepLabCut. In short, we find no significant difference between CRHit and Hit-Hit conditions in each body part separately (Fig. S7 below). This was true for the naïve, learning and expert phases. Please see additional analyses in the points below.

      This is now reported in the text (pg. 4): “In addition, we performed a more detailed body and whisker analysis, e.g., decomposing the movement to different body parts and obtaining single whisker dynamics. These analyses did not find significant differences in movement parameters between CR-Hit and Hit-Hit conditions (Fig. s7 and s8).”

      First, given the small sample size and use of non-parametric tests, you will only get p<.05 if at least 6 of the 7 mice perform in the same way. So getting p>.05 is not surprising even if there is an underlying effect. This makes it especially important to do analyses that are likely to reveal any differences; using whisker angle and overall body movement, which is poorly explained, is in my opinion insufficient. An alternative approach would be to compare movements within animals; small as the dataset is, it is feasible to do an animal-by-animal analysis, and then one could leverage the large trial count to get much greater statistical power, foregoing summary analyses that pool over only n=7 .

      We agree with this point and are have now dramatically improved our statistical analysis.

      1) We now perform within mouse statistics for responses in BC during naïve, learning and expert (see Fig. S4 below). In short, we find statistical significance for 7 out of 7 mice during the expert phase, 6 out of 7 mice in the learning phase and 0 out of 7 in the naive phase. For RL during the pre period we find significant difference in 5 out of 7 expert mice.

      This is now reported in the text (pg. 4): "In addition, a statistical comparison between CR-Hit and Hit-Hit responses within each mouse separately maintained significance for expert (7/7 mice Mann-Whitney U-test p<0.05) and learning (6/7 mice) but not for naïve (0/7 mice. Fig. S3)."

      And also in (pg. 5): "In addition, a statistical comparison between CR-Hit and Hit-Hit responses in RL within each mouse separately maintained significance for expert (5/7 mice; MannWhitney U-test p<0.05)."

      2) We would like to point out that we have now added 3 additional mice (with hemodynamics control) and performed within mouse statistics in BC and RL (Fig. S5), adding to our initial observations.

      3) In terms of body movements, we now performed within mice statistics and compared body movements between CR-Hit and Hit-Hit conditions. In general, most mice did not show a significant difference in body movements or whisker envelope.

      This is now reported in the text (pg. 4): "A within mouse statistical comparison between body or whisker parameters in CR-Hit and Hit-Hit maintained a non-significant difference in expert (1/7 mice displayed a significant difference; Mann-Whitney U-test p>0.05), learning (2/7 mice) and naïve (0/7 mice)."

      And also in (pg. 4): "Body movements and whisker parameters did not significantly differ between CR-Hit and Hit-Hit conditions during the pre-period (Similar to the stim period. Across and within mice. P>0.05; Mann-Whitney U-test)."

      In summary, we have now substantially improved our statistical analysis and further decomposed the body movements, maintaining the trial history results.

      The authors only consider a simple parametrization of movement (correlation across successive frames), and given the high variability in movement across animals, it is likely that different mice adopt different movements during the task, perhaps altering movement in specific ways. Aggregating movement across different body parts after an analysis where body parts are treated separately seems like an odd choice - perhaps it is fine, but again, supporting evidence for this is needed. As it stands, it is not clear if real differences were averaged out by combining all body parts, or what averaging actually entails .

      Please see the above point where we decomposed body movements (Fig. S7 and Methods section in Pg. 14).

      If at all possible, I would recommend examining curvature and not just the whisker angle, since the angle being the same is not too surprising given that the stimulus is in the same place. If the animal is pressing more vigorously on CR-H trials, this should result in larger curvature changes .

      Done. We now decompose whisker dynamics (i.e., curvature) using DeepLabCut (Fig. S8 see below). In general, we find no significant differences in whisker parameters between Hit-Hit and CR-Hit conditions.

      This is now reported in the text (pg. 4): "In addition, we performed a more detailed body and whisker analysis, e.g., decomposing the movement to different body parts. This analysis did not find significant differences between CR-Hit and Hit-Hit conditions (Fig. S7 and S8)."

      Finally, the authors presumably have access to lick data. Are reaction times shorter on CR-H trials? Is lick count or lick frequency shorter?

      Done. We now calculated lick reaction time and lick rate and find a significant difference for the lick reaction time but not in lick rate. We show a figure below for the reviewer and report this in the text

      The text now reads (pg. 3): "In addition, the lick reaction time (but not the lick rate) between Hit-Hit and CR-Hit were significantly different (p<0.05; Wilcoxon signed-rank test) ,maybe indicating a more considered response after a previous stop signal."

      If movement differs across trial types, it is entirely plausible that at least barrel cortex activity differences reflect differences in sensory input due to differences in whisker position/posture/etc. This would mitigate the novelty of the present results .

      As detailed above, have now meticulously analyzed the whisker parameter differences between both conditions and did not find any significant differences.

      1b) Given the importance of this control to the story, both whisker and body movement tracking frames should be explicitly shown either in the primary paper or as a supplement. Moreover, in the methods, please elaborate on how both whisker and body tracking were performed .

      Done. Please see Figs. S7 and S8 for tracking frames. This is now detailed in the above points and also the revised relevant methods section

      2) .Did streak length impact the response? For instance, in Fig. 1f "Learning", there is a 6-trial "no-go" streak; if the data are there, it would be useful to plot CR-H responses as a function of preceding unrewarded trials.

      Done. We have now calculated response in CR-Hit as a function of the number of preceding CRs. In general, we obtain inconsistent results across mice that may be due to the small number of trials that have more than one preceding CR. Nevertheless, some mice have a trend, sometimes significant, in which CR-Hit responses are higher for longer CR preceding streaks. This is especially true during the learning phase. We have decided not to include this in the manuscript and present this figure only to the reviewer.

    1. Author Response

      Reviewer #1 (Public Review):

      The central claim that the R400Q mutation causes cardiomyopathy in humans require(s) additional support.

      We regret that the reviewer interpreted our conclusions as described. Because of the extreme rarity of the MFN2 R400Q mutation our clinical data are unavoidably limited and therefore insufficient to support a conclusion that it causes cardiomyopathy “in humans”. Importantly, this is a claim that we did not make and do not believe to be the case. Our data establish that the MFN2 R400Q mutation is sufficient to cause lethal cardiomyopathy in some mice (Q/Q400a; Figure 4) and predisposes to doxorubicin-induced cardiomyopathy in the survivors (Q/Q400n; new data, Figure 7). Based on the clinical association we propose that R400Q may act as a genetic risk modifier in human cardiomyopathy.

      To avoid further confusion we modified the manuscript title to “A human mitofusin 2 mutation can cause mitophagic cardiomyopathy” and provide a more detailed discussion of the implications and limitations of our study on page 11).

      First, the claim of an association between the R400Q variant (identified in three individuals) and cardiomyopathy has some limitations based on the data presented. The initial association is suggested by comparing the frequency of the mutation in three small cohorts to that in a large database gnomAD, which aggregates whole exome and whole genome data from many other studies including those from specific disease populations. Having a matched control population is critical in these association studies.

      We have added genotyping data from the matched non-affected control population (n=861) of the Cincinnati Heart study to our analyses (page 4). The conclusions did not change.

      For instance, according to gnomAD the MFN2 Q400P variant, while not observed in those of European ancestry, has a 10-fold higher frequency in the African/African American and South Asian populations (0.0004004 and 0.0003266, respectively). If the authors data in table one is compared to the gnomAD African/African American population the p-value drops to 0.029262, which would not likely survive correction for multiple comparison (e.g., Bonferroni).

      Thank you for raising the important issue of racial differences in mutant allele prevalence and its association with cardiomyopathy. Sample size for this type of sub-group analysis is limited, but we are able to provide African-derived population allele frequency comparisons for both the gnomAD population and our own non-affected control group.

      As now described on page 4, and just as with the gnomAD population we did not observe MFN2 R400Q in any Caucasian individuals, either cardiomyopathy or control. Its (heterozygous only) prevalence in African American cardiomyopathy is 3/674. Thus, the R400Q minor allele frequency of 3/1,345 in AA cardiomyopathy compares to 10/24,962 in African gnomAD, reflecting a statistically significant increase in this specific population group (p=0.003308; Chi2 statistic 8.6293). Moreover, all African American non-affected controls in the case-control cohort were wild-type for MFN2 (0/452 minor alleles).

      (The source and characteristics of the subjects used by the authors in Table 1 is not clear from the methods.)

      The details of our study cohorts were inadvertently omitted during manuscript preparation. As now reported on pages 3 and 4, the Cincinnati Heart Study is a case-control study consisting of 1,745 cardiomyopathy (1,117 Caucasian and 628 African American) subjects and 861 non-affected controls (625 Caucasian and 236 African American) (Liggett et al Nat Med 2008; Matkovich et al JCI 2010; Cappola et al PNAS 2011). The Houston hypertrophic cardiomyopathy cohort [which has been screened by linkage analysis, candidate gene sequencing or clinical genetic testing) included 286 subjects (240 Caucasians and 46 African Americans) (Osio A et al Circ Res 2007; Li L et al Circ Res 2017).

      Relatedly, evaluation in a knock-in mouse model is offered as a way of bolstering the claim for an association with cardiomyopathy. Some caution should be offered here. Certain mutations have caused a cardiomyopathy in mice when knocked in have not been observed in humans with the same mutation. A recent example is the p.S59L variant in the mitochondrial protein CHCHD10, which causes cardiomyopathy in mice but not in humans (PMID: 30874923). While phenocopy is suggestive there are differences in humans and mice, which makes the correlation imperfect.

      We understand that a mouse is not a man, and as noted above we view the in vitro data in multiple cell systems and the in vivo data in knock-in mice as supportive for, not proof of, the concept that MFN2 R400Q can be a genetic cardiomyopathy risk modifier. As indicated in the following responses, we have further strengthened the case by including results from 2 additional, previously undescribed human MFN2 mutation knock-in mice.

      Additionally, the argument that the Mfn2 R400Q variant causes a dominant cardiomyopathy in humans would be better supported by observing of a cardiomyopathy in the heterozygous Mfn2 R400Q mice and not just in the homozygous Mfn2 R400Q mice.

      We are intrigued that in the previous comment the reviewer warns that murine phenocopies are not 100% predictive of human disease, and in the next sentence he/she requests that we show that the gene dose-phenotype response is the same in mice and humans. And, we again wish to note that we never argued that MFN2 R400Q “causes a dominant cardiomyopathy in humans.” Nevertheless, we understand the underlying concerns and in the revised manuscript we present data from new doxorubicin challenge experiments comparing cardiomyopathy development and myocardial mitophagy in WT, heterozygous, and surviving (Q/Q400n) homozygous Mfn2 R400Q KI mice (new Figure 7, panels E-G). Homozygous, but not heterozygous, R400Q mice exhibited an amplified cardiomyopathic response (greater LV dilatation, reduced LV ejection performance, exaggerated LV hypertrophy) and an impaired myocardial mitophagic response to doxorubicin. These in vivo data recapitulate new in vitro results in H9c2 rat cardiomyoblasts expressing MFN2 R400Q, which exhibited enhanced cytotoxicity (cell death and TUNEL labelling) to doxorubicin associated with reduced reactive mitophagy (Parkin aggregation and mitolysosome formation) (new Figure 7, panels A-D). Thus, under the limited conditions we have explored to date we do not observe cardiomyopathy development in heterozygous Mfn2 R400Q KI mice. However, we have expanded the association between R400Q, mitophagy and cardiomyopathy thereby providing the desired additional support for our argument that it can be a cardiomyopathy risk modifier.

      Relatedly, it is not clear what the studies in the KI mouse prove over what was already known. Mfn2 function is known to be essential during the neonatal period and the authors have previously shown that the Mfn2 R400Q disrupts the ability of Mfn2 to mediate mitochondrial fusion, which is its core function. The results in the KI mouse seem consistent with those two observations, but it's not clear how they allow further conclusions to be drawn.

      We strenuously disagree with the underlying proposition of this comment, which is that “mitochondrial fusion (is the) core function” of mitofusins. We also believe that our previous work, alluded to but not specified, is mischaracterized.

      Our seminal study defining an essential role for Mfn2 for perinatal cardiac development (Gong et al Science 2015) reported that an engineered MFN2 mutation that was fully functional for mitochondrial fusion, but incapable of binding Parkin (MFN2 AA), caused perinatal cardiomyopathy when expressed as a transgene. By contrast, another engineered MFN2 mutant transgene that potently suppressed mitochondrial fusion, but constitutively bound Parkin (MFN2 EE) had no adverse effects on the heart.

      Our initial description of MFN2 R400Q and observation that it exhibited impaired fusogenicity (Eschenbacher et al PLoS One 2012) reported results of in vitro studies and transgene overexpression in Drosophila. Importantly, a role for MFN2 in mitophagy was unknown at that time and so was not explored.

      A major point both of this manuscript and our work over the last decade on mitofusin proteins has been that their biological importance extends far beyond mitochondrial fusion. As introduced/discussed throughout our manuscript, MFN2 plays important roles in mitophagy and mitochondrial motility. Because this central point seems to have been overlooked, we have gone to great lengths in the revised manuscript to unambiguously show that impaired mitochondrial fusion is not the critical functional aspect that determines disease phenotypes caused by Mfn2 mutations. To accomplish this we’ve re-structured the experiments so that R400Q is compared at every level to two other natural MFN2 mutations linked to a human disease, the peripheral neuropathy CMT2A. These comparators are MFN2 T105M in the GTPase domain and MFN2 M376A/V in the same HR1 domain as MFN2 R400Q. Each of these human MFN2 mutations is fusion-impaired, but the current studies reveal that that their spectrum of dysfunction differs in other ways as summarized in Author response table 1:

      Author response table 1.

      We understand that it sounds counterintuitive for a mutation in a “mitofusin” protein to evoke cardiac disease independent of its appellative function, mitochondrial fusion. But the KI mouse data clearly relate the occurrence of cardiomyopathy in R400Q mice to the unique mitophagy defect provoked in vitro and in vivo by this mutation. We hope the reviewer will agree that the KI models provide fresh scientific insight.

      Additionally, the authors conclude that the effect of R400Q on the transcriptome and metabolome in a subset of animals cannot be explained by its effect on OXPHOS (based on the findings in Figure 4H). However, an alternative explanation is that the R400Q is a loss of function variant but does not act in a dominant negative fashion. According to this view, mice homozygous for R400Q (and have no wildtype copies of Mfn2) lack Mfn2 function and consequently have an OXPHOS defect giving rise to the observed transcriptomic and metabolomic changes. But in the rat heart cell line with endogenous rat Mfn2, exogenous of the MFN2 R400Q has no effect as it is loss of function and is not dominant negative.

      Our results in the original submission, which are retained in Figures 1D and 1E and Figure 1 Figure Supplement 1 of the revision, exclude the possibility that R400Q is a functional null mutant for, but not a dominant suppressor of, mitochondrial fusion. We have added additional data for M376A in the revision, but the original results are retained in the main figure panels and a new supplemental figure:

      Figure 1D reports results of mitochondrial elongation studies (the morphological surrogate for mitochondrial fusion) performed in Mfn1/Mfn2 double knock-out (DKO) MEFs. The baseline mitochondrial aspect ratio in DKO cells infected with control (b-gal containing) virus is ~2 (white bar), and increases to ~6 (i.e. ~normal) by forced expression of WT MFN2 (black bar). By contrast, aspect ratio in DKO MEFs expressing MFN2 mutants T105M (green bar), M376A and R400Q (red bars in main figure), R94Q and K109A (green bars in the supplemental figure) is only 3-4. For these results the reviewer’s and our interpretation agree: all of the MFN2 mutants studied are non-functional as mitochondrial fusion proteins.

      Importantly, Figure 1E (left panel) reports the results of parallel mitochondrial elongation studies performed in WT MEFs, i.e. in the presence of normal endogenous Mfn1 and Mfn2. Here, baseline mitochondrial aspect ratio is already normal (~6, white bar), and increases modestly to ~8 when WT MFN2 is expressed (black bar). By comparison, aspect ratio is reduced below baseline by expression of four of the five MFN2 mutants, including MFN2 R400Q (main figure and accompanying supplemental figure; green and red bars). Only MFN2 M376A failed to suppress mitochondrial fusion promoted by endogenous Mfns 1 and 2. Thus, MFN2 R400Q dominantly suppresses mitochondrial fusion. We have stressed this point in the text on page 5, first complete paragraph.

      Additionally, as the authors have shown MFN2 R400Q loses its ability to promote mitochondrial fusion, and this is the central function of MFN2, it is not clear why this can't be the explanation for the mouse phenotype rather than the mitophagy mechanism the authors propose.

      Please see our response #7 above beginning “We strenuously disagree...”

      Finally, it is asserted that the MFN2 R400Q variant disrupts Parkin activation, by interfering with MFN2 acting a receptor for Parkin. The support for this in cell culture however is limited. Additionally, there is no assessment of mitophagy in the hearts of the KI mouse model.

      The reviewer may have overlooked the studies reported in original Figure 5, in which Parkin localization to cultured cardiomyoblast mitochondria is linked both to mitochondrial autophagy (LC3-mitochondria overlay) and to formation of mito-lysosomes (MitoQC staining). These results have been retained and expanded to include MFN2 M376A in Figure 6 B-E and Figure 6 Figure Supplement 1 of the revised manuscript. Additionally, selective impairment of Parkin recruitment to mitochondria was shown in mitofusin null MEFs in current Figure 3C and Figure 3 Figure Supplement 1, panels B and C.

      The in vitro and in vivo doxorubicin studies performed for the revision further strengthen the mechanistic link between cardiomyocyte toxicity, reduced parkin recruitment and impaired mitophagy in MFN2 R400Q expressing cardiac cells: MFN2 R400Q-amplified doxorubicin-induced H9c2 cell death is associated with reduced Parkin aggregation and mitolysosome formation in vitro, and the exaggerated doxorubicin-induced cardiomyopathic response in MFN2 Q/Q400 mice was associated with reduced cardiomyocyte mitophagy in vivo, measured with adenoviral Mito-QC (new Figure 7).

      Reviewer #2 (Public Review):

      In this manuscript, Franco et al show that the mitofusin 2 mutation MFN2 Q400 impaires mitochondrial fusion with normal GTPase activity. MFN2 Q400 fails to recruit Parkin and further disrupts Parkin-mediated mitophagy in cultured cardiac cells. They also generated MFN2 Q400 knock-in mice to show the development of lethal perinatal cardiomyopathy, which had an impairment in multiple metabolic pathways.

      The major strength of this manuscript is the in vitro study that provides a thorough understanding in the characteristics of the MFN2 Q400 mutant in function of MFN2, and the effect on mitochondrial function. However, the in vivo MFN2 Q/Q400 knock-in mice are more troubling given the split phenotype of MFN2 Q/Q400a vs MFN2 Q/Q400n subtypes. Their main findings towards impaired metabolism in mutant hearts fail to distinguish between the two subtypes.

      Thanks for the comments. We do not fully understand the statement that “impaired metabolism in mutant hearts fails to distinguish between the two (in vivo) subtypes.” The data in current Figure 5 and its accompanying figure supplements show that impaired metabolism measured both as metabolomic and transcriptomic changes in the subtypes (orange Q400n vs red Q400a in Figure 5 panels A and D) are reflected in the histopathological analyses. Moreover, newly presented data on ROS-modifying pathways (Figure 5C) suggest that a central difference between Mfn2 Q/Q400 hearts that can compensate for the underlying impairment in mitophagic quality control (Q400n) vs those that cannot (Q400a) is the capacity to manage downstream ROS effects of metabolic derangements and mitochondrial uncoupling. Additional support for this idea is provided in the newly performed doxorubicin challenge experiments (Figure 7), demonstrating that mitochondrial ROS levels are in fact increased at baseline in adult Q400n mice.

      While the data support the conclusion that MFN2 Q400 causes cardiomyopathy, several experiments are needed to further understand mechanism.

      We thank the reviewer for agreeing with our conclusion that MFN2 Q400 can cause cardiomyopathy, which was the major issue raised by R1. As detailed below we have performed a great deal of additional experimentation, including on two completely novel MFN2 mutant knock-in mouse models, to validate the underlying mechanism.

      This manuscript will likely impact the field of MFN2 mutation-related diseases and show how MFN2 mutation leads to perinatal cardiomyopathy in support of previous literature.

      Thank you again. We think our findings have relevance beyond the field of MFN2 mutant-related disease as they provide the first evidence (to our knowledge) that a naturally occurring primary defect in mitophagy can manifest as myocardial disease.

    1. Author Response

      Reviewer #2 (Public Review):

      This manuscript reports on an important study that aims to identify symptom trajectories for the early detection of pancreatic cancer. The study's findings are based on the analysis of two complementary data sources: structured data obtained from the Danish National Patient Registry and unstructured information extracted from the free-text sections of patient notes. The researchers successfully identified various symptoms and disease trajectories that are strongly associated with pancreatic cancer, with compelling evidence from both data sources. Additionally, the study provides a detailed comparison and contrast of the results obtained from each data source, adding valuable insights into the strengths and limitations of each method.

      Strengths:

      The work is well motivated by the urgent need for early detection of pancreatic cancer, which is often difficult due to the lack of effective (computational) methods. The manuscript is generally well-written and includes relevant studies, providing a comprehensive overview of the current state of the field.

      One of the unique contributions of this work is its use of both structured registry data and unstructured clinical notes to leverage complementary information. This approach enables a more nuanced and comprehensive understanding of the disease symptom trajectories, which is critical for improving early disease diagnosis and prognosis.

      The methodology employed in this study is sound and robust, and the authors have candidly discussed its limitations. The results are significant and highlight previously unknown insights into symptom disease trajectories, which have important implications for the management of pancreatic cancer.

      Overall, this is a well-designed and executed study that makes an important contribution to the field of cancer/informatics research, and it should be of great interest to both researchers and clinicians.

      Weaknesses:

      To complement the results in Figure 1, I'd also suggest that the authors compile a list of the most common (known) symptoms of pancreatic cancer as a reference. In other words, not only can you compare results found from the two sources but also compare them with existing knowledge. This is something you discussed partly in lines 245 but including this early as part of the results in Figure 1 would be more informative.

      We agree that this would be informative to include into the Venn diagram. Hence, we have created a list of the most established and well-known symptoms of pancreatic cancer (Supplementary table S1) and converted these to the comparable ICD-10 level that we also use for the text mining and registry counts in Fig. 1. We have included the Venn diagram as Supplementary Figure S1.

      In terms of the text mining evaluation results, providing information on recall errors would be beneficial to better understand the performance of the method. Additionally, line 144 mentions 53 words, but it is still not clear to me what these words refer to. Could you please clarify this point or provide more context?

      We have added sensitivity/recall measures on the text mining procedure and furthermore added two references in the Discussion of the Tagcorpus program which was used for text mining the clinical notes. These references also mention similar sensitivities for the studies. The 53 words are false positives and we have clarified why these have been captured as false positives by the Tagcorpus (negations).

      The disparities between Figure 2A and 2B are noteworthy, from very different initial symptoms to the proportion of short median survival dates (<=90 days), with much more pronounced differences than those observed in Figure 1 comparing two data sources. The highlighted trajectories are almost completely different. Should this be expected? I was hoping to see at least some overlap between the two results.

      After updating the case population (via the cancer registry) and showing only symptoms trajectories in this revised version, we can clearly see that the trajectories are more similar. This gives an indication that the methods pick up on similar pancreatic-cancer symptoms, but there are also differences that show how each data type can complement the other, such as the text-mined trajectories being able to pick up longer symptom trajectories prior to the cancer.

      All trajectories shown in Figure 2 include three symptoms. Is this by design? Could there be meaningful trajectories with different numbers of symptoms (e.g. 4 or more)?

      We agree and have added the significant length 4 trajectories (for the registry data) as supplementary figure S2. No trajectories with length 5 or higher were found in the registry-based analysis. No length 4 (or higher) trajectories were found for the text-mined patients (presumably due to the data set size).

      Considering those patients with both clinical notes and registry data, it may be beneficial to merge their symptoms to generate more informative trajectories.

      This could be interesting but is out of scope for this paper. Here we would like to stress the proof-of-concept that the two data types can complement each other. The next steps would be to generate these multimodal trajectories to for example test if they are predictive of pancreatic cancer. Nonetheless, we acknowledge the significance of this perspective and have incorporated it into the Discussion section of the manuscript.

      Given that results from two sources are being compared in Figures 1 and 2, have you considered calculating the top 20 most significant symptoms from the registry data as well?

      We have done this and added them to Supplementary figure S3.

      While there is a discussion related to cardiovascular diseases, I noticed no mention of cataracts or gonarthrosis, which were found to be prevalent among patients with short survival in Figure 2.

      Since we now only include symptoms trajectories in the Results, we have chosen to not include these results in the Discussion for the final version of the manuscript. However, the diagnosis-wide trajectories are included in the Supplementary figure S2. Cataract and gonarthrosis have still been found significant in the results even though they are not shown in the Supplementary figure due to its visualization threshold of min. 400 patients per trajectory.

      Ultimately, the goal of this research is to improve the early detection and prognosis of pancreatic cancer, thus it is important to discuss how the findings of this work could be applied in practice towards this goal (e.g. used by disease prediction algorithms?)

      We agree that this is very important and have added a small section on this in the Discussion. We have also cited a recent publication using deep learning algorithms to predict pancreatic cancer based solely on registry data (Placido et al. 2023).

    1. Author Response

      Reviewer #1 (Public Review):

      In general, in the discussion, I miss two of the main points that led to suspend screening programs in most countries during the pandemic:

      1) protecting women from the risk of infection linked to attending a clinic during pandemic when health facilities were mostly attended by symptomatic people seeking care for Covid-19;

      We agree. We have added this to the background and Discussion section (page 3, lines 76-78 & page 9, lines 296-299).

      2) the of health professionals because they were mostly involved in covid related activities: lack of radiologists (addressed to the emergency department to assure diagnoses of pneumonia), lack of anesthesiologists (due to the expansion of intensive care), thus risking not having timely surgical treatment; lack of screening organization personal for invitations and phone calls (working on contact tracing).

      We agree. We have added this to the background and Discussion section (page 3, lines 76-78 & page 9, lines 296-299).

      Lacking the rationale for suspending screening, it is not clear to the reader how the Danish program afforded these issues and was able to maintain open the program.

      We have elaborated on this in the Discussion section (page 296-299), arguing that Denmark may have partly decreased the issue of staff shortage due to e.g., a lower burden of COVID-19, use of laymen and medical student for testing and vaccinations and a high vaccine coverage.

    1. Author Response

      Reviewer #1 (Public Review):

      Hoang, Tsutsumi and colleagues use 2-photon calcium imaging to study the activity of Purkinje cells during a Go/No-go task and related this activity to their location in Aldolase-C bands. Tensor component analysis revealed that a substantial part of the calcium responses can be linked to four functional components. The manuscript addresses an important question with an elegant technical approach and careful analysis. There are a few points that I think could be addressed to further improve the quality of the manuscript.

      1) The authors should be careful not to overstate the goal and results. For instance, in the abstract it is stated that dynamical functional organization is necessary for dimension reduction. However, the statement that the 4 TCs together account for about half of the variance (line 220) indicates that dimensionality may not be reduced that much. I would suggest revising the first and last sentence of the abstract accordingly.

      Dynamic functional organization of TC1 and TC2 by synchronization is the major finding of this study and we believe that it is one of the most efficient mechanisms of dimension reduction, given the unique anatomy of the cerebellum. In the revised manuscript, we added a supplemental result showing that the dimensionality of TC1 and TC2 neurons decreased and increased, respectively, in accordance with bi-directional changes in their synchronization (Figure 3 – figure supplement 1DE). Dimension reduction was further confirmed by conventional PCA (Figure 6 – figure supplement 1). However, we agree that the statement that the cerebellum reduces dimensions by self-organization of components is speculative, and we revised the abstract accordingly.

      At the end of the introduction, the authors refer to "the first evidence supporting the two major theories of cerebellar function" but which two theories is referred to and how this manuscript support them is not very obvious. Similarly, they state that "This study unveiled the secret of cerebellar functional architecture", which I would consider to be an unnecessary overstatement of the impact of the work described.

      In the revised Introduction, we explicitly stated that TC1 and TC2 are related to timing control and cognitive error learning, respectively, with some indirect causal evidence. We also revised the last paragraph of the Introduction to emphasize that this study provides the first evidence to support the view that distinct cerebellar components may serve divergent cerebellar functions in a single task. The statement "This study unveiled the secret of cerebellar functional architecture" was removed.

      In the title, the authors use the word modular. In the consensus paper on cerebellar modules (Apps et al., 2018) an attempt is made to unify the terms used to describe cerebellar anatomical structures. Here "module" is used for the longitudinal zone of interconnected PCs, CN neurons and olivary neurons. As the authors only studied PC activity (and indirectly the IO), I would suggest using band, stripe or subpopulation instead.

      Because we used TCA to identify functional components underlying the Go/No-go data, we changed the word “module” to “component” in the title.

      Finally, the term "CF firing" or "CF activity" is used when referring to the recorded signals. However, the authors measure postsynaptic calcium responses that are indeed likely driven by CF inputs, but could also be influenced by PF inputs. At the very least, because Purkinje cells and not climbing fibers are being imaged, "complex spike" should be used instead. It would be more accurate still to use the more general "calcium response" and make less of an assumption about the origin of the calcium response.

      In this study, CF-dependent dendritic Ca2+ signals in adjacent AldC compartments were recorded by the two-photon imaging. The HA_time algorithm (Hoang et al. 2020) was then applied to extract spike timings from the recorded signals. In the revised manuscript, we used the terms “calcium responses” and “complex spikes” when referring to the recorded Ca2+ signals and the estimated spikes, respectively.

      2) For some figure panels and statements in the manuscript error bars or confidence intervals and statistics are missing. This is the case for, for example, the changes in fraction correct, lick latency, fraction incorrect, etc. (Fig 1B, 2E-F, TC levels in 3, 4D-E and 5A-C). Including these is particularly relevant in Fig 4E as this is a key result, mentioned also in the abstract. Please indicate clearly if these plots are cumulative for all mice or per mouse and averaged. I advise the authors to statistically support the claim that the changes are significant and in opposite direction as this element of the study is referred to in the abstract and discussion (summary).

      We added the error bars / confidence intervals to the related figures. Most importantly, we added histograms of synchrony strength for TC1/TC2 neurons (Figure 4E) and conducted statistical tests to strengthen the claim of bi-directional changes in synchronization of TC1/TC2.

      3) Data presentation sometimes does not do the work justice. For example, the data in Figure 6 are very interesting, but hard to read because of the design of the figure. It is clear how the components are mostly confined to Aldolase-C domains, but within the domains the distribution is not clear. I would advise to also more clearly indicate what the locations of the colors within the bands refers to. The spatial distribution of the selected top 300 cells for each TC could be added.

      We added pie-chart plots for the fraction of TC1-4 neurons in each Ald-C zone and learning stage. We also indicated in the figure legend that the location of a single-color bar referred to the geographic distance of the corresponding neuron relative to Ald-C boundaries. We included spatial distribution of the selected neurons in Figure 4 – figure supplement 1D.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors investigate the mechanistic underpinning of paradoxical activation (PA) of RAF by small molecule kinase inhibitors using mathematical modeling. The main novelty of the study is the consideration of RAF conformational autoinhibition by its N-terminal regulatory domains as a new determinant of PA. This mechanism has not been explicitly considered in previous theoretical studies, which are based on two other mechanisms: drug-induced RAF oligomerization into active dimers (dimer potentiation DP) and negative cooperativity (NC) of inhibitor binding by a second monomer in the inhibitor-induced RAF kinase dimerization. An important discovery of this study is that conformational autoinhibition is a critical determinant of PA and that in some cases, it can contribute to PA in the absence of DP and NC. Another novelty is the consideration of RAF interaction with 14-3-3 proteins, as a determinant of PA. The 14-3-3 dimeric scaffolds play an important role in the regulation of both autoinhibited and active states of RAF and thus understanding how their interaction with RAF influences PA by RAF inhibitors is important. Using mathematical modeling the authors show that 14-3-3 binding does indeed enhance PA in response to a spectrum of RAF inhibitors.

      We thank Reviewer #1 for reviewing our manuscript, and we agree with this summary.

      Strengths

      The overall strength of this study is that it increases the mechanistic understanding of how PA of RAF originates in response to its inhibitors. Consideration of the effect that the inhibitors play in breaking the autoinhibited conformation has been overlooked by previous mathematical analyses of PA, and this study bridges this gap. By doing so, the authors discover that breaking that autoinhibited state is in fact the biggest contribution to PAB by RAF inhibitors. In my opinion, this is the most impactful finding of this study, which additionally speaks to how important are the autoinhibitory mechanisms for constraining basal RAF signaling in cells. The presented analysis also shows that consideration of conformational autoinhibition can explain PA by all different types of RAF inhibitors (1, 1.5, and 2), which until now has been difficult to reconcile.

      Another important contribution of this study is the investigation of how the 14-3-3 scaffold proteins can further contribute to PA. This is exciting, especially in light of recent elegant structural studies that unveiled complex regulation of RAF by 14-3-3 (which are both important for RAF inhibition and stabilization of the active dimers). The authors dissect these opposing roles of 14-3-3 in their model and show the autoinhibitory interaction with 14-3-3, but not the activating one, significantly increases the PA response. Their findings that an increase in the 143-3 levels amplifies PA is very interesting and somewhat provocative as it is unclear how much 14-3-3 levels in cells can oscillate. To this end, the authors show that elevated 14-3-3 levels are observed with increased time of RAF inhibitor treatment, which might point to a new mechanism of resistance to RAF inhibitors.

      We thank reviewer #1 for the enthusiastic review and for highlighting the value of bringing conformational autoinhibition into the study and understanding of paradoxical activation. We also appreciate the positive consideration of the 14-3-3 section of the manuscript and the helpful suggestions later in the review. In this revision, we have taken the offered option of removing all of the 14-3-3 theoretical and experimental work. We plan to expand the 14-3-3 work in our ongoing work, in accordance with the thoughtful input from reviewers #1, #2, and #3 on this topic. Thank you.

      Weaknesses

      The main weakness of the study is the limited experimental analysis conducted to test the predictions that arise from the mathematical models. While some of these predictions might be challenging to test, the one which is tested is not tested rigorously. The experiments focus on 14-3-3-based regulation and are conducted in cells by observing the effect of 14-3-3 overexpression on the inhibition of RAF signaling by its different kinase inhibitors. While the authors acknowledge that too, 14-3-3 overexpression will have a multifaceted effect on signaling as these scaffold proteins participate in the regulation of almost all signaling events. Thus, the proposed experiments are not sufficient to conclude that the observed effects are in fact a result of 14-3-3/RAF interaction.

      The authors consider conformational autoinhibition and 14-3-3 stabilization of autoinhibited RAF as two different mechanisms. While it is not a weakness, I am curious how accurate is the consideration of the autoinhibited state of RAF in the absence of 14-3-3. Is it known how the proportion of RAF in cells in its inactive state exists while not bound to 14-3-3?

      We thank Reviewer #1 for this input on how we can significantly improve the 14-3-3 section of the manuscript. We have removed the 14-3-3 sections due to the consensus input of all three reviewers and the presented option of focusing on the theoretical results of how conformational autoinhibition influences PA. We do plan to continue this research program on beyond this manuscript, and we therefore very much appreciate these insights into which aspects should be supported with additional experiments and the challenges that follow from the pleiotropic activities of 14-3-3 proteins. The suggestion of quantifying the ratio of autoinhibited to non-autoinhibited forms of RAF when 14-3-3 proteins are present and absent is an experiment we plan to pursue in our future work. It will require us to learn new methods and/or to form new collaborations, and we therefore appreciate the consensus opinion that this would be outside of our current expertise and outside of the scope of the focused manuscript on modeling the impact of conformational autoinhibition on PA.

      Reviewer #2 (Public Review):

      In this study, the authors set out to investigate factors that have been neglected in existing mathematical models for the paradoxical activation (PA) of RAF by pharmacological inhibitors. The PA phenomenon is well known and is thought to be an important factor in limiting the effectiveness of RAF inhibitors. The authors primarily use mathematical models, first to examine the importance of conformational autoinhibition of RAF monomers, and later to investigate the potential role played by binding of 14-3-3 proteins to either autoinhibited monomers or active dimers. The authors develop several model variants containing different candidate mechanisms and generate analytical solutions that demonstrate under which parameter conditions PA may occur within these models. The use of analytical solutions is a strong point of the paper, as it allows evaluation of the models independently of specific parameter values. This analysis suggests that conformational autoinhibition is a very strong contributor to paradoxical activation, as models that include this mechanism show substantially larger concentration ranges under which RAF is activated by inhibitors. Fitting the parameters of the model to a published dataset on multiple inhibitors further suggests that conformational activation is important, as models containing this mechanism can fit the dataset with significantly lower error. Another interesting observation is that the different types of RAF inhibitors (1, 1.5, 2) fit the data with parameter values that are reasonably similar within each type. A moderate weakness in this analysis is that all of these observations provide indirect evidence for the importance of conformational autoinhibition. A direct test of whether PA is reduced when conformational autoinhibition is removed would be more compelling, but such a test could be difficult to set up experimentally.

      We thank Reviewer #2 for reviewing our manuscript, and we agree with this summary. We agree that an experimental test where conformational autoinhibition is removed from the system would a very compelling experiment, but that it would be difficult to set up experimentally. We appreciate the option to focus on the theoretical advance in our revision, and we will be working toward such an experiment.

      The authors then perform an analysis of how 14-3-3 binding to either autoinhibited monomers or active dimers might enhance PA. A new model is constructed that contains these binding events in the context of conformational activation, but without negative cooperativity or dimer potentiation included, for the sake of limiting complexity. These models implicate monomer binding, but not dimer binding as a contributor to PA. They follow up on this model result by overexpressing 14-3-3 proteins in two RAS-mutant cell lines, which leads to both higher baseline ERK phosphorylation and to a wider range of inhibitor-induced PA, as predicted by the model. A cell-based RAF dimerization assay also shows higher dimerization effects when 14-3-3 plasmids are transfected as well. This experimental evidence provides strong support for the model, although one drawback, which is noted by the authors in the discussion, is that 14-3-3 overexpression could potentially exert effects on RAF activity through pleiotropic effects other than the binding actions included in the model.

      We thank Reviewer #2 for the input on the 14-3-3 section of the manuscript. Although it has been removed from the revision, all of the comments from the review will be helpful for our ongoing work.

      Overall, this study makes a strong contribution to understanding the paradoxical effects of RAF inhibitors on the RAS/ERK signaling pathway, which remains a significant problem in the use of targeted inhibitors for cancer. Demonstrating that both conformational activation and 14-3-3 binding strongly contribute to the PA effect is an important step forward, as it establishes that these mechanisms should not be overlooked when designing strategies to use Raf inhibitors.

      We appreciate the thoughtful review and helpful comments to improve the manuscript.

      Reviewer #3 (Public Review):

      The authors describe a mathematical and computational modeling study of RAF paradoxical activation (PA), a phenomenon in which RAF inhibitors exhibit a bell-shaped dose-response curve of Erk phosphorylation - activating signaling through wild-type RAF at low drug concentrations before inhibiting it at higher concentrations. They explore three distinct mechanisms that may contribute to PA - conformational autoinhibition, negative cooperativity, and drug-induced dimerization - and conclude that all three are required to best fit published data that show the PA phenomenon. They explore the effect of 14-3-3 binding to RAF both computationally and experimentally and reach the conclusion that 14-3-3 can potentiate the PA phenomenon via stabilization of the autoinhibited conformation.

      We thank Reviewer #3 for reviewing our manuscript, and for the helpful comments in the review.

      Strengths:

      One key finding will be quite valuable to the field - that paradoxical activation can arise in the absence of negative cooperativity and without any effect of the inhibitor on the propensity of RAF to dimerize, provided that there exists a "conformationally autoinhibited" state that cannot dimerize and cannot bind inhibitor. This finding is important because negative cooperativity and dimer-induction have been a major focus - arguably the main focus - of prior studies of the phenomenon and also a source of considerable confusion. Inhibitors with very different chemical structures and binding properties - type 1.5 inhibitors that are dimer-breakers (and may or may not exhibit negative cooperativity) and type I and II inhibitors that can promote dimers (and almost certainly do not exhibit negative cooperativity) can nevertheless both exhibit PA. Thus the authors' modeling provides a unifying explanation - it is not dimerinduction or negative cooperativity that is at the root of PA, rather it is that there exists an autoinhibited state that can neither bind inhibitor nor dimerize. The authors further show that negative cooperativity and dimer-induction can act in concert with "conformational autoinhibition" to modify the PA response in a drug-specific manner.

      We thank Reviewer #3 for highlighting these strengths and their value to the field. In the focused paper, we have updated our discussion of the fits and of the model to highlight these points better.

      Weaknesses:

      Unfortunately, the authors don't really explain in a straightforward manner what is going on with the conformational autoinhibition model (Figure 2A). One has to read carefully and all the way to section 3 of appendix 1 to piece it together. In short, what the math shows is that at least for certain ranges of parameter values, the presence of an inhibitor can increase the concentration of dimers, even when it does not change the equilibrium constant for dimer formation, and some of those dimers will have an active, drug-free protomer. This is because the inhibitor effectively traps open monomers, which can then capture drug-free open monomers to form active dimers (active in one subunit, inactive and drug-bound in the other). As inhibitor concentration increases, the pool of autoinhibited RAF is diminished, and eventually, it is shifted completely to fully inhibited dimers. But at low concentrations of inhibitor, there is a net increase in dimerized (active) but drug-free protomers (see figure on page 27 of the appendix). Voila, paradoxical activation, with no need to invoke negative cooperativity.

      We apologize for the confusion, and agree that the description/walk through in the appendix should be featured more prominently in the manuscript. To this end, we have added a section to the main manuscript (titled “Paradoxical activation reflects a shifting balance of signaling complexes”) that includes the content that was previously in the appendix, and we have added a supplementary figure (Figure 2 – figure supplement 2) which includes the figures from the appendix. Thank you for your thorough review and working through the appendix, and we appreciate this suggestion.

      Considering the potential for confusion around what is meant by "drug-induced dimerization" as an effect distinct from the effect of the drug in promoting RAF dimerization in their conformational autoinhibition model, it would have been helpful for the authors to explicitly address the distinction (drug-induced dimerization alters the equilibrium constant for dimerization; this is not a feature of the conformational autoinhibition model).

      Thank you for this suggestion. We have clarified our text by rewriting it to read: … some RAF inhibitors have been shown to result in an increased level of RAF dimerization (Hatzivassiliou et al., 2010; Jin et al, 2017; Karoulia et al., 2016; Lavoie et al, 2013). This druginduced dimer potentiation is commonly thought of as manifesting in a higher affinity between RAF protomers when one (or both) are bound to a RAF inhibitor (Kholodenko, 2015).

      Also, I am confused by Figure 3C. The figure shows, and the authors state in the text, that for type II inhibitors an f > ~1 indicates a propensity to break dimers. But type 1.5 inhibitors should break dimers, and Type I and II inhibitors should promote dimers (at least some Type I and II drugs have been shown to promote kinase dimers). Seems that the predictions of the model are inconsistent with experimental data, at least for some inhibitors.

      We agree that discussing the fits, relating them to experimental data and current thinking in the field, is important. We have therefore significantly extended our discussion of the fits in Figure 3C in the Discussion of the text. The new text reads:

      It has previously been difficult to reconcile PA for Type I.5 inhibitors, which are sometimes thought of as dimer breakers because they position the alpha-C helix in the “out” position (in contrast to Type I and Type II inhibitors). Studies with recombinant protein and analytic ultracentrifugation clearly found type I.5 inhibitors to predominantly be in the monomeric form (Lavoie et al., 2013). Within-cell assays have similarly found type I.5 inhibitors to promote dimerization less than other Type I and Type II RAF inhibitors (Hatzivassiliou et al., 2010; Peng et al., 2015; Thevakumaran et al, 2015), however, RAF inhibitors still appeared to promote some dimerization in those in-cell assays. 14-3-3 binding proteins, which can help stabilize RAF dimers, may help explain this discrepancy (Kondo et al., 2019; Liau et al, 2020; Park et al., 2019). For example, by promoting the non-autoinhibited form, a type I.5 inhibitorbound RAF monomer is more dimerization capable than an autoinhibited (and non-inhibitor bound) RAF monomer, and even if the affinity is reduced compared to a non-autoinhibited and non-inhibitor bound RAF monomer, 14-3-3 proteins may be able to bind and overcome the effect. As our model does not explicitly include 14-3-3 proteins, this effect may contribute to our parameter estimation process finding an elevated binding affinity for type I.5 bound RAF monomers.

      Although negative cooperativity has been difficult to precisely measure experimentally, it has widely been assumed to be present to help explain the paradoxical activation caused by Type I.5 inhibitors that do not promote dimerization as strongly as other RAF inhibitors. Our best fit parameters did tend to have g values that were larger than 1, indicating that the model fit best when there was some negative cooperativity. This could suggest that negative cooperativity is more abundant than widely believed. Alternatively, the model without negative cooperativity was able to fit the data nearly as well as the full model that included negative cooperativity (i.e., Figure 3D). This may suggest that other processes not included in the model may be modulating paradoxical activation and the g parameter, as the only other term the model, is contributing to the models ability to account for these otherwise not included effects.

      We found parameter sets that reproduced available, published, data in order to test our model and investigate the potential for it to help illuminate aspects of PA. The best fit parameter sets further support a role for conformational autoinhibition and its modulation by RAF inhibitors in PA. However, it is also important not to read too deeply into the fits. For example, the data for the type II inhibitors AZ-628, LY3009120, and TAK-632 had small total fold-change PA magnitudes, and our fits for them have even less PA. We anticipate that the model-fitting approach would converge to increasingly accurate estimates for the parameters as the set of data being fit to expands. Additionally, quantitative experimental measurements of the parameters being fit should also cascade to impact other parameters and result in better estimates (Gutenkunst et al, 2007).

      A large part of the paper deals with the effect of 14-3-3 binding. In my view, this part of the manuscript is not particularly helpful. There is no evidence (that I am aware of) that 14-3-3 concentrations vary significantly, or that their variation affects RAF activity/signaling. Considering their abundance relative to RAF, and relatively high affinity for RAF, it is likely that both autoinhibited and active RAF are saturated with 14-3-3. (RAF that is not 14-3-3-bound is likely mostly bound to chaperones and not active). That said, the authors' conclusion (based on modeling) that 14-3-3 can increase the extent of paradoxical activation by stabilizing the autoinhibited state seems sensible, but hard to reconcile with their experimental result where they find increased basal signaling with 14-3-3 over-expression. It is also difficult to understand how increased 14-3-3 binding to RAF could lead to active RAF dimers that are not inhibited at 10-100 uM concentrations of potent RAF dimer inhibitors like LY3009120 (Fig. 5C). It seems more likely that 14-3-3 overexpression is promoting Erk phosphorylation in a manner that is (at least partially) Raf-independent. To their credit, the authors acknowledge this concern.

      We thank Reviewer #3 for the helpful critique of the section on 14-3-3. Although we have cut this section as part of the consensus review and suggestions for how to proceed with the revision, these points are very helpful for us as we consider how to interpret the modeling and experimental results of this section, how it fits into what is known, and what we should investigate next. Thank you.

      Finally, one comment regarding the presentation. The authors discuss conformational inhibition and 14-3-3 binding as if they are promoting and/or inducing paradoxical activation. This is pervasive in the paper, including in the title, and is distracting and potentially will mislead some readers. Obviously, it is RAF inhibitor that induces or promotes paradoxical activation. Conformational autoinhibition - mediated by 14-3-3 - is a feature of the system that makes paradoxical activation possible.

      We completely agree. We have rephrased to avoid this interpretation and we apologize for not recognizing it previously. Thank you for catching this and noting it for us to fix. As examples of the revisions to address this point, the last sentence of our abstract now reads:

      Overall, this work establishes conformational autoinhibition as a robust mechanism for RAFinhibitor driven PA based solely on equilibrium dynamics of canonical interactions that comprise RAF signaling and inhibition.

      And as another example, the third to last sentence in our Introduction now reads:

      Our modeling reveals that, under certain conditions, RAF autoinhibitory conformational changes and their modulation by RAF inhibitor binding can be sufficient to drive PA.

      Lastly, we have a last paragraph in the discussion that summarizes and hypothesizes to generalization:

      \Our analysis was motivated by RAF inhibitors and PA in RAS mutant cells treated with a RAF inhibitor. Our model, however, is generalizable to other systems that share the modeled features. We anticipate that PA will be observed for other proteins (a) that have a dynamic-equilibrium of conformations, (b) where not all conformations can dimerize, and (c) where drug binding the protein stabilizes one or more of the conformations that can dimerize. As dimerization and conformational autoinhibition are both common features for kinase regulation (Huse & Kuriyan, 2002; Lavoie et al, 2014), it seems reasonably to hypothesize PA will be observed for more kinases through modulation of the conformation and dimerization dynamic-equilibrium. Thank you for suggesting these changes.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript reports a study to investigate the reporting practices in three top cardiovascular research journals for articles published in 2019. The study was preregistered, which makes the intent and methodology transparent, and the authors also make their materials, data, and code open. While the preregistration and sample strategy is a strength, it suffers from a higher than expected number of non-empirical articles decreasing the sample size and thus inference that can be drawn. The author's focus was mainly on transparency of reporting and not on the actual reproducibility or replicability of the articles; however, the accessibility of data, code, materials, and methods is a prerequisite. While the authors were still able to draw inferences to their main objectives, they could not perform some of their proposed analyses because of a small sample size (due partly to the less than half empirical articles in their sample as well as the low number of papers with accessible information to code). One of the descriptive analyses they performed, the country level scores (Figure 6), in particular suffers from the small sample size and while the authors state indicates this in their manuscript I do not think it would be reasonable to include as it has the potential to be misinterpreted since so many are based on an n=1. Overall, I found the authors presentation and discussion clear and concise; however, a lack of a more in-depth discussion is an area to improve the current manuscript. The manuscript outlines opportunities for researchers, journals, funders, and institutions to improve the way cardiovascular research is reported to enable discovery, reuse, and reproducibility.

      We appreciate the reviewer’s recognition of our pre-registration, methodology, and resource sharing and also their feedback regarding the small sample size of empirical research articles and need for a more in-depth discussion of the impacts of our study. We have now increased the number of empirical studies to a total of 393 out of 639 articles screened. We also agree that our study focuses more on transparency than reproducibility and replicability, and we have changed our title to reflect this. While the sample size of empirical papers has increased, a comparison of accessibility scores across countries continued to suffer from small sample size and we have removed it based on the recommendation of the reviewers. We have updated the Materials and Methods section to reflect our updated analyses, as well as included additional paragraphs on Limitations and Future Work in our Discussion to acknowledge future improvements that could be made to the accessibility score used in our study.

      Reviewer #2 (Public Review):

      This is a descriptive paper in the field of metascience, which documents levels of accessibility and reproducible research practices in the field of cardiovascular science. As such, it does not make a theoretical contribution, but it argues, first, that there is a problem for this field, and second, it provides a baseline against which the impact of future initiatives to improve reproducibility can be assessed. The study was pre-registered and the methods and data are clearly documented. This kind of study is extremely labour-intensive and represents a great deal of work.

      I have a major concern about the analysis. It is stated that to be fully reproducible, publications must include sufficient resources (materials, methods, data and analysis scripts). But how about cases where materials are not required to reproduce the work? In line 128-129 it is noted that the materials criterion was omitted for meta-analyses, but what about other types of study where materials may be either described adequately in the text, readily available (eg published questionnaires), or impossible to share (e.g. experimental animals).

      To see how valid these concerns might be, I looked at the first 4 papers in the deposited 'EmpricalResearchOnly.csv' file. Two had been coded as 'No Materials availability statement' and for two the value was blank.

      Study 1 used registry data and was coded as missing a Materials statement. The only materials that I could think might be useful to have might be 'standardized case report forms' that were referred to. But the authors did note that the Registry methods were fully documented elsewhere (I am not sure if that is the case).

      Study 2 was a short surgical case report - for this one the Materials field was left blank by the coder.

      Study 3 was a meta-analysis; the Materials field was blank by the coder

      Study 4 was again coded as lacking a Material statement. It presented a model predicting outcome for cardiac arrhythmias. The definitions of the predictor variables were provided in supplementary materials. I am not clear what other materials might be needed.

      These four cases suggest to me that it is rather misleading to treat lack of a Materials statement as contributing to an index of irreproducibility. Certainly, there are many studies where this is the case, but it will vary from study to study depending on the nature of the research. Indeed, this may also be true for other components of the irreproducibility index: for instance, in a case study, there may be no analysis script because no statistical analysis was done. And in some papers, the raw data may all be present in the text already - that may be less common, but it is likely to be so for case studies, for instance.

      A related point concerns the criteria for selecting papers for screening: it was surprising that the requirement for studies to have empirical data was not imposed at the outset: it should be possible to screen these out early on by specifying 'publication type'; instead, they were included and that means that the numbers used for the actual analysis are well below 400. The large number of non-empirical papers is not of particular relevance for the research questions considered here. In the Discussion, the authors expressed surprise at the large number of non-empirical papers they found; I felt it would have been reasonable for them to depart from their pre registered plan on discovering this, and to review further papers to bring the number up to 400, restricting consideration to empirical papers only - also excluding case reports, which pose their own problems in this kind of analysis.

      A more minor point is that some of the analyses could be dropped. The analysis of authorship by country had too few cases for many countries to allow for sensible analysis.

      Overall, my concern is that the analysis presented here may create a backlash against metascientific analyses like this because it appears unfair on authors to use a metric based on criteria that may not apply to their study. I am strongly in favour of open, reproducible science, and agree it is important to document the state of the science for different disciplines. But what this study demonstrates to me is that if you are going to evaluate papers as to whether they include things like materials/data/ availability statements, then you need to have a N/A option. Unfortunately, I suspect it may not be possible to rely on authors' self-evaluation of N/A and that means that metascientists doing an evaluation would need to read enough of the paper to judge whether such a statement should apply.

      We thank the reviewer for the time taken to review our paper, the appreciation of the work we conducted, and for the suggestions for improving our research methods. To address the initial concern about our analytical approach, the definition for fully reproducible publications that we used was only applicable to research that utilized empirical research methods. We recognize that publications such as editorials and reviews are not inherently reproducible experimental studies; thus, such papers were not provided with an accessibility score, were only screened for the components such as funding and conflict of interest information, and were only compared amongst each other. Additionally, articles such as meta-analyses and systematic reviews that do not include materials had adjusted accessibility scores. We expanded our Methods and Discussion section to further explain our screening process and our assumption that all empirical research articles contain methods, data, and analysis scripts and to acknowledge the limitations of our approach. We also agree that screening more empirical research articles is more in line with the intent of our pre-registration and we expanded the number of empirical research articles screened to 393. We also agree with the reviewer that the analysis by country should be excluded because of the small sample size for most countries, and we have adjusted the manuscript accordingly.

    1. Author Response

      We thank the reviewers for their insightful comments, which raise several important points regarding our study. As the reviewers have recognised, we introduced a number of simplifications in order to perform this complex optimisation problem, such as by restricting the analysis to a single intervention (insecticide-treated nets) and modelling countries at a national level. Despite their clear relevance to the study, computationally it would not have been feasible to run the multitude of scenarios suggested by reviewer 1, which we recognise as a limitation. As such we agree with the assessment that this study primarily represents a thought experiment to assess whether current policies are aligned with an optimal allocation strategy or whether there might be a need to consider alternative strategies. The findings are relevant primarily to global funders and should not be used to inform individual country allocation decisions. This perspective also underlies our decision to start the analysis from a baseline of year 2000 as opposed to modelling the current 2023 malaria situation: the largest international donor (the Global Fund) also uses baseline malaria levels in the period 2000-2004 as the basis of their allocation calculations (The Global Fund, Description of the 2020-2022 Allocation Methodology, December 2019). A simplified version of this method is represented by our “proportional allocation” strategy. We will further address these points in a revised manuscript and detailed responses to the reviewer comments.

    1. Author Response

      Reviewer #2 (Public Review):

      Machold and colleagues develop and describe an intersectional genetic mouse (Id2Cre:Dlx5/6FlpE) that allows for the targeting of a cortical interneuron subpopulation predominantly consisting of the neurogliaform cell subtype (NGFCs). The strategy is a modification of that previously published by the authors (Id2cre:Nkx2-1Flpo; Valero et al., 2021) in which a subset of deep layer 6 NGFCs with distinct embryonic origins were targeted. Conversely, using the NDNF transgenic mouse lines previous studies, including thosefrom the Rudy laboratory, have clearly shown the prevalence of NGFCs in the outermost cortical Layer 1 region. Thus, the Id2Cre:Dlx5/6FlpE mouse poses an advantage over these previous approaches permitting the targeting of NGFCs in Layers 2-5. NGFCs in these regions have been hitherto difficult to study in an expedited manner.

      The manuscript is of the resource/toolbox type and the authors are thorough in their description of the distribution and molecular characteristics of the ID2 neurons labelled by this intersectional approach. Furthermore, the authors perform a series of in vivo experiments. These entail the identification of NGFCs, the assessment of their influence on other neuronal populations, and the ability to delineate their activity during various network and behavioral states. Indeed, the authors reveal an activity pattern that is unique to NGFCs across epochs of specific network states. Therefore, this clearly demonstrates the applicability of the ID2Cre:Dlx5/6Flpe mouse to study the role of L2-5 NGFCs in a whole brain setting and these in vivo experiments constitute a major strength of the current study.

      However, as with many transgenic mice, they are not always perfect, and the authors are very transparent regarding the additional, albeit a relatively smaller number of reported non-NGFCs particularly those of the CCK IN subtype. Indeed, clear morpho- functional divergence is revealed by the authors between these ID2 IN subpopulations. Furthermore, it is possible that this variability may differ across varying cortical regions. Thus, careful consideration of this caveat is necessary when using this mouse for future in vitro and in vivo studies. Related to this matter is a concern regarding the framing of the manuscript. The authors term the ID2 mixed population as the "4th group" since they do not express PV, SST, and VIP. One could argue this is a matter of semantics but to combine IN types that display distinct morphological and physiological properties into a single "group" based on one molecular feature is not consistent with that proposed by the widely accepted Petilla terminology (Ascoli et al., 2008).

      We agree that the definition of “group” here for INs delineated by the molecular markers PV, SST, VIP and Id2 is oversimplified, but in practice, the use of the corresponding genetic tools (e.g., Pvalb-Cre, Sst-Cre etc.) has resulted in widespread adoption of this marker-based organization of IN diversity. For example, PV+ INs targeted with PV-Cre encompass both basket cells and chandelier cells that while sharing some electrophysiological properties (e.g., fast-spiking behavior) are completely distinct morphologically, and innervate different subcellular compartments (soma vs. axon initial segment). The same is true for SST INs, in that there appear to be at least three main subtypes – Martinotti, non-Martinotti, and long range projecting – each with distinct axonal projections and electrophysiology. Thus, while the molecular targeting approaches developed to date have greatly facilitated functional studies of IN subtypes, they have prioritized marker expression over the other aspects of IN diversity outlined in the Petilla framework.

      Of interest to many who investigate cortical INs is the ability to genetically target specific subtypes during development. To this end, a potential and welcome addition to the manuscript would be an analysis (perhaps restricted to distribution/molecular characterization) highlighting whether the Id2cre:Dlx5/6Flpe strategy allows genetic access to layer 2-5 NGFCs during postnatal development following maternal tamoxifen administration.

      We agree that a method to target NGFC at early postnatal ages would be useful; however, the expression of Id2 is dynamic during development, and is robust in ventricular zone progenitors at embryonic stages (Neuman et al., 1993 Dev. Biol. PMID 8224536) so maternal tamoxifen administration is likely to result in nonspecific labeling. Furthermore, we found that multiple doses of tamoxifen were necessary to achieve decent labeling of the Id2 IN population in adult animals, a protocol that would be difficult to perform in pregnant dams or early postnatal animals due to pup lethality.

      Regardless, the experiments in the current study are, in general, well performed and clearly presented with the authors' conclusions supported by the results. Thus, it is clear that further refinements to genetic strategies are obviously required to exclusively target NGFCs throughout the cortical depth. Nevertheless, in the interim, the approach described in this current manuscript will be of use to the neuroscience community and help to further unravel the physiological role of this relatively understudied neuronal subtype.

    1. Author Response

      Reviewer #3 (Public Review):

      Because of the position of pigeon embryos in eggs, light exposure will only stimulate the right eye, leading to lateralisation of brain responses and behaviour. Lorenzi and colleagues injected manganese chloride into pigeon eggs, to assess neuronal activation in the embryonic brain. While the eggs were placed in the light or dark, manganese ions accumulated in neurons that were activated (in cell bodies and axons), which was then visualized with MRI of the embryos before hatching. The authors report lateralisation of neuronal activity in three brain regions, which could potentially be important for our understanding of experience-dependent development of lateralised neural activation.

      The tectofugal pathway in pigeons projects from the retina to the optical tectum, then to the nucleus rotundus in the thalamus, and then to the entopallium. The thalamofugal pathway projects from the retina to the GLd in the thalamus, and then to the wulst in the hyperpallium. The two pathways involve different thalamic nuclei (e.g., Deng 2006). In the methods and throughout the manuscript it should be specified which thalamic region is used as ROI.

      Here we refer to the Gld in the thalamofugal visual pathway, we did not estimate activity in the n. rotundus. We have now clarified this point in the revised MS (ll. 54, 80, 86).

      This manuscript only describes neural activity, but the MEMRI technique should also be used to assess the effect of experimental manipulations on axonal connectivity. It is important to learn about the asymmetry of contralateral projections in the light vs dark groups for answering the research question.

      Here we used systemic administration of Mn through the CAM. The Blood Brain Barrier at this embryonic stage is not completely developed and its permeability to ions and small molecules is way higher in embryo than in later stages of development (Engelhardt, B. (2003). Development of the blood-brain barrier. Cell and tissue research, 314(1), 119-129.). Other studies involving direct, local injection in selected brain regions are more apt to investigate connectivity, but this is not the protocol used here. We appreciate the reviewer’s suggestion, and this will be the object of future experiments. However, we would like to disseminate the current protocol and the results it led to at an early stage to enable and encourage its use by other researchers in the field.

      There is an overinterpretation of post-hoc statistics that are reported without correction for multiple testing. The wulst light group lateralization is probably not actually different from zero (uncorrected p=0.04).

      We considered the reviewer's observation regarding the need for improvements in the statistical methods. In response, we have made amendments to the relevant section of the manuscript, explicitly stating that significant findings were obtained using a two-way ANOVA. For comparisons between conditions within specific brain regions, we conducted two-sample t-tests, and the results were corrected for Type I errors using the false discovery rate (FDR) method. Post-hoc one-sample t-tests were employed to assess lateralization across brain regions and conditions, and the corresponding p-values were reported without correction for multiple comparisons (as explicitly reported in the text, to avoid any confusion).

      The first line in the discussion states that there is thalamofugal lateralization, but no lateralization in the tectofugal pathway. To my understanding, previous literature reported it the other way around: in altricial pigeons, light exposure in the egg mainly affected the tectofugal pathway (Deng & Rogers 2002), while the thalamofugal pathway in pigeons was not lateralized (Strockens et al., 2013). The manuscript should compare the current findings with the literature and discuss differences.

      We are aware of the substantial differences in brain lateralization of the two visual pathways between pigeons and chicks after embryonic light exposure. However, in the present work we employed chick embryos (Gallus gallus domesticus), and the space limitations of a Brief Communication do not allow for an in-depth discussion of these differences between avian species.

      Moreover, the tectum is the only region shown here from the tectofugal pathway. However, lateralization of contralateral connections is expected from tectum to the nucleus rotundus in the thalamus, and thus lateralization of activation may only arise in downstream brain regions from the optical tectum. Therefore, the conclusion that there is no lateralization in the tectofugal pathway is not supported by the data.

      In conclusion, I think it is interesting and worthwhile that the authors assessed neural activity in response to visual stimulation in the embryo prior to hatching, but multiple methodological weaknesses and unclarities should be addressed.

      The ROI that we here named Thalamus does not include the nucleus rotundus, but is referring to the nucleus geniculatus lateralis (Gld). We have now clarified this point in the revised MS (ll. 54, 80, 86), and we now refer only to the tectum, without generalizing to the entire tectofugal pathway, which will be the subject of future investigations.

    1. Author Response

      Reviewer #3 (Public Review):

      This manuscript proposes to tackle a very interesting and methodologically challenging topic: the mechanistic underpinnings of neural specialization in the infant brain. The authors presented 4- to 7-month-old infants with social and non-social stimuli while their neural, hemodynamic, and metabolic activity was monitored, and they report a complex pattern of relationships between neural and metabolic or hemodynamic responses during social processing on the one hand, and during non-social processing on the other hand.

      The approach described in this manuscript is very interesting and the combined use of EEG and bNIRS data appears very promising. However, there is some confusion between the initial aims of the study, and the analyses performed, which jeopardizes the clarity and the impact of this manuscript. Besides, the predictions of the authors are often underspecified which complexifies the interpretation of the results.

      Based on its abstract, the goal of this work is to "combine simultaneous measures of coordinated neural activity metabolic rate and oxygenated blood supply to measure emerging specialization in the infant brain". The introduction nicely elaborates on the "interactive specialization theory" and the potential role of the interplay between brain energy consumption and neural activity in the emergence of functionally specialized brain regions during development. The authors present a novel multimodal approach, with potentially important implications for the study of brain specialization as a function of experience or maturation. Yet the experimental procedure presented in this manuscript only assesses specialized brain activity in response to social processing in 4- to 7-month-old infants, using multimodal neuroimaging.

      Indeed, the authors presented 4- to 7-month-old infants with social and non-social stimuli while their neural, hemodynamic, and metabolic activity was monitored. The authors report significant differences between the two conditions in terms of neural activity in the delta, alpha, beta, and gamma bands; as well as in the pattern of hemodynamic to metabolic coupling. Using a GLM approach, the authors report on fNIRS channels and EEG sensors showing significant relationships between the evoked neural activity in the beta and gamma frequency bands, and each of the bNIRS signals (HbO, HbR & CCO), in the social and in the non-social conditions. The authors identify a particular fNIRS channel overlaying posterior STS, showing a positive relationship between Pz EEG beta activity and HbO, as well as CCO, together with a negative relationship between that same neural activity and HbR, in the social condition. This pattern of activity was not observed in the non-social condition.

      Overall, these results indicate differential neural responses to social and non-social stimuli, coupled metabolic and hemodynamic activity in response to social as well as nonsocial stimuli.

      These results additionally indicate coordinated metabolic, hemodynamic, and neural responses in brain regions selective for social processing, but it does not allow us to conclude that this coordinated activity is actually related to the functional specialization process (e.g. last sentence of the abstract).

      We would like to thank the reviewer for their detailed comments. Based on their suggestions, we have made several changes to the manuscript. This study was the first to combine EEG and broadband NIRS and therefore served as a proof of principle study. At the onset of this work, there were many elements to develop such as the technical aspect of simultaneous bNIRS – EEG measurements as well as the methodology to combine the signals from both techniques with such different time resolutions. Therefore, we focused on one age group of infants rather than performing a study involving multiple age groups. The 4-to-7-month-old age group has been studied extensively using fNIRS, particularly to look at social brain development using similar stimuli as those used in the present study. Previous studies have demonstrated that social selectivity can be detected at 4 – 8 months of age (Grossmann et al., 2010; Lloyd-Fox et al., 2012, 2013, 2017). As this was a proof of principle study, we wanted to ensure that we were able to replicate results from previous studies with this new methodology. We therefore used one age group of 4-to-7-months. This has also been added to the introduction of the manuscript to provide clearer reasoning for using this age group.

      The reviewer is correct that the current study does not provide direct evidence of developmental change in functional specialisation or the hypothesised interactive process through which functional specialisation may occur. Rather, we are measuring the status of functional specialisation (the idea that different areas in the brain are specialised for different functions) at the age we study, by testing whether the signals we observe are selective to social but not non-social stimuli. We have reframed the abstract and introduction of the manuscript to ensure this is clear, and we additionally now focus more on the methodology developed to answer such questions. Future studies can leverage our methodology to study different age groups to establish how the relationships between neural and vascular/metabolic signals changes over developmental time, which may provide greater insight into the specialisation process.

      Grossmann, T., Oberecker, R., Koch, S. P., & Friederici, A. D. (2010). The Developmental Origins of Voice Processing in the Human Brain. Neuron, 65(6), 852–858. https://doi.org/https://doi.org/10.1016/j.neuron.2010.03.001

      Lloyd-Fox, S., Begus, K., Halliday, D., Pirazzoli, L., Blasi, A., Papademetriou, M., Darboe, M. K., Prentice, A. M., Johnson, M. H., Moore, S. E., & Elwell, C. E. (2017). Cortical specialisation to social stimuli from the first days to the second year of life: A rural Gambian cohort. Developmental Cognitive Neuroscience, 25, 92–104. https://doi.org/10.1016/j.dcn.2016.11.005

      Lloyd-Fox, S., Blasi, A., Elwell, C. E., Charman, T., Murphy, D., & Johnson, M. H. (2013). Reduced neural sensitivity to social stimuli in infants at risk for autism. Proceedings of the Royal Society B: Biological Sciences, 280(1758), 20123026. https://doi.org/10.1098/rspb.2012.3026

      Lloyd-Fox, S., Blasi, A., Mercure, E., Elwell, C. E., & Johnson, M. H. (2012). The emergence of cerebral specialization for the human voice over the first months of life. Social Neuroscience, 7(3), 317–330. https://doi.org/10.1080/17470919.2011.614696

      Another weakness of this manuscript relates to the unclear or underspecified motivations behind some of the performed analyses. For example, the authors contrast brain responses to social vs. baseline, non-social vs. baseline, and social vs. non-social. For clarity in the manuscript, the authors should specify the motivation behind each of these contrasts and their predictions.

      We thank the reviewer for their suggestion. We have added the predictions for each of the analyses in the introduction section, lines 436 – 527. We have removed the “social minus non-social” comparison for the EEG topographical maps from Figure 2 as there was no value added by including this comparison.

      Another example is in the analysis of the hemodynamic and metabolic coupling analysis, here the authors analyze only the social vs. baseline and non-social vs. baseline contrast, and they do not analyze the social vs non-social contrast. It would be useful for the reader to understand why only these two contrasts are performed and not the social vs. non-social, and what are the predictions of the authors.

      We have now added this into the manuscript and the results can be seen in Figure 3c. We have clarified our predictions both at the end of the introduction (lines 436 - 527) and at the beginning of the discussion (lines 685 – 755).

      The following has been added to the introduction:

      For EEG, we expected an increase in neural activity in response to the social condition and a decrease in neural activity in response to the non-social condition. Based on previous work, this was expected to be strongest in the theta frequency band [3]. Moreover, for the combined bNIRS-EEG analyses, we hypothesised differentiated haemodynamic/metabolic coupling with neural activity for the social and non-social stimulus conditions. We performed two types of statistical tests: a) individual comparisons of the social and non-social conditions and b) comparison of the social condition versus the non-social condition. The individual condition tests were performed to show the scale and spatial location/sensitivity of the coupling between haemodynamics/metabolism and neural activity for each condition. Meanwhile, the social versus non-social comparison was performed to show where there was a significant difference in the coupling between the two conditions. With comparison (a) we aimed to identify regions involved in the processing of social and non-social stimuli by identifying the regions where the coupling was significant. With comparison (b) we aimed to identify regions where coupling was significantly different between conditions. We predicted that for the individual comparison of the social condition, we would observe positive associations between bNIRS and EEG measures, i.e. coordinated increases in haemodynamics/metabolism and neural oscillatory activity in the beta and gamma frequency bands (based on previous combined EEG – fMRI studies [16], [18]–[21], [23], [30]) which would be localised to core social brain regions. We hypothesised that for the non-social condition, over the same brain regions, positive associations would be observed between bNIRS and EEG measures, but they would be coordinated decreases in haemodynamics/metabolism and oscillatory activity. We also expected coordinated increases in haemodynamics/metabolism and oscillatory activity localised to the parietal brain region. These predictions are based on our previous work [29] where we demonstrated that stronger coupling between haemodynamics and metabolism was observed in the temporo-parietal regions for the social condition and in parietal region for the non-social condition which is known to play an important role in object processing [31], [32]. For the social versus the non-social contrast, we predicted that haemodynamic activity and metabolism would be coupled with neuronal oscillatory activity more strongly for the social stimuli in comparison to the non-social stimuli, with significant differences being observed in the temporo-parietal regions.

      The following has been added to the discussion:

      As a proof of principle, we examined the relationship between these measures to identify regional selectivity to social versus non-social stimuli. To first demonstrate the scale and spatial sensitivity of the coupling between haemodynamic/metabolic activity and neuronal oscillatory activity, comparisons were performed individually for the social and non-social conditions. For this, we predicted coordinated increases in haemodynamics/metabolism and neural activity in the beta and gamma frequency band. We predicted that for the social condition this would be localised to the core social brain regions (temporo-parietal region) while for the non-social condition, we expected the coupling to be localised to parietal regions, known to be involved in object processing [31], [32]. We additionally expected coordinated decreases in haemodynamic/metabolic activity and neural activity over the temporo-parietal region for the non-social condition, in accordance with our previous work [29]. Next, to demonstrate differential coupling for social and non-social stimuli, we performed a comparison of the social condition versus the non-social condition. For this, we hypothesised that in the beta and gamma frequency bands, there would be stronger coupling between haemodynamics/metabolism and neural activity for the social condition over the temporo-parietal region.

      Finally, the core result of this work derives from the final GLM analysis which relates EEG activity to hemodynamic or metabolic responses. This analysis implies the inspection of interactions between 3 neuroimaging modalities, with 4 EEG measures, 2 hemodynamic measures, and 1 metabolic measure, which represents a very rich and relatively complex analytic approach. Unfortunately, the predictions are not clearly specified, which makes results interpretation difficult.

      We appreciate that the methods are complex, and the hypotheses should be stated more clearly. The hypotheses have now been explicitly stated both at the end of the introduction (lines 436 - 527) and at the beginning of the discussion (lines 685 – 755).

      Based on the results (L160-162) and discussion (L233-235) sections, it appears that the authors aim at identifying brain regions showing a precise pattern of activity, with a positive relationship between EEG activity and HbO/CCO responses together with a concurrent negative relationship between EEG and HbR responses in response to social events, but not in response to non-social events. Importantly, the social vs. non-social contrast seems crucial to assess the selectivity of the response. Yet, the authors analyze the 3 chromophores separately, and they do not contrast the two conditions (figure 3). As a result, the authors are limited to reporting a descriptive pattern of relationships between EEG and HbO/HbR/CCO activations for the social condition. And another one for the non-social condition. Overall, the authors conclude that channel 14, overlaying the right TPJ, shows the expected pattern of activity, specifically in response to social stimuli. Yet, this statement is only supported by visual inspection/comparison of the results between the social vs baseline and non-social vs baseline conditions. The authors do not assess analytically the differential patterns of activations between the two conditions. Instead, a GLM including all 3 chromophores and contrasting the two experimental conditions would allow us to directly test the predicted pattern of activity, and the selectivity of the activity for social stimuli.

      As per the reviewer’s comment, we have now included the comparison of the social and non-social conditions, shown in Figure 3c. The results from this comparison showed that haemodynamics and metabolic activity at channels 11 and 14 (located spatially close to one another) had a significantly greater association to EEG electrode “Pz” for the social condition, in comparison to the non-social condition for the beta and gamma bands. These results support/indicate the selectivity of the response to the social condition, analytically.

      We have kept the results showing the individual comparison of the social and non-social conditions. The individual condition tests were performed to show the scale and spatial location/sensitivity of the coupling between haemodynamics/metabolism and neural activity for each condition. Meanwhile, the social versus non-social comparison was performed to show where there was a significant difference in the coupling between the two conditions. With comparison (a) we aimed to identify regions involved in the processing of social and non-social stimuli by identifying the regions where the coupling was significant. With comparison (b) we aimed to identify regions where coupling was significantly different between conditions. The following has been added on line 533 – 541 to explain the reasoning behind the comparisons performed.

      We performed two types of statistical tests: a) individual comparisons of the social and non-social conditions and b) comparison of the social condition versus the non-social condition. The individual condition tests were performed to show the scale and spatial location/sensitivity of the coupling between haemodynamics/metabolism and neural activity for each condition. Meanwhile, the social versus non-social comparison was performed to show where there was a significant difference in the coupling between the two conditions. With comparison (a) we aimed to identify regions involved in the processing of social and non-social stimuli by identifying the regions where the coupling was significant. With comparison (b) we aimed to identify regions where coupling was significantly different between conditions.

      As our interest was in looking at the selectivity of the response and not comparing the chromophores, we did not perform a comparison between chromophores.

    1. Author Response

      Reviewer 2 (Public Review):

      1) My major criticism of the study is that the authors argue for CD8+ Trm activity as a key mechanism for OLP pathogenesis but have presented mostly descriptive datasets. The data strongly argue for CD8+ Trm cells as a defining feature of erosive OLP, but there is no data to support their involvement in disease pathogenesis. The authors note the lack of a mouse model for OLP which represents a significant technical barrier to interrogating the role of CD8+ Trm cells in OLP pathogenesis.

      Thank you for bringing this to our attention, and please accept our apologies for any confusion caused by our previous article. The pathogenesis of OLP is responsible for the immune disease caused by multiple factors, but there is no corresponding animal model at present, which has obvious limitations on the research. Therefore, we focus on the research on the reasons for the change of the clinical state of the disease. Our study found that CD8+ TRM cells play an important role in the changes observed in the local presentation of OLP, specifically erosions. However, it is important to note that they are not the primary driver of the disease. In addition, we use cohort studies combined with transcriptome data to increase the strength of evidence for causal effects. We have revised and emphasized this point in the updated text.

      The modified description in introduction is as follows:

      Notably, EOLP has a significantly higher risk of malignant transformation than non-erosive oral lichen planus (NEOLP) (Danielsson et al., 2013). To reduce the psychological and economic burden of OLP patients, improve their quality of life, and decrease the risk of cancer, it is crucial to maintain the disease in a relatively stable non-erosive stage for as long as possible. However, clinical experience suggests that OLP often exhibits a prolonged and recurrent disease course, with alternating periods of non-erosive and erosive lesions. Despite this, the underlying causes and mechanisms of lesion type switching remain unclear (Husein-ElAhmed and Steinhoff, 2022). (Page 4, lines 13-21)

      2) Another criticism is the lack of strong findings in the analysis of CD8+ Trm cells isolated from non-erosive and erosive OLP tissues. The authors note increases in CD8+ Trm cell recovery, however, they only observe minor changes in CD8+ Trm activity upon restimulation. Analyzing the activation status or proliferative capacity of CD8+ Trm cells from non-erosive and erosive OLP could be informative and more robust measures of functional changes.

      We appreciate your suggestion to test the activation status and proliferation of sorted CD8+ Trm cells to further investigate the differences between the two groups. However, due to the limited amount of tissue available for our study, it was so hard to obtain sufficient numbers of CD8+ Trm cells for these experiments. Additionally, there is a lack of established methods for in vitro culture of CD8+ Trm cells, which further limited our options for functional studies.

      To investigate the function of CD8+ Trm cells in the two tissue groups, we instead measured inflammatory factors in the supernatant of CD8+ Trm cells after in vitro stimulation. This allowed us to indirectly assess the activity of CD8+ Trm cells in non-erosive and erosive OLP. We used ELISA assay to measure the levels of several inflammatory cytokines, which are known to be produced by activated T cells, including CD8+ Trm cells.

      We acknowledge that this method has limitations and is an indirect measure of CD8+ Trm cell function. However, we believe that our approach provides useful information on the potential role of CD8+ Trm cells in oral lichen planus and represents a valuable contribution to the field.

      3) A minor criticism is the formatting of the data presented in Figure 4. The authors should clearly label each marker used in the flow cytometry experiments as well as clearly labeling y-axes for graphs 4H and 4I.

      Thank you for your valuable comments, I have modified the flow cytometry diagram accordingly and labeled each step of the gating strategy, also modified the other two diagrams. And 4H and 4I figure numbers changed to 4G and 4H.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper investigates whether bistable rhodopsins can be used to manipulate GPCR signalling in zebrafish. As a first step, the authors compared the performance of bistable rhodopsins fused with a flag tag or with a fluorescent protein tag (TagCFP). Constructs were compared by expressing in HEK cells followed by calcium imaging with aequorin or cAMP monitoring with GloSensor. This showed that the protein with a smaller flag tag performed better. Then, a series of transgenic zebrafish lines were made, in which tagged rhodopsins were expressed in reticulospinal neurons or cardiomyocytes.

      The data indicate that bistable rhodopsin can be used to manipulate Gq and Gi/o signalling in zebrafish. The Gq-coupled SpiRh1 was effective in manipulating reticulospinal neurons, as indicated by analysis of tail movements and calcium imaging of the neurons. Gi/o signalling could be manipulated by Opn3 from mosquitoes, TMT from pufferfish, and parapinopsin from lamprey, as shown by their effects on the heartbeat. Lamprey parapinopsin has the interesting property that it can be turned on and off by different wavelengths of light, and this was used to stop and restart the heart. Finally, the authors show that the cardiac effects are mediated by an inward-rectifier K+ channel, through the use of pharmacological inhibitors.

      A strength of this paper is the testing of a range of bistable rhodopsins, with a total of 10 proteins tested. This provides a good resource for future experiments. A weakness is the failure to show that some experiments involved repeated sampling of the same animal. Figure 3 gives the impression that there are 48 independent datapoints. However, there are 8 animals, with 6 datapoints coming from each. Similarly, Figure 4 shows the data from 6 trials of 4 animals, not 24 independent animals. Repeated sampling should be reflected in the data presentation, and in the statistical analysis. Was there an effect of trial number, which is suggested in Figure 6?

      In response to the reviewer’s comments, we modified the graph to show the average data for individual animals in Figure 3A-E, Figure 3-supplement 2, Figure 4D-F, H, and Figure 4-supplement 2B. We also showed the effect of trial number (difference between trials 1 and 6) in Figure 3-supplement 1 and Figure 4-supplement 1. In addition, we also showed all data as source data. We believe that more accurate statistical analyses were conducted using data from each individual animal.

      Delta F/F refers to relative change, which should be (F-F0)/F0. This should be zero when t = 0. The values in Figure 3E, and 3F are ~ 1 when t = 0, however. Are these figures showing F/F0?

      The reviewer is correct. It is indeed F-F0/F0 (ΔF/F0). In Figure 3F (3E in the original manuscript), t=0 was the time when 470-495 nm light (for both stimulation of SpiRh1 and detection of GCaMP6s fluorescence) started to be applied. In the experiment in Figure 3G (3F in the original manuscript), 405 nm light was applied to activate SpiRh1[S186F] for 2 s and then 470-495 nm light was applied to detect GCaMP6s fluorescence. In other words, t=0 is the time when 405 nm light started to be applied.

      The authors' conclusions that the bistable rhodopsins are useful tools in the zebrafish system appear largely justified. This is consistent with findings from other organisms, including mouse (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8097317/, https://www.sciencedirect.com/science/article/pii/S0896627321001616). The tools here are likely to find broad use by scientists who use the zebrafish as the experimental system for a variety of different areas.

      For the studies on LamPP and MosOpn3, we cited the references mentioned by the reviewer. We believe that our study substantiates that LampPP and MosOpn3, as well as other bistable rhodopsins, are valuable tools for zebrafish research, as pointed out by the reviewer.

      Reviewer #2 (Public Review):

      The presented study aims at deciphering the physiological function of GPCR signaling in excitable cells. To this end, the authors developed transgenic zebrafish models expressing a selection of Gq- and Gi/o-coupled bistable rhodopsins in either reticulospinal neurons or cardiomyocytes and elucidated behavioral responses (tail movements) or physiological responses (heartbeat) as well as intracellular Ca2+ dynamics following optical stimulation of rhodopsins.

      One of the major strengths of the presented study is the functional comparison of five Gq- and five Gi/o-coupled rhodopsins in two major classes of excitable cells, however; the selection of rhodopsins tested remains elusive. More importantly, it is not obvious why some of the effects of rhodopsin activation were assessed in both neurons and cardiomyocytes, while others were only tested in one of the two systems without further explanation. The main chosen experimental readouts (swimming/tail bending or cardiac contractions) have limited informative value regarding GPCR signaling, as they will only report the peak of the iceberg, namely whether movements are elicited or heartbeats inhibited. No analysis on subtle changes in heart rate and contraction force was included, but such modulation of cardiac activity (e.g. positive or negative chronotropic, inotropic, dromotropic, bathmotropic, and/or lusitropic responses) would represent better the physiological modulation of the heart via GPCR and down-stream signaling events. In line, the presented data only represents behavior at one light intensity tested, whereas a light titration of observed effects could provide more meaningful insight into both rhodopsin responses and signaling mechanisms. Also, the potential promiscuity of G protein activation of selected receptors has not been addressed, neither experimentally nor in the discussion part. As a result of the above-mentioned limitations, it is difficult to follow the logic of the study and especially to interconnect the data obtained in reticulospinal neurons (where activation of jumping spider rhodopsin elicited tail bending) to myocyte data (where three Gi-coupled rhodopsins suppressed cardiac activity). Moreover, as such, the study does not provide explanations on why a certain tool might evoke an effect in one system or the other, or not, which could be the main deliverable of such a comparative analysis.

      We are grateful for helpful and insightful comments from the reviewer. We believe that the presentation of experimental findings in the original manuscript may have led to a misunderstanding. We examined the effects of Gq and Gi/o-coupled bistable rhodopsins on both reticulospinal V2a neurons and cardiomyocytes. We observed noticeable effects of Gq rhodopsins on reticulospinal V2a neurons, but no significant effects on cardiomyocytes. Similarly, we found effects of Gi/o-coupled rhodopsins on cardiomyocytes, but no significant effects on reticulospinal V2a neurons. These discrepancies could be attributed to differences in the target cells and experimental conditions, suggesting the need for further optimization. We described the data on page 13, lines 16-22 and page 16, lines 9-10 in the Result section and Table 1, and discussed the relationship between the activity of bistable rhodopsins and their effects on target cells on page 21, lines 6-15 and page 24, line 19-page 25, line 2 in the Discussion section of the revised manuscript.

      In order to clarify the function of Gi/o-coupled rhodopsins on the heart in more detail, we conducted experiments in which we activated cardiomyocytes expressing bistable rhodopsins at various light intensities to observe the effects on heartbeats. We analyzed cardiac arrest rate, latency to cardiac arrest, and time to resumption of heartbeat. The results of these experiments are shown in Figure 4 and Figure 4-supplement 2, 3 in the revised manuscript. We described the data on page 15, line 16-page 16, line 1 in the revised manuscript, as follows.

      To analyze the photosensitivity of Gi/o-coupled rhodopsins, we applied light of various intensities for 1 s and examine their effect on HBs (Figure 4-supplement 2). Cardiac arrest was induced and sustained for over 20 s after stimulation of MosOpn3 with 0.05 mW/mm2 light for 1 s. Photoactivation of PufTMT and LamPP at lower light intensities (0.2 or 0.05 mW/mm2) resulted in cardiac arrest, but faster HB recovery than stimulation with 0.5 mW/mm2 light (Figure 4-supplement 2). The data indicate that the ability of MosOpn3 to suppress HBs is more photosensitive than PufTMT and LamPP in the zebrafish heart. We further examined atrial-ventricular (AV) conductivity by measuring the time difference between atrial and ventricular contractions before and after light stimulation when HBs had slightly recovered. There was no significant difference in AV conductivity before and after light stimulation (Figure 4-supplement 3).

      We performed experiments to the best of our ability with current technology regarding cardiac function. However, we hope that the reviewer is willing to acknowledge that there are certain limitations in conducting a detailed analysis of the zebrafish larval heart, since many experimental techniques, such as electrophysiological analysis, have not yet been fully or effectively established for this animal model.

      While the presented data is interesting, the graphical presentation and description of the data are insufficient. Most importantly, the current version of the text does not include a quantitative description of effects and statistical analyses (which are found in the figures and legends!). The lack of quantitative description also extends to both the introduction and discussion, which remain general without a specific dissection of observed effects.

      We have described quantitative data in the Result section.

      One major concern is the selective citation of own work. While single statements in both the introduction and discussion are supported by up to ten own papers, recent studies using rhodopsins for dissecting GPCR signaling in neurons are not sufficiently discussed and new data is not compared to published results by other teams. Moreover, relevant papers on cardiomyocytes (e.g. PMID: 35579776, 35365606, 34987414, 30894542) are not cited at all, despite the use of similar rhodopsins and/or optogenetic activation of the same signaling pathways. Taking into account these published studies may help to better understand the observed responses.

      We apologize for not citing important relevant papers in the original manuscript. We have now cited all four papers (Dai et la., 2022; Wagdi et al., 2022; Cokic et al., 2021; Makowka et al., 2019) mentioned by the reviewer, as well as a new paper describing the use of MosOpn3 and LamPP in C. elegans neurons (Koyanagi et al., 2022) in the Introduction section. We also discussed the differences between our findings and previously published data in the Discussion section.

      Additional comment: Data were obtained from larvae zebrafish. It would be useful to include a discussion on how GPCR signaling might be different in adult fish compared to larvae, and how to test whether the observed effects are more generally applicable.

      We discussed the differences between the hearts of zebrafish larvae and adults, and the differences in GPCR signaling, on page 27, lines 10-16, as follows. In this study, we used zebrafish larvae to study the role of GPCR signaling in cardiac function, and there are differences in heart structure and function between larvae and adult zebrafish. As a zebrafish grows, blood pressure increases and the heart becomes more complex with the development of valves and ventricular trabeculae. Therefore, GPCR signaling, which regulates heart structure and function, may differ between juvenile and adult fish. Optogenetic manipulation of the heart’s function in adult zebrafish using bistable opsins should clarify this issue.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper aims to test whether a series of light activated ion channels (GtCCRT4, KnChR) and enzymes that regulate second messengers (BeGC1, bPac, OaPac) can be used to manipulate cells in the zebrafish.

      Among the strengths of the paper are the use of several independent methods to test whether the tools are functional - e.g. electrophysiology of mammalian cells for GtCCR4, calcium and cAMP imaging in zebrafish cells in vivo, behaviour tests (tail movement) and monitoring of heart beat. Multiple transgenic lines were established, to select for lines with optimal expression levels. Experiments are carried out in two cell types - reticulospinal neurons in the hindbrain and cardiomyocytes.

      The authors have largely achieved their aim of determining whether the rhodopsins can be used in zebrafish. They demonstrate that the cation channel KnChR is particularly sensitive in triggering depolarization of the reticulospinal neurons, as indicated by tail movement. They show that the photoactivatable adenylyl cyclase bPAC and cation channels have an effect on heartbeat. Two other photoactivatable enzymes OaPAC and BeGC1 have no effect on heartbeat, although it is not evident whether this is due to lack of effect on cAMP and cGMP levels.

      The abstract sets out to investigate the role of second messengers, emphasizing the need for specificity. However, KnChR is not specific for Na+. As noted by Tashiro et al, the channel can also conduct H+, Ca2+ and Mg2+. The knowledge gap that is being addressed by the manuscript thus needs to be reframed. The concluding statement of the abstract, that the tools tested here can be used to investigate second messengers, is not accurate given the broad conductance of KnChR.

      We agree with the reviewer. We changed the title to “Optogenetic manipulation of neuronal and cardiomyocyte functions in zebrafish using microbial rhodopsins and adenylyl cyclases” and revised the abstract and introduction, accordingly. The last sentence of the abstract was modified to “These data suggest that these optogenetic tools can be used to reveal the function and regulation of zebrafish neurons and cardiomyocytes.”

      The tools described here have been tested previously in other species, either in cultured mammalian cells (GtCCR4, KnChR, OaPAC) or in vivo (bPAC and BeGC1). The current work thus does not introduce novel tools, but provides evidence that some of these tools can be used in zebrafish. Overall, the lines characterized here will be of use to scientists using zebrafish as the experimental system in a variety of areas.

      We appreciated the positive comments from the reviewer. It was worthwhile generating and analyzing so many transgenic zebrafish.

      Reviewer #2 (Public Review):

      Optogenetic proteins are important tools for circuit neuroscience. The authors characterize five proteins, GtCCR4, KnCHR2, BeGC1, bPAC, and OaPAC with respect to their ability to suppress normal cell excitability and compare the results to those for the more established GtACR1 and CrChR2[T159]. The study makes use of expression in the zebrafish heart and hindbrain, as well as in a cell line. Electrophysiology in the cell line demonstrates that GtCCR photo-activation induces similar currents as CrChR2 activation and shows less signs of desensitization. Using a transgenic vsx2:Gal4 zebrafish line, immunohistochemistry shows that the tools are expressed. When activated, they triggered the expected behavioral responses (swimming) at short latency (<4s). This was true even for the three tools that are guanylyl or adenylyl cyclases (BeGC1, bPAC, OaPAC) and thus affect cell excitability only indirectly. At the tested light intensity, the Klebsormidium nitens channelrhodopsin (KnChR) had the shortest latency (<0.5 s) and highest (100%) probabilities of inducing locomotion. When expressing the tools in the zebrafish heart, brief illumination (100 ms) induces brief (100 ms - 1500 ms) suppression of the heartbeat. Notably, also tools that evoke depolarization induce heartbeat suppression. Heartbeat movies and calcium imaging demonstrate that this is caused by prolonged cardiomyocyte contraction. The optogenetic guanylyl and adenylyl cyclases were not effective in perturbing zebrafish heartbeat (except for bPAC over longer time scales).

      Given the large number of optogenetic proteins available to date and the challenge of employing them in well-controlled neuroscience experiments, this study presents an important contribution for neuroscientists performing optogenetic research in animal models. Two light-gated cation channels, GtCCR4 and KnChR, are tested for the first time in vivo. The evidence supporting the claims regarding heartbeat and induced swimming behavior is solid. Since GtCCR4 is more Na+-selective than other channelrhodopsins, it should allow better control of experimental variables and is a valuable addition to the optogenetic tool box. The created transgenic zebrafish lines will be useful for the zebrafish neuroscience community.

      The expression in zebrafish was compared using immunohistochemical staining (of a single Gal4 driver line). From this experiment alone, it is difficult to judge the expression level, the in vivo visibility of the fluorescence under the microscope, and the proportion of target cells that do express the optogenetic gene of interest.

      The evidence for optogenetically induced alteration of swimming behavior is compelling. However, the associated neuronal responses and their dependence on different light intensity levels remain uncharacterized. Therefore, if anyone plans to use these tools to investigate a neural circuit in the future, the needed light levels and the specificity of the manipulation would still need to be determined.

      We stimulated neuronal ND7/23 cells, reticulospinal V2a neurons or cardiomyocytes expressing microbial optogenetic tools at various light intensities and examined their effects on neuronal activities and behaviors (tail movements and cardiac arrest). These data are shown in revised Figure 1, Figure 1-supplement 1, Figure 3, Figure 3-supplements 2, 3, Figure 5, and Figure 5-supplements 1, 2. We described the data on page 12, line-page 13, line 1 and page 14, lines 10-13 in the revised manuscript.

      For the optogenetic guanylyl and adenylyl cyclases, which clearly were able to alter behavioral responses, the signaling and circuit mechanisms that lead to neuronal depolarization remain unknown, but possible activation pathways are discussed.

      Reviewer #3 (Public Review):

      In this study, the authors set out to test several new optogenetic tools in zebrafish. They motivate the study by citing differences in ion selectivity of channelrhodopsins and the potential utility of photoactivatable anenylyl and guanylyl cyclases to control cell functions. Although the study provides some useful new information about the utility of these tools in zebrafish, the characterization is limited and there are serious caveats around interpretation of behavioral responses.

      The latency of behavioral responses is often extremely long and there is a lack of control data from opsin negative animals, raising serious doubts as to whether these responses are optogenetically mediated.

      In other words, many of these responses may not result from optogenetic activation of V2a cells, but instead arise from indirect effects such as visual stimulation of the animal. Previous zebrafish studies have shown swimming responses in opsin-negative control animals at latencies above ~100 ms and used a 50 ms cut-off for optogenetically evoked swims. One can see evidence suggestive of this issue in the authors' data: latency data for GtCCR4 appears bimodal with a cluster of short latency swims and a second spread at latencies >2s; this could be a mix of fast optogenetic and slow artifactual responses. As the authors have already tested opsin negative control animals, they should examine the latency distribution of these responses. The long latency is even more striking in the case of BeGC1, pPAC and OaPAC where in all cases mean latency exceeds 2 seconds. No short latency responses are apparent and the delay is too long to be solely a result of second messenger action (e.g. activation of cyclic nucleotide gated ion channels). In any case, no explanation is provided.

      We understand the reviewer’s concern that the responses were too slow. However, the neurons responded after accumulation of cAMP or cGMP, which bind and activate CNG in the neurons. Similar delayed responses were observed when G protein-coupled bistable rhodpsins were activated in reticulospinal V2a neurons (please see the accompanying manuscript).

      We compared the latency of zebrafish larvae expressing each tool with those not expressing the tool. The data are shown in Figure 3, Figure 3-supplement 1, Figure 5, Figure 6, Figure 7, and Figure 7-supplement 1. Statistically, we considered responses within 8 s after the start of light stimulation as positive, and significant differences in responses were observed depending on the presence or absence of tool expression, suggesting that tail movements were induced by tool activation. In the absence of tool expression, spontaneous movements were occasionally observed, but they did not often occur within 8 s. We have described the data on page 15, line 20-page 16, line 4 in the revised manuscript.

      Although this study is motivated by the need to precisely control the flux of specific ions and modulate specific second messenger pathways, there is almost no characterisation of these processes in zebrafish cells. As such, the degree to which these tools are useful to "precisely control second messengers in vivo" is unclear and the lack of mechanistic data also leaves open questions about unexpected aspects of behavioral results (e.g. the long latency of presumed cyclic-nucleotide induced behavior, above).

      We believe that the description "controlling second messengers" was misleading. Since Reviewer #3 has taken issue with this aspect, we note that this paper does not provide a detailed analysis of second(ary) messengers. We have restructured the entire manuscript to focus on optogenetic regulation of zebrafish neurons and cardiomyocytes rather than on "control messenger regulation".

      Finally, there is little comparison with other commonly used optogenetic actuators. CrChR2[T159C] is used as the only control but more recent tools (e.g. CoChR, Chrmine, ChroME) are not considered. Thus, beyond showing that the new tools have behavioral effects in zebrafish, the usefulness of this report for researchers wanting to compare and select between tools is limited.

      We examined the activity of CoChR and ChrimsonR in neuronal ND7/23 cells. In addition, we generated transgenic zebrafish expressing CoChR or ChrimsonR, and examined their activity in V2a neurons and cardiomyocytes. We thereby compared the activity of GtACR4, KnChR, and CrChR2[T159C] with that of CoChR and ChrimsonR. The data are shown in Figure 1, Figure 1-supplement 1, Figure 2, Figure 3, Figure 3-supplement 3, and Figure 5-supplements 1, 2. We described the data for CoChR and ChrimsonR in the relevant part of the Result section (pages 8-14) and discussed a comparison on page 18, lines 2-16 in the revised manuscript.

      We found that KnChR was a more potent optogenetic tool than GtCCR4, CrChR2, and ChrimsonR in zebrafish reticulospinal V2a neurons. Optogenetic activity of KnChR was comparable to that of CoChR in both reticulospinal V2a neurons and cardiomyocytes (Figures 1, 3, 5). Truncation of KnChR prolonged the channel open lifetime by more than 10-fold (Tashiro et al. , 2021) (Figure 1). KnChR conducts various monovalent and bivalent cations, including H+, Na+, and Ca2+, while KnChR has a higher permeability to Na+ and Ca2+ and a higher permeability ratio of Ca2+ to Na+ than CrChR2 (Tashiro et al. , 2021). These properties may contribute to the high photo-inducible activity of KnChR. Activation of KnChR may induce influx of more cations with a longer channel open time than CrChR2 and ChrimsonR, leading to stronger cell depolarization. Optogenetic activity of KnChR was comparable to that of GtCCR4 in cultured cells, but higher than GtCCR4 in zebrafish reticulospinal V2a neurons and cardiomyocytes. While the exact reason is unclear, it is possible that the expression of functional KnChR protein may be high in zebrafish cells.

    1. Author Response

      Reviewer #2 (Public Review):

      Dipeptide repeat (DPR) proteins produced from both sense GGGGCC (poly-GA, poly-GP and poly-GR) and antisense CCCCGG (poly-PR, poly-PG, poly-PA) repeat RNAs are found C9ORF72-linked ALS/FTD and contribute to neurodegeneration. The translation of the repeat RNA can initiate without the AUG start codon, a process known as repeat associated non-AUG (RAN) translation. In this manuscript, the authors used luciferase reporter construct to show that the translation of PR and PG from the CCCCGG repeats initiated from in-frame AUG in the C9 sequences before the repeats. After mutating candidate AUG codons, the translation can initiate from other AUG, so there is redundancy. But if mutating all the in-frame AUG codons, the luciferase was dramatically reduced, supporting the translation initiated at the AUG start codon. The translation initiation factor eIF2D has been shown to be important for CUG start codon-dependent poly-GA translation from GGGGCC repeats. Here it is shown that eIF2D is not required for poly-PG and poly-PR translation from CCCCGG repeats using both reporter and patient iPS-neurons. The data using luciferase reporter to study antisense repeat translation is solid, the translation initiates from AUG start codon as there are AUG in frame with PG and PR in the constructs containing the antisense sequences.

      We thank the reviewer for the constructive feedback.

      On the other hand, as the reporter construct includes the sequences containing the AUG codon, it is not surprising that AUG was used. This is canonical translation.

      We completely agree. In the revised Introduction, we now point out that, before our study, it was not clear which mode of translation (RAN vs AUG canonical) is employed for DPR synthesis.

      Also, in the revised Discussion (lines 251-257)), we mention the following: “Hence, our findings together with these previous studies suggest that DPR synthesis may involve at least three different modes of translation: (a) near-cognate start codon (e.g., CUG, AGG) dependent-translation for poly-GA and poly-GR from sense GGGGCC transcripts, (b) canonical AUG-dependent translation for poly-PR and poly-PG synthesis from antisense CCCCGG transcripts, and (c) DPR synthesis may also occur through RAN translation mechanisms that solely utilize the repeat. It is conceivable that all three modes of translation may occur simultaneously in disease, and that the use of non-canonical and canonical initiation codons may be the primary contributors of DPR production ”.

      The 1,000bp intronic sequence included in our antisense 35xCCCCGG constructs (Figure 1A) is the authentic human intronic sequence. We agree that it does contain multiple putative initiation codons, and this was our motivation for conducting systematic mutagenesis of all these codons. To narrow down the list of putative initiation codons, we used our recently developed machine-learning algorithm for initiation codon prediction (PMID: 35648796). We found a CUG and an AUG in poly-PR frame; a CUG and three AUGs in the poly-PG frame), all of which had a good Kozak sequence (as mentioned in Results). Systematic mutagenesis of these codons (single and multiple codon mutations were generated) revealed that an AUG at -273bp is necessary for poly-PR synthesis (Figure 2). Of note, poly-PR is one of the most toxic DPRs, for which an initiation codon had not been previously identified in the literature.

      Additionally, the AUG-initiated translation of antisense repeats has been reported previously. Therefore, the novelty is limited.

      We agree that an AUG initiation codon was previously described for poly-PG (Boivin et al., EMBO J, 2020, PMID: 31930538). However, our findings significantly extend this observation because redundancy at the level of AUG initiation codon usage was not reported in that study.

      We believe our study significantly contributes to the field of C9ORF72 ALS/FTD in the following way:

      (i) We identified for the first time an AUG (at -273nt) necessary for synthesis of poly-PR, one of the most toxic DPRs.

      (ii) We propose the concept of initiation codon redundancy for poly-PG, which may apply to other DPRs in C9ORF72 ALS/FTD, as well as in other neurological disorders caused by nucleotide repeat expansion mutations.

      (iii) Our findings merged with those of previous studies suggest that DPR synthesis may involve at least three different modes of translation: (a) near-cognate start codon (e.g., CUG, AGG) dependent-translation for poly-GA and poly-GR from sense GGGGCC transcripts, (b) canonical AUG-dependent translation for poly-PR and poly-PG synthesis from antisense CCCCGG transcripts, and (c) DPR synthesis may also occur through RAN translation mechanisms that solely utilize the repeat. It is conceivable that all three modes of translation may occur simultaneously in disease, and the use of non-canonical and canonical initiation codons may be the primary contributor of DPR production”.

      (iv) We found that the non-canonical translation initiation factor eIF2D is mainly responsible for poly-GA (sense DPR) production without affecting anti-sense DPRs. Hence, we propose a model where DPR translation occurs in a “piecemeal manner”, i.e., a distinct machinery of translation initiation factors may be needed for the synthesis of each DPR.

      In the revised manuscript, we now better highlight these key contributions.

      How the antisense DPRs are translated endogenously, AUG-canonical translation or RAN translation, depends on whether the AUG is included in the antisense RNA in patients and where the transcription of the antisense starts, upstream or downstream of the AUG start codons. However, this is not considered in the manuscript.

      Thank you for this important point. Zu et al., (PNAS, 2013) observed antisense DPR aggregation in brain samples of C9ORF72 ALS/FTD patients. In the same study, the authors conducted 5’ Rapid Amplification of cDNA Ends (RACE). Although this analysis did not identify the exact transcription start site for the antisense CCCCGG RNA, it did show that the region that includes the AUG codons, which we found to be important for poly-PR or poly-PG, is included in the antisense RNA from human C9ORF72 ALS/FTD samples. In page E4969, Zu et al write: “RACE analysis of FCX samples showed intron 1b antisense transcripts begin at varying sites 251–455 bp upstream of the G2C4 repeat”. The same study also detected antisense RNA foci in brain samples of C9ORF72 ALS/FTD patients.

      The exact transcription start site for the antisense (and sense) transcript remains unknown. In the near future, we plan RACE experiments to identify it and share these finding with the community in a separate manuscript.

      We have modified the Results (lines 133-136) to: “These results strongly suggest that AUG at -273 bp is the start codon for translation of poly-PR, one of the most toxic DPRs in C9ORF72 ALS/FTD. This AUG is predicted to be included in the endogenous antisense CCCCGG transcript based on 5’ Rapid Amplification of cDNA Ends (RACE) analysis on brain samples of C9ORF72 ALS/FTD patients14.”

    1. Author Response

      Reviewer #1 (Public Review):

      1) While the current dataset aims to demonstrate a "correlation" between grid cell encoding and task performance, the other variables that could confound this correlation should be carefully examined.

      (1) The exact breakdown of the fraction of beaconed/non-beaconed/probe trials is never shown. if the session makeup has a significant effect on the coding scheme or other results, this variable should be accounted for.

      (2) The manuscript did not provide information about whether individual mice experienced sessions with different combinations of the three trial types, and whether they show different preferences in position or distance encoding even in comparable sessions. This leads to the question of whether different behaviour and activity encoding were dominated by experimental or natural differences between individual mice. Presenting the data per mouse will be helpful.

      (3) Related to the above point, in Figure 5, the mice appeared to behave worse in probe trials than non-beaconed trials. If the mouse did not know if a trial is a probe or a non-beacon trial, they should behave equivalently until the reward location and thus should stop an equal amount. If this difference is because multiple probe trials are placed consecutively, did the mouse learn that it will not get a reward and then stop trying to get rewards? Did this affect switching between position and distance coding?

      (4) It is not shown how the behaviours (e.g., running speed away from the reward zone, licking for reward) in beaconed/non-beaconed/probe trials were different and whether the difference in behaviours led to the different encoding schemes.

      We appreciate these suggestions and will add all of the requested analyses in a revised manuscript. We note here that while the proportion of trial types differed between sessions, in all sessions trial types were varied in a repeating sequence, so blocks of behaviour where grid firing is anchored (or not anchored) to the track coordinates can not be explained as a consequence of a particular trial type. We will make this clearer in a revised manuscript.

      2) Regarding the behaviour and activity encoding on a trial-by-trial basis, did the behavioural change occur first, or did the encoding switch occur first, or did they happen within the same trial? This analysis will potentially determine whether the encoding is causal for the behaviour, or the other way around.

      We agree this is an important point and the corresponding analyses will be reported in a revised manuscript.

      3) The author determined that the grid cell coding schemes were limited to distance encoding and position encoding. However, there could be other schemes, such as switching between different position encodings (with clear spatial fields but at different locations), as indicated by Low et. al., 2021, and switching between different distant encodings (with different distance periods). If these other schemes indeed existed in the data, they might contribute to the variation of the behaviours.

      We did not observe switching between coding schemes of the same type within our dataset and so did not document this. We agree it is important to do so and will provide additional analyses in the revised manuscript

      4) The percentage of neurons categorised in each coding scheme was similar between non-grid and grid cells. This implies that non-grid cells might switch coding schemes in sync with grid cells, which would mean the whole MEC network was switching between distance and position coding. This raises the question of whether the grid cell coding scheme was important per se, or just the MEC network coding scheme.

      We appreciate the suggestion and very much agree that looking at cells outside of just grid cells is important in determining which cells are functionally relevant in spatial behaviours. We will provide additional analyses in a revised manuscript.

      5) In Figure 2 there are several cell examples that are categorised as distance or position coding but have a high fraction of the other coding scheme on a per-trial basis. Given this variation, the full session data in F should be interpreted carefully, since this included all cells and not just "stable" coding cells. It will be cleaner to show the activity comparison only between the stable cells.

      We agree that showing stable examples before introducing examples that switch on a per-trial basis will be helpful. We will amend this in a revised manuscript.

      6) The manuscript is not well written. Throughout the manuscript, there are many unexplained concepts (especially in the introduction) and methods, mis-referenced figures, and unclear labels.

      We appreciate the feedback and will work to address the concerns in a revised manuscript.

      Reviewer #2 (Public Review):

      This study is very timely as there is a pressing need to identify/delimitate the contribution of grid cells to spatial behaviors. More studies in which grid cell activity can be associated with navigational abilities are needed. The link proposed by Clark and Nolan between "virtual position" coding by grid cells and navigational performance is a significant step toward better understanding how grid cell activity might support behavior. It should be noted that the study by Clark and Nolan is correlative. Therefore, the effect of selective manipulations of grid cell activity on the virtual task will be needed to evaluate whether the activity of grid cells is causally linked to the behavioral performance on this task. In a previous study by the same research group, it was shown that inactivating the synaptic output of stellate cells of the medial entorhinal cortex affected mice's performance of the same virtual task (Tennant et al., 2018). Although this manipulation likely affects non-grid cells, it is still one of the most selective manipulations of grid cells that are currently available.

      We appreciate this additional context provided here. In our view, it is critical to narrow down the space of possible behaviours that grid cells might contribute to. As the reviewer notes, our previous work provided evidence that speaks to this question by targeting genetic manipulations (Tennat et al., 2018), but while this approach was specific to stellate cells it does not discriminate grid from non-grid cells and so does not tell us specifically about roles for grid cells. As far as we are aware there is currently no manipulation that will do this. In the experiments here, we take a complementary approach, leveraging the variability inherent in behaviour and the fact that in our location memory task animals will perform many trials in a session. By showing that spatially anchored grid firing does not predict behavioural success on cued trials, but does predict success on trials that are solved by path integration, we substantially narrow the space of behaviours that grid cells could contribute to. Importantly, stellate cells appear necessary for both cued and uncued behaviour in the task (Tennant et al., 2018), suggesting that their roles are more general than the grid cell population, which is likely to be only a subset of stellate cells. We will more carefully address this point in a revised manuscript.

      When interpreting the "position" and "distance" firing mode of grid cells, it is important to appreciate that the "position" code likely involves estimating distance. The visual cues on the virtual track appear to provide mainly optic flow to the animal. Thus, the animal has to estimate its position on the virtual track by estimating the distance run from the beginning of the track (or any other point in the virtual world).

      We agree this terminology has the potential for causing confusion. A simpler descriptive definition would be track-anchored and track-independent rather than position and distance coding. We will consider this and other alternatives for a revised manuscript.

      Reviewer #3 (Public Review):

      This study addresses the major question of 'whether and when grid cells contribute to behaviour'. There is no doubt that this is a very important question. My major concern is that I'm not convinced that this study gives a significant contribution to this question, although this study is well-performed and potentially interesting. This is mainly due to the fact that the relation between grid cell properties and behaviour is exclusively correlative and entirely based on single cell activity, although the introduction mentions quite often the grid cell network properties and dynamics. In general, this study gives the impression that grid cells exclusively support the cognitive processes involved in this task. This problem is in part related to the text. However, it would be interesting to look at the population level (even beyond grid cells) to test whether at the network level, the link between behavioural performance and neural activity is more straightforward compared to the single-cell level.

      We appreciate the feedback and suggestions. As we note in our response to Reviewer #2, there is currently no method for selective manipulation of grid cells, while testing correlation is a critical step on the path to establishing causation. Our study contributes by reducing the space of possible functions of grid cells to exclude behaviours in which local cues are available, while providing evidence for a clear relationship between anchoring of grid cells and successful outcomes when path integration is used for localisation. We’re unclear here about what the reviewer means by ‘more straightforward’ as the relationships we establish do not appear overly complicated, and as strong relationships between activity of single grid cells and populations of grid cells are already well established (Gardner et al., 2021; Waaga et al., 2021; Yoon et al., 2013).

      The authors used a statistical method based on the computation of the frequency spectrum of the spatial periodicity of the neural firing to classify grid cells as 'position-coding' (with fields anchored to the virtual track) and 'distance-coding' (with fields repeating at regular intervals across trials). This is an interesting approach that has nonetheless the default to be based exclusively on autocorrelograms. It would be interesting to compare with a different method based on the similarities between raw maps.

      We’re not sure we understand the point here. The manuscript provides analyses comparing rate maps for activity periods in which grid cells are / are not anchored to the task environment (e.g. Figure 2A-C, Figure 3B-E); when grid cells are anchored the rate maps are clearly spatial, when they are not anchored we show that spatial information (in the track reference frame) is very substantially reduced.

      Beyond this minor point, cell categorization is performed using all trial types. Each trial type (i.e. beacon or non-beacon) is supposed to force mice to use different strategies and should induce different spatial representations within the entorhinal-hippocampal circuit (and not only in the grid cell system). In that context, since all trials are mixed, it is difficult to extrapolate general information.

      Again, we’re not sure we understand the point. We appreciate this likely reflects a lack of clarity on our part in the writing of the manuscript. As noted in our response to Reviewer #1, we will include additional details about the organisation of trials and relationships between trials, behavioural outcomes and neural codes observed. We should note here that mice are not ‘forced’ to adopt any particular strategy. Rather, on uncued trials a path integration strategy is the most efficient way to solve the task. Mice could instead use a less efficient strategy of stopping at short intervals and still obtain rewards, although the behavioural evidence suggests they do not choose to do this after learning the task.

      On page 5 the authors state that 'Since only position representations should reliably predict the reward location, ..., we reasoned that the presence of positional coding could be used to assess whether grid firing contributes to the ongoing behaviour'. I do not agree with this statement. First of all, position coding should be more informative only in a cue-guided trial. Second, distance coding could be as informative as position coding since at the network level may provide information relevant to the task (such as distance from the reward).

      Again, this point perhaps reflects a lack of clarity on our part in writing the manuscript. When grid cells are anchored to the track reference frame (position encoding in the manuscript), then the location of the rate peaks in grid firing is reliable from trial to trial. This is the case whether or not the trial is cued. When grid cells are independent of the track reference frame (distance encoding in the manuscript, but we now appreciate this is a poor choice of words), then the location of the firing rate peaks vary from trial to trial; thus position can not be read out directly from trial to trial. In principle, when grid cells are not anchored to the track the mouse could read out track position by storing the grid network configuration at the start of each trial and then subtracting this from readouts of distance as mice move along the track. If mice do use this computation we would expect them to do so equally well on cued and uncued trials, whereas our results clearly show a dissociation between trial types in the relationship between grid firing and behavioural outcome. We will highlight this possibility in a revised manuscript.

      Third, position-coding is interpreted as more relevant because it predominates in correct trials. However, this does not imply that this coding scheme is indeed used to perform correct trials.

      As we note above, our analyses reduce the space of behaviours to which grid cells might contribute, by providing evidence that anchoring of grid firing is associated with successful outcomes specifically when mice adopt a path integration strategy. We agree that alternative models remain plausible, for example perhaps the behaviourally relevant computations are implemented elsewhere in the brain with grid anchoring to the track as an indirect consequence. Nevertheless, the space of alternative models is substantially reduced given our experiments and analyses, while our approach complements tests of grid-behaviour functions that rely on manipulations which leave open alternative explanations based on off target effects. We expect that inclusion in a revised manuscript of the further analyses suggested above should provide further tests of the grid-behaviour relationship.

      It could be more informative to push forward the correlative analysis by looking at whether behavioural performance can be predicted by the coding scheme on a trial-by-trial basis.

      Figure 5E shows the recommended analysis.

    1. Author Response

      eLife assessment

      This useful study emphasizes some previously ignored aspects of synaptic communication between Purkinje neurons and their targets in the cerebellar nuclei. Reviewers felt that some aspects of the evidence were solid but that others were incomplete.

      We think this is an extensive and complete study. The major issue that the reviewers raised is about the usage of high chloride internals in our recordings. We feel that this single issue does not really match the statement “others were incomplete”, which suggests that this study is incomplete in some way. Please note that in our complete revision we will respond to the issue of chloride by pointing out: (1) the advantages of using high chloride internals to determine the distribution of input sizes, (2) the challenges of estimating the relationship between input sizes for different chloride internals, (3) the previous studies that have established the relationship between input sizes and chloride levels at other synapses, and (4) additional simulations will be provided indicating that subtle changes in the input sizes would have minor quantitative effects on the influences of individual inputs, but would not affect the main conclusions of the paper.

      Reviewer #1 (Public Review):

      This manuscript explores physiological properties of Purkinje-to-nuclear synapses. The report provides largely incremental advances over what has already been discovered about this synaptic relationship. The main findings, as articulated by the authors, are that Purkinje-to-nuclear synaptic strength is variable, with a few very strong inputs to the cerebellar nuclei. They show that single inputs effectively inhibit nuclear firing and that the diversity of synaptic strength influences nuclear neuron responsivity to inputs by enhancing synaptic variance. In addition, while not necessarily surprising, it's nice to see that stronger inputs would have a stronger influence on a postsynaptic cell, both in terms of rates and temporal coding transfer. Overall, as it stands, the manuscript is not very scholarly, overstates the novelty of findings, and frames a straw-man. That said, buried in here are some potentially interesting observations.

      This review provides us with an opportunity to more clearly summarize what is new in our findings. Our study builds upon Person and Raman (2012) and other studies, and makes a number of important advances. (1) We provide a much more extensive characterization of input sizes (n=157) than previous studies, and show that the distribution of input sizes is skewed, with the largest inputs almost 100 times larger than the smallest inputs. This distribution is clearly different from that of Person and Raman (2012), where the estimation of unitary PC input sizes was based on small sample sizes from a broad range of age (n=30, P13-29 animals). The high Cl- concentration internal we used in our recordings provides us with superior stability and sensitivity in detecting such variability in input size. (2) We show for the first time that the distribution of input sizes becomes more skewed in juvenile animals than in young animals, suggesting that PC-CbN synapses are modified by plasticity mechanisms during development. (3) Our dynamic clamp approach is based on the skewed distribution of input sizes we observed, and the Purkinje cell firing patterns we recorded in vivo, whereas Person and Raman (2012) primarily focused their dynamic clamp studies on 40 uniform sized inputs (even though they recognized that there are also somewhat larger inputs), with their firing interspike intervals drawn from Gaussian distributions (which lack refractory periods and do not represent realistic PCs firing patterns). We also complement our dynamic clamp studies with simulations using an integrate-and-fire model that does a good job of replicating our dynamic clamp studies. This allowed us to more thoroughly explore the effects of different size input that would not be practical with dynamic clamp studies. (4) We show that individual PC inputs powerfully regulate the rate and timing of CbN neuron firing, without requiring a high degree of PC synchrony. (5) We further show that timing control by PCs leads to strong inhibition of CbN firing and, surprisingly, a brief elevation prior to the inhibition. This result from the refractory period of PCs, which generate a disinhibition period prior to the inhibition, and is shaped by the firing statistics of PC inputs. If such an elevation prior to inhibition was observed in vivo, it could be misinterpreted as excitation of CbN neurons by other inputs (e.g., mossy fiber collaterals) preceding the PC inputs. (6) We show that the total inhibitory conductance and the coefficient of variation (CV) of this conductance are both important factors in controlling the firing rate of CbN neurons. Having variable input sizes or synchronized inputs all lead to higher CV of the inhibitory conductance and therefore higher firing rates. (7) We show that all different-sized PC inputs transmit a robust rate code that simply depend on their sizes. (8) Our study helps to resolve a long-standing controversy in the field. Some thought that PC synchrony is an effective way of controlling CbN neuron firing, while others doubted the physiological relevance of PC synchrony. Here we show that a single large input is functionally equivalent to many small, perfectly synchronized inputs, which can influence the rate and timing of CbN firing as previously proposed (Person and Raman, 2012a), but without requiring a high degree of PC synchrony. We also suggest that a high degree of synchrony is not a prerequisite for an appreciable influence, because synchronizing a few large inputs can have large effects on CbN neuron firing. We strived to be fair and thorough, and we think that the study is scholarly. Prior to the initial submission, we sought advice from experts in the field, Indira Raman and Nicolas Brunel, and their input was very helpful in this regard. We will revise the manuscript to more clearly articulate what has been done previously, and what aspects of our study are new.

      Reviewer #2 (Public Review):

      In this manuscript, the authors address how cerebellar Purkinje cells (PC) control the firing of nuclear cells (CbN), the output stage of the cerebellar. They used patch-clamp recordings in acute cerebellar slices, and combined dynamic clamp with simulations of nuclear cell firing rate.

      This article addresses one of the most fundamental unresolved question of the cerebellar physiology: how inhibitory PCs control the output stage of the cerebellum?

      They first described a developmental evolution of the that PC-CbN synapses. Inhibitory synaptic weights become highly variable after three weeks of age, with a group of very large PC inputs. They used dynamic clamp to examine the influence of these variable inputs on CbN firing rate. They demonstrate that while all input size affect CbN discharge, larger ones can stop them for a few milliseconds. Using a distribution of variable input size, they showed that increasing the variability of PC inputs favor CbN discharge, while increasing the magnitude of a constant inhibitory conductance decrease their firing rate. By varying the frequency of PC inputs, they suggest that CbNs faithfully transmit rate code, but larger inputs are more effective to decrease their firing rate. Finally, addressing how synchrony of variable PC inputs influence CbN discharge, dynamic clamp studies and simulations showed that input synchronization enhance firing, but driven by the total charge of the inhibitory input.

      The keystone observations that PC inputs are highly variable is very interesting and convincing and open new questions about PC-CbN plasticity. More importantly the combination of dynamic clamp and simulations is a real strength of the study, allowing the authors to test many combinations of inputs in real cells and extrapolating their hypotheses in silico. Weaknesses result from the assumptions made on the construction of the distribution of inputs and the many different conditions explored. The organization of the article could be difficult to read for a non-specialist of cerebellar physiology.

      We thank the reviewer for their kind comments. We will revise the manuscript to clarify the assumptions made to construct the distribution of input sizes. We will do our best to revise the manuscript to make it easier for a non-specialist to read.

    1. Author Response

      We thank the editors and the reviewers for their comments. In response, we plan to revise the manuscript in order to provide the details requested and include additional bioinformatic analysis of the data, along the lines suggested by the reviewers. We will also take into account individual variations among the subjects investigated in this study, and discuss the extent to which factors other than age might contribute to the results. And we will expand the discussion to consider how our results may apply to other cells/tissues and how they relate to other findings in the field.

    1. Author Response

      The following is the authors’ response to the current reviews.

      We will make some minor changes to address the issues in the revised manuscript during preparation of the Version of Record.

      1) Acknowledge the previous discovery that COUPTFII expression is confined to the ventral hippocampus in early human fetal forebrain (doi: 10.1093/cercor/bhx185).

      We agree. We will incorporate the previous discovery that COUPTFII expression is confined to the ventral hippocampus in early human fetal forebrain (doi: 10.1093/cercor/bhx185) in the discussion section of "COUP-TFII governs the distinct characteristics of the ventral hippocampus".

      2) Give some consideration to this observation from my original review "Abnormalities in the trisynaptic circuit. No studies of actual synapses, either physiological or morphological, were carried out. I wonder to what extent these immunohistochemical studies just further reflect the abnormalities in hippocampal morphology presented earlier in the manuscript without specifically telling us about synaptic circuits? Although the immunohistochemical preparations are beautiful, they are inadequate on their own in telling us much about what sort of synaptic circuitry exists in the transgenic animals".

      Our data in Figure 4 show clearly that at the neural circuit level, compared with the corresponding control, the trisynaptic circuit is abnormal in all three models; therefore, in the discussion section of "COUP-TF genes are imperative for the formation of the trisynaptic circuit", we will add the following sentence, "We would like to investigate what sort of synaptic circuitry is compromised either physiologically or morphologically in the trisynaptic circuit of individual animal model in detail in the future studies.

      In addition, we will correct a reference related to the COUP-TFII gene and congenital heart defects.

      The reference of "High, F. A., Bhayani, P., Wilson, J. M., Bult, C. J., Donahoe, P. K., & Longoni, M. (2016). De novo frameshift mutation in COUP-TFII (NR2F2) in human congenital diaphragmatic hernia. Am J Med Genet A, 170(9), 2457-2461. doi:10.1002/ajmg.a.37830" was replaced with "Al Turki, S., Manickaraj, A. K., Mercer, C. L., Gerety, S. S., Hitz, M. P., Lindsay, S., . . . Hurles, M. E. (2014). Rare variants in NR2F2 cause congenital heart defects in humans. Am J Hum Genet, 94(4), 574-585. doi:10.1016/j.ajhg.2014.03.007".

      —————

      The following is the authors’ response to the original reviews.

      Reviewer #1(Recommendations For The Authors):

      1) Better presentation of the western blot results

      We agree with the reviewer. Based on the suggestion, new information about the western blot results has been added in the revised Figure 1Ap. We added a dash to each western blot image to indicate the target band of COUP-TFI (46 KDa), COUP-TFII (45 KDa), and GAPDH (37 KDa), respectively. There were two bands in the blot of COUP-TFII, with the upper band corresponding to mouse IgG at 50 KDa, and the bottom band corresponding to COUP-TFII protein at 45 KDa. Therefore, only the lower bands of COUP-TFII are used for the quantitative analysis. The expression of COUP-TFII in the ventral hippocampus is clearly higher than that in the dorsal hippocampus.

      2) Full presentation of the Immunohistochemistry and qPCR results for at E11.5 and E14.5 in double knockdown mice.

      Thanks for the suggestion. Based on the suggestion, we added immunofluorescent data in the double knockout mice at E11.5 in the Figure 5Ba-h. Meanwhile, given that it takes time to prepare animal samples at E14.5 for RT-qPCR assays, we performed immunofluorescent assays at both E13.5 and E14.5 to make sure that the changes of Lhx5 and Lhx2 expression in the hippocampal regions between the control and mutant mice were consistent. As shown in the new Figure 5B, consistent with the downregulated expression of Lhx5 transcripts in the double mutant, the expression of the Lhx5 protein was reduced in the CH in the double mutants at E11.5; moreover, the numbers of Lhx5-positive Cajal-Retzius cells decreased in the double mutant embryos at E11.5, E13.5 and E14.5 (Figure 5Ba-d, a’-d’, a’’-d’’, i-l, i’-l’, q-t, q’-t’). Consistent with RT-qPCR data, the expression of Lhx2 was comparable between the control and double-mutant mice at E11.5 (Figure 5Be-h, e’-h’). Interestingly, the expression of the Lhx2 protein was increased in the hippocampal primordium in the COUP-TF double-mutant mice at E13.5 and E14.5 (Figure 5Bm-p, m’-p’, u-x, u’-x’). Please find the altered descriptions in the Page 15, lines 347-351, 353-358 and Page 21, lines 500-503 in the revised manuscript.

      3) Minor corrections. Lines 159-162, prospected not quite the right word. I would suggest "an ectopic CA-like region was observed medially in the temporal hippocampus in the COUP1TFII mutant, where the prospective posterior part of the medial amygdaloid nucleus was situated, (MeP), indicated by the star (Figure 1Ba-f). The presence of the ectopic CA-like region in the ventral but not dorsal hippocampus of the mutant was further confirmed by the presence of the prospective MeP and amygdalohippocampal area (AHi) in sagittal sections, as indicated by the star. See also line 251. Line437/438 I would suggest "... most important breakthroughs in understanding the role of the hippocampus in memory."

      Thanks for the suggestion. We made the changes based on the suggestion. Please find the amendments in Page 8, lines 178-181; Page 12, lines 270, 276; Page 14, line 318; Page 19, lines 451; Page 20, lines 461-462 in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      1) It is also important to point out that the immunofluorescence data in Figure 5B is contrary to what is known for Lhx5 (it's not expressed in the neocortical and hippocampal vz) and Lhx2 (it's not expressed in the choroid plexus). Authors should explain how their conclusions could align more clearly, and consider the possibility that their results are due to a possible artifact of image setting issues or worse, antibody specificity issues.

      Very good point. Based on the comments and suggestions, we first tested another Lhx5 antibody, R&D, Cat # AF6290, in the immunofluorescence assays. Indeed, there was something wrong with the previous Lhx5 antibody, Millipore, Cat # AB5762. With the new Lhx5 antibody, consistent with the reported in situ data, the expression of Lhx5 was detected specifically in the CH at E11.5, and in the Cajal-Retzius cells in the marginal zone of the telencephalon. The same Lhx2 antibody, Santa Cruz, Cat # sc-19344, which has been used successfully in one of our previous studies (Tang et al., Development, 2012) (PMID: 22492355), was used in the present study. We believe that the observations at the MP and DP of the samples are really associated with the expression of Lhx2 protein. We performed new immunofluorescence assays with the new Lhx5 antibody and confirmed with the Lhx2 antibody. As shown in new Figure 5B, consistent with the downregulated expression of Lhx5 transcripts in the double mutant, the expression of the Lhx5 protein was reduced in the CH in the double mutants at E11.5; moreover, the numbers of Lhx5-positive Cajal-Retzius cells decreased in the double mutant embryos at E11.5, E13.5 and E14.5 (Figure 5Ba-d, a’-d’, a’’-d’’, i-l, i’-l’, q-t, q’-t’). Consistent with RT-qPCR data, the expression of Lhx2 was comparable between the control and double-mutant mice at E11.5 (Figure 5Be-h, e’-h’). Interestingly, the expression of the Lhx2 protein was increased in the hippocampal primordium in the COUP-TF double-mutant mice at E13.5 and E14.5 (Figure 5Bm-p, m’-p’, u-x, u’-x’). Please find the changed descriptions in Page 15, lines 347-351, 353-358 and Page 21, lines 500-503 in the revised manuscript.

      The reference:

      Tang, K., Rubenstein, J. L., Tsai, S. Y., & Tsai, M. J. (2012). COUP-TFII controls amygdala patterning by regulating neuropilin expression. Development, 139(9), 1630-1639. doi:10.1242/dev.075564

      2) The expression domain of RxCre remains poorly explained, and the early expression of COUPTFI and II (E10.5-E12.5) could be considered major weaknesses of the paper.

      Thanks for the suggestion. The generation of RXCre was reported by Swindell et al., Genesis, 2006 (PMID: 16850473). Given that the activation of the LacZ expression serves as an indicator for the deletion of the COUP-TFII gene (Tang et al., Development, 2012) (PMID: 22492355), we performed the immunofluorescent data with antibodies against COUP-TFII and LacZ on the sagittal sections of RXCre/+; COUP-TFIIF/+ heterozygous mutant and RXCre/+; COUP-TFIIF/F homozygous mice at E11.5. As shown in the new Figure 1—figure supplement 1Da-f, COUP-TFII was readily detected at the hippocampal primordium of the heterozygous mutant embryo at E11.5 (Figure 1—figure supplement 1Da, c, g); in contrast, the expression of COUP-TFII significantly decreased in the homozygous mutant (Figure 1—figure supplement 1Dd, f, j). In addition, compared with the heterozygous mutant embryo, the LacZ signals increased distinctly in the hippocampal primordium of the homozygous mutant embryo at E11.5 (Figure 1—figure supplement 1Db-c, e-f, h, k), suggesting that RXCre recombinase can efficiently excise the COUP-TFII gene in the hippocampal primordium as early as E11.5. Please find the corresponding changes in Page 7, lines 149-159 and Page 8, lines 160-164 in the revised manuscript.

      Meanwhile, we also added the early expression of COUP-TFI and -TFII at E10.5 and E11.5 in new Figure 1—figure supplement 1Aa-d. At embryonic days 10.5 (E10.5), COUP-TFI was detected in the dorsal pallium (DP) laterally and COUP-TFII was expressed in the MP and CH medially (Figure 1—figure supplement 1Aa, b). At E11.5, the expression of COUP-TFII remained in the hippocampal primordium, including MP and CH (Figure 1—figure supplement 1Ac, d). Please find the corresponding changes in Page 6, lines 129-132 and Page 9, lines 202-203 in the revised manuscript.

      The references:

      Swindell, E. C., Bailey, T. J., Loosli, F., Liu, C., Amaya-Manzanares, F., Mahon, K. A., . . . Jamrich, M. (2006). Rx-Cre, a tool for inactivation of gene expression in the developing retina. Genesis, 44(8), 361-363. doi:10.1002/dvg.20225

      Tang, K., Rubenstein, J. L., Tsai, S. Y., & Tsai, M. J. (2012). COUP-TFII controls amygdala patterning by regulating neuropilin expression. Development, 139(9), 1630-1639. doi:10.1242/dev.075564

      Reviewer #3 (Recommendations For The Authors):

      1) Regarding the RxCre line, I was also confused about its spatiotemporal expression, as this line is not a commonly used Cre line and no detailed description is provided in the manuscript. Searching this line shows a previous paper by the authors (PMID: 22492355) in which they tested the RxCre recombinase activity. At E12.5, RxCre induced high LacZ expression in the ventral telencephalon but much less in the dorsal telencephalon. But they did not check later stage. Therefore, it's hard to explain the defective dorsal hippocampus in RxCre, CFI CKO. They should check later stage.

      The generation of RXCre was reported by Swindell et al., Genesis, 2006 (PMID: 16850473), which reveals high Cre recombinase activity of RXCre in the eye and ventral telencephalon. Given that the activation of the LacZ expression serves as an indicator for the deletion of COUP-TFII gene, Tang et al., Development, 2012 (PMID: 22492355), we performed the immunofluorescent data with antibodies against COUP-TFII and LacZ on the sagittal sections of RXCre/+; COUP-TFIIF/+ heterozygous mutant and RXCre/+; COUP-TFIIF/F homozygous mice at E11.5. As shown in new Figure 1—figure supplement 1D, compared with the heterozygous mutant embryo, the expression of COUP-TFII was significantly decreased in the homozygous mutant; in addition, the LacZ signals evidently increased in the hippocampal primordium of the homozygous mutant embryo at E11.5, suggesting that RXCre recombinase can efficiently excise the target gene in the hippocampal primordium as early as E11.5. The expression of COUP-TFI is barely detectable in the early developing hippocampal primordium including MP at E10.5, E11.5 and E12.5. The expression of COUP-TFI is high in the MP of the control (Figure 1Cj, l); in contrast, the COUP-TFI expression is barely detectable in the MP of the homozygous double mutant at E14.5, indicating that RXCre can efficiently delete the COUP-TFI gene in the hippocampal primordium at E14.5. The loss of the COUP-TFI gene in the MP as early as E14.5 by RXCre initiates the defective dorsal hippocampus in RXCre/+; COUP-TFIF/F knockout mice.

      2) Authors should check and review extensively for improvements to the use of English.

      We carefully checked and made changes throughout the manuscript accordingly. For example, “imperative” was used 6 times in the previous manuscript, lines 20, 255, 486, 499, 522, 553; “imperative” was used only once in Page 22, line 522 in the revised manuscript.

      3) Please correct the manuscript; 1-month-old mice are not adult mice.

      Thanks for the suggestion. Based on the suggestion, we have corrected related words and sentences in the manuscript. Please find the amendments in the revised manuscript (Page 7, line 146; Page 9, lines 203-204; Page 10, line 213; Page 13, lines 299-300; Page 17, line 406; Page 20, line 476).

      4) Additional ref should be added at line 93 on page 5.

      Based on the suggestion, we added some new references (Bertacchi et al., EMBO J, 2020) (PMID: 32572460); (Del Pino et al., Cereb Cortex, 2020) (PMID: 32484994); (J. Feng et al., Sci Adv, 2021) (PMID: 34215582) at line 96 on page 5.

      The references:

      Bertacchi, M., Romano, A. L., Loubat, A., Tran Mau-Them, F., Willems, M., Faivre, L., . . . Studer, M. (2020). NR2F1 regulates regional progenitor dynamics in the mouse neocortex and cortical gyrification in BBSOAS patients. Embo j, 39(13), e104163. doi:10.15252/embj.2019104163

      Del Pino, I., Tocco, C., Magrinelli, E., Marcantoni, A., Ferraguto, C., Tomagra, G., . . . Studer, M. (2020). COUP-TFI/Nr2f1 Orchestrates Intrinsic Neuronal Activity during Development of the Somatosensory Cortex. Cereb Cortex, 30(11), 5667-5685. doi:10.1093/cercor/bhaa137

      Feng, J., Hsu, W. H., Patterson, D., Tseng, C. S., Hsing, H. W., Zhuang, Z. H., . . . Chou, S. J. (2021). COUP-TFI specifies the medial entorhinal cortex identity and induces differential cell adhesion to determine the integrity of its boundary with neocortex. Sci Adv, 7(27). doi:10.1126/sciadv.abf6808

      5) I am confused why the authors analyzed 1-month-old mice in some instances but 3-month-old mice in others.

      The RXCre/+; COUP-TFIF/F; COUP-TFIIF/F double mutant mice barely survived beyond postnatal 3 weeks. To make our findings consistent and comparable, we mainly prepared figures with observations on about 1-month-old mice in the RXCre related single or/and double gene mutant mouse models. In the study of the Emx1Cre related COUP-TFI mouse model, due to behavioral tests such as the Morris water maze test, experiments were performed with the adult experimental animal about postnatal 3 months. In order to be consistent with the stage of the mice for the behavioral tests, we only displayed morphological data with observations on the control and Emx1Cre/+; COUP-TFIF/F mutant mice at about postnatal 3-month.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their comments. We have now addressed all the comments in a revised version of the manuscript, which we believe has strengthened our paper.

      1) Introduction LINE 60: the authors cite Funato et al 2016 as the paper first describing a role for SIk3 in sleep regulation. In fact, the role for this kinase was first identified nearly a decade earlier in C. elegans (Van der Linden et al, Genetics 2008 PMID 18832350).

      Thank you for pointing us to this reference. Van der Linden et al. demonstrated that the C. elegans homolog of Sik3 (KIN-29) regulates satiety quiescence, in which worms stop moving following feeding on high quality food. However, as pointed out in Trojanowski and Raizen “Call it Worm Sleep” (2016), not all of the behavioral criteria for sleep has been applied to C. elegans satiety quiescence, and we cannot find any references that unequivocally demonstrate satiety quiescence is a sleep state. As McClanahan et al., (2020) show, quiescent states following mild sensory arousal do not fulfill the sleep criteria of changes in arousal threshold and homeostatic regulation, so not all quiescent states in C. elegans are sleep. Then again Grubbs et al, 2020 does demonstrate that KIN29 regulates both developmentally timed and stress induced sleep states in worms, suggesting that the observations in Van der Linden were ahead of its time and these behavioral states are possibly inter-related. We believe, though, that our line “the roles of… SIK3 kinase in modulating sleep homeostasis in mice (Funato et al. 2016) were identified in genetic screens” remains accurate.

      2) Introduction LINE 71: remove the word "known" from "...while some known human sleep/wake regulators, such as the...")

      Good idea. Done.

      3) I was confused regarding Supplemental data 1 describing the genes they targeted with their forward genetic screen. Am I understanding correctly from the "Summary stats" tab that 702 fish lines with virus insertions were screened behaviorally? In Figure S1, it looks like about 60 are shown in the histograms but in the text (in the Discussion) they say 25 were screened. Were all the genes listed under the Excel tabs (GPCRs, channels, etc) tested? Or was just a subset tested? Where are the sleep data for these lines? Negative results may be relevant to their manuscript since they listed (tested??) a number of ion channel genes under tab "channels" which appear to NOT have a sleep phenotype.

      We apologize for the confusion on these points. As highlighted in the legend to Supplementary Figure S1, we had planned a screening strategy with the following pipeline: Candidate mammalian gene → Zebrafish ortholog → ID viral insertion from “Zenemark” library → grow viral insertion lines from frozen sperm→ phenotype F3 heterozygous and homozygous mutant generation. Unfortunately, the company, Znomics, which held the Zenemark library, could not reliably reconstitute the correct live fish from the sperm library, and of the 702 lines we planned to screen, we could only screen 26 (25 was a typo) lines. We treated heterozygous and homozygous animals for each line independently, for a total of 52 screened lines in the histograms.

      To make this clearer, we have edited the main text as follows (lines 104-105): “For screening, we identified zebrafish sperm samples from the Zenemark collection (Varshney et al., 2013) that harboured viral insertions in genes of interest and used these samples for in vitro fertilization and the establishment of F2 families, which we were able to obtain for 26 lines.” And lines 111-112: “While most screened heterozygous and homozygous lines had minimal effects on sleep-wake behavioural parameters (Figure S1B-S1C),”

      We believe it is important to include the full set of Supplementary Data 1, even though the vast majority of these candidate lines were not tested.

      4) Results LINE 117: remove the word "prominent", which is subjective, from the sentence "...showed a prominent decrease in sleep during the..."

      Good point. Done.

      5) LINES 185-186: did you see any circadian variation in your dmist:GFP protein abundance or localization? Protein trafficking has been described as a mechanism of circadian regulation of excitability.

      For practical reasons, we imaged the membrane localization of Dmist:GFP in plasmidinjected embryos at 90% epiboly, which is about 9 hours after fertilization and when the cells remain large and in a relatively flat epithelium. Thus, we could not follow circadian fluctuations in abundance or localization. For circadian studies, we believe the best method will be to raise an antibody that recognizes Dmist.

      6) LINE 203: does the GFP-tagged Dmist rescue the loss-of-function phenotype? This is relevant to Figure 2E. it is also relevant to the issue of structure-function. If it rescues, then the C-terminus may not be essential to protein function.

      As noted, for practical reasons, we observed Dmist-GFP only transiently at early stages of development, expressed using a strong, ubiquitous promoter. A rescue experiment is a good idea for future experiments, where we carefully control the expression of Dmist in neurons.

      7) LINE 220: explain what you mean by "...consistent with nonsense-mediated decay." and/or give a reference.

      In zebrafish and other species including humans, mutant transcripts that have premature stop codons often undergo “nonsense mediated decay”, whereby the expression levels are largely reduced (Wittkopp et al., 2009). In the zebrafish community, this is often used as secondary evidence of a loss of function mutation, as relatively few antibodies are available to directly observe zebrafish proteins. We have added a reference that describes this phenomenon (Wittkopp et al., 2009).

      8) LINE 225: define "LME model"

      Now reads: “Linear mixed effects (LME).”

      9) LINES 227-229: could the vir/vir phenotype be explained by specific effects on protein structure? could vir/vir be a gain-of-function allele?

      We can’t rule this out formally, and vir/+ animals do show some sleep phenotypes, albeit weaker than those of vir/vir animals (Figure 1G). However, it is not uncommon for heterozygous mutants to show significant phenotypes that are weaker than those of their homozygous mutant siblings, and the strong suppression of dmist expression by the viral insertion (which is located in the dmist intron) is more consistent with a hypomorphic loss-of-function phenotype for the vir allele.

      10) LINES 229-230: I don't quite follow the argument for pursuing further studies only of i8/i8. i8/i8 seems to also be a hypomorphic allele based on your qPCR data.

      First, the dmist viral line was generated by an insertional mutagenesis method followed by sequencing, and each line has multiple other inserts in a background that does not match the background of the other animals reported in this paper. Second, the dmist vir allele is an insertion in the intron, leading to reduced, but not complete loss of expression. In contrast, the i8 allele was generated on the same background strain as our other existing and newly reported lines. Moreover, our i8 line is likely a loss-of-function allele and not a hypomorph. Yes, dmist expression is reduced in the i8 allele; however, this is likely due to nonsense mediated decay of dmist mRNA. The mutation introduces a frameshift in the dmist coding sequence, and as a result the amino acid sequence of the protein is altered after the N-terminal signal sequence.

      11) LINES 241-243: grammar.

      Fixed

      12) LINE 245: define "JackHMMR iterative search"

      We’ve added the phrase: “and seeding a hidden Markov model iterative search (JackHMMR)”

      13) LINE 246 is missing the word "we" prior to "...found distant homology between..."

      Added

      14) LINE 301: show data demonstrating deviation from Mendelian ratios. Also, comment on meaning of such data (embryonic lethality??).

      We have added this data in the line (301):

      “atp1a3b mutant larvae were not obtained at Mendelian ratios (55 wild type [52.5 expected], 142 [105] atp1a3b+/-, 13 [52.5] atp1a3b-/-; p<0.0001, Chi-squared) suggesting some impact on early stages of development leading to lethality.”

      15) Discussion LINES 362-372: This paragraph seems to be of only tangential relevance to the paper. Consider removing.

      Our screening strategy was a large-scale reverse genetic screen, but the number of lines was limited by the technical issues described above. We think it is important to mention that the strategy, if employed today, could benefit from newer technologies.

      16) Discussion. Another model is that Dmist and NaK pump have a developmental effect. Arguing against this developmental model is the Oubain expt.

      This is an important point. We’ve added the line (454:457): “We also cannot exclude a role for Dmist and the Na+/K+ pump in developmental events that impact sleep, although our observation that ouabain treatment, which inhibits the pump acutely after early development is complete, also impacts sleep, argues against a developmental role.”

      17) FIGURE 1G: Are these significance cut offs corrected for multiple comparisons?

      Yes, all the data is corrected for multiple comparisons.

      18) performing neuronal activity measures, either via neural activity imaging or phospho-ERK labeling in different mutants at day or night conditions, to determine whether baseline neuronal activity brain-wide or in specific brain regions are altered.

      These are excellent experiments that we plan to perform in the future.

      19) Please check all Figure numbers for accuracy.

      We have double checked these.

      20) The authors emphasize the role of increased cellular sodium, but equally plausibly, the phenotypes could be due to decreased cellular potassium. The potassium channel shaker has been previously identified as a critical sleep regulator in Drosophila.

      We completely agree. We would like to highlight that we did devote an entire paragraph to the possibility of changes in extracellular potassium in the discussion: “A third possibility is that Dmist and the Na+,K+-ATPase regulate sleep not by modulation of neuronal activity per se but rather via modulation of extracellular ion concentrations. Recent work has demonstrated that interstitial ions fluctuate across the sleep/wake cycle in mice. For example, extracellular K+ is high during wakefulness, and cerebrospinal fluid containing the ion concentrations found during wakefulness directly applied to the brain can locally shift neuronal activity into wake-like states (Ding et al., 2016). Given that the Na+,K+-ATPase actively exchanges Na+ ions for K+ , the high intracellular Na+ levels we observe in atp1a3a and dmist mutants is likely accompanied by high extracellular K+. Although we can only speculate at this time, a model in which extracellular ions that accumulate during wakefulness and then directly signal onto sleep-regulatory neurons could provide a direct link between Na+,K+ ATPase activity, neuronal firing, and sleep homeostasis. Such a model could also explain why disruption of fxyd1 in non-neuronal cells also leads to a reduction in night-time sleep.”

      We also agree that Shaker may be an important component of this sleep regulatory mechanism. Indeed, we previously showed that another potassium channel in zebrafish regulates sleep (Rihel et al., 2010).

      We have emphasized sodium homeostasis in our title and paper only because we were able to directly observe intracellular sodium levels, so we are confident that these have been altered in our mutants. We can only presume that potassium levels have also been altered, but we could not directly observe this.

      21) The similar phenotype between dmist and Fxyd1 in sleep reduction yet very different expression patterns, with dmist being mostly neuronal while fxyd1 being mostly non-neuronal, raise many possible questions: 1) are the sleep phenotypes due to neuronal Na/K imbalance? Or 2) Are the sleep phenotypes due to extracellular Na/K imbalance? Or 3) both? Some feasible experiments may help achieve a better mechanistic understanding of the observed sleep defects.

      Yes, we think these are excellent studies for future work. As noted in the previous point (20), we did discuss the possibility that changes to extracellular potassium might be a parsimonious explanation for the similar phenotypes of fxyd1 and dmist mutants.

      Future experiment suggestions (not required)

      1) Perform a double mutant analysis of fxyd1 and atp1a3a, to determine whether an epistatic relationship similar to that of dmist and atp1a3a is observed in the case of fxyd1 and atp1a3a.

      This is a great experiment that we will do in the future. Unfortunately, the fxyd1 mutant had been sperm frozen during the COVID-19 pandemic, so we cannot do this experiment at this time.

      2) Given the differences in the sleep phenotypes between vir/vir and i8/i8 mutants, would be informative to see the phenotype of the vir/i8 trans-heterozygote.

      This is also a good experiment to perform in the future. Since obtaining the cleaner i8 allele, the dmistvir/vir lines were sperm frozen.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      In this manuscript, the authors investigated the role of Elg1 in the regulation of telomere length. The main role of the Elg1/RLC complex is to unload the processivity factor PCNA, mainly after completion of synthesis of the Okazaki fragment in the lagging strand. They found that Elg1 physically interacts with the CST (Cdc13-Stn1- Ten1) and propose that Elg1 negatively regulates telomere length by mediating the interaction between Cdc13 and Stn1 in a pathway involving SUMOylation of both PCNA and Cdc13. Accumulation of SUMOylated PCNA upon deletion of ELG1 or overexpression of RAD30 leads to elongated telomeres. On the other hand, the interaction of Elg1 with Sten1 is SIM-dependent and occurs concurrently with telomere replication in late S phase. In contrast Elg1-Cdc13 interaction is mediated by PCNA-SUMO, is independent on the SIM of Elg1 but still dependent on Cdc13 SUMOylation. The authors present a model containing two main messages 1) PCNA- SUMO acts as a positive signal for telomerase activation 2) Elg1 promotes Cdc13/Stn1 interaction at the expense of Cdc13/Est1 interaction thus terminating telomerase action.

      The manuscript contains a large amount of data that make a major inroad on a new type of link between telomere replication and regulation of the telomerase. Nevertheless, the detailed choreography of the events as well as the role of PCNA- SUMO remain elusive and the data do not fully explain the role of the Stn1/Elg1 interaction. The data presented do not sufficiently support the claim that SUMO- PCNA is a positive signal for telomerase activation.

      We thank the reviewer for her/his review efforts and opinion. We have re-submitted a new version of the manuscript in which we clarify some of the criticisms presented. In a point-by-point letter we respond to all the specific queries.

      Reviewer #2 (Public Review):

      This paper purports to unveil a mechanism controlling telomere length through SUMO modifications controlling interactions between PCNA unloader Elg1 and the CST complex that functions at telomeres. This is an extremely interesting mechanism to understand, and this paper indeed reveals some interesting genetic results, leading to a compelling model, with potential impact on the field. The conclusions are largely supported by experiments examining protein-protein interactions at low resolution and ambiguous regarding directness of interactions like co-IP and yeast two-hybrid (Y2H) combined with genetics. However, some results appear contradictory and there's a lack of rigor in the experimental data needed to support claims. There is significant room for improvement and this work could certainly attain the quality needed to support the claims. The current version needs substantial revision and lacks the necessary experimental detail. Stronger support for the claims would add detail to help distinguish competing models.

      We thank the reviewer for her/his positive opinion. We have re-submitted a new version of the manuscript in which we clarify some of the criticisms presented by thereferees, and added all the missing experimental details. In a point-by-point letter we respond to all the specific queries.

      Reviewer #3 (Public Review):

      This paper reveals interesting physical connections between Elg1 and CST proteins that suggest a model where Elg1-mediated PCNA unloading is linked to regulation of telomere length extension via Stn1, Cdc13, and presumably Ten1 proteins. Some of these interactions appear to be modulated by sumolyation and connected with Elg1's PCNA unloading activity. The strength of the paper is in the observations of new interactions between CST, Elg1, and PCNA. These interactions should be of interest to a broad audience interested in telomeres and DNA replication.

      We thank the reviewer for her/his positive opinion. We have re-submitted a new version of the manuscript in which we clarify some of the criticisms presented. In a point-by-point letter we respond to all the specific queries.

      What is not well demonstrated from the paper is the functional significance of the interactions described. The model presented by the authors is one interpretation of the data shown, and proposes that the role of sumolyation is temporally regulate the Elg1, PCNA and CST interactions at telomeres. This model makes some assumptions that are not demonstrated by this work (such as Stn1 sumolyation, as noted) and are left for future testing. Alternative models that envision sumolyation as a key in promoting spatial localization could also be proposed based on the data here (as mentioned in the discussion), in addition to or instead of a role for sumolyation in enforcing a series of switches governing a tightly sequenced series of interactions and events at telomeres. Critically, the telomere length data from the paper indicates that the proposed model depicts interactions that are not necessary for telomerase activation or inhibition, as telomeres in pol30-RR strains are normal length and telomeres in elg1∆ strains are not nearly as elongated as in stn1 strains. One possibility mentioned in the paper is the PCNAS and Elg1 interactions are contributing to the negative regulation of telomerase under certain conditions that are not defined in this work. Could it also be possible that the role of these interactions is not primarily directed toward modulating telomerase activity? It will be of interest to learn more about how these interactions and regulation by Sumo function intersect with regulation of telomere extension.

      We present compelling evidence for a role of SUMOylated PCNA in telomere length regulation. Figure 1 shows that this modification is both necessary and sufficient to elongate the telomeres, indicating that PCNA SUMOylation plays a positive role in telomere elongation. The model we present is consistent with all our results. There are, of course, possible alternative models, but they usually fail to explain some of the results. We agree that the fact that pol30-RR presents normal-sized telomeres implies that SUMO-PCNA is not required for telomerase to solve the "end replication problem", but rather is needed for "sustained" activity of telomerase. Since elongated telomeres (by absence of Elg1 or by over-expression of SUMO-PCNA) was the phenotype monitored, this may require sustained telomerase activity. Similar results were seen in the past for Rnr1 (Maicher et al., 2017), and this mode depends on Mec1, rather than Tel1 (Harari and Kupiec, 2018). Telomere length regulation is complex, and we may not yet understand the whole picture. It appears that for normal “end replication problem” solution, very little telomerase activity may be needed, and spontaneous interactions at a low level may suffice. Future work may find the conditions at which telomerase switches from "end replication problem" to "sustained" activity. We have added further explanations on this subject to the Discussion section.

      We suspect, but could not prove, a role for Stn1 SUMOylation in the interactions. SUMOylation is usually transient, and notoriously hard to detect, and despite the fact that many telomeric proteins are SUMOylated, Stn1 SUMOylation could not be shown directly by us and others (Hang et al, 2011).

      Reviewer #1 (Recommendations For The Authors):

      Suggestions for improved or additional experiments, data or analyses.

      • My main concern is the claim that SUMOylated PCNA acts as a positive signal for telomerase activation. Yet the pol30-RR mutant has no impact on telomere length. The explanation of the authors is not entirely convincing.

      We are aware that the regulation of telomere length is complex, and we may not fully understand it yet. Just consider the fact that ~500 genes participate in determining the final telomere length of a yeast (Askree et al., 2004). Since mutation in EACH of these genes has a phenotype, the implication is that the joint action of 500 players determines the outcome (a dialogue of 500 participants). Having said this, we clearly show in figure 1 that mutations that prevent PCNA SUMOylation prevent telomere length elongation in cells lacking Elg1, and overexpressing SUMOylated PCNA is enough to elongate the telomeres. Thus, SUMOylation of PCNA does act as a positive signal for elongation.

      However, it appears that to fulfill the minimal requirement of dealing with the "end- replication problem", PCNA SUMOylation is not required, and only a "sustained activity" mode requires the S-PCNA signal (as we have also shown, surprisingly, for RNR1, Maicher et al. 2017). This sustained activity mode depends on Mec1, rather than Tel1 (Harari and Kupiec, 2018). Since elongated telomeres (by absence of Elg1 or by over-expression of SUMO-PCNA) was the phenotype monitored, this may require sustained telomerase activity. Telomere length regulation is complex, and we may not yet understand the whole picture. It appears that for normal “end replication problem” solution, very little telomerase activity may be needed, and spontaneous interactions at a low level may suffice (for example, unmodified PCNA may promote telomerase activity at a lower level than that of SUMO-PCNA. Future work may find the conditions at which telomerase switches from "end replication problem" to "sustained" activity.

      We have added further explanations on this subject to the Discussion section.

      • The model is entitled « Elg1 negatively regulates the telomere length by forming an interaction with the CST complex ». Nevertheless, expression of PCNA-RR completely reversed the long telomere phenotype of elg1∆ cells. Thus it appears that although the interaction between Stn1 and Cdc13 is reduced in the absence of Elg1, Elg1/Stn1 interaction is not instrumental in the formation of the CST complex and thus in the termination of telomerase activity. Does the elg1∆SIM mutant that does not interact with Stn1 impact telomere length?

      • In the model part (lane 318), it is argued that the complex Elg1-Stn1 unloads SUMOylated PCNA. Elg1-Stn1 interaction depends on the SIM of Elg1. This SIM is however not required for Elg1's function in genome-wide SUMO-PCNA unloading, is it required specifically at telomeres?

      The interactions between Elg1 and SUMOylated PCNA are carried out through both the SIM and the Threonines 386 and 387 (Shemesh et al, 2017). Consistently, the single elg1-SIM mutant has telomeres of normal length, and its effects on telomere length can only be seen when combined with mutations in the Threonines (elg1- TT386/7AA or elg1-TT386/7DD). Although the unloading of SUMOylated PCNA by Elg1 is important, the gene is not essential, and PCNA is either eventually unloaded by RFC, or spontaneously dis-assembles. This explains why the telomere length does not reach the same length in the absence of Elg1 as in the absence of, say, Stn1.

      • The model suggests that Elg1 promotes the interaction between Cdc13 and Stn1. This is based on the data presented in Figure 5 E and F. This is an important result. Because the experiment has been done on cells synchronized in S phase and the Elg1/Stn1 interaction occurs specifically at the end of S-phase, the FACS profile should be shown or a control provided to show that the two conditions are comparable.

      The FACS profile for this experiment is shown in Figure 5C.

      • Does the interaction between Cdc13 and Pol30 depend on the SUMOyaltion of POL30 ?

      Yes. We have added this as new Figure S2, and presented the results together with Figure 3 (Figure 3 is already too crowded).

      Others points :

      • Fig 1 : it should be mentioned in the Materials and Methods or in the figure legend how the average telomere lengths (horizontal bar) were calculated from the teloblot, as the position of the bar is not always intuitive

      We estimate telomere length by using TelQuant (Rubinstein et al., 2014). We have added this to the Methods section.

      -Fig 2 : Owing to the large span of telomere length in the stn1 mutants, the epistatic relationship between elg1∆ and stn1 mutants is poorly illustrated by the teloblot.

      We repeated this experiment several times, and stn1 mutants consistently gave a very spread telomere length. In ALL the blots, however, the double mutants elg1 stn1 showed a telomere length similar to that of the single stn1 mutant, and never longer.

      • It is mentioned that other mutants in the collection showed epistasis. Are any of these mutants related to telomere replication or the proposed model?

      Since we used the collection of non-essential mutants (so far), it was quite devoid of genes involved in DNA replication, which are mostly essential. An exception was siz1, which showed epistasis with elg1Δ.

      • The section entitled « Elg1's functional activity is essential for its interaction with Cdc13 » (lane 205) is difficult to follow. The hierarchy between the different mutants of Elg1 on their capacity to unload PCNA is not totally in agreement with the data published in Itzkovich et al 2023 and Shemesh et al. 2017. In particular it appears to me from these papers that elg1-WalkerA 238 (KK343/4AA) mutant did not show a defect in contrast to elg1-WalkerA 238(KK343/4DD).

      We are sorry for the typo in the results. We used the elg1-WalkerA (KK343/4DD) allele, which has a normal SIM but no activity. In a nutshell, we used mutants that either did or did not show unloading activity and/or SIM. The results clearly show that you need to unload PCNA in order for the N-ter of Elg1 to interact with Cdc13.

      • Are the synchronization done at 30{degree sign}C ?

      Yes. We have added the information to the Methods section.

      • ChIP experiments are not described in the Materials and Methods

      We apologize for this. They are now described.

      • In the figure 6, the PCNA rings are curiously placed at the beginning of the Okasaki fragments.

      We thank the referee for noticing, we have corrected the figure.

      Reviewer #2 (Recommendations For The Authors):

      This paper purports to unveil a mechanism controlling telomere length through SUMO modifications controlling interactions between PCNA unloader Elg1 and the CST complex that functions at telomeres. This is an extremely interesting mechanism to understand, and this paper indeed reveals some interesting genetic results, leading to a compelling model, with potential impact on the field. The conclusions are largely supported by experiments examining protein-protein interactions at low resolution and ambiguous regarding directness of interactions like co-IP and yeast two-hybrid (Y2H) combined with genetics. However, some results appear contradictory and there's a lack of rigor in the experimental data needed to support claims. There is significant room for improvement and this work could certainly attain the quality needed to support the claims. The current version needs substantial revision and lacks necessary experimental detail. Stronger support for the claims would add detail to help distinguish competing models.

      Specific comments:

      Insufficient technical detail: I could find no explanation of how overexpression was achieved. No description of how teloChIP is performed, either for the PCNA IP or how the sequence analysis is performed. Too limited details on growth like exact temperatures for the cell cycle time course.

      We have significantly expanded the Methods section to include all the technical information.

      Please do not bold and underline text for emphasis-EVER

      We have removed those from the text.

      Lines 130-132: they have not shown "accumulation of SUMOylated PCNA" anywhere; this is an inference.

      We have modified the text, it says: ”show that SUMOylated PCNA, and not unmodified or ubiquitinated PCNA, is both necessary and sufficient for telomere elongation in the presence or in the absence of Elg1.”

      Fig 2A Can authors show any other very long-telomere mutant like stn1 that does show enhancement in combination with elg1∆ to show feasibility of such phenotype?

      We don't think it is appropriate for the paper, but we have systematically created double mutants with elg1Δ and found many additive and even synergistic interactions. Here is an example. in Author response image 1, taken from the PhD thesis of Taly Ben-Shitrit, a PhD student in the lab.

      Author response image 1.

      What about cdc13 or ten1? Epistatic?

      We did not test telomere length in combination with Ten1. Combining elg1 with cdc13-50 resulted in synergistic elongation. Given the complex genetic relationship between Stn1/Ten1 and Cdc13, it is hard to interpret this result.

      Seems tenuous to use Y2H to decipher protein-protein interactions occurring out of context (i.e., not at telomere but at reporter gene promoter)

      Y2H is a great method to detect interactions, even if they are transient. Whenever possible, we confirm our findings using co-IP or telo-ChIP.

      Lines 268-270: It would be more accurate to state "can be" instead of "becomes" or "is" as they have not shown that SUMOylation or PCNA unloading have occurred.

      We agree, and have changed the text.

      Cdc13snm protein level?

      Unfortunately our Western blot is not presentable, but the level of Cdc13snm was similar to that of the wt Cdc13, and this result has been already published by Hang et al., 2011.

      Fig S3A: If SUMOylated Cdc13 mediates the Stn1-Elg1 interaction, why is Stn1-Elg1 interaction maintained in cdc13snm strain? This result seems to directly contradict the premise and overall conclusion of this section that Cdc13-SUMO mediates the (Y2H) interaction of Elg1 and Stn1.

      According to our model, the interaction between Stn1 and Elg1 takes place upstream, and only then this complex interacts with SUMOylated Cdc13. Hence, if Cdc13 cannot be SUMOylated, the interaction Elg1-Stn1 is not lost, although Stn1 fails to interact with Cdc13, leading to a telomeric phenotype.

      Line 279: which data establishes Stn1-Elg1 interaction as direct? Fig 2B co-Ip indicates physical but not necessarily direct interaction, but later the authors suggest that the interaction requires a SUMOylated intermediary, and Y2H in Fig. S3B doesn't demonstrate direct interaction.

      We have changed the text, taking out the word "direct".

      Co-Ip shows that interaction of Elg1 with Stn1 occurs mainly during later Sphase and with an overall delay compared to initial Elg1-Pol3 interaction.Co-IP Interaction between Cdc13 and Stn1 is reduced in the absence of Elg1

      The subsection title: "The interaction of Elg1 with Stn1 takes place at telomeres only at late S-phase" is not well supported by the data. I agree the data are consistent with the idea of the interactions occurring at telomeres but there's no direct evidence of this.

      We have changed the subsection title. It now reads: " The interaction of Elg1 with Stn1 takes place only at late S-phase"

      Model: Is unloading happening at the fork? Doesn't PCNA unloading have to follow its loading which occurred behind the fork particularly on the lagging strand? Model now suggest that Stn1 itself is SUMOylated.

      Yes, according to the model Elg1 moves with the fork, unloading PCNA from the lagging strand. Once Elg1 reaches the telomeres, it interacts with Stn1 (Figure 5). This interaction requires SUMOylation of Stn1 or of some other protein, which is not PCNA (Figure 3D) nor Cdc13 (Figure S3A) and could be Stn1 itself or another telomeric protein (Hang et al., 2011)

      Title is rather vague.

      We think it summarizes what we present in the paper.

      Abstract:

      "We report that SUMOylated PCNA acts as a signal that positively regulates telomerase activity."

      I don't think this is supported or a good description of what they find

      Figure 1B clearly shows that SUMO-PCNA is both necessary and sufficient for telomere elongation.

      "and dissected the mechanism by which Elg1 and Stn1 negatively regulates telomere elongation, coordinated by SUMO."

      Again, I don't think this is sufficiently supported and the model invokes SUMOylation events not demonstrated like Stn1, which might be a significant step forward.

      On the positive side, their model makes several predictions that they could test much more directly and rigorously: for example, examining the impact of the relevant mutations in the recruitment of proteins to the telomere.

      We have dissected the mechanism, and future work will be devoted to examining the impact of the relevant mutations in the recruitment of proteins to the telomere.

      Reviewer #3 (Recommendations For The Authors):

      Comments:

      1) The telomere length analysis data presented here is consistent with an interpretation that Stn1 and Elg1 play roles in a similar telomere maintenance pathway because the telomere restriction fragment pattern in the double mutants are not longer than the stn1 single mutants. No comment is made with respect to the yellow bars in Figure 2 that presumably measure telomere length appearing to be slightly shorter than in the stn1 single mutants. It may be interesting and informative if the double mutants do in fact have some phenotype distinct from the single stn1 mutants. Is there an impact on viability in the double mutant?

      Given the variable telomeric phenotype of the single stn1 mutants, slight variations in the measurement of the median telomere size are expected. The difference observed is not likely to be significant. What is important is that the double mutants with elg1 do not show longer telomeres. In terms of fitness, the stn1 mutants grow slightly slowly, but the elg1 mutation does not slow them down further.

      2) It is somewhat surprising that no additional telomere length analysis is included that actually tests the proposed model, including whether this path could be operational only under certain conditions. Maybe this is a topic of the next paper?

      Indeed, future work will explore the conditions under which PCNA SUMOylation is essential, and those under which is only needed.

      3) Were the error bars in Figure 5F determined only from the experiment in E? Does this represent error in measuring the data from one biological replicate? The type of error should be made clear to avoid readers assuming the data represents measurements from more than one sample in more than one experiment. The data would be stronger if it represented measurements from multiple experiments.

      The graph was made with data from three biological replicates. We show the best blot in Figure 5E. We have now stressed this in the Figure Legend.

      4) Why was only one two hybrid reporter shown? Having the multiple reporters can give confidence in interactions. (Not a big deal here given the nice co-IP data.)

      We thought that it is enough to show one reporter, as the results with a different reporter (B-gal assay) led to the same conclusions. since this did not add information and made the paper too lengthy (and boring), we took them out. In any case all data was verified by co-IP.

      5) Line 414 - what are the 32P-radio labeled PCR fragments? Are these solely comprised of TG1-3 repeats of some length? A bit more detail in this aspect of the method could be helpful.

      We have added an explanation on the probe in the Methods section.

      6) Line 432-433 - which anti-HA or anti-My antibodies are these? (very minor detail)

      We have added the details.

    1. Author Response:

      We would like to thank the eLife reviewers for the considerable time and effort they have invested to review these manuscripts. We have also benefited from a previous round of review of the manuscript describing the proposed burial features, which underwent two rounds of revisions in a high-impact journal over a period of approximately 8 months during 2022 and early 2023. Both sets of reviews have reflected mixed responses to the evidence we have presented, with one reviewer recommending acceptance with minor editorial revisions, two recommending acceptance with minor revisions and the fourth recommending rejection based upon similar arguments to those reflected by some of the reviewers in this current round of reviews in eLife. Ultimately the managing editor of this first journal took the decision that the review process could not be completed in a timely manner and rejected the manuscript although the submission here reflected our consideration of these reviewers suggestions.

      We have chosen in this initial response to the eLife reviews to include some references to the previous anonymous reviews in order to illustrate differences of opinion and differences in revision suggestions within the review process. Our goal is to offer maximal insight into our decision-making process and to acknowledge the considerable time and effort put into the assessment of these manuscripts by reviewers (for eLife and in the case of the earlier review process). We hope that this approach will assist the readers, and reviewers, of our manuscripts in understanding why we are proceeding with certain decisions during the revision process.

      This is a new process for us and the reviewers, and one way in which it significantly differs from more traditional review is that both the reviews and our reply will be public well in advance of our revisions to the manuscript. Indeed, considering the scope of the reviews, some of those revisions may take considerable time, although many can be accomplished fairly easily. Thus, we are not in a position to say that we have solved every issue raised by the reviewers. Instead, we will examine what appear to be the key critical issues raised regarding the data and the analyses and how we propose to address these as we revise the papers. We will also address several philosophical and ethical issues raised by the reviews and our proposal for dealing with these. More specific editorial and citational recommendations will be dealt with on a case-by-case basis, and we do not address these point-by-point in this reply. Please note, this response to the reviewers is not the revision of the manuscript and is only the initial opinion of the corresponding authors with some guidance from the larger group of authors of all three papers. Our final submitted revision will reflect the input of all authors included on those submissions.

      We took the decision to submit three separate papers consciously. The two different categories of evidence, burials and engravings, involve different kinds of analysis and different (although overlapping) teams of researchers, and we recognized that each deserved their own presentation and assessment. Meanwhile, together they inform the context of H. naledi in a way that requires some synthetic discussion, in which both kinds of evidence are relevant, leading to a third paper. But the mutual relevance of these different kinds of evidence and their review by a common set of reviewers naturally raises cross-cutting issues, and the reviewers have cross-referenced the three articles. This has sometimes led to suggestions about one manuscript based on the contents of another. Considering the situation, we accepted the recommendation that it would be clearer to consider all three articles in a single reply. Thus, while each of the three papers will proceed separately during the revision process, it will be necessary to highlight across all three papers occasionally in our responses.

      Scientific Issues:

      In reading the reviews, we feel there are 9 critical points/assertions raised by one or more of the reviewers that present a problem for, or challenge to, our hypothesis that the observed evidence (bone accumulations and engravings) described in the Dinaledi subsystem are of intentional naledigenic origin. These are:

      1. The evidence presented does not demonstrate a clear interruption of the floor sediments, thus failing to demonstrate excavated holes.

      2. The sediments infilling the holes where the skeletal remains are found have not been demonstrated to originate from the disruption of the floor sediments and thus could be part of a natural geological process (e.g. water movement, slumping) or carnivore accumulations.

      3. Previous geological interpretations by our research group have given alternative geological explanations for formation of the bony accumulations that contradict the present evidence presented here and result in alternative origins hypotheses.

      4. Burial cannot be effectively assessed without complete excavation of the features and site.

      5. The skeletal remains as presented do not conform clearly to typical body arrangement/positions associated with human (Homo sapiens) burials.

      6. There is no evidence of grave goods or lithic scatters that are typically associated with human burials.

      7. Humans may have been involved with the creation of either the Homo naledi bone accumulations, the engravings, or both.

      8. Without a date of the engravings, the null hypothesis should be the engravings were created by Homo sapiens.

      9. The null hypothesis for explanation of the skeletal remains in this situation should be “natural accumulation”.

      Our analysis of the Dinaledi Feature 1 leads us to accept that the laminated orange-red mudstone (LORM) sedimentary layer is interrupted, indicating a non-natural intervention, and that the hole created by the interruption was then filled by both a fleshed body (and perhaps parts of other bodies) which were then covered by sediment that originated from the hole that was dug. We recognize that the four eLife reviewers are not convinced that our presentation is sufficient to establish this. Interestingly, this was not the universal opinion of earlier reviewers of the initial manuscript several of whom felt we had adequately supported this hypothesis. The lack of clarity in this current version of the burial manuscript is our responsibility. In the upcoming revision of this paper to be submitted, we will take the reviewers’ critiques to heart and add additional figures that illustrate better the disruption of the LORM and clarify the sedimentological data showing the material covering the skeletal remains in the hole are the disrupted sediments excavated from the same hole. We are proposing to isolate this most critical evidence for burial into a separate section in the revised submission based on the reviewers’ comments. The fact that the LORM layer is disrupted, a fleshed body was placed in the hole created by this disruption, and the body (and perhaps parts of other bodies) was/were then covered by the same sediments from the hole is the central feature of our hypothesis that the bone accumulations observed reflect a burial and not a natural process.

      The possibility of fluvial transport or involvement in the subsystem is a topic that we have addressed extensively in past work, and it is clear from these reviews that we must enhance our current manuscript to discuss this issue at greater length. Our previous work (Dirks et al. 2015; Dirks et al. 2017) emphasized that fluvial transport of whole bodies into the subsystem was precluded by several lines of sedimentological evidence. We excavated a rich accumulation of skeletal remains, including articulated limbs and other elements in subvertical orientations inconsistent with slow sedimentary infill, which were difficult to explain without positing either a large and dense pile of bodies and/or sediment movement. We encountered fractured chunks of laminated orange-red mudstone (LORM) in random orientations within our excavation area, within and among skeletal remains, which directly refuted that the remains were inundated with water at the time of burial, and this limited the possibility of fluvial transport. Water flow sufficient to displace bodies or complete skeletal evidence would also transport large and course sediment, which is absent from the subsystem, and would sort the commingled skeletal material that we found by size, which we do not observe. But our excavation only covered less than a square meter at very limited depth, and this was the limit to our knowledge of subsurface sediment. We thus were left with uncertainty that led us to suggest the possibility of sediment slumping or movement into subsurface drains, although these were not observed near our excavation. Our current work expands our knowledge of the subsurface and presents an alternative explanation for the disposition of skeletal remains from our earlier excavation. But we acknowledge that this new explanation is vulnerable to our own previous published proposals, and we must do a better job of explaining how the new information addresses our previous suggestions. By not clearly creating a section where we explained how these previous hypotheses were now nullified by new evidence, we clearly confused the reviewers with our own previous work. We will revise the manuscript by enhancing the review of the significant geological evidence demonstrating that there is no significant fluvial action in the system and making it clear how the burial hypothesis provides a clearer explanation for the situation of skeletal remains from our previous excavation work.

      One of the central issues raised by reviewers has been a perceived need to excavate these features completely, totally exhuming all skeletal remains from them. Reviewers have written that it is necessary to identify every skeletal element that is present and account for any missing elements. On this point, we have both ethical and scientific differences from these reviewers. We express our ethical concerns first. Many of the best-preserved possible burials ever discovered by archaeologists were subjected to total excavation and exhumation. Cases like La Chapelle-aux-Saints, La Ferrassie, and Skhūl were fully excavated at a time when data recording and excavation methods did not include the range of spatial and geomorphological approaches that later became routine. The judgment of early investigators that these situations were intentional burials was challenged by later workers, and the kind of information that might enable better tests had been irrevocably lost (Gargett 1999; Dibble et al. 2015; Rendu et al. 2014).

      Later, improved excavation standards have not sufficed to remove uncertainty or debate about possible burials. For example, it was long presumed that well-preserved remains of young children were by themselves diagnostic of intentional burial, such as those from Dederiyeh, Border Cave, or Roc de Marsal. Such cases were also fully excavated, with adequate documentation of the positioning of skeletal remains and their surrounding stratigraphic situation, but such cases were later challenged on several bases and the complete exhumation of material has confused or precluded testing of new hypotheses (e.g. Gargett 1999). The case of Roc de Marsal is one in which data from the initial excavation combined with data from the initial excavation combined with re-excavation and geoarchaeological analysis led to a naturalistic interpretation of the skeletal material (Sandgathe et al. 2011; Goldberg et al. 2017). But even in this case, the researchers erred in their interpretation of the skeleton’s situation due to a lack of identification of parts of the infant’s skeleton (Gómez-Olivencia and García-Martinez 2019). That is to say, it is not only the burial hypothesis but other hypotheses that suffer from complete excavation. Researchers concerned with preserving all possible information have sometimes taken extraordinary measures to remove and study possible burials at high-resolution in the laboratory. Such was the case of the Shanidar IV burial removed from the site and transported in plaster jacket by Solecki, which led to the disruption and loss of internal stratigraphic information (Pomeroy et al. 2020). Arguably, the current state of the art is full excavation with partial preparation, such as that undertaken at Panga ya Saidi (Martinón-Torres et al. 2021). But again, any future attempt to reinterpret or test the hypothesis of burial must rely on the adequacy of documentation as the original context has been removed.

      In our decision to leave material in place as much as possible, we are expanding upon standard practice to leave witness sections and unexcavated areas for future research. The situation is novel, representing possible burials by a nonhuman species, and that makes it doubly important in our opinion to be conservative in not fully exhuming the skeletal material from its context. We anticipate that many other researchers, including future investigators, will suggest additional methods to further test the hypothesis of burial, something that would be impossible if we had excavated the features in their entirety prior to publishing a description of our work. We believe strongly that our ethical responsibility is to publish the work and the most likely interpretation while leaving as much evidence in place as possible to enable further testing and replication. We welcome the suggestions of additional methods/analyses to test the H. naledi burial hypothesis.

      This being said, we also observe that total exhumation would not resolve the concerns raised by the reviewers. The recommendation of total exhumation is in pursuit of a full account of all skeletal material present and its preservation and spatial situation, in order to demonstrate that they conform to body positions comparable to human burials. As has been highlighted in forensic casework, the excavation of an inhumation feature does not necessarily provide an accurate spatial or anatomical manifest of the stratigraphical relationships between the body, encapsulating matrix, and any cut present due to preservational, taphonomic and operational factors (Dirkmaat and Cabo, 2016; Hunter, 2014). In particular, in cases where skeletal elements are highly fragmented, friable, or degraded (such as through bioerosion) then complete excavation—even under controlled laboratory conditions—may destroy bone and severely limit skeletal identification (Henderson, 1997; Hochrein, 2002; Owsley and Compton, 1997), particularly in elements where the ratio of trabecular to cortical bone is high (Darwent and Lyman, 2002; Lyman, 1994). As such, non-invasive methods of 3D and 4D modelling (preservation in situ) are often considered preferable to complete necropsy or excavation (preservation by record) where appropriate (Bolliger and Thali, 2009; Dell’Unto and Landeschi, 2022; Randolph-Quinney et al., 2018; Silver, 2016). 

      The test of burial is not primarily positional, but taphonomic and geological. The position and number of bones can elaborate on process-driven questions of decay and destruction in the burial environment, or post-mortem modification, but are not singularly indicative of whether the remains were intentionally buried – the post-mortem narrative of all the processes affecting the cadaveric island is required (Knüsel and Robb, 2016). In previous cases, researchers have disputed or accepted the hypothesis of intentional hominin burial based upon assumptions about how modern humans or Neandertals would have positioned bodies, with the idea that some positions reflect ritual intent while others do not. But applying such assumptions is unjustifiable, particularly for a species like H. naledi, whose culture may have differed fundamentally from our own. Our work acknowledges that the present evidence does not enable a full reconstruction of the burial positions, but it does show that fleshed remains were encased in sediment prior to decomposition of soft tissue, and that subsequent spatial changes can be most parsimoniously explained by natural decomposition within sedimentary matrix contained within a burial feature (after Green, 2022; Mickleburgh and Wescott, 2018; Mickleburgh et al., 2022). If the argument is that extraordinary claims require extraordinary evidence, we feel that the evidence documents excavation and interment (and will do so more clearly in the revision) and the fact of the remains do not match a “typical” human burial in body positioning is not in itself evidence that these are not H. naledi burials.

      We feel that the reviewers (in keeping with many palaeoanthropologists) have a clear idea of what they “think” a burial should look like in an idealised sense, but this platonic ideal of burial form is not matched by the extensive literature in archaeothanatology, funerary archaeology and forensic science which indicates enormous variability in the activity, morphology and post-mortem system experienced by the human body in cases of interment and body disposal (e.g. Aspöck, 2008; Boulestin and Duday, 2005 and 2006; Connelly et al., 2005; Channing and Randolph-Quinney, 2006; Cherryson, 2008; Donnelly et al., 1995; Finley, 2000; Hunter, 2014; Parker Pearson, 1999; Randolph-Quinney, 2013). Decades of experience in the identification, recovery and interpretation of clandestine, deviant, and non-formal burials indicates the platonic ideal is rare, and in many contexts, the exception (Cherryson, 2008; Parker Pearson, 1999). This variability is particularly relevant to morphological traits in burial context, such as the informal nature of the grave cut in plan and section, shallow burial depth, and initial disposition of body (placement) during the early post-mortem period. These might run counter to the expectations of reviewers or others referencing the fossil hominin record, but are well accepted within the communities of researchers investigating Holocene archaeological sites and forensic contexts.

      It is encouraging to see reviewers beginning to incorporate the extensive (often experimentally derived) literature from archaeothanatology and forensic taphonomy in their deliberations, and we will be taking these comments on board going forward. In particular, we acknowledge reviewers’ comments and the need to construct a more detailed post-mortem narrative, accounting for joint disarticulation (labile versus persistent joints etc), displacement, and final disposition of elements within the burial space. As such we will incorporate the hierarchy of decomposition (rank order disarticulation), associations between regions of anatomical association, areas of disassociation, and the voids produced during decomposition (after Mickleburgh and Wescott, 2018; Mickleburgh et al., 2022) into our narrative. In doing so we acknowledge the tensions between the inductive archaeolothanatological narrative-driven approach (e.g. Duday, 2005 & 2009) versus robust decomposition data derived from human forensic taphonomic experimentation recently articulated by Schotsmans and colleagues (2022) - noting that we will highlight comparative data based on forensic experimental casework and actualistic modelling over inductive intuitive approaches which come with significant evidential shortcomings (Bristow et al. 2011).

      Finally, from a taphonomic perspective it is worth pointing out to reviewers that we have already addressed the issue of lack of taphonomic evidence for carnivore involvement in the formation of the Dinaledi assemblage (Dirks, et al., 2016). Absence of any carnivore-induced bone surface modifications, patterns of skeletal part representation, and a total absence of any carnivore remains found within the Dinaledi chamber (following Kuhn and colleagues, 2010) lead us to reject carnivores as possible vectors of body accumulation within the Dinaledi Chamber and Hill Antechamber.

      Reviewers suggest that without a date derived from geochronological methods, the engravings cannot be associated with H. naledi, and that it is possible (or probable) that the engravings were done in the recent past by H. sapiens. This suggestion neglects the context of the site. We have previously documented the structure and extremely limited accessibility of the Dinaledi subsystem. This subsystem was not recorded on maps of the documented Rising Star Cave system prior to our work and its discovery by our teams. Furthermore, there is no evidence of prehistoric human activity in the areas of the cave related to possible subterranean entrances There is no evidence that humans in the past typically ventured into such extreme spaces like those of Rising Star. It is clear from the presence of the remains of many individuals that H. naledi ventured into these spaces again and again. It is likely that H. naledi moved through these spaces more easily than humans do based on their physique. We show that the engravings overlay each other suggesting multiple engraving events.  These engravings took time and effort and the only evidence for use of the Dinaledi subsystem by any hominin is by H. naledi. The context leads to the null hypothesis that H. naledi made the marks. In our revision, we will elaborate on this argument to clarify the evidence for our stance on this hypothesis. Several reviewers took issue with the title of the engraving paper as we did not insert a qualifier in front of the suggested date range for the engravings. We deliberately left out qualifying language so that the title took the form of a testable hypothesis rather than a weak assertation. Should future work find the engravings were not produced within this time range, then we will restate this hypothesis.

      Finally, with regards to the engravings we have chosen to report them because they exist. Not reporting the presence of engraved marks on the walls of a cave above hypothesized burials would be tantamount to leaving relevant evidence out of the description of an archeological context. We recognize and state in our manuscript that these markings require substantial further study, including attempts at geochronological dating. But the current evidence is clearly relevant to the archaeological context of the subsystem. We take a similar stance with reporting the presence of the tool shaped artefact near the hand of the H. naledi skeleton in the Hill Antechamber. It is evident that this object requires further study, as we stated in our manuscript, but again omitting it from our study would be leaving out relevant evidence.

      Some have suggested that the null hypothesis should be that all of these observed circumstances are of natural origin. Our team took this approach in our early investigation of the Dinaledi subsystem (Dirks et al. 2015). We adopted the null hypothesis that the geological processes involved in the accumulation of H. naledi skeletal remains were “natural” (e.g., non-naledigenic involvement), and we were able to reject many alternative explanations for the assemblage, including carnivore accumulation, “death trap” accumulation, and fluvial transport of bodies or bones (Dirks et al. 2015). This led us to the hypothesis that H. naledi were involved in bringing the bodies into the spaces where they were found. But we did not hypothesize their involvement in the formation of the deposit itself beyond bringing the bodies to the location.

      This approach seems conservative. It followed the traditional view that small-brained hominins do not engage in cultural practices. But we recognize in hindsight that this null hypothesis approach did harm to our analyses. It impeded us from recognizing within our initial excavations of the puzzle box area and other excavations between 2014 – 2017 that we might be encountering remains that were intrusive in the sedimentary floor of the chamber. If we had approached the accumulation of a large number of hominins from the perspective of the null hypothesis being that the situation was likely cultural, we perhaps would have collected evidence in a slightly different manner. We certainly note that if the Dinaledi system had been full of the remains of modern humans, there would have been little doubt that the null hypothesis would have been that this was a cultural space and not a “natural space”.  We therefore respectfully disagree with the reviewers who continue to support the idea that we should approach hominin excavations with the null hypothesis that they will be natural (specifically non-cultural) in origins. If excavations continue with this mindset we believe that potential cultural evidence is almost certain to be lost.

      There has been a gradient across paleoanthropological excavations, archaeological work, and forensic investigation, with increasing precision of context. The reality is that the recording precision and frame of approach is typically different in most paleontological excavations than in those related to contemporary human remains. If anything comes from the present discussion of whether the Dinaledi system is a burial site for H. naledi or not, we hope that by taking seriously the possibility of deep cultural dynamics of hominins, we will encourage other teams to meet the highest standards of excavation in order to preserve potential cultural evidence. Given H. naledi’s cranial capacity we suggest that even very early hominin skeletal assemblages should be re-examined, if there is sufficient evidence or records available.  These would include examples such as the A.L. 333 Au. afarensis site (the so called First Family site in Hadar Ethiopia), the Dikika infant skeleton, WT 15000 (Turkana Boy) and even A.L. 288 (Lucy) as such unusual taphonomic situations where skeletons are preserved cannot be simply explained away as “natural” in origin, based solely on the cranial capacity and assumed lack of cognitive and cultural complexity of the hominins as emphasized by us in Fuentes et al. (2023). We are not the first to observe that some very early hominin situations may represent early mortuary activity (Pettitt 2013), but we would advocate a step further. We suggest it may be damaging to take “natural accumulation” as the standard null hypothesis for hominin paleoanthropology, and that it is more conservative in practice to engage remains with the null hypothesis of possible cultural formation.

      We are deeply grateful for the time and effort all of the 8 reviewers (across three reviews) have taken with this work.  We also acknowledge the anonymous reviewers from previous submissions who’s opinions and comments will have made the final iterations of these manuscripts better for their efforts. As this process is rather public and includes commentary outside of the eLife forum, we ask that the efforts of all 37 authors and 8 reviewers involved be respected and that the discourse remain professional in all venues as we study this fascinating and quite complex occurrence. We appreciate also the efforts of members of the public who have engaged with this relatively new process where preprints are posted prior to the reviews allowing comments and interactions from colleagues and the public who are normally not part of the internal peer review process.  We believe these interactions will make for better final papers. We feel we have met the standards of demonstrating burials in H. naledi and that the engraving are most likely associated with H. naledi. However, given the reviews we see many areas where our clarity and context, and analyses, were less strong than they can be. With the clarifications and additions taken on board through these review processes the final papers will be stronger and clearer. We, recognize that this is an ongoing process of scientific investigation and further work will allow continued, and possibly better, evaluation of these hypothesis and others.

      Lee R Berger, Agustín Fuentes, John Hawks, Tebogo Makhubela

      Works cited:

      • Aspöck, E. (2008). What Actually is a ‘Deviant Burial’?: Comparing German-Language and Anglophone Research on ‘Deviant Burials.’ In E. M. Murphy (Ed.). Deviant Burial in the Archaeological Record. Oxford: Oxbow Books.  pp 17–34.

      • Bolliger, S.A. & Thali, M.J. (2009). Thanatology. In S.A. Bolliger and M.J. Thali (eds) Virtopsy Approach:  3D Optical and Radiological Scanning and Reconstruction in Forensic Medicine. Boca Raton: CRC Press. pp 187-218.

      • Boulestin, B. & Duday, H. (2005). Ethnologie et archéologie de la mort: de l’illusion des références à l’emploi d’un vocabulaire. In: C. Mordant and G. Depierre (eds) Les Pratiques Funéraires à l’Âge du Bronze en France. Actes de la table ronde de Sens-en-Bourgogne. Paris: Éditions du Comité des Travaux Historiques et Scientifiques. pp. 17–30.

      • Boulestin, B. & Duday, H. (2006). Ethnology and archaeology of death: from the illusion of references to the use of a terminology. Archaeologia Polona 44: 149–169.

      • Bristow, J., Simms, Z. & Randolph-Quinney, P.S. Taphonomy. In S. Black and E. Ferguson (eds.) Forensic Anthropology 2000-2010. Boca Raton, FL: CRC Press. pp 279-318.

      • Channing, J. & Randolph-Quinney, P.S. (2006). Death, decay and reconstruction: the archaeology of Ballykilmore Cemetery, County Westmeath. In J. O’Sullivan and M. Stanley (eds.) Settlement, Industry and Ritual: Archaeology. National Roads Authority Monograph Series No. 3. Dublin: NRA/Four Courts Press. pp 113-126.

      • Cherryson, A. K. (2008). Normal, Deviant and Atypical: Burial Variation in Late Saxon Wessex, c. AD 700–1100. In E. M. Murphy (Ed.). Deviant Burial in the Archaeological Record. Oxford: Oxbow Books. pp 115–130.

      • Connolly, M., F. Coyne & L. G. Lynch (2005). Underworld : Death and Burial in Cloghermore Cave, Co. Kerry. Bray, Co. Wicklow: Wordwell.

      • Darwent, C. M. & R. L. Lyman (2002). Detecting  the postburial fragmentation of carpals, tarsals and phalanges. In M. H. Sorg and W. D. Haglund (eds). Advances in Forensic Taphonomy: Method, Theory and Archeological Perspectives. Boca Raton, FL, CRC Press. pp 355-378.

      • d’Errico, F., & Backwell, L. (2016). Earliest evidence of personal ornaments associated with burial: The Conus shells from Border Cave. Journal of Human Evolution, 93, 91–108.

      • De Villiers. H. (1973). Human skeletal remains from Border Cave, Ingwavuma District, KwaZulu, South Africa. Annals of the Transvaal Museum, 28(13), 229–246.

      • Dell’Unto, N. and Landeschi, G. (2022). Archaeological 3D GIS. London: Routledge.

      • Dibble, H. L., Aldeias, V., Goldberg, P., McPherron, S. P., Sandgathe, D., & Steele, T. E. (2015). A critical look at evidence from La Chapelle-aux-Saints supporting an intentional Neandertal burial. Journal of Archaeological Science, 53, 649–657.

      • Dirkmaat, D. C., & Cabo, L. L. (2016). Forensic archaeology and forensic taphonomy: basic considerations on how to properly process and interpret the outdoor forensic scene_. Academic Forensic Pathology_ 6, 439–454.

      • Dirks, P. H., Berger, L. R., Roberts, E. M., Kramers, J. D., Hawks, J., Randolph-Quinney, P. S., Elliott, M., Musiba, C. M., Churchill, S. E., de Ruiter, D. J., Schmid, P., Backwell, L. R., Belyanin, G. A., Boshoff, P., Hunter, K. L., Feuerriegel, E. M., Gurtov, A., Harrison, J. du G., Hunter, R., … Tucker, S. (2015). Geological and taphonomic context for the new hominin species Homo naledi from the Dinaledi Chamber, South Africa. ELife, 4, e09561.

      • Dirks, P.H.G.M., Berger, L.R., Hawks, J., Randolph-Quinney, P.S., Backwell, L.R., and Roberts, E.M. (2016). Comment on “Deliberate body disposal by hominins in the Dinaledi Chamber, Cradle of Humankind, South Africa?” [J. Hum. Evol. 96 (2016) 145-148]. Journal of Human Evolution 96:  149-153.

      • Dirks, P. H., Roberts, E. M., Hilbert-Wolf, H., Kramers, J. D., Hawks, J., Dosseto, A., Duval, M., Elliott, M., Evans, M., Grün, R., Hellstrom, J., Herries, A. I., Joannes-Boyau, R., Makhubela, T. V., Placzek, C. J., Robbins, J., Spandler, C., Wiersma, J., Woodhead, J., & Berger, L. R. (2017). The age of Homo naledi and associated sediments in the Rising Star Cave, South Africa. ELife, 6, e24231.

      • Donnelly, S., C. Donnelly & E. Murphy (1999). The forgotten dead: The cíllíní and disused burial grounds of Ballintoy, County Antrim. Ulster Journal of Archaeology 58, 109-113.

      • Duday, H. (2005). L’archéothanatologie ou l’archéologie de la mort. In: O. Dutour, J.-J. Hublin and B. Vandermeersch (eds) Objets et Méthodes en Paléoanthropologie. Paris: Comité des Travaux Historiques et Scientifiques. pp. 153–215.

      • Duday, H. (2009). Archaeology of the Dead: Lectures in Archaeothanatology. Oxford: Oxbow Books.

      • Finley, N. (2000). Outside of life: Traditions of infant burial in Ireland from cillin to cist.  World Archaeology 31, 407-422.

      • Gargett, R. H. (1999). Middle Palaeolithic burial is not a dead issue: The view from Qafzeh, Saint-Césaire, Kebara, Amud, and Dederiyeh. Journal of Human Evolution, 37(1), 27–90.

      • Goldberg, P., Aldeias, V., Dibble, H., McPherron, S., Sandgathe, D., & Turq, A. (2017). Testing the Roc de Marsal Neandertal “Burial” with Geoarchaeology. Archaeological and Anthropological Sciences, 9(6), 1005–1015.

      • Gómez-Olivencia, A., & García-Martínez, D. (2019). New postcranial remains from the Roc de Marsal Neandertal child. PALEO. Revue d’archéologie Préhistorique, 30–1, 30–1.

      • Green, E.C. (2022). An archaeothanatological approach to the identification of late Anglo-Saxon burials in wooden containers. In C.J. Knüsel and E.M.J. Schotsmans (eds.) The Routledge Handbook of Archaeothanatology. London: Routledge. pp 436-455.

      • Henderson, J. (1987). Factors determining the state of preservation of human remains. In A. Boddington, A. Garland and R. Janaway (eds). Death, Decay and Reconstruction: Approaches to Archaeology and Forensic Science. Manchester: Manchester University Press. pp 43-54.

      • Hunter, J. R. (2014). Human remains recovery: archaeological and forensic perspectives. In C. Smith (ed). Encyclopedia of Global Archaeology. New York: Springer New York. pp 3549-3556.

      • Hochrein, M. (2002). An Autopsy of the Grave: Recognizing, Collecting and Preserving Forensic Geotaphonomic Evidence. In M. H. Sorg and W. D. Haglund (eds). Advances in Forensic Taphonomy: Method, Theory and Archeological Perspectives. Boca Raton, FL, CRC Press: 45-70.

      • Knüsel, C.K. & Robb, J. (2016). Funerary taphonomy: An overview of goals and methods. Journal of Archaeological Science: Reports 10, 655-673.

      • Kuhn, B.F., Berger, L.R. & Skinner, J.D. (2010). Examining criteria for identifying and differentiating fossil faunal assemblages accumulated by hyenas and hominins using extant hyenid accumulations. International Journal of Osteoarchaeology 20, 15-35.

      • Lyman, R. (1994). Vertebrate Taphonomy. Cambridge, Cambridge University Press.

      • Martinón-Torres, M., d’Errico, F., Santos, E., Álvaro Gallo, A., Amano, N., Archer, W., Armitage, S. J., Arsuaga, J. L., Bermúdez de Castro, J. M., Blinkhorn, J., Crowther, A., Douka, K., Dubernet, S., Faulkner, P., Fernández-Colón, P., Kourampas, N., González García, J., Larreina, D., Le Bourdonnec, F.-X., … Petraglia, M. D. (2021). Earliest known human burial in Africa. Nature, 593(7857), 7857.

      • Mickleburgh, H.L & Wescott, D.J. (2018). Controlled experimental observations on joint disarticulation and bone displacement of a human body in an open pit: implications for funerary archaeology. Journal of Archaeological Science: Reports 20: 158-167.

      • Mickleburgh, H.L., Wescott, D.J., Gluschitz, S. & Klinkenberg, V.M. (2022). Exploring the use of actualistic forensic taphonomy in the study of (forensic) archaeological human burials: An actualistic experimental research programme at the Forensic Anthropology Center at Texas State University (FACTS), San Marcos, Texas. In C.J. Knüsel and E.M.J. Schotsmans (eds.) The Routledge Handbook of Archaeothanatology. London: Routledge. pp 542-562.

      • Owsley, D. & B. Compton (1997). Preservation in late 19th Century iron coffin burials. In W. Haglund and M. Sorg (eds). Forensic Taphonomy: The Postmortem Fate of Human Remains. Boca Raton, FL, CRC Press: 511-526.

      • Parker Pearson, M. (1999). The Archaeology of Death and Burial. College Station: Texas A&M University Press.

      • Pettitt, P. (2013). The Palaeolithic Origins of Human Burial. Routledge.

      • Pomeroy, E., Bennett, P., Hunt, C. O., Reynolds, T., Farr, L., Frouin, M., Holman, J., Lane, R., French, C., & Barker, G. (2020). New Neanderthal remains associated with the ‘flower burial’ at Shanidar Cave. Antiquity, 94(373), 11–26.

      • Randolph-Quinney, P.S. (2013). From the cradle to the grave: the bioarchaeology of Clonfad 3 and Ballykilmore 6. In N. Brady, P. Stevens and J. Channing (eds.). Settlement and Community in the Fir Tulach Kingdom. Dublin: National Roads Authority Press. pp A2.1-48.

      • Randolph-Quinney, P.S., Haines, S. and Kruger, A. (2018). The use of three-dimensional scanning and surface capture methods in recording forensic taphonomic traces: issues of technology, visualisation, and validation. In: W.J. M. Groen and P. M. Barone (eds). Multidisciplinary Approaches to Forensic Archaeology. Berlin: Springer International Publishing, pp. 115-130.

      • Rendu, W., Beauval, C., Crevecoeur, I., Bayle, P., Balzeau, A., Bismuth, T., Bourguignon, L., Delfour, G., Faivre, J.-P., Lacrampe-Cuyaubère, F., Tavormina, C., Todisco, D., Turq, A., & Maureille, B. (2014). Evidence supporting an intentional Neandertal burial at La Chapelle-aux-Saints. Proceedings of the National Academy of Sciences, 111(1), 81–86.

      • Sandgathe, D. M., Dibble, H. L., Goldberg, P., & McPherron, S. P. (2011). The Roc de Marsal Neandertal child: A reassessment of its status as a deliberate burial. Journal of Human Evolution, 61(3), 243–253.

      • Silver, M. (2016). Conservation Techniques in Cultural Heritage. In E. Stylianidis and F. Remondino (eds) 3D Recording, Documentation and Management of Cultural Heritage. Dunbeath: Whittles Publishing. pp 15-106.

      • Schotsmans, E.M.J., Georges-Zimmermann, P., Ueland, M. and Dent, B.B. (2022). From flesh to bone: Building bridges between taphonomy, archaeothanatology and forensic science for a better understanding of mortuary practices. In C.J. Knüsel and E.M.J. Schotsmans (eds.) The Routledge Handbook of Archaeothanatology. London: Routledge. pp 501-541.

    1. Author Response:

      We would like to thank the eLife reviewers for the considerable time and effort they have invested to review these manuscripts. We have also benefited from a previous round of review of the manuscript describing the proposed burial features, which underwent two rounds of revisions in a high-impact journal over a period of approximately 8 months during 2022 and early 2023. Both sets of reviews have reflected mixed responses to the evidence we have presented, with one reviewer recommending acceptance with minor editorial revisions, two recommending acceptance with minor revisions and the fourth recommending rejection based upon similar arguments to those reflected by some of the reviewers in this current round of reviews in eLife. Ultimately the managing editor of this first journal took the decision that the review process could not be completed in a timely manner and rejected the manuscript although the submission here reflected our consideration of these reviewers suggestions.

      We have chosen in this initial response to the eLife reviews to include some references to the previous anonymous reviews in order to illustrate differences of opinion and differences in revision suggestions within the review process. Our goal is to offer maximal insight into our decision-making process and to acknowledge the considerable time and effort put into the assessment of these manuscripts by reviewers (for eLife and in the case of the earlier review process). We hope that this approach will assist the readers, and reviewers, of our manuscripts in understanding why we are proceeding with certain decisions during the revision process.

      This is a new process for us and the reviewers, and one way in which it significantly differs from more traditional review is that both the reviews and our reply will be public well in advance of our revisions to the manuscript. Indeed, considering the scope of the reviews, some of those revisions may take considerable time, although many can be accomplished fairly easily. Thus, we are not in a position to say that we have solved every issue raised by the reviewers. Instead, we will examine what appear to be the key critical issues raised regarding the data and the analyses and how we propose to address these as we revise the papers. We will also address several philosophical and ethical issues raised by the reviews and our proposal for dealing with these. More specific editorial and citational recommendations will be dealt with on a case-by-case basis, and we do not address these point-by-point in this reply. Please note, this response to the reviewers is not the revision of the manuscript and is only the initial opinion of the corresponding authors with some guidance from the larger group of authors of all three papers. Our final submitted revision will reflect the input of all authors included on those submissions.

      We took the decision to submit three separate papers consciously. The two different categories of evidence, burials and engravings, involve different kinds of analysis and different (although overlapping) teams of researchers, and we recognized that each deserved their own presentation and assessment. Meanwhile, together they inform the context of H. naledi in a way that requires some synthetic discussion, in which both kinds of evidence are relevant, leading to a third paper. But the mutual relevance of these different kinds of evidence and their review by a common set of reviewers naturally raises cross-cutting issues, and the reviewers have cross-referenced the three articles. This has sometimes led to suggestions about one manuscript based on the contents of another. Considering the situation, we accepted the recommendation that it would be clearer to consider all three articles in a single reply. Thus, while each of the three papers will proceed separately during the revision process, it will be necessary to highlight across all three papers occasionally in our responses.

      Scientific Issues:

      In reading the reviews, we feel there are 9 critical points/assertions raised by one or more of the reviewers that present a problem for, or challenge to, our hypothesis that the observed evidence (bone accumulations and engravings) described in the Dinaledi subsystem are of intentional naledigenic origin. These are:

      1. The evidence presented does not demonstrate a clear interruption of the floor sediments, thus failing to demonstrate excavated holes.

      2. The sediments infilling the holes where the skeletal remains are found have not been demonstrated to originate from the disruption of the floor sediments and thus could be part of a natural geological process (e.g. water movement, slumping) or carnivore accumulations.

      3. Previous geological interpretations by our research group have given alternative geological explanations for formation of the bony accumulations that contradict the present evidence presented here and result in alternative origins hypotheses.

      4. Burial cannot be effectively assessed without complete excavation of the features and site.

      5. The skeletal remains as presented do not conform clearly to typical body arrangement/positions associated with human (Homo sapiens) burials.

      6. There is no evidence of grave goods or lithic scatters that are typically associated with human burials.

      7. Humans may have been involved with the creation of either the Homo naledi bone accumulations, the engravings, or both.

      8. Without a date of the engravings, the null hypothesis should be the engravings were created by Homo sapiens.

      9. The null hypothesis for explanation of the skeletal remains in this situation should be “natural accumulation”.

      Our analysis of the Dinaledi Feature 1 leads us to accept that the laminated orange-red mudstone (LORM) sedimentary layer is interrupted, indicating a non-natural intervention, and that the hole created by the interruption was then filled by both a fleshed body (and perhaps parts of other bodies) which were then covered by sediment that originated from the hole that was dug. We recognize that the four eLife reviewers are not convinced that our presentation is sufficient to establish this. Interestingly, this was not the universal opinion of earlier reviewers of the initial manuscript several of whom felt we had adequately supported this hypothesis. The lack of clarity in this current version of the burial manuscript is our responsibility. In the upcoming revision of this paper to be submitted, we will take the reviewers’ critiques to heart and add additional figures that illustrate better the disruption of the LORM and clarify the sedimentological data showing the material covering the skeletal remains in the hole are the disrupted sediments excavated from the same hole. We are proposing to isolate this most critical evidence for burial into a separate section in the revised submission based on the reviewers’ comments. The fact that the LORM layer is disrupted, a fleshed body was placed in the hole created by this disruption, and the body (and perhaps parts of other bodies) was/were then covered by the same sediments from the hole is the central feature of our hypothesis that the bone accumulations observed reflect a burial and not a natural process.

      The possibility of fluvial transport or involvement in the subsystem is a topic that we have addressed extensively in past work, and it is clear from these reviews that we must enhance our current manuscript to discuss this issue at greater length. Our previous work (Dirks et al. 2015; Dirks et al. 2017) emphasized that fluvial transport of whole bodies into the subsystem was precluded by several lines of sedimentological evidence. We excavated a rich accumulation of skeletal remains, including articulated limbs and other elements in subvertical orientations inconsistent with slow sedimentary infill, which were difficult to explain without positing either a large and dense pile of bodies and/or sediment movement. We encountered fractured chunks of laminated orange-red mudstone (LORM) in random orientations within our excavation area, within and among skeletal remains, which directly refuted that the remains were inundated with water at the time of burial, and this limited the possibility of fluvial transport. Water flow sufficient to displace bodies or complete skeletal evidence would also transport large and course sediment, which is absent from the subsystem, and would sort the commingled skeletal material that we found by size, which we do not observe. But our excavation only covered less than a square meter at very limited depth, and this was the limit to our knowledge of subsurface sediment. We thus were left with uncertainty that led us to suggest the possibility of sediment slumping or movement into subsurface drains, although these were not observed near our excavation. Our current work expands our knowledge of the subsurface and presents an alternative explanation for the disposition of skeletal remains from our earlier excavation. But we acknowledge that this new explanation is vulnerable to our own previous published proposals, and we must do a better job of explaining how the new information addresses our previous suggestions. By not clearly creating a section where we explained how these previous hypotheses were now nullified by new evidence, we clearly confused the reviewers with our own previous work. We will revise the manuscript by enhancing the review of the significant geological evidence demonstrating that there is no significant fluvial action in the system and making it clear how the burial hypothesis provides a clearer explanation for the situation of skeletal remains from our previous excavation work.

      One of the central issues raised by reviewers has been a perceived need to excavate these features completely, totally exhuming all skeletal remains from them. Reviewers have written that it is necessary to identify every skeletal element that is present and account for any missing elements. On this point, we have both ethical and scientific differences from these reviewers. We express our ethical concerns first. Many of the best-preserved possible burials ever discovered by archaeologists were subjected to total excavation and exhumation. Cases like La Chapelle-aux-Saints, La Ferrassie, and Skhūl were fully excavated at a time when data recording and excavation methods did not include the range of spatial and geomorphological approaches that later became routine. The judgment of early investigators that these situations were intentional burials was challenged by later workers, and the kind of information that might enable better tests had been irrevocably lost (Gargett 1999; Dibble et al. 2015; Rendu et al. 2014).

      Later, improved excavation standards have not sufficed to remove uncertainty or debate about possible burials. For example, it was long presumed that well-preserved remains of young children were by themselves diagnostic of intentional burial, such as those from Dederiyeh, Border Cave, or Roc de Marsal. Such cases were also fully excavated, with adequate documentation of the positioning of skeletal remains and their surrounding stratigraphic situation, but such cases were later challenged on several bases and the complete exhumation of material has confused or precluded testing of new hypotheses (e.g. Gargett 1999). The case of Roc de Marsal is one in which data from the initial excavation combined with data from the initial excavation combined with re-excavation and geoarchaeological analysis led to a naturalistic interpretation of the skeletal material (Sandgathe et al. 2011; Goldberg et al. 2017). But even in this case, the researchers erred in their interpretation of the skeleton’s situation due to a lack of identification of parts of the infant’s skeleton (Gómez-Olivencia and García-Martinez 2019). That is to say, it is not only the burial hypothesis but other hypotheses that suffer from complete excavation. Researchers concerned with preserving all possible information have sometimes taken extraordinary measures to remove and study possible burials at high-resolution in the laboratory. Such was the case of the Shanidar IV burial removed from the site and transported in plaster jacket by Solecki, which led to the disruption and loss of internal stratigraphic information (Pomeroy et al. 2020). Arguably, the current state of the art is full excavation with partial preparation, such as that undertaken at Panga ya Saidi (Martinón-Torres et al. 2021). But again, any future attempt to reinterpret or test the hypothesis of burial must rely on the adequacy of documentation as the original context has been removed.

      In our decision to leave material in place as much as possible, we are expanding upon standard practice to leave witness sections and unexcavated areas for future research. The situation is novel, representing possible burials by a nonhuman species, and that makes it doubly important in our opinion to be conservative in not fully exhuming the skeletal material from its context. We anticipate that many other researchers, including future investigators, will suggest additional methods to further test the hypothesis of burial, something that would be impossible if we had excavated the features in their entirety prior to publishing a description of our work. We believe strongly that our ethical responsibility is to publish the work and the most likely interpretation while leaving as much evidence in place as possible to enable further testing and replication. We welcome the suggestions of additional methods/analyses to test the H. naledi burial hypothesis.

      This being said, we also observe that total exhumation would not resolve the concerns raised by the reviewers. The recommendation of total exhumation is in pursuit of a full account of all skeletal material present and its preservation and spatial situation, in order to demonstrate that they conform to body positions comparable to human burials. As has been highlighted in forensic casework, the excavation of an inhumation feature does not necessarily provide an accurate spatial or anatomical manifest of the stratigraphical relationships between the body, encapsulating matrix, and any cut present due to preservational, taphonomic and operational factors (Dirkmaat and Cabo, 2016; Hunter, 2014). In particular, in cases where skeletal elements are highly fragmented, friable, or degraded (such as through bioerosion) then complete excavation—even under controlled laboratory conditions—may destroy bone and severely limit skeletal identification (Henderson, 1997; Hochrein, 2002; Owsley and Compton, 1997), particularly in elements where the ratio of trabecular to cortical bone is high (Darwent and Lyman, 2002; Lyman, 1994). As such, non-invasive methods of 3D and 4D modelling (preservation in situ) are often considered preferable to complete necropsy or excavation (preservation by record) where appropriate (Bolliger and Thali, 2009; Dell’Unto and Landeschi, 2022; Randolph-Quinney et al., 2018; Silver, 2016). 

      The test of burial is not primarily positional, but taphonomic and geological. The position and number of bones can elaborate on process-driven questions of decay and destruction in the burial environment, or post-mortem modification, but are not singularly indicative of whether the remains were intentionally buried – the post-mortem narrative of all the processes affecting the cadaveric island is required (Knüsel and Robb, 2016). In previous cases, researchers have disputed or accepted the hypothesis of intentional hominin burial based upon assumptions about how modern humans or Neandertals would have positioned bodies, with the idea that some positions reflect ritual intent while others do not. But applying such assumptions is unjustifiable, particularly for a species like H. naledi, whose culture may have differed fundamentally from our own. Our work acknowledges that the present evidence does not enable a full reconstruction of the burial positions, but it does show that fleshed remains were encased in sediment prior to decomposition of soft tissue, and that subsequent spatial changes can be most parsimoniously explained by natural decomposition within sedimentary matrix contained within a burial feature (after Green, 2022; Mickleburgh and Wescott, 2018; Mickleburgh et al., 2022). If the argument is that extraordinary claims require extraordinary evidence, we feel that the evidence documents excavation and interment (and will do so more clearly in the revision) and the fact of the remains do not match a “typical” human burial in body positioning is not in itself evidence that these are not H. naledi burials.

      We feel that the reviewers (in keeping with many palaeoanthropologists) have a clear idea of what they “think” a burial should look like in an idealised sense, but this platonic ideal of burial form is not matched by the extensive literature in archaeothanatology, funerary archaeology and forensic science which indicates enormous variability in the activity, morphology and post-mortem system experienced by the human body in cases of interment and body disposal (e.g. Aspöck, 2008; Boulestin and Duday, 2005 and 2006; Connelly et al., 2005; Channing and Randolph-Quinney, 2006; Cherryson, 2008; Donnelly et al., 1995; Finley, 2000; Hunter, 2014; Parker Pearson, 1999; Randolph-Quinney, 2013). Decades of experience in the identification, recovery and interpretation of clandestine, deviant, and non-formal burials indicates the platonic ideal is rare, and in many contexts, the exception (Cherryson, 2008; Parker Pearson, 1999). This variability is particularly relevant to morphological traits in burial context, such as the informal nature of the grave cut in plan and section, shallow burial depth, and initial disposition of body (placement) during the early post-mortem period. These might run counter to the expectations of reviewers or others referencing the fossil hominin record, but are well accepted within the communities of researchers investigating Holocene archaeological sites and forensic contexts.

      It is encouraging to see reviewers beginning to incorporate the extensive (often experimentally derived) literature from archaeothanatology and forensic taphonomy in their deliberations, and we will be taking these comments on board going forward. In particular, we acknowledge reviewers’ comments and the need to construct a more detailed post-mortem narrative, accounting for joint disarticulation (labile versus persistent joints etc), displacement, and final disposition of elements within the burial space. As such we will incorporate the hierarchy of decomposition (rank order disarticulation), associations between regions of anatomical association, areas of disassociation, and the voids produced during decomposition (after Mickleburgh and Wescott, 2018; Mickleburgh et al., 2022) into our narrative. In doing so we acknowledge the tensions between the inductive archaeolothanatological narrative-driven approach (e.g. Duday, 2005 & 2009) versus robust decomposition data derived from human forensic taphonomic experimentation recently articulated by Schotsmans and colleagues (2022) - noting that we will highlight comparative data based on forensic experimental casework and actualistic modelling over inductive intuitive approaches which come with significant evidential shortcomings (Bristow et al. 2011).

      Finally, from a taphonomic perspective it is worth pointing out to reviewers that we have already addressed the issue of lack of taphonomic evidence for carnivore involvement in the formation of the Dinaledi assemblage (Dirks, et al., 2016). Absence of any carnivore-induced bone surface modifications, patterns of skeletal part representation, and a total absence of any carnivore remains found within the Dinaledi chamber (following Kuhn and colleagues, 2010) lead us to reject carnivores as possible vectors of body accumulation within the Dinaledi Chamber and Hill Antechamber.

      Reviewers suggest that without a date derived from geochronological methods, the engravings cannot be associated with H. naledi, and that it is possible (or probable) that the engravings were done in the recent past by H. sapiens. This suggestion neglects the context of the site. We have previously documented the structure and extremely limited accessibility of the Dinaledi subsystem. This subsystem was not recorded on maps of the documented Rising Star Cave system prior to our work and its discovery by our teams. Furthermore, there is no evidence of prehistoric human activity in the areas of the cave related to possible subterranean entrances There is no evidence that humans in the past typically ventured into such extreme spaces like those of Rising Star. It is clear from the presence of the remains of many individuals that H. naledi ventured into these spaces again and again. It is likely that H. naledi moved through these spaces more easily than humans do based on their physique. We show that the engravings overlay each other suggesting multiple engraving events.  These engravings took time and effort and the only evidence for use of the Dinaledi subsystem by any hominin is by H. naledi. The context leads to the null hypothesis that H. naledi made the marks. In our revision, we will elaborate on this argument to clarify the evidence for our stance on this hypothesis. Several reviewers took issue with the title of the engraving paper as we did not insert a qualifier in front of the suggested date range for the engravings. We deliberately left out qualifying language so that the title took the form of a testable hypothesis rather than a weak assertation. Should future work find the engravings were not produced within this time range, then we will restate this hypothesis.

      Finally, with regards to the engravings we have chosen to report them because they exist. Not reporting the presence of engraved marks on the walls of a cave above hypothesized burials would be tantamount to leaving relevant evidence out of the description of an archeological context. We recognize and state in our manuscript that these markings require substantial further study, including attempts at geochronological dating. But the current evidence is clearly relevant to the archaeological context of the subsystem. We take a similar stance with reporting the presence of the tool shaped artefact near the hand of the H. naledi skeleton in the Hill Antechamber. It is evident that this object requires further study, as we stated in our manuscript, but again omitting it from our study would be leaving out relevant evidence.

      Some have suggested that the null hypothesis should be that all of these observed circumstances are of natural origin. Our team took this approach in our early investigation of the Dinaledi subsystem (Dirks et al. 2015). We adopted the null hypothesis that the geological processes involved in the accumulation of H. naledi skeletal remains were “natural” (e.g., non-naledigenic involvement), and we were able to reject many alternative explanations for the assemblage, including carnivore accumulation, “death trap” accumulation, and fluvial transport of bodies or bones (Dirks et al. 2015). This led us to the hypothesis that H. naledi were involved in bringing the bodies into the spaces where they were found. But we did not hypothesize their involvement in the formation of the deposit itself beyond bringing the bodies to the location.

      This approach seems conservative. It followed the traditional view that small-brained hominins do not engage in cultural practices. But we recognize in hindsight that this null hypothesis approach did harm to our analyses. It impeded us from recognizing within our initial excavations of the puzzle box area and other excavations between 2014 – 2017 that we might be encountering remains that were intrusive in the sedimentary floor of the chamber. If we had approached the accumulation of a large number of hominins from the perspective of the null hypothesis being that the situation was likely cultural, we perhaps would have collected evidence in a slightly different manner. We certainly note that if the Dinaledi system had been full of the remains of modern humans, there would have been little doubt that the null hypothesis would have been that this was a cultural space and not a “natural space”.  We therefore respectfully disagree with the reviewers who continue to support the idea that we should approach hominin excavations with the null hypothesis that they will be natural (specifically non-cultural) in origins. If excavations continue with this mindset we believe that potential cultural evidence is almost certain to be lost.

      There has been a gradient across paleoanthropological excavations, archaeological work, and forensic investigation, with increasing precision of context. The reality is that the recording precision and frame of approach is typically different in most paleontological excavations than in those related to contemporary human remains. If anything comes from the present discussion of whether the Dinaledi system is a burial site for H. naledi or not, we hope that by taking seriously the possibility of deep cultural dynamics of hominins, we will encourage other teams to meet the highest standards of excavation in order to preserve potential cultural evidence. Given H. naledi’s cranial capacity we suggest that even very early hominin skeletal assemblages should be re-examined, if there is sufficient evidence or records available.  These would include examples such as the A.L. 333 Au. afarensis site (the so called First Family site in Hadar Ethiopia), the Dikika infant skeleton, WT 15000 (Turkana Boy) and even A.L. 288 (Lucy) as such unusual taphonomic situations where skeletons are preserved cannot be simply explained away as “natural” in origin, based solely on the cranial capacity and assumed lack of cognitive and cultural complexity of the hominins as emphasized by us in Fuentes et al. (2023). We are not the first to observe that some very early hominin situations may represent early mortuary activity (Pettitt 2013), but we would advocate a step further. We suggest it may be damaging to take “natural accumulation” as the standard null hypothesis for hominin paleoanthropology, and that it is more conservative in practice to engage remains with the null hypothesis of possible cultural formation.

      We are deeply grateful for the time and effort all of the 8 reviewers (across three reviews) have taken with this work.  We also acknowledge the anonymous reviewers from previous submissions who’s opinions and comments will have made the final iterations of these manuscripts better for their efforts. As this process is rather public and includes commentary outside of the eLife forum, we ask that the efforts of all 37 authors and 8 reviewers involved be respected and that the discourse remain professional in all venues as we study this fascinating and quite complex occurrence. We appreciate also the efforts of members of the public who have engaged with this relatively new process where preprints are posted prior to the reviews allowing comments and interactions from colleagues and the public who are normally not part of the internal peer review process.  We believe these interactions will make for better final papers. We feel we have met the standards of demonstrating burials in H. naledi and that the engraving are most likely associated with H. naledi. However, given the reviews we see many areas where our clarity and context, and analyses, were less strong than they can be. With the clarifications and additions taken on board through these review processes the final papers will be stronger and clearer. We, recognize that this is an ongoing process of scientific investigation and further work will allow continued, and possibly better, evaluation of these hypothesis and others.

      Lee R Berger, Agustín Fuentes, John Hawks, Tebogo Makhubela

      Works cited:

      • Aspöck, E. (2008). What Actually is a ‘Deviant Burial’?: Comparing German-Language and Anglophone Research on ‘Deviant Burials.’ In E. M. Murphy (Ed.). Deviant Burial in the Archaeological Record. Oxford: Oxbow Books.  pp 17–34.

      • Bolliger, S.A. & Thali, M.J. (2009). Thanatology. In S.A. Bolliger and M.J. Thali (eds) Virtopsy Approach:  3D Optical and Radiological Scanning and Reconstruction in Forensic Medicine. Boca Raton: CRC Press. pp 187-218.

      • Boulestin, B. & Duday, H. (2005). Ethnologie et archéologie de la mort: de l’illusion des références à l’emploi d’un vocabulaire. In: C. Mordant and G. Depierre (eds) Les Pratiques Funéraires à l’Âge du Bronze en France. Actes de la table ronde de Sens-en-Bourgogne. Paris: Éditions du Comité des Travaux Historiques et Scientifiques. pp. 17–30.

      • Boulestin, B. & Duday, H. (2006). Ethnology and archaeology of death: from the illusion of references to the use of a terminology. Archaeologia Polona 44: 149–169.

      • Bristow, J., Simms, Z. & Randolph-Quinney, P.S. Taphonomy. In S. Black and E. Ferguson (eds.) Forensic Anthropology 2000-2010. Boca Raton, FL: CRC Press. pp 279-318.

      • Channing, J. & Randolph-Quinney, P.S. (2006). Death, decay and reconstruction: the archaeology of Ballykilmore Cemetery, County Westmeath. In J. O’Sullivan and M. Stanley (eds.) Settlement, Industry and Ritual: Archaeology. National Roads Authority Monograph Series No. 3. Dublin: NRA/Four Courts Press. pp 113-126.

      • Cherryson, A. K. (2008). Normal, Deviant and Atypical: Burial Variation in Late Saxon Wessex, c. AD 700–1100. In E. M. Murphy (Ed.). Deviant Burial in the Archaeological Record. Oxford: Oxbow Books. pp 115–130.

      • Connolly, M., F. Coyne & L. G. Lynch (2005). Underworld : Death and Burial in Cloghermore Cave, Co. Kerry. Bray, Co. Wicklow: Wordwell.

      • Darwent, C. M. & R. L. Lyman (2002). Detecting  the postburial fragmentation of carpals, tarsals and phalanges. In M. H. Sorg and W. D. Haglund (eds). Advances in Forensic Taphonomy: Method, Theory and Archeological Perspectives. Boca Raton, FL, CRC Press. pp 355-378.

      • d’Errico, F., & Backwell, L. (2016). Earliest evidence of personal ornaments associated with burial: The Conus shells from Border Cave. Journal of Human Evolution, 93, 91–108.

      • De Villiers. H. (1973). Human skeletal remains from Border Cave, Ingwavuma District, KwaZulu, South Africa. Annals of the Transvaal Museum, 28(13), 229–246.

      • Dell’Unto, N. and Landeschi, G. (2022). Archaeological 3D GIS. London: Routledge.

      • Dibble, H. L., Aldeias, V., Goldberg, P., McPherron, S. P., Sandgathe, D., & Steele, T. E. (2015). A critical look at evidence from La Chapelle-aux-Saints supporting an intentional Neandertal burial. Journal of Archaeological Science, 53, 649–657.

      • Dirkmaat, D. C., & Cabo, L. L. (2016). Forensic archaeology and forensic taphonomy: basic considerations on how to properly process and interpret the outdoor forensic scene_. Academic Forensic Pathology_ 6, 439–454.

      • Dirks, P. H., Berger, L. R., Roberts, E. M., Kramers, J. D., Hawks, J., Randolph-Quinney, P. S., Elliott, M., Musiba, C. M., Churchill, S. E., de Ruiter, D. J., Schmid, P., Backwell, L. R., Belyanin, G. A., Boshoff, P., Hunter, K. L., Feuerriegel, E. M., Gurtov, A., Harrison, J. du G., Hunter, R., … Tucker, S. (2015). Geological and taphonomic context for the new hominin species Homo naledi from the Dinaledi Chamber, South Africa. ELife, 4, e09561.

      • Dirks, P.H.G.M., Berger, L.R., Hawks, J., Randolph-Quinney, P.S., Backwell, L.R., and Roberts, E.M. (2016). Comment on “Deliberate body disposal by hominins in the Dinaledi Chamber, Cradle of Humankind, South Africa?” [J. Hum. Evol. 96 (2016) 145-148]. Journal of Human Evolution 96:  149-153.

      • Dirks, P. H., Roberts, E. M., Hilbert-Wolf, H., Kramers, J. D., Hawks, J., Dosseto, A., Duval, M., Elliott, M., Evans, M., Grün, R., Hellstrom, J., Herries, A. I., Joannes-Boyau, R., Makhubela, T. V., Placzek, C. J., Robbins, J., Spandler, C., Wiersma, J., Woodhead, J., & Berger, L. R. (2017). The age of Homo naledi and associated sediments in the Rising Star Cave, South Africa. ELife, 6, e24231.

      • Donnelly, S., C. Donnelly & E. Murphy (1999). The forgotten dead: The cíllíní and disused burial grounds of Ballintoy, County Antrim. Ulster Journal of Archaeology 58, 109-113.

      • Duday, H. (2005). L’archéothanatologie ou l’archéologie de la mort. In: O. Dutour, J.-J. Hublin and B. Vandermeersch (eds) Objets et Méthodes en Paléoanthropologie. Paris: Comité des Travaux Historiques et Scientifiques. pp. 153–215.

      • Duday, H. (2009). Archaeology of the Dead: Lectures in Archaeothanatology. Oxford: Oxbow Books.

      • Finley, N. (2000). Outside of life: Traditions of infant burial in Ireland from cillin to cist.  World Archaeology 31, 407-422.

      • Gargett, R. H. (1999). Middle Palaeolithic burial is not a dead issue: The view from Qafzeh, Saint-Césaire, Kebara, Amud, and Dederiyeh. Journal of Human Evolution, 37(1), 27–90.

      • Goldberg, P., Aldeias, V., Dibble, H., McPherron, S., Sandgathe, D., & Turq, A. (2017). Testing the Roc de Marsal Neandertal “Burial” with Geoarchaeology. Archaeological and Anthropological Sciences, 9(6), 1005–1015.

      • Gómez-Olivencia, A., & García-Martínez, D. (2019). New postcranial remains from the Roc de Marsal Neandertal child. PALEO. Revue d’archéologie Préhistorique, 30–1, 30–1.

      • Green, E.C. (2022). An archaeothanatological approach to the identification of late Anglo-Saxon burials in wooden containers. In C.J. Knüsel and E.M.J. Schotsmans (eds.) The Routledge Handbook of Archaeothanatology. London: Routledge. pp 436-455.

      • Henderson, J. (1987). Factors determining the state of preservation of human remains. In A. Boddington, A. Garland and R. Janaway (eds). Death, Decay and Reconstruction: Approaches to Archaeology and Forensic Science. Manchester: Manchester University Press. pp 43-54.

      • Hunter, J. R. (2014). Human remains recovery: archaeological and forensic perspectives. In C. Smith (ed). Encyclopedia of Global Archaeology. New York: Springer New York. pp 3549-3556.

      • Hochrein, M. (2002). An Autopsy of the Grave: Recognizing, Collecting and Preserving Forensic Geotaphonomic Evidence. In M. H. Sorg and W. D. Haglund (eds). Advances in Forensic Taphonomy: Method, Theory and Archeological Perspectives. Boca Raton, FL, CRC Press: 45-70.

      • Knüsel, C.K. & Robb, J. (2016). Funerary taphonomy: An overview of goals and methods. Journal of Archaeological Science: Reports 10, 655-673.

      • Kuhn, B.F., Berger, L.R. & Skinner, J.D. (2010). Examining criteria for identifying and differentiating fossil faunal assemblages accumulated by hyenas and hominins using extant hyenid accumulations. International Journal of Osteoarchaeology 20, 15-35.

      • Lyman, R. (1994). Vertebrate Taphonomy. Cambridge, Cambridge University Press.

      • Martinón-Torres, M., d’Errico, F., Santos, E., Álvaro Gallo, A., Amano, N., Archer, W., Armitage, S. J., Arsuaga, J. L., Bermúdez de Castro, J. M., Blinkhorn, J., Crowther, A., Douka, K., Dubernet, S., Faulkner, P., Fernández-Colón, P., Kourampas, N., González García, J., Larreina, D., Le Bourdonnec, F.-X., … Petraglia, M. D. (2021). Earliest known human burial in Africa. Nature, 593(7857), 7857.

      • Mickleburgh, H.L & Wescott, D.J. (2018). Controlled experimental observations on joint disarticulation and bone displacement of a human body in an open pit: implications for funerary archaeology. Journal of Archaeological Science: Reports 20: 158-167.

      • Mickleburgh, H.L., Wescott, D.J., Gluschitz, S. & Klinkenberg, V.M. (2022). Exploring the use of actualistic forensic taphonomy in the study of (forensic) archaeological human burials: An actualistic experimental research programme at the Forensic Anthropology Center at Texas State University (FACTS), San Marcos, Texas. In C.J. Knüsel and E.M.J. Schotsmans (eds.) The Routledge Handbook of Archaeothanatology. London: Routledge. pp 542-562.

      • Owsley, D. & B. Compton (1997). Preservation in late 19th Century iron coffin burials. In W. Haglund and M. Sorg (eds). Forensic Taphonomy: The Postmortem Fate of Human Remains. Boca Raton, FL, CRC Press: 511-526.

      • Parker Pearson, M. (1999). The Archaeology of Death and Burial. College Station: Texas A&M University Press.

      • Pettitt, P. (2013). The Palaeolithic Origins of Human Burial. Routledge.

      • Pomeroy, E., Bennett, P., Hunt, C. O., Reynolds, T., Farr, L., Frouin, M., Holman, J., Lane, R., French, C., & Barker, G. (2020). New Neanderthal remains associated with the ‘flower burial’ at Shanidar Cave. Antiquity, 94(373), 11–26.

      • Randolph-Quinney, P.S. (2013). From the cradle to the grave: the bioarchaeology of Clonfad 3 and Ballykilmore 6. In N. Brady, P. Stevens and J. Channing (eds.). Settlement and Community in the Fir Tulach Kingdom. Dublin: National Roads Authority Press. pp A2.1-48.

      • Randolph-Quinney, P.S., Haines, S. and Kruger, A. (2018). The use of three-dimensional scanning and surface capture methods in recording forensic taphonomic traces: issues of technology, visualisation, and validation. In: W.J. M. Groen and P. M. Barone (eds). Multidisciplinary Approaches to Forensic Archaeology. Berlin: Springer International Publishing, pp. 115-130.

      • Rendu, W., Beauval, C., Crevecoeur, I., Bayle, P., Balzeau, A., Bismuth, T., Bourguignon, L., Delfour, G., Faivre, J.-P., Lacrampe-Cuyaubère, F., Tavormina, C., Todisco, D., Turq, A., & Maureille, B. (2014). Evidence supporting an intentional Neandertal burial at La Chapelle-aux-Saints. Proceedings of the National Academy of Sciences, 111(1), 81–86.

      • Sandgathe, D. M., Dibble, H. L., Goldberg, P., & McPherron, S. P. (2011). The Roc de Marsal Neandertal child: A reassessment of its status as a deliberate burial. Journal of Human Evolution, 61(3), 243–253.

      • Silver, M. (2016). Conservation Techniques in Cultural Heritage. In E. Stylianidis and F. Remondino (eds) 3D Recording, Documentation and Management of Cultural Heritage. Dunbeath: Whittles Publishing. pp 15-106.

      • Schotsmans, E.M.J., Georges-Zimmermann, P., Ueland, M. and Dent, B.B. (2022). From flesh to bone: Building bridges between taphonomy, archaeothanatology and forensic science for a better understanding of mortuary practices. In C.J. Knüsel and E.M.J. Schotsmans (eds.) The Routledge Handbook of Archaeothanatology. London: Routledge. pp 501-541.

    1. Author Response:

      We would like to thank the eLife reviewers for the considerable time and effort they have invested to review these manuscripts. We have also benefited from a previous round of review of the manuscript describing the proposed burial features, which underwent two rounds of revisions in a high-impact journal over a period of approximately 8 months during 2022 and early 2023. Both sets of reviews have reflected mixed responses to the evidence we have presented, with one reviewer recommending acceptance with minor editorial revisions, two recommending acceptance with minor revisions and the fourth recommending rejection based upon similar arguments to those reflected by some of the reviewers in this current round of reviews in eLife. Ultimately the managing editor of this first journal took the decision that the review process could not be completed in a timely manner and rejected the manuscript although the submission here reflected our consideration of these reviewers suggestions.

      We have chosen in this initial response to the eLife reviews to include some references to the previous anonymous reviews in order to illustrate differences of opinion and differences in revision suggestions within the review process. Our goal is to offer maximal insight into our decision-making process and to acknowledge the considerable time and effort put into the assessment of these manuscripts by reviewers (for eLife and in the case of the earlier review process). We hope that this approach will assist the readers, and reviewers, of our manuscripts in understanding why we are proceeding with certain decisions during the revision process.

      This is a new process for us and the reviewers, and one way in which it significantly differs from more traditional review is that both the reviews and our reply will be public well in advance of our revisions to the manuscript. Indeed, considering the scope of the reviews, some of those revisions may take considerable time, although many can be accomplished fairly easily. Thus, we are not in a position to say that we have solved every issue raised by the reviewers. Instead, we will examine what appear to be the key critical issues raised regarding the data and the analyses and how we propose to address these as we revise the papers. We will also address several philosophical and ethical issues raised by the reviews and our proposal for dealing with these. More specific editorial and citational recommendations will be dealt with on a case-by-case basis, and we do not address these point-by-point in this reply. Please note, this response to the reviewers is not the revision of the manuscript and is only the initial opinion of the corresponding authors with some guidance from the larger group of authors of all three papers. Our final submitted revision will reflect the input of all authors included on those submissions.

      We took the decision to submit three separate papers consciously. The two different categories of evidence, burials and engravings, involve different kinds of analysis and different (although overlapping) teams of researchers, and we recognized that each deserved their own presentation and assessment. Meanwhile, together they inform the context of H. naledi in a way that requires some synthetic discussion, in which both kinds of evidence are relevant, leading to a third paper. But the mutual relevance of these different kinds of evidence and their review by a common set of reviewers naturally raises cross-cutting issues, and the reviewers have cross-referenced the three articles. This has sometimes led to suggestions about one manuscript based on the contents of another. Considering the situation, we accepted the recommendation that it would be clearer to consider all three articles in a single reply. Thus, while each of the three papers will proceed separately during the revision process, it will be necessary to highlight across all three papers occasionally in our responses.

      Scientific Issues:

      In reading the reviews, we feel there are 9 critical points/assertions raised by one or more of the reviewers that present a problem for, or challenge to, our hypothesis that the observed evidence (bone accumulations and engravings) described in the Dinaledi subsystem are of intentional naledigenic origin. These are:

      1. The evidence presented does not demonstrate a clear interruption of the floor sediments, thus failing to demonstrate excavated holes.

      2. The sediments infilling the holes where the skeletal remains are found have not been demonstrated to originate from the disruption of the floor sediments and thus could be part of a natural geological process (e.g. water movement, slumping) or carnivore accumulations.

      3. Previous geological interpretations by our research group have given alternative geological explanations for formation of the bony accumulations that contradict the present evidence presented here and result in alternative origins hypotheses.

      4. Burial cannot be effectively assessed without complete excavation of the features and site.

      5. The skeletal remains as presented do not conform clearly to typical body arrangement/positions associated with human (Homo sapiens) burials.

      6. There is no evidence of grave goods or lithic scatters that are typically associated with human burials.

      7. Humans may have been involved with the creation of either the Homo naledi bone accumulations, the engravings, or both.

      8. Without a date of the engravings, the null hypothesis should be the engravings were created by Homo sapiens.

      9. The null hypothesis for explanation of the skeletal remains in this situation should be “natural accumulation”.

      Our analysis of the Dinaledi Feature 1 leads us to accept that the laminated orange-red mudstone (LORM) sedimentary layer is interrupted, indicating a non-natural intervention, and that the hole created by the interruption was then filled by both a fleshed body (and perhaps parts of other bodies) which were then covered by sediment that originated from the hole that was dug. We recognize that the four eLife reviewers are not convinced that our presentation is sufficient to establish this. Interestingly, this was not the universal opinion of earlier reviewers of the initial manuscript several of whom felt we had adequately supported this hypothesis. The lack of clarity in this current version of the burial manuscript is our responsibility. In the upcoming revision of this paper to be submitted, we will take the reviewers’ critiques to heart and add additional figures that illustrate better the disruption of the LORM and clarify the sedimentological data showing the material covering the skeletal remains in the hole are the disrupted sediments excavated from the same hole. We are proposing to isolate this most critical evidence for burial into a separate section in the revised submission based on the reviewers’ comments. The fact that the LORM layer is disrupted, a fleshed body was placed in the hole created by this disruption, and the body (and perhaps parts of other bodies) was/were then covered by the same sediments from the hole is the central feature of our hypothesis that the bone accumulations observed reflect a burial and not a natural process.

      The possibility of fluvial transport or involvement in the subsystem is a topic that we have addressed extensively in past work, and it is clear from these reviews that we must enhance our current manuscript to discuss this issue at greater length. Our previous work (Dirks et al. 2015; Dirks et al. 2017) emphasized that fluvial transport of whole bodies into the subsystem was precluded by several lines of sedimentological evidence. We excavated a rich accumulation of skeletal remains, including articulated limbs and other elements in subvertical orientations inconsistent with slow sedimentary infill, which were difficult to explain without positing either a large and dense pile of bodies and/or sediment movement. We encountered fractured chunks of laminated orange-red mudstone (LORM) in random orientations within our excavation area, within and among skeletal remains, which directly refuted that the remains were inundated with water at the time of burial, and this limited the possibility of fluvial transport. Water flow sufficient to displace bodies or complete skeletal evidence would also transport large and course sediment, which is absent from the subsystem, and would sort the commingled skeletal material that we found by size, which we do not observe. But our excavation only covered less than a square meter at very limited depth, and this was the limit to our knowledge of subsurface sediment. We thus were left with uncertainty that led us to suggest the possibility of sediment slumping or movement into subsurface drains, although these were not observed near our excavation. Our current work expands our knowledge of the subsurface and presents an alternative explanation for the disposition of skeletal remains from our earlier excavation. But we acknowledge that this new explanation is vulnerable to our own previous published proposals, and we must do a better job of explaining how the new information addresses our previous suggestions. By not clearly creating a section where we explained how these previous hypotheses were now nullified by new evidence, we clearly confused the reviewers with our own previous work. We will revise the manuscript by enhancing the review of the significant geological evidence demonstrating that there is no significant fluvial action in the system and making it clear how the burial hypothesis provides a clearer explanation for the situation of skeletal remains from our previous excavation work.

      One of the central issues raised by reviewers has been a perceived need to excavate these features completely, totally exhuming all skeletal remains from them. Reviewers have written that it is necessary to identify every skeletal element that is present and account for any missing elements. On this point, we have both ethical and scientific differences from these reviewers. We express our ethical concerns first. Many of the best-preserved possible burials ever discovered by archaeologists were subjected to total excavation and exhumation. Cases like La Chapelle-aux-Saints, La Ferrassie, and Skhūl were fully excavated at a time when data recording and excavation methods did not include the range of spatial and geomorphological approaches that later became routine. The judgment of early investigators that these situations were intentional burials was challenged by later workers, and the kind of information that might enable better tests had been irrevocably lost (Gargett 1999; Dibble et al. 2015; Rendu et al. 2014).

      Later, improved excavation standards have not sufficed to remove uncertainty or debate about possible burials. For example, it was long presumed that well-preserved remains of young children were by themselves diagnostic of intentional burial, such as those from Dederiyeh, Border Cave, or Roc de Marsal. Such cases were also fully excavated, with adequate documentation of the positioning of skeletal remains and their surrounding stratigraphic situation, but such cases were later challenged on several bases and the complete exhumation of material has confused or precluded testing of new hypotheses (e.g. Gargett 1999). The case of Roc de Marsal is one in which data from the initial excavation combined with data from the initial excavation combined with re-excavation and geoarchaeological analysis led to a naturalistic interpretation of the skeletal material (Sandgathe et al. 2011; Goldberg et al. 2017). But even in this case, the researchers erred in their interpretation of the skeleton’s situation due to a lack of identification of parts of the infant’s skeleton (Gómez-Olivencia and García-Martinez 2019). That is to say, it is not only the burial hypothesis but other hypotheses that suffer from complete excavation. Researchers concerned with preserving all possible information have sometimes taken extraordinary measures to remove and study possible burials at high-resolution in the laboratory. Such was the case of the Shanidar IV burial removed from the site and transported in plaster jacket by Solecki, which led to the disruption and loss of internal stratigraphic information (Pomeroy et al. 2020). Arguably, the current state of the art is full excavation with partial preparation, such as that undertaken at Panga ya Saidi (Martinón-Torres et al. 2021). But again, any future attempt to reinterpret or test the hypothesis of burial must rely on the adequacy of documentation as the original context has been removed.

      In our decision to leave material in place as much as possible, we are expanding upon standard practice to leave witness sections and unexcavated areas for future research. The situation is novel, representing possible burials by a nonhuman species, and that makes it doubly important in our opinion to be conservative in not fully exhuming the skeletal material from its context. We anticipate that many other researchers, including future investigators, will suggest additional methods to further test the hypothesis of burial, something that would be impossible if we had excavated the features in their entirety prior to publishing a description of our work. We believe strongly that our ethical responsibility is to publish the work and the most likely interpretation while leaving as much evidence in place as possible to enable further testing and replication. We welcome the suggestions of additional methods/analyses to test the H. naledi burial hypothesis.

      This being said, we also observe that total exhumation would not resolve the concerns raised by the reviewers. The recommendation of total exhumation is in pursuit of a full account of all skeletal material present and its preservation and spatial situation, in order to demonstrate that they conform to body positions comparable to human burials. As has been highlighted in forensic casework, the excavation of an inhumation feature does not necessarily provide an accurate spatial or anatomical manifest of the stratigraphical relationships between the body, encapsulating matrix, and any cut present due to preservational, taphonomic and operational factors (Dirkmaat and Cabo, 2016; Hunter, 2014). In particular, in cases where skeletal elements are highly fragmented, friable, or degraded (such as through bioerosion) then complete excavation—even under controlled laboratory conditions—may destroy bone and severely limit skeletal identification (Henderson, 1997; Hochrein, 2002; Owsley and Compton, 1997), particularly in elements where the ratio of trabecular to cortical bone is high (Darwent and Lyman, 2002; Lyman, 1994). As such, non-invasive methods of 3D and 4D modelling (preservation in situ) are often considered preferable to complete necropsy or excavation (preservation by record) where appropriate (Bolliger and Thali, 2009; Dell’Unto and Landeschi, 2022; Randolph-Quinney et al., 2018; Silver, 2016). 

      The test of burial is not primarily positional, but taphonomic and geological. The position and number of bones can elaborate on process-driven questions of decay and destruction in the burial environment, or post-mortem modification, but are not singularly indicative of whether the remains were intentionally buried – the post-mortem narrative of all the processes affecting the cadaveric island is required (Knüsel and Robb, 2016). In previous cases, researchers have disputed or accepted the hypothesis of intentional hominin burial based upon assumptions about how modern humans or Neandertals would have positioned bodies, with the idea that some positions reflect ritual intent while others do not. But applying such assumptions is unjustifiable, particularly for a species like H. naledi, whose culture may have differed fundamentally from our own. Our work acknowledges that the present evidence does not enable a full reconstruction of the burial positions, but it does show that fleshed remains were encased in sediment prior to decomposition of soft tissue, and that subsequent spatial changes can be most parsimoniously explained by natural decomposition within sedimentary matrix contained within a burial feature (after Green, 2022; Mickleburgh and Wescott, 2018; Mickleburgh et al., 2022). If the argument is that extraordinary claims require extraordinary evidence, we feel that the evidence documents excavation and interment (and will do so more clearly in the revision) and the fact of the remains do not match a “typical” human burial in body positioning is not in itself evidence that these are not H. naledi burials.

      We feel that the reviewers (in keeping with many palaeoanthropologists) have a clear idea of what they “think” a burial should look like in an idealised sense, but this platonic ideal of burial form is not matched by the extensive literature in archaeothanatology, funerary archaeology and forensic science which indicates enormous variability in the activity, morphology and post-mortem system experienced by the human body in cases of interment and body disposal (e.g. Aspöck, 2008; Boulestin and Duday, 2005 and 2006; Connelly et al., 2005; Channing and Randolph-Quinney, 2006; Cherryson, 2008; Donnelly et al., 1995; Finley, 2000; Hunter, 2014; Parker Pearson, 1999; Randolph-Quinney, 2013). Decades of experience in the identification, recovery and interpretation of clandestine, deviant, and non-formal burials indicates the platonic ideal is rare, and in many contexts, the exception (Cherryson, 2008; Parker Pearson, 1999). This variability is particularly relevant to morphological traits in burial context, such as the informal nature of the grave cut in plan and section, shallow burial depth, and initial disposition of body (placement) during the early post-mortem period. These might run counter to the expectations of reviewers or others referencing the fossil hominin record, but are well accepted within the communities of researchers investigating Holocene archaeological sites and forensic contexts.

      It is encouraging to see reviewers beginning to incorporate the extensive (often experimentally derived) literature from archaeothanatology and forensic taphonomy in their deliberations, and we will be taking these comments on board going forward. In particular, we acknowledge reviewers’ comments and the need to construct a more detailed post-mortem narrative, accounting for joint disarticulation (labile versus persistent joints etc), displacement, and final disposition of elements within the burial space. As such we will incorporate the hierarchy of decomposition (rank order disarticulation), associations between regions of anatomical association, areas of disassociation, and the voids produced during decomposition (after Mickleburgh and Wescott, 2018; Mickleburgh et al., 2022) into our narrative. In doing so we acknowledge the tensions between the inductive archaeolothanatological narrative-driven approach (e.g. Duday, 2005 & 2009) versus robust decomposition data derived from human forensic taphonomic experimentation recently articulated by Schotsmans and colleagues (2022) - noting that we will highlight comparative data based on forensic experimental casework and actualistic modelling over inductive intuitive approaches which come with significant evidential shortcomings (Bristow et al. 2011).

      Finally, from a taphonomic perspective it is worth pointing out to reviewers that we have already addressed the issue of lack of taphonomic evidence for carnivore involvement in the formation of the Dinaledi assemblage (Dirks, et al., 2016). Absence of any carnivore-induced bone surface modifications, patterns of skeletal part representation, and a total absence of any carnivore remains found within the Dinaledi chamber (following Kuhn and colleagues, 2010) lead us to reject carnivores as possible vectors of body accumulation within the Dinaledi Chamber and Hill Antechamber.

      Reviewers suggest that without a date derived from geochronological methods, the engravings cannot be associated with H. naledi, and that it is possible (or probable) that the engravings were done in the recent past by H. sapiens. This suggestion neglects the context of the site. We have previously documented the structure and extremely limited accessibility of the Dinaledi subsystem. This subsystem was not recorded on maps of the documented Rising Star Cave system prior to our work and its discovery by our teams. Furthermore, there is no evidence of prehistoric human activity in the areas of the cave related to possible subterranean entrances There is no evidence that humans in the past typically ventured into such extreme spaces like those of Rising Star. It is clear from the presence of the remains of many individuals that H. naledi ventured into these spaces again and again. It is likely that H. naledi moved through these spaces more easily than humans do based on their physique. We show that the engravings overlay each other suggesting multiple engraving events.  These engravings took time and effort and the only evidence for use of the Dinaledi subsystem by any hominin is by H. naledi. The context leads to the null hypothesis that H. naledi made the marks. In our revision, we will elaborate on this argument to clarify the evidence for our stance on this hypothesis. Several reviewers took issue with the title of the engraving paper as we did not insert a qualifier in front of the suggested date range for the engravings. We deliberately left out qualifying language so that the title took the form of a testable hypothesis rather than a weak assertation. Should future work find the engravings were not produced within this time range, then we will restate this hypothesis.

      Finally, with regards to the engravings we have chosen to report them because they exist. Not reporting the presence of engraved marks on the walls of a cave above hypothesized burials would be tantamount to leaving relevant evidence out of the description of an archeological context. We recognize and state in our manuscript that these markings require substantial further study, including attempts at geochronological dating. But the current evidence is clearly relevant to the archaeological context of the subsystem. We take a similar stance with reporting the presence of the tool shaped artefact near the hand of the H. naledi skeleton in the Hill Antechamber. It is evident that this object requires further study, as we stated in our manuscript, but again omitting it from our study would be leaving out relevant evidence.

      Some have suggested that the null hypothesis should be that all of these observed circumstances are of natural origin. Our team took this approach in our early investigation of the Dinaledi subsystem (Dirks et al. 2015). We adopted the null hypothesis that the geological processes involved in the accumulation of H. naledi skeletal remains were “natural” (e.g., non-naledigenic involvement), and we were able to reject many alternative explanations for the assemblage, including carnivore accumulation, “death trap” accumulation, and fluvial transport of bodies or bones (Dirks et al. 2015). This led us to the hypothesis that H. naledi were involved in bringing the bodies into the spaces where they were found. But we did not hypothesize their involvement in the formation of the deposit itself beyond bringing the bodies to the location.

      This approach seems conservative. It followed the traditional view that small-brained hominins do not engage in cultural practices. But we recognize in hindsight that this null hypothesis approach did harm to our analyses. It impeded us from recognizing within our initial excavations of the puzzle box area and other excavations between 2014 – 2017 that we might be encountering remains that were intrusive in the sedimentary floor of the chamber. If we had approached the accumulation of a large number of hominins from the perspective of the null hypothesis being that the situation was likely cultural, we perhaps would have collected evidence in a slightly different manner. We certainly note that if the Dinaledi system had been full of the remains of modern humans, there would have been little doubt that the null hypothesis would have been that this was a cultural space and not a “natural space”.  We therefore respectfully disagree with the reviewers who continue to support the idea that we should approach hominin excavations with the null hypothesis that they will be natural (specifically non-cultural) in origins. If excavations continue with this mindset we believe that potential cultural evidence is almost certain to be lost.

      There has been a gradient across paleoanthropological excavations, archaeological work, and forensic investigation, with increasing precision of context. The reality is that the recording precision and frame of approach is typically different in most paleontological excavations than in those related to contemporary human remains. If anything comes from the present discussion of whether the Dinaledi system is a burial site for H. naledi or not, we hope that by taking seriously the possibility of deep cultural dynamics of hominins, we will encourage other teams to meet the highest standards of excavation in order to preserve potential cultural evidence. Given H. naledi’s cranial capacity we suggest that even very early hominin skeletal assemblages should be re-examined, if there is sufficient evidence or records available.  These would include examples such as the A.L. 333 Au. afarensis site (the so called First Family site in Hadar Ethiopia), the Dikika infant skeleton, WT 15000 (Turkana Boy) and even A.L. 288 (Lucy) as such unusual taphonomic situations where skeletons are preserved cannot be simply explained away as “natural” in origin, based solely on the cranial capacity and assumed lack of cognitive and cultural complexity of the hominins as emphasized by us in Fuentes et al. (2023). We are not the first to observe that some very early hominin situations may represent early mortuary activity (Pettitt 2013), but we would advocate a step further. We suggest it may be damaging to take “natural accumulation” as the standard null hypothesis for hominin paleoanthropology, and that it is more conservative in practice to engage remains with the null hypothesis of possible cultural formation.

      We are deeply grateful for the time and effort all of the 8 reviewers (across three reviews) have taken with this work.  We also acknowledge the anonymous reviewers from previous submissions who’s opinions and comments will have made the final iterations of these manuscripts better for their efforts. As this process is rather public and includes commentary outside of the eLife forum, we ask that the efforts of all 37 authors and 8 reviewers involved be respected and that the discourse remain professional in all venues as we study this fascinating and quite complex occurrence. We appreciate also the efforts of members of the public who have engaged with this relatively new process where preprints are posted prior to the reviews allowing comments and interactions from colleagues and the public who are normally not part of the internal peer review process.  We believe these interactions will make for better final papers. We feel we have met the standards of demonstrating burials in H. naledi and that the engraving are most likely associated with H. naledi. However, given the reviews we see many areas where our clarity and context, and analyses, were less strong than they can be. With the clarifications and additions taken on board through these review processes the final papers will be stronger and clearer. We, recognize that this is an ongoing process of scientific investigation and further work will allow continued, and possibly better, evaluation of these hypothesis and others.

      Lee R Berger, Agustín Fuentes, John Hawks, Tebogo Makhubela

      Works cited:

      • Aspöck, E. (2008). What Actually is a ‘Deviant Burial’?: Comparing German-Language and Anglophone Research on ‘Deviant Burials.’ In E. M. Murphy (Ed.). Deviant Burial in the Archaeological Record. Oxford: Oxbow Books.  pp 17–34.

      • Bolliger, S.A. & Thali, M.J. (2009). Thanatology. In S.A. Bolliger and M.J. Thali (eds) Virtopsy Approach:  3D Optical and Radiological Scanning and Reconstruction in Forensic Medicine. Boca Raton: CRC Press. pp 187-218.

      • Boulestin, B. & Duday, H. (2005). Ethnologie et archéologie de la mort: de l’illusion des références à l’emploi d’un vocabulaire. In: C. Mordant and G. Depierre (eds) Les Pratiques Funéraires à l’Âge du Bronze en France. Actes de la table ronde de Sens-en-Bourgogne. Paris: Éditions du Comité des Travaux Historiques et Scientifiques. pp. 17–30.

      • Boulestin, B. & Duday, H. (2006). Ethnology and archaeology of death: from the illusion of references to the use of a terminology. Archaeologia Polona 44: 149–169.

      • Bristow, J., Simms, Z. & Randolph-Quinney, P.S. Taphonomy. In S. Black and E. Ferguson (eds.) Forensic Anthropology 2000-2010. Boca Raton, FL: CRC Press. pp 279-318.

      • Channing, J. & Randolph-Quinney, P.S. (2006). Death, decay and reconstruction: the archaeology of Ballykilmore Cemetery, County Westmeath. In J. O’Sullivan and M. Stanley (eds.) Settlement, Industry and Ritual: Archaeology. National Roads Authority Monograph Series No. 3. Dublin: NRA/Four Courts Press. pp 113-126.

      • Cherryson, A. K. (2008). Normal, Deviant and Atypical: Burial Variation in Late Saxon Wessex, c. AD 700–1100. In E. M. Murphy (Ed.). Deviant Burial in the Archaeological Record. Oxford: Oxbow Books. pp 115–130.

      • Connolly, M., F. Coyne & L. G. Lynch (2005). Underworld : Death and Burial in Cloghermore Cave, Co. Kerry. Bray, Co. Wicklow: Wordwell.

      • Darwent, C. M. & R. L. Lyman (2002). Detecting  the postburial fragmentation of carpals, tarsals and phalanges. In M. H. Sorg and W. D. Haglund (eds). Advances in Forensic Taphonomy: Method, Theory and Archeological Perspectives. Boca Raton, FL, CRC Press. pp 355-378.

      • d’Errico, F., & Backwell, L. (2016). Earliest evidence of personal ornaments associated with burial: The Conus shells from Border Cave. Journal of Human Evolution, 93, 91–108.

      • De Villiers. H. (1973). Human skeletal remains from Border Cave, Ingwavuma District, KwaZulu, South Africa. Annals of the Transvaal Museum, 28(13), 229–246.

      • Dell’Unto, N. and Landeschi, G. (2022). Archaeological 3D GIS. London: Routledge.

      • Dibble, H. L., Aldeias, V., Goldberg, P., McPherron, S. P., Sandgathe, D., & Steele, T. E. (2015). A critical look at evidence from La Chapelle-aux-Saints supporting an intentional Neandertal burial. Journal of Archaeological Science, 53, 649–657.

      • Dirkmaat, D. C., & Cabo, L. L. (2016). Forensic archaeology and forensic taphonomy: basic considerations on how to properly process and interpret the outdoor forensic scene_. Academic Forensic Pathology_ 6, 439–454.

      • Dirks, P. H., Berger, L. R., Roberts, E. M., Kramers, J. D., Hawks, J., Randolph-Quinney, P. S., Elliott, M., Musiba, C. M., Churchill, S. E., de Ruiter, D. J., Schmid, P., Backwell, L. R., Belyanin, G. A., Boshoff, P., Hunter, K. L., Feuerriegel, E. M., Gurtov, A., Harrison, J. du G., Hunter, R., … Tucker, S. (2015). Geological and taphonomic context for the new hominin species Homo naledi from the Dinaledi Chamber, South Africa. ELife, 4, e09561.

      • Dirks, P.H.G.M., Berger, L.R., Hawks, J., Randolph-Quinney, P.S., Backwell, L.R., and Roberts, E.M. (2016). Comment on “Deliberate body disposal by hominins in the Dinaledi Chamber, Cradle of Humankind, South Africa?” [J. Hum. Evol. 96 (2016) 145-148]. Journal of Human Evolution 96:  149-153.

      • Dirks, P. H., Roberts, E. M., Hilbert-Wolf, H., Kramers, J. D., Hawks, J., Dosseto, A., Duval, M., Elliott, M., Evans, M., Grün, R., Hellstrom, J., Herries, A. I., Joannes-Boyau, R., Makhubela, T. V., Placzek, C. J., Robbins, J., Spandler, C., Wiersma, J., Woodhead, J., & Berger, L. R. (2017). The age of Homo naledi and associated sediments in the Rising Star Cave, South Africa. ELife, 6, e24231.

      • Donnelly, S., C. Donnelly & E. Murphy (1999). The forgotten dead: The cíllíní and disused burial grounds of Ballintoy, County Antrim. Ulster Journal of Archaeology 58, 109-113.

      • Duday, H. (2005). L’archéothanatologie ou l’archéologie de la mort. In: O. Dutour, J.-J. Hublin and B. Vandermeersch (eds) Objets et Méthodes en Paléoanthropologie. Paris: Comité des Travaux Historiques et Scientifiques. pp. 153–215.

      • Duday, H. (2009). Archaeology of the Dead: Lectures in Archaeothanatology. Oxford: Oxbow Books.

      • Finley, N. (2000). Outside of life: Traditions of infant burial in Ireland from cillin to cist.  World Archaeology 31, 407-422.

      • Gargett, R. H. (1999). Middle Palaeolithic burial is not a dead issue: The view from Qafzeh, Saint-Césaire, Kebara, Amud, and Dederiyeh. Journal of Human Evolution, 37(1), 27–90.

      • Goldberg, P., Aldeias, V., Dibble, H., McPherron, S., Sandgathe, D., & Turq, A. (2017). Testing the Roc de Marsal Neandertal “Burial” with Geoarchaeology. Archaeological and Anthropological Sciences, 9(6), 1005–1015.

      • Gómez-Olivencia, A., & García-Martínez, D. (2019). New postcranial remains from the Roc de Marsal Neandertal child. PALEO. Revue d’archéologie Préhistorique, 30–1, 30–1.

      • Green, E.C. (2022). An archaeothanatological approach to the identification of late Anglo-Saxon burials in wooden containers. In C.J. Knüsel and E.M.J. Schotsmans (eds.) The Routledge Handbook of Archaeothanatology. London: Routledge. pp 436-455.

      • Henderson, J. (1987). Factors determining the state of preservation of human remains. In A. Boddington, A. Garland and R. Janaway (eds). Death, Decay and Reconstruction: Approaches to Archaeology and Forensic Science. Manchester: Manchester University Press. pp 43-54.

      • Hunter, J. R. (2014). Human remains recovery: archaeological and forensic perspectives. In C. Smith (ed). Encyclopedia of Global Archaeology. New York: Springer New York. pp 3549-3556.

      • Hochrein, M. (2002). An Autopsy of the Grave: Recognizing, Collecting and Preserving Forensic Geotaphonomic Evidence. In M. H. Sorg and W. D. Haglund (eds). Advances in Forensic Taphonomy: Method, Theory and Archeological Perspectives. Boca Raton, FL, CRC Press: 45-70.

      • Knüsel, C.K. & Robb, J. (2016). Funerary taphonomy: An overview of goals and methods. Journal of Archaeological Science: Reports 10, 655-673.

      • Kuhn, B.F., Berger, L.R. & Skinner, J.D. (2010). Examining criteria for identifying and differentiating fossil faunal assemblages accumulated by hyenas and hominins using extant hyenid accumulations. International Journal of Osteoarchaeology 20, 15-35.

      • Lyman, R. (1994). Vertebrate Taphonomy. Cambridge, Cambridge University Press.

      • Martinón-Torres, M., d’Errico, F., Santos, E., Álvaro Gallo, A., Amano, N., Archer, W., Armitage, S. J., Arsuaga, J. L., Bermúdez de Castro, J. M., Blinkhorn, J., Crowther, A., Douka, K., Dubernet, S., Faulkner, P., Fernández-Colón, P., Kourampas, N., González García, J., Larreina, D., Le Bourdonnec, F.-X., … Petraglia, M. D. (2021). Earliest known human burial in Africa. Nature, 593(7857), 7857.

      • Mickleburgh, H.L & Wescott, D.J. (2018). Controlled experimental observations on joint disarticulation and bone displacement of a human body in an open pit: implications for funerary archaeology. Journal of Archaeological Science: Reports 20: 158-167.

      • Mickleburgh, H.L., Wescott, D.J., Gluschitz, S. & Klinkenberg, V.M. (2022). Exploring the use of actualistic forensic taphonomy in the study of (forensic) archaeological human burials: An actualistic experimental research programme at the Forensic Anthropology Center at Texas State University (FACTS), San Marcos, Texas. In C.J. Knüsel and E.M.J. Schotsmans (eds.) The Routledge Handbook of Archaeothanatology. London: Routledge. pp 542-562.

      • Owsley, D. & B. Compton (1997). Preservation in late 19th Century iron coffin burials. In W. Haglund and M. Sorg (eds). Forensic Taphonomy: The Postmortem Fate of Human Remains. Boca Raton, FL, CRC Press: 511-526.

      • Parker Pearson, M. (1999). The Archaeology of Death and Burial. College Station: Texas A&M University Press.

      • Pettitt, P. (2013). The Palaeolithic Origins of Human Burial. Routledge.

      • Pomeroy, E., Bennett, P., Hunt, C. O., Reynolds, T., Farr, L., Frouin, M., Holman, J., Lane, R., French, C., & Barker, G. (2020). New Neanderthal remains associated with the ‘flower burial’ at Shanidar Cave. Antiquity, 94(373), 11–26.

      • Randolph-Quinney, P.S. (2013). From the cradle to the grave: the bioarchaeology of Clonfad 3 and Ballykilmore 6. In N. Brady, P. Stevens and J. Channing (eds.). Settlement and Community in the Fir Tulach Kingdom. Dublin: National Roads Authority Press. pp A2.1-48.

      • Randolph-Quinney, P.S., Haines, S. and Kruger, A. (2018). The use of three-dimensional scanning and surface capture methods in recording forensic taphonomic traces: issues of technology, visualisation, and validation. In: W.J. M. Groen and P. M. Barone (eds). Multidisciplinary Approaches to Forensic Archaeology. Berlin: Springer International Publishing, pp. 115-130.

      • Rendu, W., Beauval, C., Crevecoeur, I., Bayle, P., Balzeau, A., Bismuth, T., Bourguignon, L., Delfour, G., Faivre, J.-P., Lacrampe-Cuyaubère, F., Tavormina, C., Todisco, D., Turq, A., & Maureille, B. (2014). Evidence supporting an intentional Neandertal burial at La Chapelle-aux-Saints. Proceedings of the National Academy of Sciences, 111(1), 81–86.

      • Sandgathe, D. M., Dibble, H. L., Goldberg, P., & McPherron, S. P. (2011). The Roc de Marsal Neandertal child: A reassessment of its status as a deliberate burial. Journal of Human Evolution, 61(3), 243–253.

      • Silver, M. (2016). Conservation Techniques in Cultural Heritage. In E. Stylianidis and F. Remondino (eds) 3D Recording, Documentation and Management of Cultural Heritage. Dunbeath: Whittles Publishing. pp 15-106.

      • Schotsmans, E.M.J., Georges-Zimmermann, P., Ueland, M. and Dent, B.B. (2022). From flesh to bone: Building bridges between taphonomy, archaeothanatology and forensic science for a better understanding of mortuary practices. In C.J. Knüsel and E.M.J. Schotsmans (eds.) The Routledge Handbook of Archaeothanatology. London: Routledge. pp 501-541.

    1. Author Response:

      We thank eLife and the reviewer for the nice summary of our manuscript. We largely agree with the summary and review, and just add a few small points.

      First, the review asks about the reproducibility of our findings, and suggests that they are only from a single experiment. In fact, our manuscript reports data from two independent single-cell experiments: one performed at low multiplicity of infection (MOI), and another at higher MOI. The broad trends, including the lack of strong correlations between viral mRNA transcription and progeny production, are consistent across both experiments.

      Second, the reviewer asks about what happens when two different virions bearing the same viral barcode infect two different cells, given that we estimate 4-8% of barcodes to be shared between multiple infecting virions. When two cells are infected by different virions with the same barcode, this breaks the one-to-one link between transcription in that cell and progeny in the supernatant, since it is not possible to determine which cell contributed the progeny with that barcode. This means that between 4-8% of the points on our correlation plots could be affected by this factor, meaning that a few outliers should be expected. Another scenario, where a single cell is infected by two barcodes, is not problematic for our method because we can simply sum the progeny output for both barcodes from that cell.

      Finally, the reviewer notes that some cells appear to produce progeny virions despite failing to express one or more viral genes. Such cells can be explained in one of two ways. First, as noted immediately above, we expect a small fraction (4-8%) of the points to be erroneous due to a lack of a guaranteed one-to-one link between cell and progeny for non-unique barcodes. Second, in some cases the missing viral gene could be a technical artifact caused by a stochastic failure to capture modestly expressed transcripts from the gene; this phenomenon, known as gene dropout, occurs at a fairly high rate in single-cell experiments (see Qiu Nature Communications 2020 for a detailed discussion). Genes that are expressed at lower levels, like the Influenza virus polymerase genes, are more likely to be missed during single-cell RNA sequencing. The absent viral genes in each infected cell can be explored in detail using the interactive plots at https://jbloomlab.github.io/barcoded_flu_pdmH1N1/

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Major Revisions:

      1) Although we appreciate this work was carried out independently, it would improve this paper if this structure presented here was compared to the recently published structure of Cx43 (Nat Commun 14, 931 (2023)) with the conclusions including added in the discussion.

      We encourage the readers to read both our study on Cx43 and the one mentioned by the reviewer. However, we believe the optimal format for such a comparison is going to be a more comprehensive review article, which is outside the scope of our study.

      2) Please elaborate on the lipid-binding pockets observed for lipid 1, lipid 2, and the N-lipid/PGL. For example, what are the residues involved in these lipid-protein interactions? Are these residues conserved in other connexin isoforms? Do these lipid-binding pockets match with previous structures, including the recent Cx43 structure? Please clarify what lipid sites are ambiguous due to insufficient resolution.

      Within the scope of our study, we have shown that some of the disease-linked residues are located in close proximity to the lipid sites (Fig. 4b). This suggests a possible role of the lipid sites in diseases associated with Cx43 mutations (and possibly with the mutations in other connexins, as the structures of other connexin channels also feature bound lipids inside the pore region). We feel that a more in-depth comparison will require a careful study, beyond the analysis that we have performed here, and for this reason we would like to reserve such a detailed comparison for our future work (possibly a comprehensive review article on connexin structure and function).

      3) The NT domain and TM2 segments are referred to as the gate region. If there is no strong evidence to support this claim then please use "putative" gate region.

      We have updated the text accordingly, referring to this region as a putative gate region where appropriate.

      4) It is mentioned that there is a reorientation of extracellular loops 1 and 2 after Gap junction formation. Based on their structures, I wonder how this rearrangement alters the channel conduction pathway. For example, Do the electrostatic surface and hydrophobic properties change? Please consider adding further details as this information could be useful to understand why some properties of hemichannels differ from intercellular GJ channels.

      We have updated the Fig. 5 with an illustration of the Cx43 HC surface coloured according to electrostatic potential (to match the same representation of the Cx43 GJC). It is obvious that the rearrangement of the extracellular loops 1 and 2 do not dramatically alter the electrostatic properties of the HC relative to the GJC. A more obvious difference is in the local environment of the ECLs: it is radically different in a “free” HC (exposed to the solvent or to the extracellular space of a cell), compared to the ECL environment in a connexon within a GJC (which is sealed by a docked connexon from the opposite membrane).

      5) Related to the previous point, the pore profile shown in Figure 5C shows that there is a constriction site in the extracellular part with the same diameter as the observed constriction caused by the NT domain. This constriction point seems to be associated with the high energies calculated for Cl-. Please clarify if this constriction is produced by the formation of the GJC or is also present in HC?

      This is the same constriction zone, and the Cl- barriers are further down the channel axis where the electrostatic potential of the protein is negative. We have included a similar calculation for the HC simulation in Fig. 5 (revised Fig. 5f).

      6) Related to the MD simulations shown in Figure 5d: if the voltage is applied across the whole GJC, the free energy under voltage should not be symmetric. Please clarify.

      The symmetry observed in the free energies is due to the fact that the ions enter and exit from the same hemichannel. Only at very high voltages we observe some rare full GJC permeation events, slightly unbalancing the free energy at 500 mV.

      7) The scheme in Figure 6 many needs further editing. The authors propose a putative closed state in which lipids are bound next to the NT, but we suggest it should be made clearer in the figure that this is a putative model, since there is no functional evidence supporting the role of these lipids in the gating/permeation properties of Cx43. Also, please clarify what is meant by a "semi-permeable gate" - a channel that only permeates ions but not molecules?

      We have updated the legend of the figure 6, to clearly reflect that this is a putative model. The “semi-permeable” state of the channel is something that was suggested previously by the authors of the Cx31.3 study, and we refer to that structure in the figure.

      Minor comments:

      1) In the result section there are some statements that currently lack solid experimental support. Please consider editing or moving this text to the discussion section only. A good example of this is the Diseaselinked mutation section, specifically lines 199-206. In another example: in lines, 237-238 authors state that NT can move laterally and vertically, but this idea still requires experimental validation.

      We feel that the original formulations of these portions of the text are appropriate. Disrupting them would interrupt the flow of the manuscript, and we prefer to stay with the original text in this case.

      2) Line 283. "With these structures in mind, we can now establish the existence of several structurally defined gating substates of the connexin channels". Please, tone down this statement. Replace "establish" with "propose" or another more appropriate word.

      We have updated the text as suggested ("propose” instead of “establish”)

      3) Line 313-314. " The presence of such molecules could have important implications for HC or GJC assembly, substrate permeation, and molecular gating". Currently, this entire statement does not have any support. Is there any paper that authors can discuss to suggest with some basis that lipids might have a role in assembly, permeation or gating?

      We feel that this statement is sufficiently careful, conveying a thought that the presence of such molecules could have important implications for various HC- or GJC-related processes. It is not a particularly strong claim and seems to be appropriate in this context.

      4) It seems that the structure shown in panels A and C in Figure 2 are shown in opposite directions, which makes the figure confusing. If needed, please rotate the structure in panel A to show the cytosolic part of the protein as panel C. Also, in the same figure, panels G and F are wrongly labeled. Please correct.

      For Fig. 2a, the angle is very different from anything else we show in the figure, so we would rather keep this as it is now. We have corrected the labelling for Fig. 2g-h.

      5) Check spelling mistakes in the legend of Extended data Fig.2, Extended data Fig.9, and line 243.

      We are grateful to the reviewers for pointing out the typos, which have now been corrected.

      6) The colors for G-L isoforms are not specified in Extended Data Fig.10. Please correct this.

      We updated the figure, removing the PGL label (the correct label is “lipid-N”).

      7) It is not clear what is the difference between PGL and the N-lipid density. Does PGL refers to the lipid-like density observed in nanodiscs, as indicated in Extended Fig. 4 and 10?. Please clarify this issue in the manuscript.

      The labeling has been corrected in like with the revised version of the manuscript (this density element is now referred to as the “lipid-N”).

      8) Page 7 line 234-235 "The pore opening has a solvent-accessible radius of ~6Å (Figure 5c) very close to the effective hydrated radius of K+ (~6.6 Å) and Cl- (~7.2 Å). This makes it the most narrow pore opening...", it should be diameter, not radius.

      We have added a calculation for the HC (new Fig. 5f) and corrected the text as follows (line 234):

      “The pore opening observed in our cryo-EM structures has a solvent-accessible radius of ~3 Å (Figure 2b). This makes it the most narrow pore opening observed for a connexin channel to date (a comparison of the pore openings in the cryo-EM structures of connexin channels is shown in Extended Data Fig. 12). However, the average solvent-accessible radius of the pore during molecular dynamics was ~6 Å (Figure 5c); note that the effective hydrated radius of K+ and Cl- is ~3.3 Å and ~3.6 Å, respectively.”

      And line 277:

      “The average pore radius during the simulations was consistent with that observed in the cryo-EM structure (Fig. 5f).”

    1. Author Response

      Reviewer #2 (Public Review):

      The manuscript by Ma et al, "Two RNA-binding proteins mediate the sorting of miR223 from mitochondria into exosomes" examines the contribution of two RNA-binding proteins on the exosomal loading of miR223. The authors conclude that YBX1 and YBAP1 work in tandem to traffic and load miR223 into the exosome. The manuscript is interesting and potentially impactful. It proposes the following scenario regarding the exosomal loading of miR223: (1) YBAP1 sequesters miR223 in the mitochondria, (2) YBAP1 then transfers miR223 to YBX1, and (3) YBX1 then delivers miR223 into the early endosome for eventual secretion within an exosome. While the authors propose plausible explanations for this phenomenon, they do not specifically test them and no mechanism by which miR223 is shuttled between YBAP1 and YBX1, and the exosome is shown. Thus, the paper is missing critical mechanistic experiments that could have readily tested the speculative conclusions that it makes.

      Comments:

      1) The major limitation of this paper is that it fails to explore the mechanism of any of the major changes it describes. For example, the authors propose that miR223 shuttles from mitochondrially localized YBAP1 to P-body-associated YBX1 to the exosome. This needs to be tested directly and could be easily addressed by showing a transfer of miR223 from YBAP1 to YBX1 to the exosome.

      Testing this idea using fluorescently labeled miR223 would indeed be an ideal experiment. However, miRNA imaging presents challenges. As reviewer 1 pointed out, and we have now confirmed, the atto-647 dye itself localizes to mitochondria. We will continue our efforts to identify a suitable fluorescent label for miR223in order to be in a position to evaluate the temporal relationship between mitochondrial and endosomal miR223.

      2) If YBAP1 retains miR223 in mitochondria, what is the trigger for YBAP1 to release it and pass it off to YBX1? The authors speculate in their discussion that sequestration of mito-miR223 plays a "role in some structural or regulatory process, perhaps essential for mitochondrial homeostasis, controlled by the selective extraction of unwanted miRNA into RNA granules and further by secretion in exosomes...". This is readily testable by altering mitochondria dynamics and/or integrity.

      A previous study has reported that YBAP1 can be released from mitochondria to the cytosol during HSV-1 infection (Song et al., 2021). However, due to restrictions, we are unable to conduct experiments using HSV to verify this condition. We attempted to induce mitochondrial stress by using different concentrations of CCCP, but we did not observe the release of YBAP1 from mitochondria after CCCP treatment. We speculate that not all mitochondrial stress conditions can trigger YBAP1 release. Investigating the mechanism of mito-miR223 release from mitochondria is one of our interests that we aim to explore in future studies.

      3) Much of the miRNA RT-PCR analysis is presented as a ratio of exosomal/cellular. This particular analysis assumes that cellular miRNA is unaffected by treatments. For example, Figure 1a shows that the presence of exosomal miR223 is significantly reduced when YBX1 is knocked out. This analysis does not consider the possibility that YBX1-KO alters (up or down-regulates) intracellular miR223 levels. Should that be the case, the ratiometric analysis is greatly skewed by intracellular miRNA changes. It would be better to not only show the intracellular levels of the miRs but also normalize the miRNA levels to the total amount of RNA isolated or an irrelevant/unchanged miRNA.

      Our previous publications demonstrated that miR223 levels are increased in YBX1-KO cells and decreased in exosomes derived from YBX1 KO cells. However, no significant changes were observed in miR190 levels (Liu et al., 2021; Shurtleff et al., 2016). The repeated data has been included in Figure 1a.

      For the analysis of other miRNAs by RT-PCR, we assessed changes in intracellular and exosomal miRNA levels in the corresponding figures. In the qPCR analysis, miRNA levels were normalized to the total amount of RNA.

      4) In figure 1, the authors show that in YBX1-KO cells, miR223 levels are decreased in the exosome. They further suggest this is because YBX1 binds with high affinity to miR223. This binding is compared to miR190 which the authors state is not enriched in the exosome. However, no data showing that miR190 is not present in the exosome is shown. A figure showing the amount of cellular and exosomal miR223 and 190 should be shown together on the same graph.

      In previous publications we demonstrated that miR190 is not localized in exosomes and not significantly changed in YBX1 knockout (KO) cells and exosomes derived from YBX1 KO cells (Liu et al., 2021; Shurtleff et al., 2016). The repeated data has been included in Figure 1a.

      5) Figure 2 Supplement 1 - As to determine the nucleotides responsible for interacting with YBX1, the authors made several mutations within the miR223 sequence. However, no explanation is given regarding the mutant sequences used or what the ratios mean. Mutant sequences need to be included. How do the authors conclude that UCAGU is important when the locations of the mutations are unclear? Also, the interpretation of this data would benefit from a binding affinity curve as shown in Fig 2C.

      The ratio is of labeled miR223/unlabeled miR223 (wt and mutant). All mutant sequences of miR223 have been included in Figure 2 supplement 1.

      6) While the binding of miR223mut to YBX1 is reduced, there is still significant binding. Does this mean that the 5nt binding motif is not exact? Do the authors know if there are multiple nucleotide possibilities at these positions that could facilitate binding? Perhaps confirming binding "in vivo" via RIP assay would further solidify the UCAGU motif as critical for binding to YBX1.

      The binding affinity of miR223mut with YBX1 is reduced approximately 27-fold compared to miR223. We speculate that the secondary structure of miR223 may contribute to the interaction with YBX1.

      Our EMSA data, in vitro packaging data, and exosome analysis reinforce the conclusion that UCAGU is critical for YBX1 binding. These findings suggest that the presence of the UCAGU motif in miR223 is crucial for its interaction with YBX1 and subsequent sorting into exosomes.

      7) Figures 2g, h - It would be nice to show that miR190mut also packages in the cell-free system. This would confirm that the sequence is responsible. Also, to confirm that the sorting of miR223 is YBX1-dependent, a cell-free reaction using cytosol and membranes from YBX1 KO cells is needed.

      Although we have not performed the suggested experiment, we purified exosomes from cells overexpressing miR190sort and observed an increase in the enrichment of miR190sort in exosomes compared to miR190. This finding confirmed that the UCAGU motif facilitates miRNA sorting into exosomes.

      Regarding the in vitro packaging assay, our previously published paper demonstrated that cytosol from YBX1 knockout (KO) cells significantly reduces the protection of miR223 from RNase digestion. We concluded that the sorting of miR223 into exosomes is dependent on YBX1 (Shurtleff et al., 2016).

      8) In Figure 3a, the authors show that miR223 is mitochondrially localized. Does the sequence of miR223 (WT or Mut) matter for localization? Does it matter for shuttling between YBAP1 and YBX1?

      The localization of miR223mut has not been tested in our current study. We plan to conduct these experiments in the future.

      9) Supplement 3c - Is it strange that miR190 is not localized to any particular compartment? Is miR190 present ubiquitously and equally among all intracellular compartments?

      Most mature miRNAs are predominantly localized in the cytoplasm. Although there is no specific subcellular localization reported for miR190 in the literature, our experimental findings indicate a relatively high expression of miR190 in 293T cells. It is likely that most of miR190 is localized in the cytosol. However, it is also possible that a small fraction of miR190 may associate with a membrane, which could explain its distribution in various subcellular structures. Importantly, we did not observe enrichment of miR190 in the mitochondria or exosomes.

      10) Figure 3h - Why would the miR223 levels increase if you remove mitochondria? Does CCCP also cause miR223 upregulation? I would have thought miR223 would just be mis-localized to the cytosol.

      We report that the levels of cytoplasmic miR223 increase following the removal of mitochondria using CCCP treatment. While we cannot rule out the possibility that upregulation of miR223 is directly caused by CCCP treatment, we suggest that miR223 becomes mis-localized to the cytosol upon mitochondrial removal. Our data suggests that mitochondria contribute to the secretion of miR223 into exosomes. When mitochondria are removed by mitophagy, cytosolic miR223 is not efficiently secreted, which provides an alternative explanation for the observed increase in miR223 level after mitochondrial removal.

      11) Figure 3i - What is the meaning of "Urd" in the figure label? This isn't mentioned anywhere.

      “Urd” represents Uridine. Uridine is now spelled out in figure 3i. The absence of mitochondria can impact the function of the mitochondrial enzyme dihydroorotate dehydrogenase, which plays a role in pyrimidine synthesis. To address this issue, one approach is to supplement the cell culture medium with Urd. A previous study demonstrated that primary fibroblasts showed positive responses when Urd was added to the cell culture medium, resulting in improved cell viability for extended periods of time (Correia-Melo et al., 2017).

      12) Figure 3j - The data is presented as a ratio of EV/cell. Again, this inaccurately represents the amount of miR223 in the EV. This issue is apparent when looking at Figures 3h and 3j. In 3h, CCCP causes an upregulation of intracellular miR223. As such, the presumed decrease in EV miR233 after CCCP (3j) could be an artifact due to increased levels of intracellular miR223. Both intracellular and EV levels of miRs need to be shown.

      Both the intracellular and exosomal levels of miR223 have been included in Figure 3j.

      13) In Figure 4, the authors show that when overexpressed, YBX1 will pulldown YBAP1. Can the authors comment as to why none of the earlier purifications show this finding (Figure 1 for example)? Even more curious is that when YBAP1 is purified, YBX1 does not co-purify (Figure 4 supplement 1a, b).

      In Figure 4a-b, human YBX1 fused with a Strep II tag was purified from 293T cells using Strep-Tactin® Sepharose® resin in a one-step purification process. Our data has shown that YBAP1 is expressed in 293T cells.

      In Figure 1 and Figure 4 Supplement 1a, human YBX1 or YBAP1 fused with His and MBP tags were purified from insect cells using a three-step purification process involving Ni-NTA His-Pur resin, amylose resin, and Superdex-200 gel filtration chromatography.

      One possibility is that human YBX1 or YBAP1 may not interact well with insect YBAP1 or YBX1, which could result in separate tagged forms of YBX1 or YBAP1 isolated from insect cells.

      Another possibility is that the expression levels of insect YBAP1 and YBX1 may be too low. Consequently, tagged forms YBX1 or YBAP1 expressed in insect cells may copurify with partners not readily detected by Coomassie blue stain. However, in Figure 4 Supplement 1b, human YBX1 fused with His and MBP tags was co-expressed with non-tagged human YBAP1, and both bands of YBX1 and YBAP1 were visible on the Coomassie blue gel after purification using Ni-NTA His-Pur resin, amylose resin, and Superdex-200 gel filtration chromatography.

      14) Figure 4f, g - The text associated with these figures is very confusing, as is the labeling for the input. Also, what is "miR223 Fold change" in this regard? Seeing as your IgG should not have IP'd anything, normalizing to IgG can amplify noise. As such, RIP assays are typically presented as % input or fold enrichment.

      The RIP assay results have been calculated and presented as a % input in Figure 4g.

      15) Figure 4h - The authors show binding between miR223 and YBAP1 however it is not clear how significant this binding is. There is more than a 30-fold difference in binding affinity between miR223 and YBX1 than between miR223 and YBAP1. Even more, when comparing the EMSAs and fraction bound from figures 1 and 2 to those of Figure 4h, the binding between miR223 and YBAP1 more closely resembles that of miR190 and YBX1, which the authors state is a non-binder of YBX1. The authors will need to reconcile these discrepancies.

      We agree that the binding of YBAP and YBX1 differ quite significantly in the affinity of their interaction with miR223. It is difficult to draw conclusions from a comparison of the affinities of YBX1 for miR190 and YBAP1 for miR223. Nonetheless, a quantitative difference in the interaction of YBAP1 with miR223 and miR190 is apparent (Fig. 4 h, I, j) and we observed no enrichment miR190 in isolated mitochondria (Fig. 3 supplement 1a) whereas YBAP1 selectively IP’d miR223 from isolated mitochondria (Fig. 4 f and g).

      16) Can the authors present the Kd values for EMSA data?

      The Kd values for the EMSA data have been added to the respective figures.

      17) Figure 5 - Does YBAP1-KO affect mitochondrial protein integrity or numbers?

      We generated stable cell lines expressing 3xHA-GFP-OMP25 in both 293T WT and YBAP1-KO cells, but we did not observe any alterations in mitochondrial morphology (Author response image 1).

      Author response image 1.

      Additionally, we performed a comparison of different mitochondrial markers using immunoblot in 293T WT cells and YBAP1-KO cells and did not observe any changes in these markers (data has been included in Figure 5b.).

      18) Figure 6a - Are the authors using YBAP1 as their mitochondrial marker? Please include TOM20 and/or 22.

      In Figure 4c and 4e, our data clearly demonstrate that the majority of YBAP1 is localized in the mitochondria.

      To further validate this localization, we performed immunofluorescence staining using antibodies against endogenous Tom20 and YBX1. The immunofluorescence images document YBX1 associated with mitochondria (Author response image 2 and new Fig 6a.).

      Author response image 2.

      19) Figure 6b - Rab5 is an early endosome marker and may not fully represent the organelles that become MVBs. Co-localization at this point does not suggest that associating proteins will be present in the exosome, and it is possible that the authors are looking at the precursor of a recycling endosome. Even more, exosome loading does not occur at the early endosome, but instead at the MVB. Perhaps looking at markers of the late endosome such as Rab7 or ideally markers of the MVB such as M6P or CD63 would help draw an association between YBX1, YBAP1, and the exosome. Also, If the authors want to make the claim that interactions at the early endosome leads to secretion as an exosome, the authors should show that isolated EVs from Rab5Q79L-expressing cells contain miR223.

      We have previously used overexpressed Rab5(Q79L) to monitor the localization of exosomal content, specifically CD63 and YBX1, in enlarged endosomes (Liu et al. 2021, Fig. 4A, B). These endosomes exhibit a mixture of early and late endocytic markers, including CD63. (Wegner et al., 2010). Hence, the presence of Rab5(Q79L)-positive enlarged endosomes does not solely indicate early endosomes.

      20) The mentioning of P-bodies is interesting but at no time is an association addressed. This is therefore an overly speculative conclusion. Either show an association or leave this out of the manuscript.

      In a previous paper we demonstrated that YBX1 puncta colocalize with P-body markers EDC4, Dcp1 and DDX6 (Liu et al., 2021).

      21) In lines 55-58, the authors make the comment "However, many of these studies used sedimentation at ~100,000 g to collect EVs, which may also collect RNP particles not enclosed within membranes which complicates the interpretation of these data." Do RNPs not dissolve when secreted? Can the authors give a reference for this statement?

      In a previous paper, we demonstrated that the RNP Ago2 does not dissolve in the conditioned medium and is not in vesicles but sediments to the bottom of a density gradient (Temoche-Diaz et al., 2019).

    1. Author Response

      Reviewer #1 (Public Review):

      In this study, Shin and colleagues investigate the role of the posttranslational modification of the DNA methyltransferase by covalent linkage of the N-Acetylglucosamine (O-GlcNAc).

      The authors present compelling evidence showing that a prolonged high fat/sucrose diet causes global protein O-GlcNAcylation in the liver and DNMT1 is among the proteins that increase their O-GlcNAc level. This result is significant because of the paucity of in vivo data addressing the interplay between metabolism and protein O-GlcNAcylation. The paper also shows that DNMT1's O-GlcNAcylation level correlated to the extracellular glucose levels in other cell types.

      Using mass spectrometry, the authors identify S878 as the main site for O-GlcNAcylation. It is noteworthy that the mapping was performed with hyper-O-GlcNAcylated cells and may be different in a physiological situation. To investigate how O-GlcNAcylation of S878 of DNMT1 impacts its activity and ultimately DNA methylation patterns, Shin and colleagues mostly use a cellular model of hyper O-GlcNAcylation induced by the combination of high glucose and a chemical inhibitor of OGA (the only enzyme responsible for O-GlcNAc removal). The data shows that increased O-GlcNAcylation resulting from the combination of high glucose and OGA inhibition causes a reduction of DNMT1 activity and local loss of DNA methylation specifically at partially methylated domains.

      This study brings completely new knowledge on the regulatory function of glycosylation of DNMT1 and its impact on its methyl-transferase activity and downstream genomic methylation. Furthermore, the manuscript introduces new data on the interplay between cellular metabolism and O-GlcNAcylation on DNMT1 and other proteins. The experiments are well-controlled, and their interpretation is sound. This study should be of special interest to the fields of fundamental and environmental epigenetics, as well as metabolism.

      The main limitation of the study is the convolution of the functional experiments where the perturbation is a combination of high glucose and chemical inhibition of OGA. The relative contribution of the two variables is partially addressed in Figure 3-figure supplement 1B which shows that high glucose increases DNMT1 activity (Hep3B cells) while Figure 3D shows that high glucose when combined with OGA inhibitor decreases DNMT1 activity (Hep3B cells). As discussed, the data suggest that high-glucose and OGA inhibition may have an antagonistic effect on DNMT1 activity. An experiment of treatment of the cells with the OGA inhibitor in physiological glucose conditions would address this gap of knowledge.

      We thank the reviewer for the suggestion. The physiological glucose levels are between 5 to 7 mM, and 25mM is in hyperglycemic range, which corresponds to severe diabetes. The new Figure 1A shows TMG treatment with physiological glucose conditions. We have included new WB data of 5mM glucose, 5mM glucose + TMG, 25mM glucose, and 25mM glucose + TMG (Figure 1A).

      To understand the impact of the environment (in this study: extracellular glucose level) on the epigenome, one should keep in mind the variation of cytosine methylation patterns between individuals and over time. A recent large-scale profiling of DNA methylation of 137 individuals shows a near absence of individual variation between replicates of the same cell type, suggesting that genomic methylation patterns are largely insensitive to the environment (https://doi.org/10.1038/s41586-022-05580-6).

      Comparative methylomes of healthy and diabetic individuals are needed to examine the medical significance of the findings presented here. It is possible that the modulation of DNMT1 activity by O-GlcNAc modification is relevant for a specific cell type or developmental stage that remains to be discovered.

      We thank the reviewer for the suggestion. While the present study is focused on the functional impact of glucose concentrations on O-GlcNAcylation of DNMT1, the extension of this work to diabetic individuals is a goal for a follow up project.

      Reviewer #2 (Public Review):

      I've read the manuscript by Shin et al with great interest. The authors describe the identification of O-GlcNAcylation of DNMT1 and the impact this modification has on the maintenance activity of DNMT1 genome-wide and that modification of S878 leads to enzyme inhibition. The manuscript is written in a clear and understandable way making it easy for the reader to understand the logic as well as the steps of the experimental approach.

      The authors identify O-GlcNAcylation of DNMT1 in a number of different cell lines by combining inhibition studies and WB and further on they identify the modification sites with LC/MS, predictions, and mutational studies. I really like the experimental approach, which while being straightforward (albeit technically challenging), is powerful and well-controlled in this case to unequivocally prove the modification of DNMT1 and identify the site. However, mutation of the two identified modification sites does not remove all the O-GlcNAcylation signal associated with DNMT1, thus possibly not all the possible sites were identified. While this is not a criticism of this manuscript, it would be interesting to know what other sites are modified and the enzymatic/biological effects associated.

      We completely agree with the reviewer. As the O-GlcNAc band was also detected in double mutated DNMT1 (Figure 2D), it is expected that undetected O-GlcNAcylated sites will exist. This is a limitation of current MS analysis and is known to be difficult to detect in the case of modified sites located at both 5’- and 3’- ends of the protein or around the site cut by endoprotease such as trypsin. In follow up work we plan to detect more diverse O-GlcNAc modified sites using more types of endoproteases and observe changes in the phenotype of various cells accordingly.

      Also, the authors isolate the modified DNMT1 from cells using immunoprecipitation, which is indeed useful to study the changes in catalytic activity but does not provide any information if the cellular localisation of modified DNMT1 changes.

      We apologize for this oversight. We have added a DNMT1 localization assay via immunofluorescence (IF) in the revised manuscript (Figure 3—figure supplement 3). We found no difference in DNMT1 localization between wild type and S878A mutants.

      Subsequently, the authors checked the impact of high glucose diet on the genome-wide DNA methylation patterns. The observed effects (Fig 4A) are very strong, almost as strong as observed with Aza treatment and therefore I wonder if LINE/IAP or other elements are getting activated (as observed with genome-wide demethylation with Aza).

      We thank the reviewer for the suggestion. Changes in methylation of LINE-1 by hyperglycemia condition are displayed in Figure 4—figure supplement 4. In the case of LINE-1, DNA methylation is lost globally in hyperglycemia conditions. While beyond the scope of this study, a more thorough examination of the impact of the observed loss of methylation under high glucose conditions is of interest.

      Do the authors see any changes in cell phenotype, slower/faster proliferation, or increased apoptosis due to the activation of mobile elements (not only ROS)?

      This is also a very interesting idea. We plan on further investigating this as part of a follow up study.

      Another point is that the S878A mutant seems not to be able to fully maintain the DNA methylation (Fig 4A). Does O-GlcNAcylation recruit any additional interactors? Given that the authors immunoprecipitated DNMT1 and use it for activity assay, it is possible, that the modification attracts an additional protein factor that could in turn inhibit DNMT1 activity (as observed). Therefore, the observed kinetic effect could be indirect, while still interesting and important, the mechanism of inhibition would be different.

      We thank the reviewer for the great suggestions. According to Figure 4A, in the case of mutated DNMT1, a slight methylation loss appears to occur in both conditions. There could be for a number of reasons. It may be due to interacting proteins or it may be caused by some damage of DNMT1 itself. A further investigation of this is planned as a follow up project.

      DNA methylation clock can be used to estimate the biological age of a tissue/cells. While not directly in the line of the manuscript, I was wondering if the DNA methylation changes in the high glucose diet would affect the methylation sites used for the DNAme clock. Meaning, would the cells/tissue epigenetically age faster when in high glucose media, and if the Ala mutant could provide resistance to that?

      We thank the reviewer for the interesting suggestion. We believe this is beyond the scope of this manuscript, but we'll consider this with interest in the future.

      In discussion, the authors write that this is the first investigation of O-GlcNAcylation in relation to DNA methylation, while this is true for DNMTs, TET enzymes, that oxidise 5mC and trigger active DNA demethylation have been shown before to also be modified.

      We have toned down the language throughout the revised manuscript. This is the first investigation into the maintenance of DNA methylation. Although there is a great deal of evidence regarding the important regulatory role of O-GlcNAcylation in gene regulation, a direct link with maintenance of DNA methylation has not previously been established.

      A nice and rigorous study, with important observations and connections to biological effects. It would be nice to prove that the effects are direct and not associated with other factors that could be recruited by the modification and impact the activity of DNMT1. I find it a bit surprising that phosphorylation of the target serine does not impact DNMT1 activity as well.

      We thank the reviewer for the positive comments and agree that there are many interesting avenues to follow up on this.

      Reviewer #3 (Public Review):

      The authors investigate the potential effect of OGlcNacylation on the activity of the DNA methyltransferase DNMT1.

      Some results that are convincingly obtained include:

      • There is more overall OGlcNacylation when Glucose concentration in the culture medium or the feed is high;

      • DNMT1 is OGlcNacylated, and more so in high glucose or on rich chow;

      • The position S878 can be OGlcNacylated;

      • The activity of transfected DNMT1 is decreased in high glucose conditions. This effect is lessened when S878 is mutated to A or D.

      Some results that are suggested but not fully backed by experimental data include:

      • This process happens to the endogenous protein under physiologically relevant conditions;

      We agree that we could not completely rule out endogenous DNMT1 in our experiments. We have adjusted the language in the revised manuscript to acknowledge this. However, we confirmed the change in activity of recombinant DNMT1 (Figure 3D), and also demonstrated the change in activity under physiological conditions (normal physiological glucose level vs hyperglycemic range) in Figure 3—figure supplement 1B. This is a result that directly shows that the activity of DNMT1 changes under physiological conditions. In addition, DNA hypomethylation due to high glucose has been previously reported, already (Kandilya et al., 2020; Lan et al., 2016). Our results suggest a possible mechanism for this.

      Kandilya, D., Shyamasundar, S., Singh, D.K., Banik, A., Hande, M.P., Stunkel, W., Chong, Y.S., and Dheen, S.T. (2020). High glucose alters the DNA methylation pattern of neurodevelopment associated genes in human neural progenitor cells in vitro. Sci Rep 10, 15676.

      Lan, C.C., Huang, S.M., Wu, C.S., Wu, C.H., and Chen, G.S. (2016). High-glucose environment increased thrombospondin-1 expression in keratinocytes via DNA hypomethylation. Transl Res 169, 91-101 e101-103.

      • This process is responsible for changes in DNA methylation, leading to changes in gene expression, leading to increased ROS and increased apoptosis.

      We confirmed that ROS levels increased under high glucose conditions through DCFH-DA fluorescence experiments (Figure 5A). In addition, γH2A.X fluorescence experiments showed that DNA damage was increased under high glucose conditions (Fig. 5B). On the other hand, in the case of the S878A mutant, DNA damage was reduced under hyperglycemic conditions compared to wild type DNMT1 despite an increase in ROS levels (Fig. 5B). Moreover, we verified that the DNA damage did not come from oxidative stress through 8-OHdG analysis (Figure 5—figure supplement 4). Therefore, DNA oxidative stress is suppressed by DNMT1 due to the increase of ROS under high glucose conditions. However, the reduction of DNA methylation by O-GlcNAcylation of DNMT1 induces apoptosis due to oxidative stress.

      Studying the connection between cellular metabolism and epigenetic phenomena is interesting. However, I feel that the article falls short of its aims because of the limits of the experimental system, some missing controls, and some data overinterpretation.

      We hope the reviewer finds our revised manuscript more suitable.

    1. Author Response

      Reviewer #1 (Public Review):

      Overall, this manuscript exposes key gaps in patient care resulting from the pandemic, as well as the challenges and unmet needs felt by healthcare workers in cervical cancer screening. The authors’ findings on the struggles while regaining screening volume across the nation in a sustainable way, demonstrate that pre-existing weaknesses in the cancer control system were exacerbated by the pandemic and are integral to amend. The authors were able to identify these gaps in care and work environments through their synthesis of qualitative interviews. I applaud the use of such mixed methods, which emphasizes the complementary need for both quantitative and qualitative data. What could be better strengthened in the manuscript is the authors’ justification for statistical analyses within the context of the research question, and reporting of survey administration and management.

      The authors thank the reviewer for a thorough assessment of the manuscript. We have addressed the reviewer’s concerns regarding justification of statistical analyses in the Data Analysis, Quantitative survey data section, and reporting of survey administration and management in the Results, Quantitative survey data section.

      Reviewer #2 (Public Review):

      Fuzzell et al. conducted a mixed-method study looking into the possible impact of COVID-19 on clinician perceptions of cervical cancer screening. The authors examined how the pandemic-related staffing changes might have affected the screening and abnormal results follow-up during the period October 2021 through July 2022.

      They found that 80% of the clinicians experienced decreased screening during the start of the pandemic and that ≈67% reported a return to pre-pandemic levels. The general barriers for not returning to pre-pandemic levels were staffing shortages and problems with structural systems for tracking overdue patients and those in need of follow-up after abnormal screening tests.

      Strengths:

      There is a high focus on the consequences and the need for action to prevent the ongoing impact of COVID-19 on cervical cancer screening. Some of the actions mentioned by the authors could be the use of HPV self-sampling kits, and it is interesting to be provided knowledge on the clinicians' views on HPV self-sampling. Both are of high interest to the general population in the US. Throughout the discussion, the authors and their claims are supported by other studies.

      Weaknesses:

      The lack of a National representative sample, where 63% of the responding clinicians were practicing in the Northeast, affects the possibility of generalization of the results found in the study. The overrepresentation of white females is not addressed in the discussion. This composition could have affected the results, especially when the authors report a need to look at higher salaries and better childcare to maintain adequate staffing.

      The conclusions are mostly supported by the data, however, some aspects of the data analysis need to be clarified.

      We thank the reviewer for their constructive feedback. Despite our best efforts, we were unable to recruit a sample more representative of all US regions. We note this limitation in the discussion: “Notwithstanding efforts to achieve a regionally diverse sample, 63% of responding clinicians were practicing in the Northeast at the time of their participation. Given that COVID-19 policies varied widely by state, this regional imbalance may limit the generalizability of our results. Despite the oversample of clinicians in the Northeast, region was not a significant predictor of either outcome.” Also, we acknowledge the high enrollment of White women in our provider sample and now address this point in the discussion: “Similarly, our sample was 85% female and 70% White. Although ideally we would have included a sample that was more diverse with respect to race and gender, these characteristics are not disparate from the majority of clinicians who perform cervical cancer screening (e.g., race: Women’s Health NPs [77% White], active Ob/Gyns [67% White], all active physicians [64% White]; gender: all NPs [92% female], Ob/Gyns [64% female], all active physicians [37% female]).” Data describing these characteristics are reported in the Association of American Medical Colleges (AAMC) 2022 Physician Specialty Data Report and Executive Summary, the 2018 NPWH Women’s Health Nurse Practitioner Workforce Demographics and Compensation Survey: Highlights Report, and a published paper describing the characteristics of nurse practitioners in the US, which are cited in text.

      Reviewer #3 (Public Review):

      This US study presents findings from an online survey and in-person interviews of healthcare providers regarding themes associated with cervical screening in federally qualified health centres (FQHCs). The study provides insights during the post-acute phase of the pandemic into a range of areas, including perceived changes in the provision of cervical cancer screening services and the impact of the pandemic, staffing and systems barriers to cervical cancer screening, strategies for tracking missed screens and catch-ups, follow-up of abnormal screening results, as well as attitudes towards HPV self-sampling. Results indicate persisting pandemic-related impacts on patient engagement and staffing, as well as system barriers to effective screening, catch-up of missed screens and follow-ups. Taken together, these issues may lead to increases in cervical cancer in the long-term in populations serviced by these centres, if measures are not taken to adequately support them. Participants were recruited from various regions in the US, however, the study was not conducted using a nationally-representative sample. Although highlighted issues are informative, findings cannot be generalised and larger studies are warranted in the future to monitor cervical screening provision and outcomes in FQHCs.

      We thank the reviewer for their thorough assessment of the manuscript. In the discussion, we have made sure to note the non-nationally representative sample and need for continued monitoring of cervical cancer screening and related outcomes in underserved settings and communities.

    1. Author Response

      Reviewer #2 (Public review):

      1) The systematic review includes data from some studies where PCOS is self-reported. While self-reported PCOS information has been found to be largely sensitive and specific, it would be of interest to know if prevalence ratios of mental health-related were impacted by self-reporting.

      Thank you for your insightful comment regarding the potential impact of self-reporting on the prevalence ratios of mental health-related outcomes in women with PCOS. We agree that this is an important factor to consider.

      In response, we have revisited all the studies included in our review. We have updated Supplemental Tables 2-4 to provide greater transparency and understanding. These revised tables now include a new column specifying the mental health assessment method used in each study. This update should allow for a more nuanced interpretation of the results, taking into account the potential impact of self-reporting.

      Furthermore, we conducted a sensitivity analysis by rerunning the meta-analysis to discern the potential influence of self-reported PCOS on our results, excluding the studies that relied solely on self-reported PCOS diagnosis. After we excluded studies where PCOS was self-reported, the point estimate for anxiety was similar whereas point estimates for depression and eating disorder were slightly higher but none of the estimates were different beyond chance compared to the original analysis. We believe these steps significantly strengthen the clarity and robustness of our findings (Line 314; Supplemental Tables 7 and 8).

      2) Likewise, the screening vs self-reported nature of the mental health disorders is not clear from the information included in the characteristics table.

      We have modified our Supplemental Tables 2-5 to include a column detailing the method of ‘Mental Health Assessment’. We should note that the majority of the studies directly assessed mental health using a variety of validated questionnaires. We have also included in the Discussion a section emphasizing that some of the studies included in the review relied on self-reported PCOS diagnosis and its potential impact. We also highlighted that while self-reported information is generally reliable, it is subject to potential bias that could impact the prevalence ratios of mental health-related conditions (Line 460).

      3) Calculated prevalence ratios were compared with prevalence values for the general population to determine the excess prevalence. However, the source of these general population statistics (i.e., whether these figures come from the control data in the included studies or other sources) is not clear.

      Thank you for raising this important point. We have now clarified in our Methods section that the general population statistics used for determining excess prevalence were derived from the control data in the included studies. We hope this provides the necessary transparency for our approach in calculating and interpreting the prevalence ratios (Line 210).

      4) The estimated costs for anxiety-, depression- and eating disorder-related care are accessed in published papers and used to calculate the excess costs. Conclusions would be strengthened by a defence of these figures, particularly for anxiety where the source paper is from 1999.

      Thank you for your insightful comment. We agree that providing a justification for our choice of cost estimates, especially for the anxiety care cost from a 1999 study, would strengthen our conclusions. The 1999 source was selected because it is a seminal study that offers a comprehensive breakdown of anxiety-related care costs. Despite its age, this paper is often cited in contemporary research due to its rigorous methodology and the granularity of its cost analysis. Adjusted for inflation, its findings still provide an insightful comparison point for current data. To ensure that these figures accurately represent present-day costs, we have adjusted them for inflation using the medical care inflation calculator. Our choice of these specific studies was based on their rigorous methodology, the detailed breakdown of costs, and their relevance to our targeted age groups. The aforementioned adjustments and justifications ensure that these figures aptly represent the present-day costs of treating these conditions.

      Similarly, the 2021 papers on depression and eating disorders present comprehensive and up-to-date analyses of the economic burdens associated with these conditions. These papers were selected for their rigorous methodologies, comprehensive cost breakdowns, and alignment with our age-specific focus. The Greenberg et al. (2021) paper, for example, is an authoritative source that provides detailed analysis on the economic burden of adults with major depressive disorder. Likewise, the paper by Streatfeild et al. (2021) offers a meticulous investigation into the socio-economic cost of eating disorders in the U.S., making it an apt choice for our study. We recognize the necessity of providing a robust justification for our choice of these particular papers, and we have endeavored to do so in our Methods section, thus reinforcing the transparency of our approach. We have clarified this in our Methods section to make our approach more transparent to readers (Line 225).

      5) An inflation tool is used to adjust the figure, but this does not take into account changes in treatment or practice since this estimate was made. The accuracy of these estimated figures is central to the final conclusions.

      Thank you for your valuable comment. We do note that the inflation figures used are a healthcare-specific inflation factor, as healthcare inflation differs from general consumer inflation. However, we agree that the inflation-adjusted figures do not necessarily account for changes in treatment practices since the original estimate was made, assuming these changes would alter the cost of care. We have added a discussion of this limitation in our manuscript and proposed future studies to validate these estimates using more recent data (Line 473).

    1. Author Response

      Reviewer #1 (Public Review):

      GSK3 is a multi-tasking kinase that recognises primed (i.e. phosphorylated) substrates. One of the mechanisms by which the activity of GSK3 can be regulated is through N-terminal (pSer9) phosphorylation. In this case, the phosphorylated N-terminus turns into a pseudo-substrate that occupies the substrate binding pocket and thus inhibits the activity of GSK3 towards its real substrates.

      One outstanding question is how this autoinhibitory mechanism can affect some, but not all signaling pathways that GSK3 is involved in. One example is WNT/CTNNB1 signaling. Here, GSK3 plays a central role in the turnover of CTNNB1 in the absence of WNT, but this pool of GSK3 is not affected by pSer9 phosphorylation.

      Gavagan et al. address this question using an in vitro approach with purified proteins. They identify a role for AXIN1 in protecting the "WNT signaling pool" of GSK3 from the auto- inhibition that occurs upon pSer9 phosphorylation.

      Specifically, they show that i) GSK3-pSer9 is less capable of binding and phosphorylating primed CTNNB1 - thus suggesting that GSK3-pSer9 does not contribute to WNT signaling, ii) in the presence of AXIN1, GSK3-pSer9 becomes more capable of binding and phosphorylating CTNNB1 - suggesting that Axin can promote binding of GSK3 and CTNNB1 even when the primed binding pocket on GSK3 is blocked initially, iii) AXIN1 specifically prevents the PKA mediated phosphorylation of GSK3B on pSer9 - while leaving the phosphorylation of other PKA substrates unaffected.

      Strengths:

      • The authors use an in vitro system in which they can reconstitute different interactions and reactions using purified proteins, thus allowing them to zoom in on specific biochemical events in isolation.

      • The authors measure the phosphorylation of primed substrates (pSer45-CTNNB1 or WNT- independent substrates) and quantify specific kinetic parameters (kcat, KM, and kcat/KM) - of wildtype non-phosphorylated GSK3B, pSer9GSK3B, or the non-phosphorylatable S9A-GSK3B, either in the presence or absence of AXIN1 (or an AXIN1 fragment).

      • The experiments appear to be well-controlled and the results appear to be interpreted correctly.

      Weaknesses:

      • Key experiments (e.g. Figures 2 and 3) are described as being performed as n=3 technical replicates rather than independent/biological replicates.

      We suggest that the replicates described in our work can properly be described as biological replicates, and we have updated the manuscript accordingly. We apologize for the confusion and elaborate on our reasoning below.

      Each replicate reported for our in vitro kinetic assays is an independent reaction prepared in a separate reaction vessel, and replicates were analyzed on separate gels. Thus, each reaction is a distinct biological sample and should have been described as a biological replicate. A technical replicate would have been repeat measurements of the same timepoint from a single reaction.

      Our original description as technical replicates was based on the notion that each replicate came from the same protein purification (biological sample). However, an analogy to cell culture experiments can illustrate why our initial reasoning was incorrect. In a cell culture experiment, cells from the same initial source are typically split into independent wells for biological replicates. Similarly, our proteins come from the same initial source but are split into independent reaction vessels for biological replicates.

      The critical point is that, regardless of the precise terminology, our replicates capture the variability between independent experiments.

      • The validation in a biologically relevant setting (i.e. a cellular context) is limited to Figure 4C, which shows that over-expression of AXIN1 reduces the total levels of pSer9-GSK3.

      The biochemical experiments presented in our work address a critical gap in the signaling field and, together with the in vivo validation in Figure 4C, establish a model that was previously speculative. We suggest that further in vivo experiments are beyond the scope of the current manuscript.

      The authors convincingly show that AXIN1 can play a role in shielding GSK3 from auto- inhibition. As it stands, the impact of this work on the field of WNT/CTNNB1 signaling is likely to remain limited. This is mainly due to the reason that the mechanism by which AXIN1 shields the WNT/CTNNB1 signaling pool of GSK3 from pSer9 inhibition remains unresolved. Based on the fact that a mini AXIN1 (i.e. an AXIN1 fragment) behaves the same as WT AXIN1, the authors conclude that AXIN1 likely causes allosteric changes on GSK3 but is less likely to block PKA from binding. They cannot conclusively show this, however, as they do not have evidence in favour of one or the other explanation.

      We thank the reviewer for this important comment which details the central concern raised in the review process. To address this point, we have collected additional biochemical data that conclusively shows that the Axin shielding effect is allosteric and not a steric block. We demonstrated that a minimal, 27 amino acid Axin peptide produces the same GSK3β shielding behavior as full length Axin and miniAxin. The minimal Axin peptide does not sterically occlude the GSK3β phosphorylation site. This data is included in a revised Fig 4A and described on lines 115-120 of the revised manuscript.

      However, this study does offer more insight into the compartmentalisation of GSK3 and the quantitative parameters may be used in computational models describing the different cellular activities of GSK3.

      This work also has conceptual significance: Scaffold proteins are known to promote signal transduction by bringing proteins together (often: kinases and substrates). Here, Gavagan et al. show that AXIN1 also plays a second role, namely in protecting one of its binding kinases (GSK3) from inhibitory signals. This could potentially hold for other scaffolding proteins as well.

      Reviewer #2 (Public Review):

      Gavagan et al. investigated the role of the scaffolding protein, Axin, in the cross-pathway inhibition of GSK3b. The authors utilize reconstituted Axin, b-catenin, GSK3b, and protein kinase A to test 2 models. In the first model, the formation of the complex consisting of Axin, b-catenin, and GSK3b overcomes inhibitory phosphorylation of serine 9 of GSK3b. In the second model, the binding of Axin to GSK3b inhibits serine 9 phosphorylation through allosteric effects. Previous literature has established that the phosphorylation of serine 9 of GSK3b inhibits its kinase activity. To provide a quantitative measure of inhibition, the authors determine the binding affinity and catalytic efficiency of GSK3b in comparison to GSK3b phosphoS9 towards b-catenin. Interestingly, the data demonstrate a 200-fold decrease in Kcat/Km and 7 fold increase in Km. It is unclear why serine 9 mutation to alanine increases the rate of B-catenin phosphorylation more than the GSK unphosphorylated protein in figure S10.

      We thank the reviewer for catching this inconsistency. In the Michaelis-Menten plots presented in the main text (Figure 2 & Figure 3D), rates for unphosphorylated GSK3β and GSK3β_S9A are indistinguishable. These plots were used to determine the kinetic parameters reported in Table S1 (now Supplementary file 1a). The purpose of Figure S10 (now Figure 2-figure supplement 8) was to confirm that these reactions were first order (linear) in enzyme concentration, but the reviewer is correct to flag the inconsistency in absolute rates. In Figure S10A (now Figure 2-figure supplement 8A), the rates for unphosphorylated GSK3β were ~2-3-fold lower than expected.

      We have reanalyzed the original frozen reaction timepoints on new western blots. The results were identical for unphosphorylated GSK3β and GSK3β_S9A, resolving the apparent discrepancy. Upon review of the original western blot images, we noted that they were relatively noisy, potentially indicating incomplete blot transfer or an antibody going bad. Because we were able to reanalyze the original samples and obtained internally consistent results, we suggest that the updated data should replace the original data. The updated data are included in a revised Figure S10A (now Figure 2-figure supplement 8A).

      Next, the authors tested if the addition of Axin could overcome this inhibition. Although the addition of Axin decreases the Km, thereby producing a 20-fold increase in catalytic efficiency, the addition of Axin does not rescue the catalytic turnover of the phosphorylated GSK3b. Hence, the authors propose that Axin does not rescue the kinase activity of GSK3b from the inhibitory effects of serine 9 phosphorylation.

      Next, the authors test if Axin protects GSK3b from phosphorylation by the upstream kinase PKA. Excitingly, the data show a decrease in binding affinity and catalytic efficiency of PKA with GSK3b phosphoS9 in comparison to GSK3b. The binding of Axin inhibits GSK3b serine 9 phosphorylation by PKA but does not inhibit the phosphorylation of other PKA substrates such as Creb. The authors demonstrate that a fragment of Axin, residues 384-518, behaves similarly to the full-length Axin to shield GSK3b from phosphorylation. However, it is unclear how this fragment may bind in the destruction complex and if Axin has allosteric effects on GSK3b.

    1. Author Response

      Reviewer #1 (Public Review):

      Various parts of the premotor cortex have been implicated in choices underlying decisionmaking tasks. Further, norepinephrine has been implicated in modulating behavior during various decision-making tasks. Less work has been done on how noradrenergic modulation would affect M2 activity to alter decision-making, nor is it clear whether noradrenergic modulation effects on activity would differ between the male and female sexes.

      This manuscript addresses some of these questions.

      • In particular, clear sex differences in task engagement are seen.

      • May also show some interesting differences and distributions of β2 adrenergic receptors in M2 between males and females.

      We thank the reviewer for their summary of our findings and thoughtful critique of our manuscript. In our revised manuscript we have taken measures to address the reviewer’s comments in line (blue edits in text and revised figures) with direct responses outlined below. We believe these revisions improve the scientific rigor of our findings and provide relevant context for our studies. We hope that they have sufficiently addressed the reviewer’s concerns.

      Less clear is the specificity of systemic antagonism of β adrenergic receptors on the changes in M2 activity reported. As propranolol was given systemically, changes in M2 firing rates could also be due to broader circuit (indirect) activity changes. As it was not given locally, nor were local receptor populations manipulated, one is unable to make the conclusion that changes in neural activity are due to the direct effects of adrenergic receptors within M2 populations.

      We agree that propranolol driven changes in anterior M2 activity may arise via multiple mechanisms, including direct action on the adrenoreceptors within M2, and indirect action via other regions that project to M2. Although locally activating inhibitory interneurons within M2 is sufficient to disrupt cueguided action plans and behavior in a 2AFC task (Inagaki et al., 2018), our noradrenergic manipulation was not restricted to M2. We have clarified our conclusions and provided additional discussion to highlight that propranolol actions were multifaceted and that direct actions in M2 are likely working in concert with propranolol mediated actions in other regions.

      Also not clear, is the contribution of M2 to this task, and whether the changes in M2 activity patterns observed are directly responsible for the behavioral disruptions measured.

      We have revised our introduction and discussion to more clearly outline the critical role of cue-guided action plans in M2 for successful behavior in 2AFC tasks. Suppression of cue-guided activity in M2 results in behavioral performance at near chance levels, similar to what we saw in females after propranolol (Guo et al., 2017; Inagaki et al., 2018; Li et al., 2016). Furthermore, targeted photostimulation of action plan encoding neurons in M2 is sufficient to drive behavioral responses (Daie et al., 2021). In our investigations it is plausible to expect propranolol related disruptions in other cognitive, sensory or motor regions. Based on the strong foundational evidence for M2 activity in 2AFC, the propranolol driven changes in anterior M2 in females, whether direct or indirectly mediated, are likely sufficient to drive behavioral disruptions in accuracy and/or trial completion.

      Reviewer #2 (Public Review):

      This paper by Rodbarg et al describes an interesting study on the role of beta noradrenergic receptors in action-related activity in the premotor cortex of behaving rats. This work is precious because even if the action of neuromodulatory systems in the cortex is thought to be critical for cognition, there is very little data to actually substantiate the theories. The study is well conducted and the paper is well written. I think, however, that the paper could benefit from several modifications since I can see 3 major issues:

      We thank the reviewer for their generous comments on the potential impact of our manuscript as well as their suggestions to improve this work. Below we outline responses to specific comments raised by the reviewer in addition to adresing them in the revised manuscript. We hope these responses sufficiently address the reviewer’s concerns.

      Both from a theoretical and from a practical point of view, the emphasis on 'cue-related' activity and the potential influence of NA on sensory processing is problematic. First, recent studies in rodents and primates have clearly demonstrated that LC activation is more closely related to actions than to stimulus processing (see Poe et al, 2020 for review).

      Indeed during optimal performance the peaks of LC activity are larger when PETH are aligned to action initiation rather than the cue itself (Clayton et al., 2004). This alignment resolves variability in decision processing times and omitted cues. Although LC responses align with action they are evoked by, and occur after, cue presentation with LC responses to visual cues occurring ~ 60ms after presentation (Aston-Jones & Bloom, 1981). The same behavioral action without preceding task relevant cues does not evoke an LC response (Rajkowski et al., 2004)

      In our current study cues initiate activity in anterior M2, this is our primary interest and where our electrodes are placed. The window between cue delivery and action completion hones in on our goal of investigating the role for β noradrenergic signaling in target cortical processing, rather than LC explicitly. In both NHP and rodents NE signaling (and evoked LC) promotes sustained cortical representations between cue onset and actions across cortical regions (dlPFC, S1) (Ramos & Arnsten, 2007; Vazey et al., 2018; Wang et al., 2007). In the current study we aligned neural data to either cue presentation (Figure 3) or action (lever press; Figure 4). Both presentations support a critical role for β adrenoreceptor signaling in suppressing irrelevant information, resolving and maintaining action plans. A unique feature of aligning the data to cue onset is that it allows us to see how the neural activity changes not only on completed trials (that end with a lever press) but also on omitted trials (which strongly increase after propranolol). We propose the reason we are seeing large increases in omitted trials is because β adrenoreceptor blockade either directly or indirectly prevents anterior M2 from resolving an action plan.

      Second, the analysis of neural activity around cue onset should be examined with spikes aligned on the action, since M2 is a motor region and raster plots suggest that activity is strongly related to action (I'll be more specific below).

      We agree that M2 shows important action plan activity which we highlight throughout the manuscript. In cued tasks, M2 neurons have been shown to represent action plans starting at cue onset that continues up to behavioral execution. Neural data was examined and results presented aligned to cue onset (illustrated in Figure 3) and aligned to action - lever press (illustrated in Figure 4). The impact of propranolol in diminishing action plan selection was similar in both action, and cue-aligned analyses.

      The distinction between neural activity and behavior or cognition is not always clear. I understand that spike count can be related to motor preparation or decision, but it should not be taken for granted that neuronal activity is action planning. The analysis should be clarified and the relation between neural activity, behavior, and potential hidden cognitive operations should be explicated more clearly.

      We have worked to clarify in our revised introduction, results and discussion the specifics of the known roles of neural activity in M2 in both action planning and decision making. We further expand that the neuronal activity in our study may reflect potential changes in cognitive processing and thus alter resultant behavioral outcomes.

      The sex difference is interesting, but at the moment it seems anecdotal. From a theoretical point of view, is there any ecological/ biological reason for a sex dependency of noradrenergic modulation of the cortex? Is there any background literature on sex differences in motor functions in rats, or in terms of NA action? If not, why does it matter (how does it change the way we should interpret the data?) From a practical point of view, is there a functional sex difference in absence of treatment, or is it that the drug has a distinct effect on males vs females? This has very distinct consequences, I think.

      We did not find overt differences in behavior in the absence of treatment. Only when noradrenergic function was challenged using propranolol did we identify functional sex differences. We agree that this has very distinct consequences – specifically it supports sex differences that can be revealed by perturbations of normal function. These functional sex differences may be a result of differences in the anatomy of central noradrenergic systems, a hypothesis further supported by our mRNA expression findings and existing literature on LC anatomy across species (Bangasser et al., 2011, 2016; Luque et al., 1992; Mulvey et al., 2018; Ohm et al., 1997; Pinos et al., 2001). Collectively these results have potential ramifications for understanding sex differences in disease prevalence and targeted treatments.

      Background literature supports some innate sex differences in motor function and executive function in rodents and humans. Of particular relevance to our investigation is an established difference in behavioral strategy with females being more risk averse than males (Grissom & Reyes, 2019). Ethologically risk adverse strategies may support parental care roles, and increased inhibitory mechanisms may be selected for in females. Although this strategy was not directly tested in our study, the large increase in omissions after propranolol seen in females is in line with avoiding risk (incorrect choices) during uncertainty (disrupted neural signaling). As with other executive functions, the utilization of norepinephrine within the cortex along with other neuromodulators, and local microcircuit interactions would all contribute to promoting risk averse behavior.

      These issues could be clarified both in the introduction and in the discussion, but the authors might have a different view on what is theoretically relevant here. In the result section, however, I think that both the lack of specificity in the description of behavior and cognitive operation and the confusion between 'sensory' and 'motor' functions make it very difficult to figure out what is going on in these experiments, both at a behavioral and at a neurophysiological level. First, the description of the behavior in the task is clearly not sufficient, which makes the interpretation of the measures very difficult.

      We have made an effort to better specify the task and relevant behavioral operations in both the methods and results and have included a clearer task schematic (Figure 1A). We agree that the confusion between ‘sensory’ and ‘motor’ functions may make it more difficult to understand the findings in this study. Anterior M2 plays a unique role in representing motor/action plans that can be informed by sensory information. This integrative function creates difficulty in parsing the neural activity of anterior M2 as strictly motor, sensory or cognitive. In attempts to improve clarity we have expanded and highlighted relevant information on the known roles of M2 in the introduction and discussion.

      One possible interpretation of the effects of the drug is a decrease in motivation, for instance, due to a decrease in reward sensitivity or an increase in sensitivity to effort. But there are others. More importantly, none of these measures can be used to tease apart action preparation from action execution, even though the study is supposed to be about the former.

      Neural activity during action planning, prior to action execution is known to be an essential function of M2 (Barthas & Kwan, 2017; Gremel & Costa, 2013; Guo et al., 2017; Inagaki et al., 2018, 2022; Li et al., 2016; Siniscalchi et al., 2016; Sul et al., 2011; Wei et al., 2019) for optimal performance in 2AFC tasks. In all, we found that the representation/separation of opposing action plans (a well validated function of M2) prior to responses (lever press) is degraded after propranolol, especially in females. We have provided additional emphasis on these foundational studies throughout our revised manuscript.

      To minimize impact of motivational factors, effort and reward size remain consistent within our task, and all trials require a random initiation hold prior to cue delivery. As described in our general response to the editor above (Figure 1, above), we investigated whether motivational changes may be reflected in our M2 recordings. PETHs from the first and last 10 trials within saline sessions did not identify potential motivation related differences in anterior M2 activity. Similarly, across propranolol sessions the neural activity was consistent between early and late trials. We used early and late trials as there was a mild decrease in trial rate during saline sessions in both males and females, potentially indicative of motivation/reward sensitivity changes during these sessions. M2 neural responses consistently separate action plans (after saline) or failed to separate action plans (propranolol sessions).

      Also, but this is less critical: In Figures 2C and D, it looks like there is a bimodal distribution for the effect of propranolol in females. Is there something similar in the neuronal effects of the drug? And in the distribution of receptors? Can it be accounted for by hormonal cycles/ anything else?

      Although there is some clustering in behavioral outcomes all data passed normality assumption as appropriate. Propranolol treatments were not synchronized to hormonal cycles, and the data likely include animals at various hormonal stages. Similar clustering was not apparent in neuronal effects of propranolol, although propranolol increased variability in many measures.

      In a pilot experiment we did not see any difference in baseline performance on our 2AFC task across the hormonal cycle (diestrous, proestrous, estrous or metestrous) of females in any measure including accuracy (F(3,33)=0.59, p=0.63, one-way ANOVA) and omissions (F(3,33)=0.51, p=0.68).

      The description of neural activity is also very superficial. In general, it is not clear how spike count measures have been extracted. For example, legend and figure C are not clear, is the (long) period of cue presentation included in the 'decision time'?? "Cues were presented at a variable interval 200-700ms after initiation and until animals left the well, 'Well Exit'. The time from cue onset to well exit was identified as the decision time (yellow)." Yet on the figure only the period after cue presentation is in yellow. This is critical because, given the duration of the cue, the animals are probably capable of deciding (to exit the well) before the cue turns off. Indeed, as shown in fig 2D, the animals can decide within about 500 ms. So to what extent is the 'cue response' actually a 'decision response'?

      We have clarified the task and spike count measurements in methods and added a revised task schematic. It is correct that the cues are available throughout the decision time (for up to 5 seconds or until well exit), and an action plan is generated before well exit/cues turn off as reflected by the separation of neural action plans (Fig 3, saline). Anterior M2 neurons maintain action plan representation from cue onset until the lever press under normal conditions (Fig 4, saline). These action plans encapsulate “cue responses” and “decision responses”. We have aligned neural data to discrete timestamps at either end of the window in which M2 processing is known to be critical, specifically between cues and actions (lever press) and focus on neural activity relative to those points. We refer to this activity throughout the manuscript as an ‘action plan’ as action planning functions of M2 activity have been well established in prior studies.

      When looking at figure 3A, there is clearly a pattern on the raster, a line going from top left to bottom right. If the trials are sorted chronologically, something is happening over time. If, as I suspect, trials are sorted by ascending response time, this raster is showing that what authors are calling a 'response to cues' is actually a response around action. Basically, if propranolol slows down reaction time, the spikes will be delayed from cue onset only because they remain locked to the action. Then the whole analysis and interpretation need to be reconsidered. But it might be for the best: as I mentioned earlier, recent work on LC activity has clearly emphasized its influence on motor rather than sensory processing (Poe et al, 2020).

      Figure 3A is a single neuron example, and data analyses focus on population-wide activity. Neural data is presented both aligned to cues, for all trials in which a cue was received, and aligned to lever press (action), for all trials on which a lever press occurred. In both cases, aligned to cue or aligned to action, the impact of propranolol is the same. β adrenoreceptor blockade reduces the separation of action plans in M2, severely so in females. However, a major finding is that females receive a cue but omit a large number of trials after propranolol, for this outcome the action does not occur. We propose this is due to the lack of action plan separation in anterior M2 (either directly or indirectly). When no behavioral response occurs, these trials cannot be aligned to action, yet we are still interested in the neural activity during the critical window between cue delivery and actions. We are not assigning this neural activity to sensory processing but using this discrete sensory event within our trials (cue) to align the data as there is substantial evidence that action plans in M2 arise after cue presentation in tasks such as ours where performance is guided by external cues.

      Fig 2D-F: it is hard to believe that the increase in firing rate induced by propranolol in females is not significant. Presumably, because the range of the median firing rate is so high in the first place, distribution (2E) really indicates an increase in firing. Maybe some other test? e.g paired t.test, or standardized values (z.score) to get rid of variability in firing across neurons?

      We agree that the session wide firing rate appears rightward shifted in females after propranolol. As our recordings were taken on different days, several days apart we cannot assume they are the same neurons for paired analyses. In our revised manuscript we evaluated these distributions using a MannWhitney test to increase power and decrease the impact of variability within the population. Previously we had used a Kolmogorov-Smirnov test. Using our new analysis, we can confirm that the propranolol significantly increases session wide firing rates in anterior M2 of females (p=0.027) but not males. This finding increases evidence for direct actions of propranolol within M2 and supports our hypothesis that propranolol leads to local disinhibition by reducing β noradrenergic signaling in interneurons and that without this noradrenergic tone anterior M2 is less efficient at suppressing irrelevant action plans.

      Along those lines, would it be worth looking for effects on specific populations (interneurons) which are sometimes characterized by thinner spikes and higher mean firing rates? Given the distribution of beta receptors RNA on interneurons, one would actually expect an effect of propranolol on the firing rate irrespective of task events. Or what is it that prevents the influence of propranolol on interneurons from changing the firing rate? In any case, one of the strengths of this study is the localization of beta receptors on specific neuronal populations in the cortex, so I think that the authors should really try to build on it and find something related to the neurophysiological effects. Otherwise, one cannot exclude the possibility that the behavioral effects are not related to the influence of the drug on these receptors in that region.

      Data were collected using stainless steel electrode arrays and our sample population of task related neurons is likely biased to pyramidal neurons, with a small number of fast spiking interneurons. We used validated spike waveform parameters of interneurons in premotor cortex (peak-to-trough ratio and duration; Giordano et al., 2023) in an attempt to isolate putative interneurons and found only a very small number of these cells in our recordings (n=5-7 per group). This population is too small to make any inferences about specific impacts. We have focused on the collective population activity of M2 as this is most strongly related to optimal action planning.

      You are correct that from the given findings we cannot conclusively show that the results found here are a result of propranolol acting solely within anterior M2. We have made sure to clarify throughout our revised manuscript that the behavioral and physiological changes we identified are a result of collective direct and indirect actions of propranolol.

      The conclusion that neuronal discrimination decreases because the proportion of neurons showing no effect increases is confusing (negative results, basically). It would be clearer if they were reporting the number of neurons that do show an effect, and presumably that this number shows a significant decrease.

      The reviewer is correct that the number of neurons that do show an effect (task related activity) does significantly decrease with propranolol (from n=70 to 27 in females and n=71 to 48 in males). These n are now given adjacent to the proportions rather than at the end of the paragraph. Proportions were used for statistical analysis due to an overall decrease in the total number of units after propranolol. All PETH presented are from neurons that show some task related activity, these PETH confirm that neural activity no longer effectively discriminates/separates action plans in M2.

      Figs 3F-I: a good proportion of neurons (at least 20%) show a significant encoding before cue onset. How is it possible? This raises the issue of noise level/ null hypothesis for this kind of repeated analysis. How did the author correct for multiple comparison issues?

      In response to reviews, we have altered the manner in which we identify the significantly modulated neurons to increase rigor and no longer include these figures or analyses. The proportion of neurons showing action plan encoding prior to cue onset was likely an artifact of how the data was analyzed and an insufficient correction for multiple comparisons, allowing inclusion of internally generated action plans in some neurons.

      The description of the action-related activity is globally confusing. Again, how can the authors discriminate between activity related to planning vs action itself? What is significant and what is not, in males vs females? What is being measured here? For example, a very unclear statement on line 238: "Propranolol primarily disrupted active inhibition of irrelevant action selection in M2 activity, reducing the ability to maintain action plan representation in M2, delaying lever press responses (Figure 4L, 4M)." What is 'active inhibition? What is an irrelevant action plan? What is selection? All of that should be defined using objective behavioral criteria and tested formally.

      We have changed our wording to clarify what we are describing and why we have chosen the words we have, and to ensure consistency and objectivity throughout the manuscript. Much of the wording we have used – for example action planning or action plan selection, are the words used in the literature to describe M2 neural activity. We call the activity in M2 action planning (either externally/cue guided or internally guided) because that is what has been previously demonstrated. In our task design and analysis we are tracking cue guided actions, as opposed to internally guided.

      We also separate the electrophysiology data as preferred and nonpreferred because the literature has shown individual M2 neurons show specific directional tuning as noted in our results, using the term ‘preferred’ encapsulates that tuning regardless of left/right direction. An example M2 neuron that increases activity for left cues and responses (preferred direction), will show active inhibition (low/negative z scores) on trials with right cues and responses (nonpreferred), other neurons would show the inverse relationship with direction.

      A primary impact of propranolol was the loss of negative z-scores for nonpreferred trials ie neurons with a left preference that are usually inhibited on right trials were still firing and vice-versa. After propranolol neurons continue to fire for an irrelevant action plan (for the opposite direction), and the resulting population activity is not significantly different for opposing cues/responses. Behavioral responses normally occur after opposing action plans have significantly separated in M2, collapsing action plans by preventing relevant signaling (Guo et al., 2017; Inagaki et al., 2018; Li et al., 2016) or facilitating irrelevant signaling as we see here with propranolol leads impairments in 2AFC performance.

      Also, the description of the classifier analysis should be more thorough. Referencing the toolbox is not sufficient to understand what has been done.

      We have added additional explanation in both the methods and description of the results to clarify the functions of the neural decoding box and how we are using it to evaluate information encoding within M2. We have provided detail on how the algorithm was trained, how shuffled data was generated and how we determined significance of decoding accuracy.

      Measuring Beta adrenoceptors is a great idea, and the results are interesting, especially the difference between neuron types. But again, how does that fit with neurophysiological results? Note, that since this is RNA measures, it should not be phrased as 'receptors' but 'receptors RNA' throughout. One possible interpretation of these anatomical results that cannot be reconciled with physiology is that protein expression at the membrane shows a distinct pattern.

      We have changed the references to β receptor expression to β receptor mRNA expression throughout the manuscript. Although mRNA provides a valuable proxy for adrenoreceptor production, as noted by the reviewer protein expression at the membrane may differ. Reliable antibodies that allow quantitative analysis of membrane bound adrenoreceoptors in situ with co-labeling of specific cell types are limited. The goal of assessing mRNA expression within M2 was to determine if the functional sex differences we identified in M2 neurophysiology when manipulating β adrenoreceptor function could be mediated by basal differences in adrenoreceptors. The causal impact of differential mRNA expression in anterior M2 was not directly tested but our findings provide preliminary evidence that adrenoreceptor regulation may differ across sexes. Our results provide a plausible avenue for differential sensitivity to β adrenoreceptor manipulation across sexes, that may also be found in other brain regions.

      In conclusion, I think that this is a very interesting study and that the results are potentially relevant for a wide audience. But the paper would clearly benefit from revisions. If the authors could clearly identify a significant relationship between the action of NA on beta receptors on specific cortical neurons, at a physiological and behavioral level, that would be a seminal study. At the moment, the evidence is not convincing enough but the data suggest that it is the case.

      We thank the reviewer for the kind remarks. We have undertaken a number of new analyses, refined existing analysis and clarified our claims in the manuscript to improve rigor. Collectively our data reflect that the behavioral and neural deficits after systemic propranolol are likely due to both direct and indirect actions on M2. We believe this work is compelling and that it will inform future work investigating potential sex differences in central noradrenergic anatomy and functional sex differences after perturbations of noradrenergic signaling.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      (1) What's the rationale of trypsinizing the tissue prior to mitochondrial isolation? This is not standard for subsequent proteomics analysis. This step will inevitably cause protein loss, especially for the post mitochondrial fractions (PMF). Treating samples with 0.01ug/uL trypsin for 37oC 30 min is sufficient to partially digest a substantial portion of the proteome. If samples from different subjects were not of the same weight, then this partial digestion step may introduce artificial variability as variable proportions of proteins from different subjects would be lost during this step. In addition, the mitochondrial protein enrichment in the mito fraction, despite statistically significant, does not look striking (Figure 1E, ~30% mitochondrial proteins in the mito fraction). As a comparison, Williams et al., MCP 2018 seem to have obtained high mitochondrial protein content in the mito fraction without trpsinizing the frozen quadriceps using a similar SWATH-MS-based approach.

      Trypsinisation of the tissue prior to mitochondrial isolation is based on previous work and a Nature Protocol (1, 2) which isolated mitochondria for skeletal muscle. The rationale is that it aids in mechanical homogenisation from highly fibrous tissues such as quadriceps muscle by digesting extracellular matrix proteins. The trypsin/protein ratio used to aid in this process is at least 400 times lower than the amount of trypsin used for formal proteomic tryptic digestion. Three pieces of evidence suggest this step has negligible effect on downstream proteomic analysis. First, because the trypsinisation buffer is detergent free, trypsin will only affect extracellular or exposed membrane proteins. Filtering our PMF dataset for proteins with ‘extracellular matrix’ gene ontology identifies at least 90 unique extracellular matrix proteins indicating good retention of proteins susceptible to partial digestion. Second, the trypsin dose used is 50 times lower than the concentration used for passaging cultured cells, which retain viability after trypsinisation. Third, and contrary to the point raised by the reviewer, we observe less missingness in PMF samples compared to mitochondrial samples. We thank the reviewer for bringing the Williams et al. 2018 MCP paper to our attention. We note that mitochondrial enrichment between the two papers is comparable (~2- fold). To improve clarity line 408 now reads: “Whole quadriceps muscle samples were prepared as previously described with modification (99, 100). First, tissue was snap frozen with liquid nitrogen…” and line 95 reads: “Mitochondrial proteins were defined based on their presence in MitoCarta 3.0 (24) and consistent with previous work (25) were approximately two-fold enriched in the mitochondrial fraction relative to the PMF (Fig 1E).”

      (2) The authors mentioned that the proteomics data were Log2 transformed and median- normalized. Would it be possible to provide a bit more details on this? Were the subjects randomized?

      Samples were randomised prior to sample processing and mass spectrometry analysis. Because of possible variation in total protein content, it is critical to normalise protein intensities between samples. Median normalisation adjusts the samples so that they have the same median, thereby accounting for technical variation. Log2 normalisation helps to achieve normal distributions, critical for many downstream statistical tests. Line 471 now reads: “…to achieve normal distributions and account for technical variation in total protein.”

      (3) In Figure 1D, what were the numbers of mice the authors used for the CV comparisons in each group? Were they of similar age and sex? Were the differences in CV values statistically significant?

      The mitochondrial and PMF proteomes originated from the same quadriceps sample from the same mouse, and thus the age and sex are the same across both proteomes. After quality control, we had mitochondrial proteomes for 194 mice and PMF proteomes for 215 mice. The overall CV in the mitochondrial fraction was significantly greater than in the PMF, however whether the source of this variation is biological, or the result of mitochondrial isolation is unclear and as such we have avoided making a statement within the body of the manuscript. We have now more clearly described the nature of the samples in the revised manuscript and added sample sizes to figure 1F.

      (4) The authors stated in lines 155-157 that proteins negatively associated with the Matsuda index were further filtered by presence of their cis-pQTLs. Perhaps more explanations would be needed to justify this filtering criterion? Having a cis-pQTL would mean the protein abundance variation is explained by the variation in its coding gene, this however conceptually would not be relevant to its association with the Matsuda index. With the data that the authors have in hand, would it not be natural to align the Matsuda index QTL with the pQTLs (cis and trans if available), and/or to perform mediation analysis to examine causal relationships with statistical significance?

      The rationale for filtering by cis-pQTL was not to study the genetics of either Matsuda or associated proteins but rather to identify proteins that were more likely to be causally associated with Matsuda Index as opposed to adaptively associated. To clarify this line 165 now reads: “Filtering based on cis-pQTL presence was based on the rationale that if genetic variation can explain protein abundance differences between mice, then we can be confident that phenotype (Matsuda Index) is not driving the observed differences and therefore the protein-phenotype associations are likely causal. Importantly, this assumption can only be made for cis-acting pQTLs.” Previous work by Matthew et al. (see https://qtlviewer.jax.org/) has demonstrated that cis-pQTL have markedly higher LOD scores than trans-pQTLs, and our own unpublished work suggests that trans-pQTLs do not reproduce well between datasets. The reviewer rightfully suggests aligning protein QTL with those for Matsuda. This is our long-term goal but to identify genome wide significant peaks associated with altered Matsuda will require many more mice than studied here.

      (5) It seems a bit odd that the first half of the paper focused extensively on the authors' discoveries in the mitochondrial proteome, and how proteins involved in mitochondrial processes (such as complex I) were associated with Matsuda Index, but the final fingerprint list of insulin resistance, which contained 76 proteins, only had 7 mitochondrial proteins. Was this because many mitochondrial proteins were filtered out due to no cis-pQTL presenting?

      There are three reasons our fingerprint is lacking mitochondrial proteins: 1) there are more non-mitochondrial than mitochondrial proteins in the muscle proteome; 2) we focussed on negatively associated proteins, and as demonstrated in figure 2c, the mitochondrial proteome is enriched for positively associated proteins; 3) as implied by the reviewer, we filtered for pQTL presence, further reducing the number of mitochondrial proteins in our fingerprint. To improve clarity, line 170 now reads: “Low mitochondrial representation in the fingerprint is the result of selecting negatively associating proteins, and as seen (Figure 2C) previously, the mitochondrial proteome is enriched for positive contributors to insulin resistance.”

      (6) The authors found that thiostrepton-induced insulin resistance reversal effects were not through insulin signalling. It activated glycolysis but the mechanism of action was not clear. What are the proteins in the fingerprint list that led to identification of thiostrepton on CMAP?

      Is thiostrepton able to bind or change the expression of these proteins? Since thiostrepton was identified by searching the insulin resistance fingerprint protein list against CMAP, it would be rational to think that it exerts the biological effects by directly or indirectly acting on these protein targets.

      This is indeed the implication of our data. Because of the timescales involved it is unlikely that thiostrepton is changing fingerprint protein levels but could be binding to and inhibiting them. Searching the CMAP thiostrepton signature reveals ARHGDIB and NAGK as the fingerprint proteins with the most positive and negative fold-changes respectively perhaps suggesting they play a role in thiostrepton’s mechanism of action. Experiments are underway to test this hypothesis however these are beyond the scope of the current paper.

      Reviewer #2 (Public Review):

      Line 105: The observation that variance in respiratory proteins is stable while lipid pathways is variable is quite interesting. Is this due to lower overall levels of lipid metabolism enzymes (ex. do these differ substantially from similar pathways ranked from high-low abundance?).

      The relationship between coefficient of variation (CV) and relative abundance of proteins is important to consider. To address this, we have now also performed GSEA on proteins ranked from high to low relative abundance. These comparisons have been added to supplementary figure 1 and line 110 now reads: “As a control experiment, we also performed enrichment analysis on proteins ranked by LFQ relative abundance. High CV pathways (enriched for high CV proteins) tended to be lower in relative abundance (enriched for low relative abundance proteins) (Supplementary Fig 1a, b). However, many high variability pathways, lipid metabolism for example, were not enriched in either direction based on relative abundance suggesting differences in relative abundance do not fully explain pathway variability differences.”

      Line 154: the 664 associations are impressive and potentially informative. It would be valuable to know which of these co-map to the same locus - either to distinguish linkage in a 2mb window or identify any cis-proteins which directly exert effects in trans-

      To assess this, we have analysed pQTL position relative to gene position to generate a ‘hotspot’ plot. We have also generated a histogram of this pQTL density (in a 2 Mbp window) and added these figures to figure 3. We did not detect any obvious pQTL hotspots, and the distribution of pQTLs across the genome appears fairly uniform. Line 159 now reads: “These were distributed across the genome and were predominately cis acting (Figure 3A)...”

      Line 194: Cross-platform validation of the CMAP fingerprint results is an admirable set of validations. It might be good to know general parameters like how many compounds were shared/unique for each platform. Also the concordance between ranking scores for significant and shared compounds.

      The Connectivity Map (CMap) query included 5163 compounds, the Prestwick library included 1120, and the overlap was 420. We have added these comparisons to supplementary figure 2. Supplementary figure 2 now also contains a comparison of CMap scores between overlapping compounds (found in CMap and the Prestwick library) against all significant compounds identified by CMap (supplementary figure 2b). Interestingly, compounds present in both platforms scored higher on average, suggesting the Prestwick library captures a significant proportion of highly scoring CMap candidates. Line 206 now reads: “In total, 420 compounds were found across both platforms, and these consensus compounds captured a significant proportion of highly scoring CMap compounds (Supplementary Figure 2A, B).”

      Line 319: Another consideration in the molecular fingerprint is how unique these are for muscle. While studies evaluating gene expression have shown that many cis-eQTLs are shared across tissues, to my knowledge, this hasn't been performed systematically for pQTLs. Therefore, consider adding a point to the discussion pointing out that some of the proteins might be conserved pQTLs whereas others which would be more relevant here present unique druggable targets in muscle.

      To examine tissue specificity, we determined whether our skeletal muscle fingerprint proteins were detected and contained a pQTL in two metabolically important tissues, liver and adipose. Despite detecting almost all the fingerprint proteins in both adipose and liver tissue, they were depleted for pQTL compared to skeletal muscle. These data have now been added to figure 3c. Line 172 now reads: “To assess the tissue specificity of our fingerprint we searched for the same proteins in metabolically important adipose and liver tissues. Despite detecting 94% and 82% of muscle fingerprint proteins across each tissue respectively, both adipose and liver were depleted for pQTL presence (Figure 3C) suggesting that regulation of our fingerprint protein abundance is specific to skeletal muscle.”

      Line 332: These are fascinating observations. 1, that in general insulin signaling and ampk were not themselves shown as top-ranked enrichments with matsuda and that this was sufficient to alter glucose metabolism without changes in these pathways. While further characterization of this signaling mechanism is beyond the scope of this study, it would be good to speculate as to additional signaling pathways that are relevant beyond ROS (ex. CNYP2 and others)

      We have now added further discussion to the manuscript to address this point., Line 347 now reads: “Aside from glycolysis, other pathways may be involved in enhancing insulin sensitivity. For example, the negatively associated protein ARHGDIA (Figure 2F) is a potent negative regulator of insulin sensitivity, and our fingerprint of insulin resistance contained its homologue ARHGDIB. Both ARHGDIA and ARHGDIB have been reported to inhibit the insulin action regulator RAC1 thus lowering GLUT4 translocation and glucose uptake. Further investigations may uncover a role for thiostrepton in modulating the RAC1 signalling pathway via ARHGDIB.”

      Line: 314: Remove the statement: "While this approach is less powerful than QTL co- localisation for identifying causal drivers,", as I don't believe that this has been demonstrated. Clearly, the authors provide a sufficient framework to pinpoint causality and produce an actionable set of proteins.

      We have edited line 314, which now reads: “Moreover, our approach has the major advantage that it requires far fewer mice to obtain meaningful outcomes (222 mice in this study) compared to that required for genetic mapping of complex traits like Matsuda Index.”

      Line 346: I would highlight one more appeal of the approach adopted by the authors. Given that these compound libraries were prioritized from patterns of diverse genetics, these observations are inherently more-likely to operate robustly across target backgrounds.

      This point is further supported by our thiostrepton results in both C57BL6/j and BXH9 mice. Line 317 now reads: “Furthermore, because we have used genetically diverse datasets (DOz mice and multiple cell lines in Connectivity Map) our findings are likely robust across diverse target backgrounds.”

      Line 434: I might have missed but can't seem to find where the muscle data are available to researchers. Given the importance and novelty of these studies, it will be important to provide some way to access the proteomic data.

      These data are now available via the ProteomeXchange Consortium. Line 465 now reads: “The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE (104) partner repository with the dataset identifier PXD042277.”

      1. Frezza C, Cipolat S, Scorrano L. Organelle isolation: functional mitochondria from mouse liver, muscle and cultured filroblasts. Nat Protoc. 2007;2(2):287-95.

      2. Acin-Perez R, Benador IY, Petcherski A, Veliova M, Benavides GA, Lagarrigue S, et al. A novel approach to measure mitochondrial respiration in frozen biological samples. The EMBO Journal. 2020;39(13):e104073.

      3. Chick JM, Munger SC, Simecek P, Huttlin EL, Choi K, Gatti DM, et al. Defining the consequences of genetic variation on a proteome-wide scale. Nature. 2016;534(7608):500- 5.

      4. Gatti DM, Svenson KL, Shabalin A, Wu L-Y, Valdar W, Simecek P, et al. Quantitative Trait Locus Mapping Methods for Diversity Outbred Mice. G3 Genes|Genomes|Genetics. 2014;4(9):1623-33.

    1. Author Response

      Reviewer #1 (Public Review):

      In this study, the authors set out to investigate spatial RNA processing events, specifically alternative splicing and 3' UTR usage, in mouse brain and kidney tissues using ReadZS and SpliZ methodologies on spatial transcriptomics data. The research contributes to understanding tissue-specific gene expression regulation from a spatial perspective. The study introduces a novel approach for analyzing spatial transcriptomics data, allowing for the identification of RNA processing and regulation patterns directly from 10X Visium data. The authors present convincing evidence supporting the identification of novel RNA processing patterns using their methodology, which holds significant implications for researchers in the field of spatial transcriptomics and the study of alternative splicing and 3' UTR usage.

      Thank you for this thorough overview of our work.

      The conclusions of the study are mostly well-supported by the data; however, certain aspects could be improved to strengthen the findings.

      1) The conclusions of this study would be strengthened by conducting a more extensive tissue sample analysis and including biological replicates. Additionally, appropriate batch effect corrections should be applied when dealing with biological replicates.

      We agree that including biological replicates would strengthen our findings. We will include biological replicates of the mouse brain tissues in the revision.

      2) The 3' UTR usage and alternative splicing should be compared among clearly labeled clusters for a more comprehensive analysis.

      We understand that it can be difficult to see how the SpliZ quantiles map spatially onto the tissue images. For the splicing of Gng13, Myl6, and Rps24, we will include box plots broken down by spatial quadrant in the revision. However, this does result in an oversimplification of the spatial patterns found in the tissue slices, which make the plots less informative than the quantile plots to our view.

      3) The authors should clarify their rationale for choosing ReadZS and SpliZ approaches and provide comparisons with other methods to demonstrate the advantages and potential limitations of their chosen methodologies.

      Thank you for pointing out the lack of sufficient discussion of ReadZS and SpliZ in the manuscript. The ReadZS and SpliZ were chosen for this analysis because both of these methods provide an individual score for each cell-gene pair, which is easily adapted to providing a score for each spot-gene pair. Due to the sparsity and 3’ bias of Visium data, approaches designed to analyze RNA processing in full-length sequencing analysis are not applicable. The SpliZ and ReadZS are two of the limited number of tools available that are designed for the analysis of RNA processing in droplet-based data. Other available tools tend to rely on aggregating data across multiple cells using a method called pseudo-bulking (Li et al., 2021; Patrick et al., 2020). It is not clear how this could be used for spatial transcriptomics data without potentially obscuring subtle spatial patterns in the data. Others are based on PSI measurements, which are vulnerable to artifacts due to sparsity (Buen Abad Najar et al., 2020; Olivieri et al., 2022; Wen et al., 2022). The tradeoff between pseudo-bulking and a single score per spot-gene pair means that the ReadZS and SpliZ do not have the power to detect changes for genes with very low read counts. We will add text in the revision to clarify this point.

      Reviewer #2 (Public Review):

      The authors applied existing ReadZS and the SpliZ methods, previously developed to analyze RNA process in scRNA-seq data, to Visium data to study spatial splicing and RNA processing events in tissues by Moran's I. The authors showed several example genes in mouse brain and kidney, whose processing are spatially regulated, such as Rps24, Myl6, Gng13.

      Thank you for this thorough overview of our work.

      The paper touches on an important question in RNA biology about how RNA processing is regulated spatially. Both experimental and computational challenges remain to address it. Despite some potentially interesting findings, most claims remain to be validated by orthogonal methods such as RNA FISH and simulations.

      We appreciate that the reviewer finds the question important, and that the findings are potentially interesting. In the revision we will include biological replicates for our findings in the mouse brain. Unfortunately, experimental validation is outside of our budget for this project. It is unclear what further simulations could validate the biological discoveries in this manuscript: permutations were used to calculate the p value of each discovery, and the false positive and negative rates of the SpliZ have been assessed through simulation (Olivieri et al., 2022).

      In addition, the percentage of spatial processing events (splicing in 0.8-2.2% of detected genes, i.e. 8-17 genes and RNA processing in 1.1-5.5% of detected genomic windows, i.e. 57-161 windows) discovered is low. Does it suggest that most of RNA processing events were not spatially regulated across the tissue? Or does it question the assumption of treating spatial transcriptomics data similar to scRNA-seq data?

      We agree that the question of the prevalence of spatial RNA processing regulation is critical. Rather than the two options proposed here, we believe that the sparsity of the data limits our ability to call more of these events. In the revision, we will provide a supplemental figure showing the relationship between read depth and p value for each gene to quantify how the fraction of observed regulation changes with sequencing depth. It is worth noting that as these technologies improve, we expect the sequencing depth of spatial technologies to increase which would likely result in more discoveries.

      The unique features for ST data, such as mixture of neighboring cells, different capture biases and much smaller number of spots (pseudo cells here), may have significant effects on the power of scRNA-seq based methods, but it is not discussed in the manuscript. The lack of careful evaluation and low discovery rates could limit application of the approach to other tissues and subcellular data.

      We appreciate the concern that technical differences between scRNA-seq data and spatial transcriptomics data could affect our results. We agree that this point could be addressed more thoroughly in the text. None of the specificities of spatial transcriptomics data invalidate the assumptions of the SpliZ or ReadZS. The method we use to identify genes with significant spatial regulation of RNA processing was specifically created to be used for Visium data. It takes into account mixture of RNAs in neighboring cells by randomly sampling scores of neighboring cells, rather than randomization of the location of the spots themselves, which does indeed result in a high false positive rate (see “Permutations for Moran’s I” in the Methods). We do note that there is a limit to the power of this kind of analysis based on the number of spots and the read depth, which we will quantify in a plot in the revision.

    1. Author Response:

      We thank Reviewer #1 for their positive assessment of our work.

      Reviewer #2 (Public Review):

      […] Although these results confirm what we already know about processes involved in the meninges in MS and its models and gradients of pathology in sub-pial regions, this is the first to use spatial transcriptomics to demonstrate such gradients at a molecular level in an animal model that demonstrates lymphoid like tissue development in the meninges and associated grey matter pathology. The mouse EAE model being used here does reproduce many, although not all, of the pathological features of MS and the ability to look at longer time points has been exploited well. However, this particular spatial transcriptomics technique cannot resolve at a cellular level and therefore there is a lot of overlap between gene expression signatures in the meninges and the underlying grey matter parenchyma.

      We appreciate the reviewer’s concise summary and comments on our manuscript. We agree that the Visium spatial sequencing technology we applied is limited in its resolution and cannot precisely distinguish individual cells or anatomic regions. For that reason, there is undoubtedly some overlap between gene expression signatures in the meninges and underlying parenchyma, particularly in spots on the borders of the meningeal inflammation clusters. However, we believe that the majority of meningeal inflammation (“cluster 11”) spots are indeed in the meninges and represent the spatial transcriptome of that niche. To support this, in the revised manuscript we will provide H&E images with the UMAP clusters overlayed to demonstrate the anatomic borders that correlate with the clusters.

      The short nature of this report means that the results are presented and discussed in a vague way, without enough molecular detail to reveal much information about molecular pathogenetic mechanisms.

      We thank the reviewer for this comment. The goal of this work is to transcriptomically characterize the spatial relationship between areas of meningeal inflammation and the underlying parenchyma. While we agree that mechanistic studies are needed to further evaluate the role of presented signaling pathways, those experiments are beyond the scope of this brief report.

      The trajectory analysis is a good way to explore gradients within the tissues and the authors are to be applauded for using this approach. However, the trajectory analysis does not tell us much if you only choose 2 genes that you think might be involved in the pathogenetic processes going on in the grey matter. It might be more useful to choose some genes involved in pathogenetic processes that we already know are involved in the tissue damage in the underlying grey matter in MS, for which there is already a lot of literature, or genes that respond to molecules we know are increased in MS CSF, although the animal models may be very different. Why were C3 and B2m chosen here?

      We appreciate the reviewer’s points here. C3 and B2m were chosen as examples of genes that have differential fit to the gradient descending pattern to assist the reader in interpreting subsequent gene set trajectory analysis. However, we agree that there are many other genes of interest and will expand the number of genes displayed in our revised manuscript. 

      Strengths: <br /> - The mouse model does exhibit many of the features of the compartmentalized immune response seen in MS, including the presence of meningeal immune cell infiltrates in the central sulcus and over the surface of the cortex, with the presence of FDC's HEVs PNAd+ vessels and CXCL13 expression, indicating the formation of lymphoid like cell aggregates. In addition, disruption of the glia limitans is seen, as in MS. Increased microglial reactivity is also present at the pial surface. <br /> - Spatial transcriptomics is the best approach to studying gradients in gene expression in both white matter and grey matter and their relationship between compartments. <br /> - It would be useful to have more discussion of how the upregulated pathways in the two .compartments fit with what we know about the cellular changes occurring in both, for which presumably there is prior information from the group's previous publications.

      Limitations: <br /> - EAE in the mouse is not MS and may be far removed when one considers molecular mechanisms, especially as MS is not a simple anti-myelin protein autoimmune condition. Therefore, this study could be following gene trajectories that do not exist in MS. This needs a significant amount of discussion in the manuscript if the authors suggest that it is mimicking MS. <br /> - The model does not have the cortical subpial demyelination typical of MS and it is unknown whether neuronal loss occurs in this model, which is the main feature of cytokine-mediated neurodegeneration in MS. If it does not then a whole set of genes will be missing that are involved in the neuronal response to inflammatory stimuli that may be cytotoxic. <br /> - Visium technology does not get down to single cell level and does not appear to allow resolution of the border between the meninges and the underlying grey matter. <br /> - Neuronal loss in the MS cortex is independent of demyelination and therefore not related to remyelination failure. There does not appear to be any cortical grey matter demyelination in these animals, so it is difficult to relate any of the gene changes seen here to demyelination. <br /> - No mention of how the ascending and descending patterns of gene expression may be due to the gradient of microglial activation that underlies meningeal inflammation, which is a big omission.

      We thank the reviewer for their insightful comments on the strengths and limitations of our study. Regarding the SJL EAE model we use in this paper, it certainly is not a perfect model of meningeal inflammation in MS, indeed we believe that no such animal model exists, but it does recapitulate several key features of human disease as described by the reviewer. Spatial transcriptomics of cortical grey matter lesions and overlying meninges of samples derived from patients with MS would be ideal, though access to this tissue is highly limited. In the revised manuscript we will include more detailed discussion of the limitations in applying these findings to MS. However, in addition to potential implications for MS research, our data contribute more generally to understanding of meningeal inflammation and penetrance of inflammation into brain tissue.

      We acknowledge that sub-pial neuronal loss has not been assessed in SJL EAE, and if present it would increase the relevance of this model to neurodegeneration. We are currently working to assess this.

      We agree with the reviewer that Visium technology is limited in its ability to discriminate individual cells, as discussed above (2.2).

      We agree that gene expression by activated microglia is likely a major driver of the transcriptomic changes observed in the parenchyma, and thank the reviewer for highlighting this. We will add discussion of this to our revised manuscript, and intend to generate additional data regarding the contribution of subpial microglial activation to the measured transcriptomic changes.

      Finally, we thank Reviewer #3 for their assessment of our work.

    1. Author Response

      eLife assessment:

      Trypanosoma brucei evades mammalian humoral immunity through the expression of different variant surface glycoprotein genes. In this fundamental paper, the authors extend previous observations that TbRAP1 both interacts with PIP5pase and binds PI(3,4,5)P3, indicating a role for PI(3,4,5)P3 binding and suggesting that antigen switching is signal dependent. While much of the evidence is compelling, one reviewer suggested that the work would benefit from further controls.

      We appreciate the evaluation of the work and agree that the findings substantially advance our understanding of antigenic variation. A detailed response to the public review is included below, which addresses and clarifies the issues raised by the reviewers, including those concerning controls. We also want to highlight the comment by Reviewer #3 “The methods used in the study are rigorous and well-controlled…. their results support the conclusions made in the manuscript.”. We hope this and our comments will help address the issue of controls in this eLife statement.

      Reviewer #1 (Public Review):

      Trypanosoma brucei undergoes antigenic variation to evade the mammalian host’s immune response. To achieve this, T. brucei regularly expresses different VSGs as its major surface antigen. VSG expression sites are exclusively subtelomeric, and VSG transcription by RNA polymerase I is strictly monoallelic. It has been shown that T. brucei RAP1, a telomeric protein, and the phosphoinositol pathway are essential for VSG monoallelic expression. In previous studies, Cestari et al. (ref. 24) have shown that PIP5pase interacts with RAP1 and that RAP1 binds PI(3,4,5)P3. RNAseq and ChIPseq analyses have been performed previously in PIP5pase conditional knockout cells, too (ref. 24). In the current study, Touray et al. did similar analyses except that catalytic dead PIP5pase mutant was used and the DNA and PI(3,4,5)P3 binding activities of RAP1 fragments were examined. Specifically, the authors examined the transcriptome profile and did RAP1 ChIPseq in PIP5pase catalytic dead mutant. The authors also expressed several C-terminal His6-tagged RAP1 recombinant proteins (full-length, aa1-300, aa301-560, and aa 561-855). These fragments’ DNA binding activities were examined by EMSA analysis and their phosphoinositides binding activities were examined by affinity pulldown of biotin-conjugated phosphoinositides. As a result, the authors confirmed that VSG silencing (both BES-linked and MES-linked VSGs) depends on PIP5pase catalytic activity, but the overall knowledge improvement is incremental. The most convincing data come from the phosphoinositide binding assay as it clearly shows that N-terminus of RAP1 binds PI(3,4,5)P3 but not PI(4,5)P2, although this is only assayed in vitro, while the in vivo binding of full-length RAP1 to PI(3,4,5)P3 has been previously published by Cestari et al (ref. 24) already. Considering that many phosphoinositides exert their regulatory role by modulating the subcellular localization of their bound proteins, it is reasonable to hypothesize that binding to PI(3,4,5)P3 can remove RAP1 from the chromatin. However, no convincing data have been shown to support the author’s hypothesis that this regulation is through an “allosteric switch”. Therefore, the title should be revised.

      We appreciate the reviewer’s detailed evaluation of our work. There are a few general comments that we would like to clarify. We will break them into three points. All data included here are new and were not previously published.

      i) “RNAseq and ChIPseq analyses have been performed previously …(ref. 24).” Reference 24 is Cestari et al. 2019, Mol Cell Biol. We, or others, have not published ChIP-seq of RAP1 in T. brucei. Previous work showed ChIP-qPCR, which analyses specific loci. The ChIP-seq shows genome-wide binding sites of RAP1, and new findings are shown here, including binding sites in the BES, MESs, and other genome loci such as centromeres. We also identified DNA sequence bias defining RAP1 binding sites (Fig 2A). We also show by ChIP-seq how RAP1-binding to these loci changes upon expression of catalytic inactive PIP5Pase. As for the RNA-seq, this is also the first time we show RNA-seq of T. brucei expressing catalytic inactive PIP5Pase, which establishes that the regulation of VSG silencing and switching is dependent on PIP5Pase enzyme catalysis, i.e., PI(3,4,5)P3 dephosphorylation. To improve clarity in the manuscript, we edited page 4, line 122, as follows: “We showed that RAP1 binds telomeric or 70 bp repeats (24), but it is unknown if it binds to other ES sequences or genomic loci.”

      ii) “The in vivo binding of full-length RAP1 to PI(3,4,5)P3 has been previously published by Cestari et al. (ref. 24) already.”. We published in reference 24 that RAP1-HA can bind agarose beads-conjugated synthetic PI(3,4,5)P3. Here, we were able to measure T. brucei endogenous PI(3,4,5)P3 associated with RAP1-HA (Fig 4F). Moreover, we showed that the endogenous RAP1-HA and PI(3,4,5)P3 binding is about 100-fold higher when PIP5Pase is catalytic inactive than WT PIP5Pase. The data establish that in vivo endogenous PI(3,4,5)P3 binds to RAP1-HA and how the binding changes in cells expressing mutant PIP5Pase; this data is new and relevant to our conclusions.

      iii) “no convincing data have been shown to support the author’s hypothesis that this regulation is through an “allosteric switch””. We show here in vitro and in vivo data supporting the conclusion. We show that PI(3,4,5)P3 binds to the N-terminus of rRAP1-His with a calculated Kd of about 20 µM (Fig 4B-E, Table 1). In contrast, we show by EMSA and binding kinetics by microscale thermophoresis that rRAP1-His binds to 70 bp and telomeric repeats via protein regions encompassing the Myb (central) or Myb-L domains (C-terminal) but not the N-terminus containing the VHP domain (Fig 3C-G, and Fig S5). Using microscale thermophoresis, we also show that rRAP1-His binds to 70 bp and telomeric repeats with Kd of 10 and 24 nM, respectively (Fig 3 and Table 1). Notably, we show that 30 µM of PI(3,4,5)P3, but not PI(4,5,)P2 – used as a control – disrupts rRAP1-His binding to 70 bp and telomeric repeats, changing Kds to about 188 and 155 nM, respectively (Fig 5A-C). We also show that PI(3,4,5)P3 does not disrupt the binding of rRAP1-His fragments (Myb or MybL) without the N-terminus domain (Fig S5), implying binding of PI(3,4,5)P3 to RAP1 N-terminus is required for displacement of RAP1 DNA binding domains (Myb and MybL) from telomeric and 70 bp repeats, and that PI(3,4,5)P3 is not competing for Myb or Myb-L binding to DNA. Moreover, we show that RAP1-HA binding to 70 bp and telomeric repeats in vivo is displaced in T. brucei cells expressing catalytic inactive PIP5Pase (Fig 5D-G), which we show results in RAP1-HA binding about 100-fold more endogenous PI(3,4,5)P3 than in T. brucei expressing WT PIP5Pase (Fig 4F). The in vivo data agrees with the in vitro data. The data show a typical allosteric regulator system, in which binding of a ligand to one site of the protein, here PI(3,4,5)P3 binding to RAP1 N-terminus, affects other domains (RAP1 Myb and Myb-L domains) binding to DNA. To improve the clarity of the title, we will change it in the revised version to imply a direct role of PI(3,4,5)P3 regulation of RAP1 in the process. This will provide more specific information to the readers and addresses the concern of the reviewer related to the “allosteric switch”. The new title will be: PI(3,4,5)P3 allosteric regulation of RAP1 controls antigenic switching in trypanosomes

      There are serious concerns about many conclusions made by Touray et al., according to their experimental approaches:

      1) The authors have been studying RAP1’s chromatin association pattern by ChIPseq in cells expressing a C-terminal HA tagged RAP1. According to data from tryptag.org, RAP1 with an N-terminal or a C-terminal tag does not seem to have identical subcellular localization patterns, suggesting that adding tags at different positions of RAP1 may affect its function. It is therefore essential to validate that the C-terminally HA-tagged RAP1 still has its essential functions. However, this data is not available in the current study. RAP1 is essential. If RAP1-HA still retains its essential functions, cells carrying one RAP1-HA allele and one deleted allele are expected to grow the same as WT cells. In addition, these cells should have the WT VSG expression pattern, and RAP1-HA should still interact with TRF. Without these validations, it is impossible to judge whether the ChIPseq data obtained on RAP1-HA reflect the true chromatin association profile of RAP1.

      Tryptag data show both N- and C-terminus RAP1 with nuclear localization in procyclic forms, although there are differences in signal intensities in the images (http://tryptag.org/?id=Tb927.11.370). It is important to note that Tryptag data is from procyclic forms, and DNA constructs are not validated for their integration in the correct locus. As for the RAP1-HA localization in bloodstream forms, we demonstrated that C-terminally HA-tagged RAP1 co-localizes with telomeres by a combination of immunofluorescence and fluorescence in situ hybridization (Cestari and Stuart, 2015, PNAS), and RAP1-HA co-immunoprecipitate telomeric and 70 bp repeats (Cestari et al. 2019 Mol Cell Biol). We also showed by immunoprecipitation and mass spectrometry that HA-tagged RAP1 interacts with nuclear and telomeric proteins, including PIP5Pase (Cestari et al. 2019). Others have also tagged T. brucei RAP1 in bloodstream forms with HA without disrupting its nuclear localization (Yang et al. 2009, Cell; Afrin et al. 2020, Science Advances). As for the experiment suggested by the reviewer, there is no guarantee that cells lacking one allele of RAP1 will behave as wildtype, i.e., normal growth and repression of VSGs genes. Also, less than 90% of T. brucei TRF was reported to interact with RAP1 (Yang et al. 2009, Cell), which might be indirect via their binding to telomeric DNA repeats rather than direct protein-protein interactions.

      2) Touray et al. expressed and purified His6-tagged recombinant RAP1 fragments from E. coli and used these recombinant proteins for EMSA analysis: The His6 tag has been used for purifying various recombinant proteins. It is most likely that the His6 tag itself does not convey any DNA binding activities. However, using His6-tagged RAP1 fragments for EMSA analysis has a serious concern. It has been shown that His6-tagged human RAP1 protein can bind dsDNA, but hRAP1 without the His6 tag does not. It is possible that RAP1 proteins in combination with the His6 tag can exhibit certain unnatural DNA binding activities. To be rigorous, the authors need to remove the His6 tag from their recombinant proteins before the in vitro DNA binding analyses are performed. This is a standard procedure for many in vitro assays using recombinant proteins.

      We show in Fig 3C-G that His-tagged full-length rRAP1 does not bind to scrambled telomeric dsDNA sequences, which indicates that His-tagged rRAP1 does not bind unspecifically to DNA. Moreover, in Fig 3G, we show that His-tagged rRAP11-300 also does not bind to 70 bp or telomeric repeats. In contrast, full-length His-tagged rRAP1, rRAP1301-560, or rRAP1561-855 bind to 70 bp or telomeric repeats (Fig 3C-G). Since all proteins were His-tagged, the His tag cannot be responsible for the DNA binding.

      As for the statement that human rRAP1-His has unspecific DNA binding properties, we could not find a reference to this statement; we cannot compare it without knowing the details of the experiment. Biochemical assays can result in unspecific binding depending on binding/buffer conditions. Also, humans and T. brucei RAP1 share only 15% of amino acid identity; unspecific binding to DNA could be specific to human RAP1.

      3) It is unclear why Nanopore sequencing was used for RNAseq and ChIPseq experiments. The greatest benefit of Nanopore sequencing is that it can sequence long reads, which usually helps with mapping, particularly at genome loci with repetitive sequences. This seems beneficial for RAP1 ChIPseq analysis as RAP1 is expected to bind telomere repeats. However, for ChIPseq, the chromatin needs to be fragmented. Larger DNA fragments from ChIPseq experiments will decrease the accuracy of the final calculated binding sites. Therefore, ChIPseq experiments are not supposed to have long reads to start with, so Nanopore sequencing does not seem to bring any advantage. In addition, compared to Illumina sequencing, Nanopore sequencing usually yields smaller numbers of reads, and the sequencing accuracy rate is lower. The Nanopore sequencing accuracy may be a serious concern in the current study. All telomeres have the perfect TTAGGG repeats, all VSG genes have a very similar 3’ UTR, and all 70 bp repeats have very similar sequences. In fact, the active and silent ESs have 90% sequence identity. Are sequence reads accurately mapped to different ESs? How is the sequencing and mapping quality controlled? Furthermore, it is unclear whether the read depth for RNAseq is deep enough.

      The mean sequence length for the ChIP-seq was about 500 bp (see Table S3), which helps to align reads to ESs and distinguish the different ESs, and it is a reasonable size range to define RAP1 binding sites. Although sequencing depths are usually higher in Illumina than in nanopore (all depending on the amount of sequencing), most Illumina short reads map to multiple genomic sequences, making it difficult to distinguish ESs. This is particularly important for RAP1 because it binds to repeats such as 70 bp and telomeric repeats. Mapping short reads to those regions would be virtually impossible; hence, our choice of nanopore sequencing. For RNA-seq, the ~500 bp read length help sequence alignment to the subtelomeric regions containing many VSG genes. The nanopore reads obtained here had an average sequencing score 12 (i.e., base call accuracy of 94%). Filtering reads with MAPQ ≥ 20 (99% probability of correct alignment) helped us to distinguish RAP1 binding to specific ESs, including silent vs active ES (ChIP-seq) or VSG sequences (RNA-seq). The details of the analysis and sequencing metrics (i.e., sequencing depth and read length) were described in the Methods section “Computational analysis of RNA-seq and ChIP-seq” and Table S3, respectively.

      4) Many statements in the discussion section are speculations without any solid evidence. For example, lines 218 - 219 “likely due to RAP1 conformational changes”, no data have been shown to support this at all. In lines 224-226, the authors acknowledged that more experiments are necessary to validate their observations, so it is important for the authors to first validate their findings before they draw any solid conclusions. Importantly, RAP1 has been shown to help compact telomeric and subtelomeric chromatin a long time ago by Pandya et al. (2013. NAR 41:7673), who actually examined the chromatin structure by MNase digestion and FAIRE. The authors should acknowledge previous findings. In addition, the authors need to revise the discussion to clearly indicate what they “speculate” rather than make statements as if it is a solid conclusion.

      The statement “likely due to RAP1 conformational changes” in lines 218-219 (page 6) is part of the Discussion. We did not make a strong statement but discussed a possibility. We believe that it is beneficial to the reader to have the data discussed, and we do not feel this point is overly speculative.

      For lines 224-226 (page 6), the statement refers to the finding of RAP1 binding to centromeric regions by ChIP-seq, which is a new finding but not the focus of this work. Hence, future studies are necessary for this finding, and we believe it is appropriate in the Discussion to be upfront and highlight this point to the readers. However, for the RAP1 binding to telomeric ES sites, e.g., 70 bp repeats and telomeric repeats (the focus of this work), we validated the binding by EMSA and by performing binding kinetics using microscale thermophoresis.

      We did not include Pandya et al. 2013 NAR because the authors demonstrated RAP1 compaction of chromatin to occur in procyclic forms only. Pandya et al. stated in their abstract: “no significant chromatin structure changes were detected on depletion of TbRAP1 in BF cells”. Hence, the suggested reference is not relevant to the context of our conclusions in bloodstream forms. Nevertheless, we have reviewed the Discussion to avoid broad speculations in the revised version of the manuscript.

      There are also minor concerns:

      1) In the PIP5Pase conditional knockout system, the WT or mutant PIP5Pase with a V5 tag is constitutively expressed from the tubulin array. What’s the relative expression level of this allele and the endogenous PIP5Pase? Without a clear knowledge of the mutant expression level, it is hard to conclude whether the mutant has any dominant negative effects or whether the mutant phenotype is simply due to a lower than WT PIP5pase expression level.

      The relative mRNA levels of the exclusive expression of PIP5Pase Mut compared to the WT is available in the Data S1, RNA-seq. The Mut allele’s relative expression level is 0.85-fold to the WT allele (both from tubulin loci). We also showed by Western blot the WT and Mut PIP5Pase protein expression (Cestari et al. 2019, Mol Cell Biol). Concerning PIP5Pase endogenous alleles, we compared RNA-seq reads counts per million from the conditional null PIP5Pase cells exclusively expressing WT or the Mut PIP5Pase alleles (Data S1, this work) to our previous RNA-seq of single-marker 427 strain (Cestari et al. 2019, Mol Cell Biol). We used the single-maker 427 because the conditional null cells were generated in this strain background. The PIP5Pase WT and Mut mRNAs expressed from tubulin loci are 1.6 and 1.3-fold the endogenous PIP5Pase levels in single-marker 427, respectively. We include a statement in the Methods, page 7, lines 265-268: “The WT or Mut PIP5Pase mRNAs exclusively expressed from tubulin loci are 1.6 and 1.3-fold the WT PIP5Pase mRNA levels expressed from endogenous alleles in the single marker 427 strain. The fold-changes were calculated from RNA-seq reads counts per million from this work (WT and Mut PIP5Pase, Data S1) and our previous RNA-seq from single marker 427 strain (24).”

      2) In EMSA analysis, what are the concentrations of the protein and the probe used in each reaction? The amount of protein used in the binding assay appears to be very high, and this can contribute to the observation that many complexes are stuck in the well. Better quality EMSA data need to be shown to support the authors’ claims.

      All concentrations were provided in the Methods section. See page 9 Electrophoretic mobility shift assays: “100 nM of annealed DNA were mixed with 1 μg of recombinant protein…”. For microscale thermophoresis, also see page 9, Microscale thermophoresis binding kinetics: “1 μM rRAP1 was diluted in 16 two-fold serial dilutions in 250 mM HEPES pH 7.4, 25 mM MgCl2, 500 mM NaCl, and 0.25% (v/v) N P-40 and incubated with 20 nM telomeric or 70 bp repeats…”. Note that two different biochemical approaches, EMSA and microscale thermophoresis, were used to assess rRAP1-His binding to DNA. Both show similar results (Fig 3 and 5, and Fig S5; microscale thermophoresis shows the binding kinetics, data available in Table 1). The EMSA images clearly show the binding of RAP1 to 70 bp or telomeric repeats but not to scramble telomeric repeat DNA.

      Reviewer #2 (Public Review):

      This manuscript by Touray, et al. provides a significant new twist to our understanding of how antigenic variation may be regulated in T. brucei. Key aspects of antigenic variation are the mutually exclusive expression of a single antigen per cell and the periodic switching from expression of one antigen isoform to another. In this manuscript, the authors show, as they have previously shown, that depletion of the nuclear phosphatidylinositol 5-phosphatase (PIP5Pase) results in a loss of mutually exclusive VSG expression. Furthermore, using ChIP-seq, the authors show that the repressor/activator protein 1 (RAP1) binds to regions upstream and downstream of VSG genes located in transcriptionally repressed expression sites and that this binding is lost in the absence of a functional PIP5Pase. Importantly, the authors decided to further investigate this link between PIP5Pase and RAP1, a protein that has previously been implicated in antigenic variation in T. brucei, and found that inactivation of PIP5Pase results in the accumulation of PI(3,4,5)P3 bound to the RAP1 N-terminus and that this binding impairs the ability of RAP1 to bind DNA. Based on these observations, the authors suggest that the levels of PI(3,4,5)P3 may determine the cellular function of RAP1, either by binding upstream of VSG genes and repressing their function, or by not binding DNA and allowing the simultaneous expression of multiple VSG genes in a single parasite.

      While I find most of the data presented in this manuscript compelling, there are aspects of Figure 1 that are not clear to me. Based on Figure 1F, the authors claim that transient inactivation of PIP5Pase results in a switch from the expression of one VSG isoform to another. However, I am not exactly sure what the authors are showing in this panel, nor do the data in Figure 1F seem to be consistent with those shown in Figure 1C. Based on Figure 1F, a transient inactivation of PIP5Pase appears to result in an almost exclusive switch to a VSG located in BES12. However, based on Figure 1E, the VSG transcripts most commonly found after a transient inactivation of PIP5Pase are those from the previously active VSG (BES1) and VSGs located on chr 1 and 6 (I believe). The small font and the low resolution make it impossible to infer the location of the expressed VSG genes, nor to confirm that ALL VSG genes located in expression sites are activated, as the authors claim. Also, I was not able to access the raw ChIP-seq and RNA-seq reads. Thus, could not evaluate the quality of the sequencing data.

      We appreciate the reviewer’s comments and evaluation of our work. Fig 1E shows VSG-seq of a population after transient (24h) exclusive expression of the PIP5Pase mutant, followed by re-expression of the WT PIP5Pase allele for 60 hours (multiple VSGs are detected). As a control, it also shows VSG-seq in cells continuously expressing WT PIP5Pase (mostly VSG2, BES1 is detected). Fig 1F and Fig S1 show the sequencing of VSGs expressed by clones isolated (5-6 days of growth) after a temporary knockdown (24h) of PIP5Pase (tet -), followed by its re-expression. For comparison, no knockdown (tet +) was included. Fig 1F shows potential switchers in the population, the Fig 1E confirms VSG switching in clones.

      To clarify the difference between Fig 1E and 1F, we edited the manuscript on page 3, lines 103-110: “To verify PIP5Pase role in VSG switching, we knocked down PIP5Pase for 24h (Tet -), then restored its expression (Tet +) and isolated clones by limiting dilution and growth for 5-6 days. Analysis of isolated clones after temporary PIP5Pase knockdown (Tet -/+) confirmed VSG switching in 93 out of 94 (99%) of the analyzed clones (Fig 1F, Fig S1). The cells switched to express VSGs from silent ESs or subtelomeric regions, indicating switching by transcription or recombination mechanisms. Moreover, no switching was detected in 118 isolated clones from cells continuously expressing WT PIP5Pase (Tet +, Fig 1F).”. We also edited Fig 1F to indicate temporary knockdown (Tet -/+) vs no knockdown (Tet -). The modifications will be available in the resubmitted version of the manuscript.

      We agree that the heat map is difficult to read due to the amount of information. We will include in the revised version of the manuscript a table with the data in the supplementary information; the reader will be able to evaluate the data in detail.

      A preference for switching to specific ESs has been observed in T. brucei (Morrison et al. 2005, Int J Parasitol; Cestari and Stuart, 2015, PNAS), which may explain several clones switching to BES12. Many potential switchers were detected in the VSG-seq (Fig 1F, the whole cell population is over 107 parasites), but not all potential switchers were detected in the clonal analysis because we analyzed 212 clones total, a fraction of the over 107 cells analyzed by VSG-seq (Fig 1E). Also, it is possible that not all potential switchers are viable. However, the point of the clonal analysis is to validate the VSG switching after genetic perturbation of PIP5Pase.

      Fig 1C shows examples of ES derepression by RNA-seq after 24h exclusive expression of the mutant compared to WT PIP5Pase. The RNA-seq shows that all ESs are derepressed (Fig 1B). This can be visualized in the volcano plot (Fig 1B, BES and MES VSGs are labelled) and on the spreadsheet Data S1. Although all ESs are derepressed after PIP5Pase mutant expression, not all ESs are selected during switching, as observed in Fig 1E-F. This agrees with our previous observations in switching assays with proteins that control VSG switching (Cestari and Stuart, 2015, PNAS).

      As for metrics of sequencing and raw sequencing data. See Methods section, page 13, lines 483-485: “Sequencing information is available in Table S3 and fastq data is available in the Sequence Read Archive (SRA) with the BioProject identification PRJNA934938.” Table S3 has a summary of sequencing data. Metrics information such as sequencing quality and analysis can be found in the Methods section “Computational analysis of RNA-seq and ChIP-seq”. The latter includes information about nanopore reads, i.e., mean Q-score of 12.

      Reviewer #3 (Public Review):

      In this manuscript, Touray et al investigate the mechanisms by which PIP5Pase and RAP1 control VSG expression in T. brucei and demonstrate an important role for this enzyme in a signalling pathway that likely plays a role in antigenic variation in T. brucei.

      The methods used in the study are rigorous and well-controlled. The authors convincingly demonstrate that RAP1 binds to PI(3,4,5)P3 through its N-terminus and that this binding regulates RAP1 binding to VSG expression sites, which in turn regulates VSG silencing. Overall their results support the conclusions made in the manuscript.

      There are a few small caveats that are worth noting. First, the analysis of VSG derepression and switching in Figure 1 relies on a genome that does not contain minichromosomal (MC) VSG sequences. This means that MC VSGs could theoretically be misassigned as coming from another genomic location in the absence of an MC reference. As the origin of the VSGs in these clones isn’t a major point in the paper, I do not think this is a major concern, but I would not over-interpret the particular details of switching outcomes in these experiments.

      The authors state that “our data imply that antigenic variation is not exclusively stochastic.” I am not sure this is true. While I also favor the idea that switching is not exclusively stochastic, evidence for a signaling pathway does not necessarily imply that antigenic variation is not stochastic. This pathway could be important solely for lifecycle-related control of VSG expression, rather than antigenic variation during infection. Nevertheless, these data are critical for establishing a potential pathway that could control antigenic variation and thus represent a fundamental discovery.

      Another aspect of this work that is perhaps important, but not discussed much by the authors, is the fact that signalling is extremely poorly understood in T. brucei. In Figure 1B, the RNA-seq data show many genes upregulated after expression of the Mut PIP5Pase (not just VSGs). The authors rightly avoid claiming that this pathway is exclusive to VSGs, but I wonder if these data could provide insight into the other biological processes that might be controlled by this signaling pathway in T. brucei.

      Overall, this is an excellent study that represents an important step forward in understanding how antigenic variation is controlled in T. brucei. The possibility that this process could be controlled via a signalling pathway has been speculated for a long time, and this study provides the first mechanistic evidence for that possibility.

      We thank the reviewer for the evaluation of our work. We agree that it is difficult to ensure the origin of all VSG genes not having minichromosome sequences; hence we did not emphasize this point in the manuscript. We used the 427-2018 reference genome assembled by PacBio and Hi-C (Muller et al. 2018, Nature), which we believe is the best assembly for the 427 strain, especially related to the VSG genes.

      We also agree that having signaling controlling switching in vitro does not mean the switching necessarily occurs by signaling in vivo. Nevertheless, stochastic switching is an accepted model; but it has not been proved, whereas we provide molecular evidence that signaling can cause switching. To express this reviewer’s suggestion, we edited the Discussion, page 7, line 250: from “our data imply that antigenic variation is not exclusively stochastic” to “our data suggest that antigenic variation is not exclusively stochastic”.

      Most of the RNA-seq data were VSGs genes/pseudogenes. Other genes upregulated included retrotransposons and DNA/RNA processing enzymes such as endonucleases and polymerases. We included in the Results, page 3, line 100: “Other genes upregulated include primarily retrotransposons, endonucleases, and polymerase proteins.”.

    1. Author Response

      Reviewer #2 (Public Review):

      Associative learning assigns valence to sensory cues paired with reward or punishment. Brain regions such as the amygdala in mammals and the mushroom body in insects have been identified as primary sites where valence assignment takes place. However, little is known about the neural mechanisms that translate valence-specific activity in these brain regions into appropriate behavioral actions. This study identifies a small set of upwind neurons (UpWiNs) in the Drosophila brain that receive direct inputs from two mushroom body output neurons (MBONs) representing opposite valences. Through a series of behavioral, imaging, and electrophysiological experiments, the authors show that UpWiNs are differentially regulated by the two MBONs, i.e., inhibited by the glutamatergic MBON-α1(encoding negative valence) while activated by the cholinergic MBON-α3 (encoding positive valence). They also show that UpWiNs control the wind-directed behavior of flies. Activation of UpWiNs is sufficient to drive flies to orient and move upwind, and inhibition of UpWiNs reduces flies' upwind movement toward the source of reward-predicting odors (CS+). These results, together with existing knowledge about the function of the mushroom body in memory processing, suggest an appealing model in which reward learning decreases and increases the responses of MBON-α1 and MBON-α3 to the CS+ odor, respectively, and these changes cause UpWiNs to respond more strongly to the CS+ odor and drive upwind locomotion. Interestingly, in the final part of the results, the authors reveal a wind-independent function of UpWiNs: increasing the probability that flies will revisit the site where UpWiNs were activated. Thus, UpWiNs guide learned reward-seeking behavior with and without airflow. Although the mushroom body has been extensively studied for its role in learning and memory, the downstream neural circuits that read the information from the mushroom body to guide memory-driven behaviors remain poorly characterized. This study provides an important piece of the puzzle for this knowledge gap.

      Strength

      1) Memory studies have predominantly relied on binary choice (go or no-go) assays as measures of memory performance. While these assays are convenient and efficient, they fall short of providing a comprehensive understanding of underlying behavioral structures. In an effort to overcome this limitation, the current study used video recording and tracking software to delve deeper into memory-guided behavior. This innovative approach allowed the authors to uncover novel neurons and examine their contribution to behavior with a level of detail not possible with binary choice assays.

      2) This study used electron microscopy-based Drosophila hemibrain connectome data to reveal the synaptic connection between UpWiNs and MBON-α1 and MBON-α3. Using this method, the study shows that a single UpWiN receives direct input from both MBON-α1 and MBON- α3, which is confirmed by a functional imaging experiment. The connectome dataset also reveals several neurons downstream of UpWiNs, opening avenues for further research into the neural mechanisms linking memory and behavior.

      Weakness

      1) The authors repeatedly state in the manuscript that MBON-α1 and MBON-α3 convey appetitive or aversive memories, respectively. This assertion may not be entirely accurate. Evidence from sugar reward conditioning experiments suggests that MBON-α3 is potentiated and required for sugar reward memory retrieval. Therefore, the compartmentalization for appetitive and aversive memories appears not as obvious at the level of MBONs.

      What we intended was that activation of DANs in these compartments can induce aversive and appetitive memories, respectively, when paired with odors, and that these are the sole output pathway from these compartments to read out the memories in these compartments. As we previously proposed (Aso et al., 2014a eLife), these MBONs can integrate inputs from MBONs of other compartments and their activity can reflect appetitive memory stored as synaptic plasticity in other compartments. Since DANs in the α3 compartment respond to heat, bitter and electric shock but not sugar, the observation that MBON-α3 acquires an enhanced CS+ odor response after appetitive conditioning is presumably due to these intercompartmental connections rather than plasticity of KC-MBON synapses in the α3 compartment. In any case, the fact that excitatory activity of MBON-α1 and MBON-α3 conveys opposite valence of memory still holds true since appetitive conditioning induces depression and potentiation of odor responses, respectively.

      To clarify this point, we now cited related literature in the following sentence in the final paragraph of Introduction: “UpWiNs receive inputs from several types of lateral horn neurons and integrate inhibitory and excitatory inputs from MBON-α1 and MBON-α3, which are the output neurons of MB compartments that store long-lasting appetitive or aversive memories, respectively (Aso and Rubin, 2016; Ichinose et al., 2015; Jacob and Waddell, 2022a; Pai et al., 2013; Yamagata et al., 2015).”

      2) This study did not conclusively establish the importance of the MBON-α1/α3 to UpWiN pathways in memory-driven behavior. In the experiments shown in Figure 5, flies were trained to associate the activation of reward-related DANs with a specific odor (CS+). After conditioning, UpWiNs were observed to show enhanced responses to the CS+ odor. However, the results should be interpreted with caution because the driver line used to activate DANs (R58E02-LexAp65) labels not only DANs projecting to the MBON-α1 compartment, but all DANs in the protocerebral anterior medial (PAM) cluster. Thus, it remains unclear to what extent the observed enhanced responses are influenced by changes in inhibitory inputs from MBON-α1. While UpWiNs have been shown to play a critical role in the expression of sugar reward memory (Figure 7), it should be noted that UpWiNs receive inputs from multiple upstream neurons, making it difficult to accurately assess the contribution of MBON-α1/α3 to UpWiN pathways in UpWiN recruitment. Further research is needed to fully address this issue.

      We totally agree with this point and added a sentence to explain an alternative mechanism. “This enhancement of CS+ response can be most easily explained as an outcome of disinhibition from MBON-α1 whose output had been decreased by memory formation; MBON-α1 is inhibitory to UpWiNs (Figure 4B) and MBON-α1 response to the CS+ is reduced following the same training protocol (Yamada et al. 2023). In addition to such a mechanism, plasticity in the β1 compartment may contribute to the enhanced CS+ response in UpWiNs because the driver R58E02 contains DANs in the β1 and glutamatergic MBON from the β1 directly synapse on the dendrites of MBON-α1 and MBON-α3. “

      3) UpWind neurons (UpWiNs) were so named because their activation promotes upwind locomotion. However, when activated in the absence of airflow, flies show increased locomotor speed and an increased probability of revisiting the same location (Figure 7 and Figure 7-figure supplement 1). The revisiting behavior can be observed during the activation of UpWiNs, which is distinct from the local search behavior that typically begins after a reward stimulus is turned off (e.g., Gr64f-GAL4 results in Figure 7-figure supplement 1).

      Return probability was calculated within a 15-s time window. High return probability during LED ON period (10-20s) in Figure 7-figure supplement 1 does not necessarily mean that flies returned during LED ON period. If a fly is at the position A when t=10s, to be counted as “returned”, it needs to move more than 10mm away from A and move back to the position less than 3mm distance from A by t=25s. In the case of sugar sensory neuron activation with Gr64f-GAL4, the peak of return probability is shifted toward a later time point because flies stop and extend proboscis during activation period.

      Because revisiting a location can also be a consequence of repeated turns, it seems more accurate to describe UpWiNs as controlling the speed and likelihood of turns and promoting upwind movement by integrating with neurons that sense the direction of airflow.

      The return probability plotted in Figure 7E is probability of return to the position at the end of LED period within 15s post LED period when angular speed of SS33917>CsChrimson and SS33918>CsChrimson flies are identical to empty-split-GAL4>CsChrimson control flies (Figure 7-figure supplement 1). Thus, revisiting behavior cannot be explained by a simple increase in turing probability.

      Although functions of UpWiNs are not limited to promotion of wind-directed walking, we still think that the “UpWind Neurons” is a practical name for broad readers and oral communications at the current stage of investigations, because EM neuron IDs and names (SMP348, SMP353, SMP354, SLP399 and SLP400) are too lengthy and do not contain any functional information. We initially defined a set of 11 neurons labeled by SS33197 split-GAL4 as “UpWind Neurons (UpWiNs)” based on initial optogenetic screening (Figure 2A). We found other driver lines for mushroom body interneuron cell types that can promote release of dopamine and more robust returning phenotype (e.g. SS49755), but SS33917 remained to be the champion driver line for upwind locomotion phenotype.

      Reviewer #3 (Public Review):

      Aso et al. provide insight into how learned valences are transformed into concrete memory-driven actions, using a diverse set of proven techniques.

      Here the authors use a four-armed arena to evaluate flies' preference for a reward-predicting odor and measure upwind locomotion. This behavioral paradigm was combined with the photoactivation of different memory-eliciting neurons, revealing that appetitive memories stored in different compartments of the mushroom bodies (center of olfactory memory) induce different levels of upwind locomotion. The authors then proceed to a non-exhaustive optogenetic screen of the neurons located downstream of the output neurons of the mushroom bodies (MBONs) and identify a group of 8-11 Cholinergic neurons promoting significant changes in upwind locomotion, the UpWins. By combining confocal immunolabelling of these neurons with electron microscope images, they manage to establish the UpWins' connectome within themselves and with the MBONs. Then, using two in vivo cell recording techniques, electrophysiology, and calcium imaging, they define that UpWins integrate both inhibitory and excitatory synaptic inputs from the MBONs encoding appetitive and aversive memory, respectively. In addition, they show that the UpWins' response to a reward-predicting odor is increased after appetitive training. On a behavioral level, the authors establish that the UpWins respond to wind direction only and are not involved in lower-level motor parameters, such as turning direction and acceleration. Finally, they demonstrate that the UpWins' activity is necessary for long-term appetitive memory retrieval, and even suggest a broader role for the UpWins in olfactory navigation, as their photoactivation increases the probability of revisiting behavior. In the end, the authors state that they provide new insights into how memory is translated into concrete behavior, which is fully supported by their data. Altogether, the authors present a pretty complete study that provides very interesting and reliable data, and that opens a new field of investigation into memory-driven behaviors.

      Strengths of the study:

      • To support their conclusions, the authors provide detailed data from different levels of analysis (behavioral, cellular, and molecular), using multiple sophisticated techniques.

      • The measurement of multiple parameters in the behavioral analysis supports the strong changes in upwind locomotion. In addition, taken individually these parameters provide precise insights into how upwind locomotion changes, and allow the authors to more precisely define the role of the UpWins.

      • The authors use split-Gal4 drivers instead of Gal4, allowing them to better refine neuron labelling.

      The authors discussed and investigated all possible biases, making their data very reliable. For example, they demonstrated that the phenotypes observed in the behavioral assay were wind-directed behaviors and could not be explained by bias avoidance of the arena's center area.

      Limitations of the study:

      • In the absence of more precise drivers, the UpWins' labelling lacks precision. For example, there is no way to know exactly which UpWin is responding in the electrophysiological experiment presented in Figure 4.

      We have ongoing efforts to generate split-GAL4 and split-LexA driver lines for specific subsets of UpWiN neurons, but the data using those lines are not ready for this manuscript. However, we would like to point out that historically, identification of a group of neurons with striking phenotype has been foundational to promote follow-up studies. A good example is P1 neurons for courtship behavior.

      • The screening of neurons located downstream of the MBONs is not exhaustive, meaning that other groups of neurons might be involved in memory-driven upwind locomotion. Although, it does not diminish the authors' conclusions.

      The UpWiNs is certainly not the only one cell type for mediating memory-driven upwind locomotion, since our and other groups’ studies (e.g. Matheson et al., 2022; PMCID: PMC9360402) identified a collection of cell types that can promote upwind locomotion upon optogenetic activation.

      In 2021, we released images and driver lines of a larger collection of split-GAL4 driver lines at https://splitgal4.janelia.org. We are preparing a manuscript to provide anatomical descriptions of these lines. This collection of new drivers will help elucidate more comprehensive views of circuits for memory-driven actions.

      • All data were obtained with walking flies. So far, there have been no experiments on flying flies.

      This is an intriguing question and we mentioned in Discussion that “Our study was limited to walking behaviors, and the role of UpWiNs in flight behaviors remains to be investigated.”

    1. Author Response

      Reviewer #1 (Public Review):

      The authors present a PyTorch-based simulator for prosthetic vision. The model takes in the anatomical location of a visual cortical prostheses as well as a series of electrical stimuli to be applied to each electrode, and outputs the resulting phosphenes. To demonstrate the usefulness of the simulator, the paper reproduces psychometric curves from the literature and uses the simulator in the loop to learn optimized stimuli.

      One of the major strengths of the paper is its modeling work - the authors make good use of existing knowledge about retinotopic maps and psychometric curves that describe phosphene appearance in response to single-electrode stimulation. Using PyTorch as a backbone is another strength, as it allows for GPU integration and seamless integration with common deep learning models. This work is likely to be impactful for the field of sight restoration.

      1) However, one of the major weaknesses of the paper is its model validation - while some results seem to be presented for data the model was fit on (as opposed to held-out test data), other results lack quantitative metrics and a comparison to a baseline ("null hypothesis") model. On the one hand, it appears that the data presented in Figs. 3-5 was used to fit some of the open parameters of the model, as mentioned in Subsection G of the Methods. Hence it is misleading to present these as model "predictions", which are typically presented for held-out test data to demonstrate a model's ability to generalize. Instead, this is more of a descriptive model than a predictive one, and its ability to generalize to new patients remains yet to be demonstrated.

      We agree that the original presentation of the model fits might give rise to unwanted confusion. In the revision, we have adapted the fit of the thresholding mechanism to include a 3-fold cross validation, where part of the data was excluded during the fitting, and used as test sets to calculate the model’s performance. The results of the cross- validation are now presented in panel D of Figure 3. The fitting of the brightness and temporal dynamics parameters using cross-validation was not feasible due to the limited amount of quantitative data describing temporal dynamics and phosphene size and brightness for intracortical electrodes. To avoid confusion, we have adapted the corresponding text and figure captions to specify that we are using a fit as description of the data.

      We note that the goal of the simulator is not to provide a single set of parameters that describes precise phosphene perception for all patients but that it could also be used to capture variability among patients. Indeed, the model can be tailored to new patients based on a small data set. Figure 3-figure supplement 1 exemplifies how our simulator can be tailored to several data sets collected from patients with surface electrodes. Future clinical experiments might be used to verify how well the simulator can be tailored to the data of other patients.

      Specifically, we have made the following changes to the manuscript:

      • Caption Figure 2: the fitted peak brightness levels reproduced by our model

      • Caption Figure 3: The model's probability of phosphene perception is visualized as a function of charge per phase

      • Caption Figure 3: Predicted probabilities in panel (d) are the results of a 3-fold cross- validation on held-out test data.

      • Line 250: we included biologically inspired methods to model the perceptual effects of different stimulation parameters

      • Line 271: Each frame, the simulator maps electrical stimulation parameters (stimulation current, pulse width and frequency) to an estimated phosphene perception

      • Lines 335-336: such that 95% of the Gaussian falls within the fitted phosphene size.

      • Line 469-470: Figure 4 displays the simulator's fit on the temporal dynamics found in a previous published study by Schmidt et al. (1996).

      • Lines 922-925: Notably, the trade-off between model complexity and accurate psychophysical fits or predictions is a recurrent theme in the validation of the components implemented in our simulator.

      2) On the other hand, the results presented in Fig. 8 as part of the end-to-end learning process are not accompanied by any sorts of quantitative metrics or comparison to a baseline model.

      We now realize that the presentation of the end-to-end results might have given the impression that we present novel image processing strategies. However, the development of a novel image processing strategy is outside the scope of the study. Instead, The study aims to provide an improved simulation which can be used for more realistic assessment of different stimulation protocols. The simulator needs to fit experimental data, and it should run fast (so it can be used in behavioral experiments). Importantly, as demonstrated in our end-to-end experiments, the model can be used in differentiable programming pipelines (so it can be used in computational optimization experiments), which is a valuable contribution in itself because it lends itself to many machine learning approaches which can improve the realism of the simulation.

      We have rephrased our study aims in the discussion to improve clarity.

      • Lines 275-279: In the sections below, we discuss the different components of the simulator model, followed by a description of some showcase experiments that assess the ability to fit recent clinical data and the practical usability of our simulator in simulation experiments

      • Lines 810-814: Computational optimization approaches can also aid in the development of safe stimulation protocols, because they allow a faster exploration of the large parameter space and enable task-driven optimization of image processing strategies (Granley et al., 2022; Fauvel et al., 2022; White et al., 2019; Küçükoglü et al. 2022; de Ruyter van Steveninck et al., 2022; Ghaffari et al., 2021).

      • Lines 814-819: Ultimately, the development of task-relevant scene-processing algorithms will likely benefit both from computational optimization experiments as well as exploratory SPV studies with human observers. With the presented simulator we aim to contribute a flexible toolkit for such experiments.

      • Lines 842-853: Eventually, the functional quality of the artificial vision will not only depend on the correspondence between the visual environment and the phosphene encoding, but also on the implant recipient's ability to extract that information into a usable percept. The functional quality of end-to-end generated phosphene encodings in daily life tasks will need to be evaluated in future experiments. Regardless of the implementation, it will always be important to include human observers (both sighted experimental subjects and actual prosthetic implant users in the optimization cycle to ensure subjective interpretability for the end user (Fauvel et al., 2022; Beyeler & Sanchez-Garcia, 2022).

      3) The results seem to assume that all phosphenes are small Gaussian blobs, and that these phosphenes combine linearly when multiple electrodes are stimulated. Both assumptions are frequently challenged by the field. For all these reasons, it is challenging to assess the potential and practical utility of this approach as well as get a sense of its limitations.

      The reviewer raises a valid point and a similar point was raised by a different reviewer (our response is duplicated). As pointed out in the discussion, many aspects about multi- electrode phosphene perception are still unclear. On the one hand, the literature is in agreement that there is some degree of predictability: some papers explicitly state that phosphenes produced by multiple patterns are generally additive (Dobelle & Mladejovsky, 1974), that the locations are predictable (Bosking et al., 2018) and that multi-electrode stimulation can be used to generate complex, interpretable patterns of phosphenes (Chen et al., 2020, Fernandez et al., 2021). On the other hand, however, in some cases, the stimulation of multiple electrodes is reported to lead to brighter phosphenes (Fernandez et al., 2021), fused or displaced phosphenes (Schmidt et al., 1996, Bak et al., 1990) or unpredicted phosphene patterns (Fernández et al., 2021). It is likely that the probability of these interference patterns decreases when the distance between the stimulated electrodes increases. An empirical finding is that the critical distance for intracortical stimulation is approximately 1 mm (Ghose & Maunsell, 2012).

      We note that our simulator is not restricted to the simulation of linearly combined Gaussian blobs. Some irregularities, such as elongated phosphene shapes were already supported in the previous version of our software. Furthermore, we added a supplementary figure that displays a possible approach to simulate some of the more complex electrode interactions that are reported in the literature, with only minor adaptations to the code. Our study thereby aims to present a flexible simulation toolkit that can be adapted to the needs of the user.

      Adjustments:

      • Added Figure 1-figure supplement 3 on irregular phosphene percepts.

      • Lines 957-970: Furthermore, in contrast to the assumptions of our model, interactions between simultaneous stimulation of multiple electrodes can have an effect on the phosphene size and sometimes lead to unexpected percepts (Fernandez et al., 2021, Dobelle & Mladejovsky 1974, Bak et al., 1990). Although our software supports basic exploratory experimentation of non-linear interactions (see Figure 1-figure supplement 3), by default, our simulator assumes independence between electrodes. Multi- phosphene percepts are modeled using linear summation of the independent percepts. These assumptions seem to hold for intracortical electrodes separated by more than 1 mm (Ghose & Maunsell, 2012), but may underestimate the complexities observed when electrodes are nearer. Further clinical and theoretical modeling work could help to improve our understanding of these non-linear dynamics.

      4) Another weakness of the paper is the term "biologically plausible", which appears throughout the manuscript but is not clearly defined. In its current form, it is not clear what makes this simulator "biologically plausible" - it certainly contains a retinotopic map and is fit on psychophysical data, but it does not seem to contain any other "biological" detail.

      We thank the reviewer for the remark. We improved our description of what makes the simulator “biologically plausible” in the introduction (line 78): ‘‘Biological plausibility, in our work's context, points to the simulation's ability to capture essential biological features of the visual system in a manner consistent with empirical findings: our simulator integrates quantitative findings and models from the literature on cortical stimulation in V1 [...]”. In addition, we mention in the discussion (lines 611 - 621): “The aim of this study is to present a biologically plausible phosphene simulator, which takes realistic ranges of stimulation parameters, and generates a phenomenologically accurate representation of phosphene vision using differentiable functions. In order to achieve this, we have modeled and incorporated an extensive body of work regarding the psychophysics of phosphene perception. From the results presented in section H, we observe that our simulator is able to produce phosphene percepts that match the descriptions of phosphene vision that were gathered in basic and clinical visual neuroprosthetics studies over the past decades.”

      5) In fact, for the most part the paper seems to ignore the fact that implanting a prosthesis in one cerebral hemisphere will produce phosphenes that are restricted to one half of the visual field. Yet Figures 6 and 8 present phosphenes that seemingly appear in both hemifields. I do not find this very "biologically plausible".

      We agree with the reviewer that contemporary experiments with implantable electrodes usually test electrodes in a single hemisphere. However, future clinically useful approaches should use bilaterally implanted electrode arrays. Our simulator can either present phosphene locations in either one or both hemifields.

      We have made the following textual changes:

      • Fig. 1 caption: Example renderings after initializing the simulator with four 10 × 10 electrode arrays (indicated with roman numerals) placed in the right hemisphere (electrode spacing: 4 mm, in correspondence with the commonly used 'Utah array' (Maynard et al., 1997)).

      • Line 518-525: The simulator is initialized with 1000 possible phosphenes in both hemifields, covering a field of view of 16 degrees of visual angle. Note that the simulated electrode density and placement differs from current prototype implants and the simulation can be considered to be an ambitious scenario from a surgical point of view, given the folding of the visual cortex and the part of the retinotopic map in V1 that is buried in the calcarine sulcus. Line 546-547: with the same phosphene coverage as the previously described experiment

      Reviewer #2 (Public Review):

      Van der Grinten and De Ruyter van Steveninck et al. present a design for simulating cortical- visual-prosthesis phosphenes that emphasizes features important for optimizing the use of such prostheses. The characteristics of simulated individual phosphenes were shown to agree well with data published from the use of cortical visual prostheses in humans. By ensuring that functions used to generate the simulations were differentiable, the authors permitted and demonstrated integration of the simulations into deep-learning algorithms. In concept, such algorithms could thereby identify parameters for translating images or videos into stimulation sequences that would be most effective for artificial vision. There are, however, limitations to the simulation that will limit its applicability to current prostheses.

      The verification of how phosphenes are simulated for individual electrodes is very compelling. Visual-prosthesis simulations often do ignore the physiologic foundation underlying the generation of phosphenes. The authors' simulation takes into account how stimulation parameters contribute to phosphene appearance and show how that relationship can fit data from actual implanted volunteers. This provides an excellent foundation for determining optimal stimulation parameters with reasonable confidence in how parameter selections will affect individual-electrode phosphenes.

      We thank the reviewer for these supportive comments.

      Issues with the applicability and reliability of the simulation are detailed below:

      1) The utility of this simulation design, as described, unfortunately breaks down beyond the scope of individual electrodes. To model the simultaneous activation of multiple electrodes, the authors' design linearly adds individual-electrode phosphenes together. This produces relatively clean collections of dots that one could think of as pixels in a crude digital display. Modeling phosphenes in such a way assumes that each electrode and the network it activates operate independently of other electrodes and their neuronal targets. Unfortunately, as the authors acknowledge and as noted in the studies they used to fit and verify individual-electrode phosphene characteristics, simultaneous stimulation of multiple electrodes often obscures features of individual-electrode phosphenes and can produce unexpected phosphene patterns. This simulation does not reflect these nonlinearities in how electrode activations combine. Nonlinearities in electrode combinations can be as subtle the phosphenes becoming brighter while still remaining distinct, or as problematic as generating only a single small phosphene that is indistinguishable from the activation of a subset of the electrodes activated, or that of a single electrode.

      If a visual prosthesis happens to generate some phosphenes that can be elicited independently, a simulator of this type could perhaps be used by processing stimulation from independent groups of electrodes and adding their phosphenes together in the visual field.

      The reviewer raises a valid point and a similar point was raised by a different reviewer (our response is duplicated). As pointed out in the discussion, many aspects about multi- electrode phosphene perception are still unclear. On the one hand, the literature is in agreement that there is some degree of predictability: some papers explicitly state that phosphenes produced by multiple patterns are generally additive (Dobelle & Mladejovsky, 1974), that the locations are predictable (Bosking et al., 2018) and that multi-electrode stimulation can be used to generate complex, interpretable patterns of phosphenes (Chen et al., 2020, Fernandez et al., 2021). On the other hand, however, in some cases, the stimulation of multiple electrodes is reported to lead to brighter phosphenes (Fernandez et al., 2021), fused or displaced phosphenes (Schmidt et al., 1996, Bak et al., 1990) or unpredicted phosphene patterns (Fernández et al., 2021). It is likely that the probability of these interference patterns decreases when the distance between the stimulated electrodes increases. An empirical finding is that the critical distance for intracortical stimulation is approximately 1 mm (Ghose & Maunsell, 2012).

      We note that our simulator is not restricted to the simulation of linearly combined Gaussian blobs. Some irregularities, such as elongated phosphene shapes were already supported in the previous version of our software. Furthermore, we added a supplementary figure that displays a possible approach to simulate some of the more complex electrode interactions that are reported in the literature, with only minor adaptations to the code. Our study thereby aims to present a flexible simulation toolkit that can be adapted to the needs of the user.

      Adjustments:

      • Lines 957-970: Furthermore, in contrast to the assumptions of our model, interactions between simultaneous stimulation of multiple electrodes can have an effect on the phosphene size and sometimes lead to unexpected percepts (Fernandez et al., 2021, Dobelle & Mladejovsky 1974, Bak et al., 1990). Although our software supports basic exploratory experimentation of non-linear interactions (see Figure 1-figure supplement 3), by default, our simulator assumes independence between electrodes. Multi- phosphene percepts are modeled using linear summation of the independent percepts. These assumptions seem to hold for intracortical electrodes separated by more than 1 mm (Ghose & Maunsell, 2012), but may underestimate the complexities observed when electrodes are nearer. Further clinical and theoretical modeling work could help to improve our understanding of these non-linear dynamics.

      • Added Figure 1-figure supplement 3 on irregular phosphene percepts.

      2) Verification of how the simulation renders individual phosphenes based on stimulation parameters is an important step in confirming agreement between the simulation and the function of implanted devices. That verification was well demonstrated. The end use a visual-prosthesis simulation, however, would likely not be optimizing just the appearance of phosphenes, but predicting and optimizing functional performance in visual tasks. Investigating whether this simulator can suggest visual-task performance, either with sighted volunteers or a decoder model, that is similar to published task performance from visual-prosthesis implantees would be a necessary step for true validation.

      We agree with the reviewer that it will be vital to investigate the utility of the simulator in tasks. However, the literature on the performance of users of a cortical prosthesis in visually-guided tasks is scarce, making it difficult to compare task performance between simulated versus real prosthetic vision.

      Secondly, the main objective of the current study is to propose a simulator that emulates the sensory / perceptual experience, i.e. the low-level perceptual correspondence. Once more behavioral data from prosthetic users become available, studies can use the simulator to make these comparisons.

      Regarding the comparison to simulated prosthetic vision in sighted volunteers, there are some fundamental limitations. For instance, sighted subjects are exposed for a shorter duration to the (simulated) artificial percept and lack the experience and training that prosthesis users get. Furthermore, sighted subjects may be unfamiliar with compensation strategies that blind individuals have developed. It will therefore be important to conduct clinical experiments.

      To convey more clearly that our experiments are performed to verify the practical usability in future behavioral experiments, we have incorporated the following textual adjustments:

      • Lines 275-279: In the sections below, we discuss the different components of the simulator model, followed by a description of some showcase experiments that assess the ability to fit recent clinical data and the practical usability of our simulator in simulation experiments.

      • Lines 842-853: Eventually, the functional quality of the artificial vision will not only depend on the correspondence between the visual environment and the phosphene encoding, but also on the implant recipient's ability to extract that information into a usable percept. The functional quality of end-to-end generated phosphene encodings in daily life tasks will need to be evaluated in future experiments. Regardless of the implementation, it will always be important to include human observers (both sighted experimental subjects and actual prosthetic implant users in the optimization cycle to ensure subjective interpretability for the end (Fauvel et al., 2022; Beyeler & Sanchez- Garcia, 2022).

      3) A feature of this simulation is being able to convert stimulation of V1 to phosphenes in the visual field. If used, this feature would likely only be able to simulate a subset of phosphenes generated by a prosthesis. Much of V1 is buried within the calcarine sulcus, and electrode placement within the calcarine sulcus is not currently feasible. As a result, stimulation of visual cortex typically involves combinations of the limited portions of V1 that lie outside the sulcus and higher visual areas, such as V2.

      We agree that some areas (most notably the calcarine sulcus) are difficult to access in a surgical implantation procedure. A realistic simulation of state-of-the-art cortical stimulation should only partially cover the visual field with phosphenes. However, it may be predicted that some of these challenges will be addressed by new technologies. We chose to make the simulator as generally applicable as possible and users of the simulator can decide which phosphene locations are simulated. To demonstrate that our simulator can be flexibly initialized to simulate specific implantation locations using third- party software, we have now added a supplementary figure (Figure 1-figure supplement 1) that displays a demonstration of an electrode grid placement on a 3D brain model, generating the phosphene locations from receptive field maps. However, the simulator is general and can also be used to guide future strategies that aim to e.g. cover the entire field with electrodes, compare performance between upper and lower hemifields etc.

      Reviewer #3 (Public Review):

      The authors are presenting a new simulation for artificial vision that incorporates many recent advances in our understanding of the neural response to electrical stimulation, specifically within the field of visual prosthetics. The authors succeed in integrating multiple results from other researchers on aspects of V1 response to electrical stimulation to create a system that more accurately models V1 activation in a visual prosthesis than other simulators. The authors then attempt to demonstrate the value of such a system by adding a decoding stage and using machine-learning techniques to optimize the system to various configurations.

      1) While there is merit to being able to apply various constraints (such as maximum current levels) and have the system attempt to find a solution that maximizes recoverable information, the interpretability of such encodings to a hypothetical recipient of such a system is not addressed. The authors demonstrate that they are able to recapitulate various standard encodings through this automated mechanism, but the advantages to using it as opposed to mechanisms that directly detect and encode, e.g., edges, are insufficiently justified.

      We thank the reviewer for this constructive remark. Our simulator is designed for more realistic assessment of different stimulation protocols in behavioral experiments or in computational optimization experiments. The presented end-to-end experiments are a demonstration of the practical usability of our simulator in computational experiments, building on a previously existing line of research. In fact, our simulator is compatible with any arbitrary encoding strategy.

      As our paper is focused on the development of a novel tool for this existing line of research, we do not aim to make claims about the functional quality of end-to-end encoders compared to alternative encoding methods (such as edge detection). That said, we agree with the reviewer that it is useful to discuss the benefits of end-to-end optimization compared to e.g. edge detection will be useful.

      We have incorporated several textual changes to give a more nuanced overview and to acknowledge that many benefits remain to be tested. Furthermore, we have restated our study aims more clearly in the discussion to clarify the distinction between the goals of the current paper and the various encoding strategies that remain to be tested.

      • Lines 275-279: In the sections below, we discuss the different components of the simulator model, followed by a description of some showcase experiments that assess the ability to fit recent clinical data and the practical usability of our simulator in simulation experiments

      • Lines 810-814: Computational optimization approaches can also aid in the development of safe stimulation protocols, because they allow a faster exploration of the large parameter space and enable task-driven optimization of image processing strategies (Granley et al., 2022; Fauvel et al., 2022; White et al., 2019; Küçükoglü et al. 2022; de Ruyter van Steveninck, Güçlü et al., 2022; Ghaffari et al., 2021).

      • Lines 842-853: Eventually, the functional quality of the artificial vision will not only depend on the correspondence between the visual environment and the phosphene encoding, but also on the implant recipient's ability to extract that information into a usable percept. The functional quality of end-to-end generated phosphene encodings in daily life tasks will need to be evaluated in future experiments. Regardless of the implementation, it will always be important to include human observers (both sighted experimental subjects and actual prosthetic implant users in the optimization cycle to ensure subjective interpretability for the end user (Fauvel et al., 2022; Beyeler & Sanchez-Garcia, 2022).

      2) The authors make a few mistakes in their interpretation of biological mechanisms, and the introduction lacks appropriate depth of review of existing literature, giving the reader the mistaken impression that this is simulator is the only attempt ever made at biologically plausible simulation, rather than merely the most recent refinement that builds on decades of work across the field.

      We thank the reviewer for this insight. We have improved the coverage of the previous literature to give credit where credit is due, and to address the long history of simulated phosphene vision.

      Textual changes:

      • Lines 64-70: Although the aforementioned SPV literature has provided us with major fundamental insights, the perceptual realism of electrically generated phosphenes and some aspects of the biological plausibility of the simulations can be further improved and by integrating existing knowledge of phosphene vision and its underlying physiology.

      • Lines 164-190: The aforementioned studies used varying degrees of simplification of phosphene vision in their simulations. For instance, many included equally-sized phosphenes that were uniformly distributed over the visual field (informally referred to as the ‘scoreboard model’). Furthermore, most studies assumed either full control over phosphene brightness or used binary levels of brightness (e.g. 'on' / 'off'), but did not provide a description of the associated electrical stimulation parameters. Several studies have explicitly made steps towards more realistic phosphene simulations, by taking into account cortical magnification or using visuotopic maps (Fehervari et al., 2010;, Li et al., 2013; Srivastava et al., 2009; Paraskevoudi et al., 2021), simulating noise and electrode dropout (Dagnelie et al., 2007), or using varying levels of brightness (Vergnieux et al., 2017; Sanchez-Garcia et al., 2022; Parikh et al., 2013). However, no phosphene simulations have modeled temporal dynamics or provided a description of the parameters used for electrical stimulation. Some recent studies developed descriptive models of the phosphene size or brightness as a function of the stimulation parameters (Winawer et al., 2016; Bosking et al., 2017). Another very recent study has developed a deep-learning based model for predicting a realistic phosphene percept for single stimulating electrodes (Granley et al., 2022). These studies have made important contributions to improve our understanding of the effects of different stimulation parameters. The present work builds on these previous insights to provide a full simulation model that can be used for the functional evaluation of cortical visual prosthetic systems.

      • Lines 137-140: Due to the cortical magnification (the foveal information is represented by a relatively large surface area in the visual cortex as a result of variation of retinal RF size) the size of the phosphene increases with its eccentricity (Winawer & Parvizi, 2016, Bosking et al., 2017).

      • Lines 883-893: Even after loss of vision, the brain integrates eye movements for the localization of visual stimuli (Reuschel et al., 2012), and in cortical prostheses the position of the artificially induced percept will shift along with eye movements (Brindley & Lewin, 1968, Schmidt et al., 1996). Therefore, in prostheses with a head-mounted camera, misalignment between the camera orientation and the pupillary axes can induce localization problems (Caspi et al., 2018; Paraskevoudi & Pezaris, 2019; Sabbah et al., 2014; Schmidt et al., 1996). Previous SPV studies have demonstrated that eye-tracking can be implemented to simulate the gaze-coupled perception of phosphenes (Cha et al., 1992; Sommerhalder et al., 2004; Dagnelie et al., 2006; McIntosh et al., 2013, Paraskevoudi & Pezaris, 2021; Rassia & Pezaris 2018, Titchener et al., 2018, Srivastava et al., 2009)

      3) The authors have importantly not included gaze position compensation which adds more complexity than the authors suggest it would, and also means the simulator lacks a basic, fundamental feature that strongly limits its utility.

      We agree with the reviewer that the inclusion of gaze position to simulate gaze-centered phosphene locations is an important requirement for a realistic simulation. We have made several textual adjustments to section M1 to improve the clarity of the explanation and we have added several references to address the simulation literature that took eye movements into account.

      In addition, we included a link to some demonstration videos in which we illustrate that the simulator can be used for gaze-centered phosphene simulation. The simulation models the phosphene locations based on the gaze direction, and updates the input with changes in the gaze direction. The stimulation pattern is chosen to encode the visual environment at the location where the gaze is directed. Gaze contingent processing has been implemented in prior simulation studies (for instance: Paraskevoudi et al., 2021; Rassia et al., 2018; Titchener et al., 2018) and even in the clinical setting with users of the Argus II implant (Caspi et al., 2018). From a modeling perspective, it is relatively straightforward to simulate gaze-centered phosphene locations and gaze contingent image processing (our code will be made publicly available). At the same time, however, seen from a clinical and hardware engineering perspective, the implementation of eye-tracking in a prosthetic system for blind individuals might come with additional complexities. This is now acknowledged explicitly in the manuscript.

      Textual adjustment:

      Lines 883-910: Even after loss of vision, the brain integrates eye movements for the localization of visual stimuli (Reuschel et al., 2012), and in cortical prostheses the position of the artificially induced percept will shift along with eye movements (Brindley & Lewin, 1968, Schmidt et al., 1996). Therefore, in prostheses with a head-mounted camera, misalignment between the camera orientation and the pupillary axes can induce localization problems (Caspi et al., 2018; Paraskevoudi & Pezaris, 2019; Sabbah et al., 2014; Schmidt et al., 1996). Previous SPV studies have demonstrated that eye-tracking can be implemented to simulate the gaze-coupled perception of phosphenes (Cha et al., 1992; Sommerhalder et al., 2004; Dagnelie et al., 2006, McIntosh et al., 2013; Paraskevoudi et al., 2021; Rassia et al., 2018; Titchener et al., 2018; Srivastava et al., 2009). Note that some of the cited studies implemented a simulation condition where not only the simulated phosphene locations, but also the stimulation protocol depended on the gaze direction. More specifically, instead of representing the head-centered camera input, the stimulation pattern was chosen to encode the external environment at the location where the gaze was directed. While further research is required, there is some preliminary evidence that such a gaze-contingent image processing can improve the functional and subjective quality of prosthetic vision (Caspi et al., 2018; Paraskevoudi et al., 2021; Rassia et al., 2018; Titchener et al., 2018). Some example videos of gaze-contingent simulated prosthetic vision can be retrieved from our repository (https://github.com/neuralcodinglab/dynaphos/blob/main/examples/). Note that an eye-tracker will be required to produce gaze-contingent image processing in visual prostheses and there might be unforeseen complexities in the clinical implementation thereof. The study of oculomotor behavior in blind individuals (with or without a visual prosthesis) is still an ongoing line of research (Caspi et al.,2018; Kwon et al., 2013; Sabbah et al., 2014; Hafed et al., 2016).

      4) Finally, the computational capacity required to run the described system is substantial and is not one that would plausibly be used as part of an actual device, suggesting that there may be difficulties with converting results from this simulator to an implantable system.

      The software runs in real time with affordable, consumer-grade hardware. In Author response image 1 we present the results of performance testing with a 2016 model MSI GeForce GTX 1080 (priced around €600).

      Author response image 1.

      Note that the GPU is used only for the computation and rendering of the phosphene representations from given electrode stimulation patterns, which will never be part of any prosthetic device. The choice of encoder to generate the stimulation patterns will determine the required processing capacity that needs to be included in the prosthetic system, which is unrelated to the simulator’s requirements.

      The following addition was made to the text:

      • Lines 488-492: Notably, even on a consumer-grade GPU (e.g. a 2016 model GeForce GTX 1080) the simulator still reaches real-time processing speeds (>100 fps) for simulations with 1000 phosphenes at 256x256 resolution.

      5) With all of that said, the results do represent an advance, and one that could have wider impact if the authors were to reduce the computational requirements, and add gaze correction.

      We appreciate the kind compliment from the reviewer and sincerely hope that our revised manuscript meets their expectations. Their feedback has been critical to reshape and improve this work.

    1. Author Response

      Review #1 Public Review:

      This is an interesting study which attempts to assess the effect of the pandemic on diagnoses of pancreatic cancer. The authors have used a large national database to evaluate this, however, it should be noted that this database only captures 40% of the population in England. The authors have looked at specific parameters including Body Mass Index (BMI) as well as markers of diabetes and liver function. Only BMI had a difference in the frequency of measurements during the pandemic, presumably due to reduced face-to-face visits to allow weight and height to be captured.

      Interestingly the authors noticed a reduction in surgery for pancreatic cancer by 25%, yet reported that there were no differences in the frequency of death within 6 months following the diagnosis of pancreatic cancer. The reduction in surgery is likely related at least in part to the loss of operating lists due to pandemic restrictions, however, this paper is not equipped to address another important possibility behind this, which is that pancreatic cancers were presenting too late for surgical intervention. It is not sufficient to comment that pancreatic cancer treatment was not affected by the pandemic based on the data presented on deaths within 6 months of the diagnosis of pancreatic cancer alone, as the median survival of patients diagnosed with pancreatic cancer within the pandemic has not been captured and compared to that of patients diagnosed in the preceding 5 years.

      Therefore while the study can conclude no difference in pancreatic cancer diagnoses before and during the pandemic, more work needs to be done to truly assess if the pandemic had any effect on the outcomes from pancreatic cancer for patients diagnosed within this timeframe.

      Thank you for taking time to undertake the review and for all the constructive comments. This study was designed to assess the effect of the pandemic on pancreatic cancer services in England. We focused on the quantity of healthcare.

      We acknowledge and understand the comments by the reviewer with regards to the limitations of this study in relation to the effect of the COVID-19 pandemic on diagnosis and survival. We did not assess the effect of the pandemic on the staging information and survival length.

    1. Author Response

      Reviewer #1 (Public Review):

      This research aimed to discern the pattern of methylation changes that occur during aging, distinguishing between a unified specific mechanism and stochastic changes. To date, no unified hypothesis exists to guide our understanding of the changes in chromatin geography observed during the aging of cells. This work analysed six different types of purified blood-borne white blood cells allowing comparison across different immune cell subsets to determine if similar patterns occurred in all cell populations. Intriguingly, each subset exhibited its own distinct differential methylation rather than a single program. However, a core set of gene changes close to age-associated CpGs was identified suggesting that a central program existed, but that individual cell type function and metabolism shaped the overall chromatin landscape for the population. These findings establish a new framework for considering the aging process and open new questions about how the individual clocks of different populations might be regulated. While circulating cells are readily accessible for evaluation in humans, the majority of immune cells that regulate immune homeostasis are found within the tissues of the body. Whether these cells exhibit a similar profile to circulating cells or are rather shaped by their tissue or organ-specific ecosystem remains to be determined. In this setting, these tissue-resident cells are exposed to very different oxygen tensions and metabolic substrates. Furthermore, genes identified have been associated with aging, they concurrently appear to be associated with inflammation, thus it is not clear whether aging and low-grade inflammation are inherently linked, or whether these two pathways can be segregated. Thus a number of questions remain warranting further investigation.

      The reviewer makes a very good point regarding different tissue resident cells being exposed to different oxygen and metabolic stress. In the reviewed manuscript we have Arid3a coming up as one of the transcription factors with motifs in and around probes hypermethylated with age in monocytes. Arid3a is known to target inflammatory genes but future research is warranted to implicate the link between aging and low-grade inflammation. To address the comment about connection between aging and low-grade inflammation, in the revised manuscript, we have incorporated new analysis by looking into SomaScan array derived protein levels of seven cytokines from the same cohort of donors. We tested the hypothesis that part of the age-associated changes in DNA methylation are connected with the well-known age-related proinflammatory state. We have now added the details in the Results and Methods sections. Briefly, we run two regression models (CpGi~age+sex and CpGi~age+sex+analytej, where i is each CpG probe from EPIC array and j is each of the seven cytokines). We find that change in DNA methylation levels in nearly 70009000 CpG sites in CD4 cells and 124 CpG sites in B cells that were originally age-associated, also are associated with increasing levels of TNFRSF1A, TNFRSF1B and TNF-alpha levels thereby indicating a link between DNA methylation change and aging as well as inflammatory cytokines levels.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors convincingly show in this study the effects of the fas5 gene on changes in the CHC profile and the importance of these changes toward sexual attractiveness.

      The main strength of this study lies in its holistic approach (from genes to behaviour) showing a full and convincing picture of the stated conclusions. The authors succeeded in putting a very interdisciplinary set of experiments together to support the main claims of this manuscript.

      We appreciate the kind comments from the reviewer.

      The main weakness stems from the lack of transparency behind the statistical analyses conducted in the study. Detailed statistical results are never mentioned in the text, nor is it always clear what was compared to what. I also believe that some tests that were conducted are not adequate for the given data. I am therefore unable to properly assess the significance of the results from the presented information. Nevertheless, the graphical representations are convincing enough for me to believe that a revision of the statistics would not significantly affect the main conclusions of this manuscript.

      We apologize for neglecting a detailed description of statistical tests that were performed. We wrote additional paragraphs in the method part specifically explaining the statistical analyses (line 435-445; 489-502; 559-561; 586-591).

      The second major problem I had with the study was how it brushes over the somewhat contradicting results they found in males (Fig S2). These are only mentioned twice in the main text and in both cases as being "similarly affected", even though their own stats seem to indicate otherwise for many of the analysed compound groups. This also should affect the main conclusion concerning the effects of fas5 genes in the discussion, a more careful wording when interpreting the results is therefore necessary.

      Thank you for pointing this out. Though our focus clearly lay on the female CHC profiles as a function in sexual signaling has only been described thus far for them, we now elaborated the result and discussion for the fas5 RNAi male part (line 167-178; 258-268).

      Reviewer #2 (Public Review):

      Insects have long been known to use cuticular hydrocarbons for communication. While the general pathways for hydrocarbon synthesis have been worked out, their specificity and in particular the specificity of the different enzymes involved is surprisingly little understood. Here, the authors convincingly demonstrate that a single fatty acid synthase gene is responsible for a shift in the positions of methyl groups across the entire alkane spectrum of a wasp, and that the wasps males recognize females specifically based on these methyl group positions. The strength of the study is the combination of gene expression manipulations with behavioural observations evaluating the effect of the associated changes in the cuticular hydrocarbon profiles. The authors make sure that the behavioural effect is indeed due to the chemical changes by not only testing life animals, but also dead animals and corpses with manipulated cuticular hydrocarbons.

      I find the evidence that the hydrocarbon changes do not affect survival and desiccation resistance less convincing (due to the limited set of conditions and relatively small sample size), but the data presented are certainly congruent with the idea that the methyl alkane changes do not have large effects on desiccation.

      We appreciate the kind comments from the reviewer.

      Reviewer #3 (Public Review):

      In this manuscript, the authors are aiming to demonstrate that a fatty-acyl synthase gene (fas5) is involved in the composition of the blend of surface hydrocarbons of a parasitoid wasp and that it affects the sexual attractiveness of females for males. Overall, the manuscript reads very well, it is very streamlined, and the authors' claims are mostly supported by their experiments and observations.

      We appreciate the kind comments from the reviewer.

      However, I find that some experiments, information and/or discussion are absent to assess how the effects they observe are, at least in part, not due to other factors than fas5 and the methyl-branched (MB) alkanes. I'm also wondering if what the authors observe is only a change in the sexual attractiveness of females and not related to species recognition as well.

      We appreciate the interesting point that the reviewer raises in sexual attractiveness and species recognition and now expand upon this potential aspect in the discussion (lines 327-330). However, in this manuscript, we very much focused on the effect of fas5 knockdown on the conveyance of female sexual attractiveness in a single species (Nasonia vitripennis). Therefore, we argue that species recognition constitutes a different communication modality here, and we currently cannot infer whether and how species recognition is exactly encoded in Nasonia CHC profiles despite some circumstantial evidence for species-specificity (Buellesbach et al. 2013; Mair et al. 2017). Thus, we would like to refrain from any further speculation on species recognition before this can be unambiguously demonstrated, and remain within the mechanism of sexual attractiveness within a single species which we clearly show is mediated by the female MB-alkane fraction governed by the fatty acid synthase genes. We however still consider potential alternative explanations (e.g., n-alkenes acting as a deterrent of homosexual mating attempts).

      The authors explore the function of cuticular hydrocarbons (CHCs) and a fatty-acyl synthase in Nasonia vitripennis, a parasitic wasp. Using RNAi, they successfully knockdown the expression of the fas5 gene in wasps. The authors do not justify their choice of fatty-acyl synthase candidate gene. It would have been interesting to know if that is one of many genes they studied or if there was some evidence that drove them to focus their interest in fas5.

      In a previous study, 5 fas candidate genes orthologous to Drosophila melanogaster fas genes were identified and mapped in the genome of Nasonia vitripennis (Buellesbach et al. 2022). We actually investigated the effects of all of these fas genes on CHC variation, but only fas5 led to such a striking, traceable pattern shift. We are currently preparing another manuscript discussing the effects of the other fas genes, but decided to focus exclusively on fas5 here, due to its significance for revealing how sexual attractiveness can be encoded and conveyed in complex chemical profiles, maintained and governed by a surprisingly simple genetic basis.

      The authors observe large changes in the cuticular hydrocarbons (CHC) profile of male and females. These changes are mostly a reduction of some MB alkanes and an increase in others as well as an increase of n-alkene in fas5 knockdown females. For males fas5 knockdowns, the overall quantity of CHC is increased and consequently, multiple types of compounds are increased compared to wild-type, with only one compound appearing to decrease compared to wild-type. Insects are known to rely on ratios of compounds in blends to recognize odors. Authors address this by showing a plot of the relative ratios, but it seems to me that they do show statistical tests of those changes in the proportions of the different types of compounds. In the results section, the authors give percentages while referring to figures showing the absolute amount of CHCs. They should also test if the ratios are significantly different or not between experimental conditions. Similar data should be displayed for the males as well.

      We appreciate your suggestions. We kindly refer you to our response to reviewer 1, where we addressed the statistical tests. Specifically, we generated separate subplots to display the proportions of different compound classes and performed statistical tests to compare these proportions between different treatments for both males and females. Additionally, we have revised the results section to replace relative abundances with absolute quantity, as depicted in Figure 2C-G.

      Furthermore, the authors didn't use an internal standard to measure the quantity of CHCs in the extracts, which, to me, is the gold standard in the field. If I understood correctly, the authors check the abundance measured for known quantities of n-alkanes. I'm sure this method is fine, but I would have liked to be reassured that the quantities measured through this method are good by either testing some samples with an internal standard, or referring to work that demonstrates that this method is always accurate to assess the quantities of CHC in extracts of known volumes.

      We actually did include 7,5 ng/μl dodecane (C12) as an “internal” standard in the hexane resuspensions of all of our processed samples (line 456, Materials and Methods). This was primarily done to allow for visually inspecting and comparing the congruence of all chromatograms in the subsequent data analysis and immediately detect any variation from sample preparation, injection process and instrument fluctuation. In our study, we have a very elaborate and standardized CHC extraction method that the volume of solvent and duration for extraction are strictly controlled to minimize the variation from sample preparation steps. Furthermore, we calibrated each individual CHC compound quantity with a dilution series of external standards (C21-C40) of known concentration. By constructing a calibration curve based on this dilution series, we achieved the most accurate compound quantification, also taking into account and counteracting the generally diminishing quantities of compounds with higher chain lengths.

      The authors provide a sensible control for their RNAi experiments: targeting an unrelated gene, absent in N. vitripennis (the GFP). This allows us to see if the injection of RNAi might affect CHC profiles, which it appears to do in some cases in males, but not in females. The authors also show to the reader that their RNAi experiments do reduce the expression of the target gene. However, one of the caveats of their experiments, is that the authors don't provide evidence or information to allow the (non-expert) reader to assess whether the fas5 RNAi experiments did affect the expression of other fatty-acyl synthase genes. I'm not an expert in RNAi, so maybe this suggestion is not relevant, but it should, at least, be addressed somewhere in the manuscript that such off-target effects are very unlikely or impossible, in that case, or more generally.

      We acknowledge the reviewer’s concern about potential off-target effect of the fas5 knockdown. We actually did check initially for off-target effects on the other four previously published fas genes in N. vitripennis (Lammers et al. 2019; Buellesbach et al. 2022) and did not find any effects on their respective expressions. We now include these results as supplementary data (Figure 2-figure supplement 1). However, as mentioned in the cover letter to the editor, we discovered a previously uncharacterized fas gene in the most recent N. vitripennis genome assembly (NC_045761.1), fas6, most likely constituting a tandem gene duplication of fas5. These two genes turned out to have such high sequence similarity (> 90 %, Figure 2-figure supplement 2) that both were simultaneously downregulated by our fas5 dsRNAi construct, which we confirmed with qPCR and now incorporated into our manuscript (Fig. 2H). Therefore, we now explicitly mention that the knockdown affects both genes, and either one or both could have the observed phenotypic effects. Recognizing this RNAi off-target effect, we have now also incorporated a discussion of this issue in the appropriate section of the manuscript (line 364-377), as well as the potential off-target effects of our GFP dsRNAi controls (line 262-274).

      The authors observe that the modified CHCs profiles of RNAi females reduce courtship and copulation attempts, but not antennation, by males toward live and (dead) dummy females. They show that the MB alkanes of the CHC profile are sufficient to elicit sexual behaviors from males towards dummy females and that the same fraction from extracts of fas5 knockdown females does so significantly less. From the previous data, it seems that dummy females with fas5 female's MB alkanes profile elicit more antennation than CHC-cleared dummy females, but the authors do not display data for this type of target on the figure for MB alkane behavioral experiments.

      Actually similar proportions of males performed antennation behavior towards female dummies with MB alkane fraction of fas5 RNAi females and CHC-cleared female dummies (55% and 50%, respectively, see Author response image 1 for the corresponding parts of the sub-figures 3 E and 4 D). We did not deem it necessary to show the same data on CHC-cleared female dummies in Figure 3 as well.

      Author response image 1.

      Unfortunately, the authors don't present experiments testing the effect of the non-MB alkanes fractions of the CHC extracts on male behavior toward females. As such, they are not able to (and didn't) conclude that the MB-alkane is necessary to trigger the sexual behaviors of males. I believe testing this would have significantly enhanced the significance of this work. I would also have found it interesting for the authors to comment on whether they observe aggressive behavior of males towards females (live or dead) and/or whether such behavior is expected or not in inter-individual interactions in parasitoids wasps.

      In our experiment, we focus on the function of the MB-alkane fraction in female CHC profiles, and we comprehensibly demonstrate in figure 4 that the MB-alkane fraction from WT females alone is sufficient to trigger mating behavior coherent with that on alive and untreated female dummies. Therefore, we do not completely understand the reviewer’s concern about us not being ” able to (and didn't) conclude that the MB-alkane is necessary to trigger the sexual behaviors of males”. We appreciate the suggestion from the reviewer of testing the non-MB alkanes (n-alkanes and n-alkenes). However, due to the experimental procedure of separating the CHC compound class fractions through elution with molecular sieves, it was not possible for us to retrieve either the whole n-alkane or n-alkene fraction remaining bound to the sieves after separation). The role of n-alkenes in N. vitripennis is however considered in the discussion, as a deterrent for homosexual interactions between males (Wang et al. 2022a). Moreover, we did not observe aggressive behavior of males towards live or dead females.

      CHCs are used by insects to signal and/or recognize various traits of targets of interest, including species or groups of origin, fertility, etc. The authors claim that their experiments show the sexual attractiveness of females can be encoded in the specific ratio of MB alkanes. While I understand how they come to this conclusion, I am somewhat concerned. The authors very quickly discuss their results in light of the literature about the role of CHCs (and notably MB alkanes) in various recognition behaviors in Hymenoptera, including conspecific recognition. Previous work (cited by the authors) has shown that males recognize males from females using an alkene (Z9C31). As such, it remains possible that the "sexual attractiveness" of N. vitripennis females for males relies on them not being males and being from the right species as well. The authors do not address the question of whether the CHCs (and the MB alkanes in particular) of females signal their sex or their species. While I acknowledge that responding to this question is beyond the scope of this work, I also strongly believe that it should be discussed in the manuscript. Otherwise, non-specialist readers would not be able to understand what I believe is one of the points that could temper the conclusions from this work.

      We acknowledge the reviewer’s insight about the MB alkanes in signaling sex or species in N. vitripennis, and now include this aspect in our revised discussion (line 324-330). Moreover, we clearly demonstrate that n-alkenes have been reduced to minute trace components after our compound class separation, and the males still do not display courtship and copulation behaviors similar to WT females, thus strongly indicating that the n-alkenes do not play a role when relying solely on the changed MB-alkane patterns, further strengthening our main argument.

      References

      Benjamini, Y. and D. Yekutieli. 2001. The control of the false discovery rate in multiple testing under dependency. Ann. Stat. 29:1165-1188.

      Buellesbach, J., J. Gadau, L. W. Beukeboom, F. Echinger, R. Raychoudhury, J. H. Werren, and T. Schmitt. 2013. Cuticular hydrocarbon divergence in the jewel wasp Nasonia: Evolutionary shifts in chemical communication channels? J. Evol. Biol. 26:2467-2478.

      Buellesbach, J., C. Greim, and T. Schmitt. 2014. Asymmetric interspecific mating behavior reflects incomplete prezygotic isolation in the jewel wasp genus Nasonia. Ethology 120:834-843.

      Buellesbach, J., H. Holze, L. Schrader, J. Liebig, T. Schmitt, J. Gadau, and O. Niehuis. 2022. Genetic and genomic architecture of species-specific cuticular hydrocarbon variation in parasitoid wasps. Proc. R. Soc. B 289:20220336.

      Engl, T., N. Eberl, C. Gorse, T. Krüger, T. H. P. Schmidt, R. Plarre, C. Adler, and M. Kaltenpoth. 2018. Ancient symbiosis confers desiccation resistance to stored grain pest beetles. Mol. Ecol. 27:2095-2108.

      Ferveur, J. F., J. Cortot, K. Rihani, M. Cobb, and C. Everaerts. 2018. Desiccation resistance: effect of cuticular hydrocarbons and water content in Drosophila melanogaster adults. Peerj 6.

      Lammers, M., K. Kraaijeveld, J. Mariën, and J. Ellers. 2019. Gene expression changes associated with the evolutionary loss of a metabolic trait: lack of lipogenesis in parasitoids. BMC Genom. 20:309.

      Mair, M. M., V. Kmezic, S. Huber, B. A. Pannebakker, and J. Ruther. 2017. The chemical basis of mate recognition in two parasitoid wasp species of the genus Nasonia. Entomol. Exp. Appl. 164:1-15.

      Wang, Y., W. Sun, S. Fleischmann, J. G. Millar, J. Ruther, and E. C. Verhulst. 2022a. Silencing Doublesex expression triggers three-level pheromonal feminization in Nasonia vitripennis males. Proc. R. Soc. B 289:20212002.

      Wang, Z., J. P. Receveur, J. Pu, H. Cong, C. Richards, M. Liang, and H. Chung. 2022b. Desiccation resistance differences in Drosophila species can be largely explained by variations in cuticular hydrocarbons. eLife 11:e80859.

    1. Author Response

      Reviewer #1 (Public Review):

      The work described herein would have an impact on the field in multiple ways. Firstly, it demonstrates a novel metabolic role for MSH in the regulation of hepatic cholesterol metabolism. This may prove to be a viable therapeutic strategy for the treatment of dyslipidemia. Furthermore, the authors demonstrate an alternative signaling cascade elicited by MSH independent of cAMP, but rather relying on AMPK. This novel interaction between AMPK and MC1R could have more widespread implications beyond the control of hepatic cholesterol metabolism.

      For the most part, the conclusions offered by the authors are supported by the data that is presented. There are, however, a number of concerns in the current version of this manuscript detailed below.

      We thank the reviewer for the encouraging and insightful comments, and we are pleased to read that the manuscript has raised considerable interest.

      1) The authors demonstrate the expression of MC1R in hepatocytes through IHC staining and western blot analysis. Furthermore, the authors show an alteration in systemic bile acid homeostasis in MC1R KO mice. However, no mention of MC1R expression or function in cholangiocytes is discussed. This is important to assess both experimentally and within the discussion given the profound role of the biliary epithelium in modulating bile acid homeostasis. Furthermore, in figure 1 the authors validate the MC1R knockdown only through mRNA expression. Given panels A and C of figure 1 shows there is clearly a functional antibody for MC1R, validation of protein knockdown is needed.

      The reviewer raises an important point, which we addressed by performing immunofluorescence staining using an antibody against the cholangiocyte marker cytokeratin 19 (CK-19). These colocalization studies demonstrate the presence of MC1-R in CK19-positive cholangiocytes (Figure 1-figure supplement 1). Furthermore, we have now added a discussion on the possible role of MC1-R in modulating bile acid homestasis in cholangiocytes (page 12, lines 456-462).<br /> We also quantified MC1-R protein expression by Western blotting in the liver of LMc1r-/- mice. MC1-R protein level was significantly reduced in L-Mc1r-/- mice compared to L-Mc1+/- mice (Figure 2-figure supplement 2).

      2) Figure 2 demonstrates a steatotic effect of MC1R knockdown in hepatocytes. The authors attempt to provide mechanistic insight into this phenomenon through assessing the mRNA expression of genes involved in cholesterol and fatty acid synthesis. The data provided is modest at the gene level and no protein validation was provided to demonstrate functional alterations of these proteins in MC1R KO mice. Key proteins proposed such as SREBP2 and HMGCR need to be validated via a western blot of IHC analysis.

      As requested by the reviewer, we quantified the expression of key proteins in the liver of L-Mc1r-/- mice by Western blotting. We observed that the protein levels of HMGCR and DHCR7 as well as the ratio between the mature and precursor forms of SREBP2 were reduced in L-Mc1r-/- mice (Figure 2F-H, page 6/lines 182-191 & page 10-11/lines 390-401). This is likely a result of the feedback regulation, whereby cholesterol accumulation suppresses the cleavage of SREBP2 and leads to a consequent downregulation of the key cholesterol synthesis enzymes such as HMGCR and DHCR7 (Brown S & Goldstein JL, Cell. 1997 May 2;89(3):331-40).

      We discussed in the original submission (page 11) as follows: ‘In the presence of excess cellular cholesterol, transcriptional induction and posttranslational activation of SREBP-2 should be attenuated, which in turn downregulates Hmgcr and Dhcr7 and reduces cholesterol synthesis as a counterregulatory mechanism. Therefore, given the increase in hepatic cholesterol content, it was unexpected that Srebp2 expression was upregulated in the liver of L-Mc1r-/- mice’. The finding of reduced SREBP2/HMGCR protein expression is thus more logical, but admittedly, it is discordant with increased Srebp2/Hmgcr mRNA expression (as reported in the original submission), which might be a compensatory response to suppressed SREBP2 cleavage. Taking into account that activation of MC1-R did not affect the protein expression of HMGCR or DHCR7 in HepG2 cells, it is plausible that hepatic cholesterol accumulation in L-Mc1r-/- mice is driven by a defect in bile acid metabolism, rather than by a direct effect of MC1-R signaling on cholesterol synthesis. To avoid unnecessary confusion, we decided to omit the qPCR data and related text parts from the manuscript and report the protein expression data instead.

      4) The authors suggest the involvement of AMPK in mediating the cholesterol-lowering effects of MSH. However, MSH is still able to lower free cholesterol levels even in the presence of an AMPK inhibitor. This suggests that MSH does not in fact rely on the activation of AMPK to elicit these cholesterol-lowering effects. The authors' conclusions are stronger than the actual data support. Furthermore, the authors claim LD211 phenocopies the effects of MSH in the presence of an AMPK inhibitor. However, the authors only measured the phosphorylation of Akt as their outcome. This begs the question, does LD211 still lower total cholesterol in the presence of AMPK inhibitors? This experiment is essential to conclude whether or not LD211 phenocopies the effects of MSH.

      The reviewer may have missed that we postulate in the manuscript that ‘MC1-R activation engages multiple signaling mechanisms to regulate cholesterol metabolism in HepG2 cells’ (manuscript page 8, lines 310-311 & page 13, lines 498508), since low concentration of a-MSH was still able to lower free cholesterol level in the presence of the AMPK inhibitor dorsomorphin. We have been careful not to claim that the effects of a-MSH are solely dependent on AMPK phosphorylation. Likewise, we have not claimed in the original submission that LD211 phenocopies the effects of MSH in the presence of an AMPK inhibitor. However, as suggested by the reviewer, we performed new experiments to investigate the effects of LD211 on cellular cholesterol levels in the absence and presence of dorsomorphin. We found that AMPK inhibition with dorsomorphin completely abolished the cholesterollowering effect of LD211 (Figure 7-figure supplement 2), which might indicate that this synthetic agonist has a stronger signaling bias toward the AMPK pathway compared to α-MSH.

      5) The authors initiate the project by showing high-fat diet disrupts the expression of MC1R. However, all of the subsequent experiments in hepatic MC1R KO mice are performed under normal chow. This begs the question of what is the phenotype of the hepatic MC1R KO mice fed a high-fat diet. Does KO of MC1R in the liver exacerbate HFD-induced obesity, glucose intolerance, and dyslipidemia? Inversely, can WT mice challenged with an HFD be rescued metabolically by treatment with either MSH or LD211? Providing data along these lines of investigation will provide physiological/clinical relevance to their findings.

      As suggested by the reviewer, we phenotyped the hepatic MC1R KO (LMc1r-/-) mice after feeding them a cholesterol- and fat-rich Western diet for 12 weeks (RD Western Diet, D12079B, Research Diets Inc, NJ, USA). This was exactly the same dietary regimen (product and duration) that was used to study the changes in hepatic MC1-R expression in wild-type C57Bl mice (Figure 1B&C). We observed that 12-week Western diet feeding induced a significant gain in body weight and total fat mass as well as an increase in plasma and hepatic cholesterol and TG levels (Figure 2-figure supplement 2). L-Mc1r-/- mice did not show a difference in body weight gain, but the weight gain was attributable to enhanced gain in fat mass and a blunted increase in lean mass compared to control Mc1rfl/fl mice (Figure 2-figure supplement 2A, D & E). Furthermore, liver weight and plasma cholesterol and TG concentrations were unchanged in HFD-fed L-Mc1r-/- mice (Figure 2-figure supplement 2B, C, F & G). Importantly, recapitulating the phenotype observed in chow-fed mice, hepatic cholesterol and TG content was significantly increased in LMc1r-/- mice after a HFD challenge (Figure 2-figure supplement 2H & I). Taken together, it appears that the phenotype of HFD-fed L-Mc1r-/- mice was slightly diluted compared to the phenotype observed in chow-fed L-Mc1r-/- mice. This phenotypic difference might relate to the finding that Western diet feeding reduced the hepatic expression of MC1-R, thus limiting the incremental effect of genetically induced MC1-R deficiency on hypercholesterolemia and hepatic lipid accumulation.

      We have previously studied the effects of pharmacological MC1-R activation in Western diet-fed mice and observed that chronic treatment with a selective MC1-R agonist reduced plasma cholesterol level and upregulated hepatic Ldlr expression without affecting body weight gain (Rinne P et al, Circulation. 2017 Jul 4;136(1):8397.). These findings are also discussed on manuscript page 12, lines 475-478. Although the selective MC1-R agonist was different in that particular study, it is expected that LD211 would also elicit a similar cholesterol-lowering effect in Western diet-fed mice. Chronic treatment with a-MSH, on the other hand, would likely produce wide-ranging metabolic effects. In addition to MC1-R activation in hepatocytes and its consequent effect on liver cholesterol metabolism, a-MSH would affect feeding, energy expenditure and cholesterol metabolism via MC4-R activation in the central nervous system as well as fatty acid and glucose metabolism via MC5-R activation in the skeletal muscle. Therefore, the phenotype associated with a-MSH treatment would be complex and mediated by multiple mechanisms and MC-R subtypes, thus making it difficult to interpret the exact contribution of hepatic MC1-R signaling to the observed phenotype.

      Reviewer #2 (Public Review):

      Keshav Thapa et al. investigated the role of melanocortin 1 receptor (MC1-R) in cholesterol and bile acid metabolism in the liver. First, they observed that MC1-R is present in the mouse liver and that its expression is reduced in response to a cholesterolrich diet. To determine the role of MC1-R in the liver, they generated hepatocyte-specific MC1-R KO mice (L-Mc1r-/-). These animals exhibited a significant increase in liver weight, lipid accumulation, triglycerides and cholesterol levels, and fibrosis in comparison with control mice. By performing liquid chromatography-mass spectrometry, the authors also found that L-Mc1r-/- mice also have fewer bile acids in the plasma and faeces, but not in the liver. In accordance with these findings, mRNA/protein expression of different genes involved in these processes were altered in L-Mc1r-/- animals.

      Secondly, in an attempt to evaluate the underlying mechanisms, they measured the expression of MC1-R in HepG2 cells under different treatments (i.e., palmitic acid, LDL, and atorvastatin). Moreover, they stimulated these cells with the endogenous MC1-R agonist - MSH, where they show that this molecule decreases the free cholesterol content, whereas increasing LDL and HDL uptake, as well as recapitulates some previously observed phenotypes in the proportions of bile acids. These effects were also encountered when using a selective agonist for MC1-R (i.e., LD211), further supporting the specific role of MC1-R. Finally, some experiments indicated that -MSH evokes not one single, but multiple intracellular signalling cascades for which MC1-R activation effects might take place.

      Overall, this work provides novel and interesting findings on the role of MC1-R in cholesterol and bile acid metabolism in the liver, which undoubtedly will have some crucial implications for future research. Nevertheless, some experimental details should be better explained for the correct interpretation of the data. Besides, discrepant results exist regarding the molecular mechanisms behind MC1-R action that requires additional experimentation to support the conclusions drawn.

      We thank the reviewer for the encouraging and insightful comments, and we are pleased to read that the manuscript has raised considerable interest.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors aim to understand the role of clonal heterogeneity of tumors in immunogenicity of clonally expressed antigens. This is a significant problem with many basic as well as translational implications.

      The strength of the manuscript lies in the novel demonstration that a poorly immunogenic tumor antigen, when paired with a stronger tumor antigen, begins to elicit significant immune response. The weakness lies in the fact that the actual mechanism of the key demonstration is never shown. There is a lot of speculation and tangential experimentation, but little actual evidence of a mechanism.

      By making the key observation (mentioned in the strength section in the previous paragraph), the authors did achieve their objective albeit very partially. Their observation is based on excellent experimental tools and design. This study will stimulate further experiments in this important field.

      Their key observation is somewhat reminiscent of the practice of conjugating small "non-immunogenic" antigens (such as some carbohydrates) to large protein carriers (such as serum albumin) in order to elicit strong antibody response to the weaker antigen. It is interesting to contemplate if the underlying mechanisms have any commonality.

      We thank the reviewer for their consideration of our work and their constructive feedback. We concur that our study has limitations and further work will be necessary to fully deconstruct the mechanism leading to the observed phenotype. We have revised the text to better reflect the aim and scope of our study. However, the goal of our work was to establish a trackable model that would allow us to model different, albeit limited, degrees of antigen expression patterns reflecting what is observed in patients with different levels of ITH. Our key observation reproduces what is observed clinically, adding strength to the model. Next, we wanted to study what was different about the induced immune responses to develop strategies to better treat tumors with heterogeneous NeoAg expression patterns that currently do not respond to checkpoint blockade therapy. Studying KP-HetHigh and KP-HetLow tumors revealed that tumor debris-carrying cDC1 draining from KP-HetLow tumors phagocytosed both NeoAgs. This population of cDC1, carrying both NeoAgs, had a more stimulatory phenotype compared to cDC1 without tumor debris or cDC1 that had engulfed only one NeoAg. We were able to develop a targeted therapy including CD40 agonism based on our key observations: KP-HetLow had a more robust response towards the weaker NeoAg which was associated with more stimulatory cDC1 presenting both NeoAgs compared to KP-HetHigh tumors. The stronger immune response increased responsiveness to CBT.

      The reviewer makes an interesting point about conjugate vaccines, which canonically elicit greater responses because they engage multiple immune cells, namely T cells with B cells, resulting in stronger antibody responses. The prevalence of tumor debris-carrying cDC1 with both neoantigens in KP-HetLow does make us consider that this population of cDC1 may be engaging multiple immune populations, i.e., different neoantigen-specific T cells. We suggest this as a possible mechanism for greater Aatf responses, but further work is necessary to determine if the same cDC1 can directly interact with both neoantigen-specific T cells.

      Reviewer #2 (Public Review):

      There are data to suggest that intratumour mutational heterogeneity (ITH; the proportion of all mutations that are found only within cancer subclones) is associated with worse therapeutic outcomes. Specifically, patients with more mutations (and thus neoantigens) mostly expressed by subclones (high ITH) have poorer responses to checkpoint immunotherapy. The authors set out to explore the mechanisms underlying this by studying 2 dimensions of neoantigen biology: firstly, distribution (clonal vs subclonal) and secondly, immunogenicity (weak vs strong binding to MHC class I). Using a panel of lung cancer cell lines modified to express individual or dual neoantigens in order to model clonal and subclonal expression, elegant studies show that clonal co-expression with a "strong" neoantigen can boost the immunogenicity of a "weak" neoantigen and result in tumour control. Mechanistically, this is related to engulfment of both neoantigens by cross presenting type 1 conventional dendritic cells and the associated enhanced activation state of this cell type. This is an interesting and potentially important finding that may be related to mechanisms of epitope spreading as immune responses diverge from targeting more to less immunogenic epitopes. Overall, the study is thought-provoking, informative in relation to how neoantigen immunogenicity is shaped and may have practical relevance.

      We greatly appreciate the constructive comments from the reviewer and their insightful comments and questions on our work. We have edited the text in response to their feedback. We believe these changes have made the writing clearer and more effectively communicates the scope of our study and our results to the reader.

    1. Author Response:

      We would like to thank the Editors and Reviewers for their positive evaluations, constructive comments, and for the opportunity to revise our manuscript. We feel that the comments and suggestions will further improve our manuscript.

      In the updated manuscript we aim to incorporate all suggested changes and considerations provided by the Reviewers. In particular, we will provide further information on the quality-control ratings per subfield, as suggested by Reviewer 1. Moreover, we will evaluate whether the training-related changes were specific to CA1-3, rather than just showing significant alterations in CA1-3 and not in the other subfields. Last, as suggested by Reviewer 2, we will additionally test for multivariate associations between hippocampal subfield structure and function, to further evaluate the specificity of hippocampal subfield change as a function of training and cortisol.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      This study is well presented and contains all the necessary experiments to support their claims. They made the interesting finding of an additional factor Dyn2. However, it is unclear whether it is present in the human complex. Hence, it would be interesting to see whether Dyn2 co-purifies when expressed with the other complex components in insect cells. Also, purification of a tagged complex from yeast would have indicated whether Dyn2 is part of the complex and whether other factors, like RBM15 or Hakai, present in humans are also present in yeast.

      We agree that Dyn2 subunit is an exciting new finding that is worth further investigation. The IP-MS experiments suggest that Dyn2 is subunit of the complex and that the Dyn2 interaction is mediated via Slz1. We also noticed a reduction in m6A levels (50%) in the dyn2 deletion mutant. What the function of Dyn2 is and whether it is conserved remains to be determined.

      Our IP-MS experiments with Mum2 identified the complex as described in the manuscript, however we did not find evidence of orthologs of RBM15 and Hakai. More follow up work is needed using in vivo and in vitro assays are needed to determine how m6A by the yeast MTC is regulated.

      P3 top: Although m6A is the most abundant internal methylation variant, it is far below the methylation levels of cap-adjacent nucleotides in mammalian mRNAs (PMID: 35970556 ).

      We have added the word “internal” to the first sentence of the introduction.

      A list of author contributions is missing.

      We have added this in the revised version.

      Reviewer #2 (Recommendations For The Authors):

      Most of the conclusions of this paper are well supported by data, and the text is clearly written and easy to read. Here are my suggestions and comments:

      1) In Fig.2, why not use LC-MS to measure m6A levels in Ygl036w, Dyn2, Pab1, Npl3 mutants, as in Fig.1?

      For measuring m6A levels, we use combination of LC-MS and m6A ELISA and m6A-seq2 throughout the manuscript. We used ELISA in the Fig2 because we had established this assay in the lab (Ensinck et al, RNA Journal, 2023). M6A-ELISA technique was more accessible and easier to execute compared to LC-MS. Additionally our collaborator for the LC-MS moved his lab to another country, which made it impractical to continue the use of LC-MS.

      2) The protein purification experiment described in Fig. 4D is informative. Can they include Dyn2 in the expression system as well?

      Thank you for the suggestion. Dyn2 was not the focus of the manuscript as Dyn2 has, at best, only a minor role in m6A deposition in vivo. We are also currently aiming to dissect how Dyn2 regulates m6A and the yeast MTC in follow up work. Hence we decided not to add more experiments on Dyn2 to the current manuscript.

      3) Among the MTC components identified in this study, Dyn2 is a new and interesting subunit. It was shown that in C. elegans Dlc1 is involved in stabilizing the m6A writer Mett10. I wonder if yeast has a homolog of C. elegans Mett10?

      As far as we know, there is no ortholog identified of Mett10 (METTL16 in mammals) in budding yeast.

      4) The authors have emphasized "the m6A dependent and independent functions"; however, this is only based on previous observations. Is it possible that the less severe phenotype associated with ime4 catalytic mutant is due to residual catalytic activity? I think the data presented in Fig. 5 tell us that Ime4 and other MTC subunits have no additional moonlighting function. It is not entirely clear to me what "the m6A-independent function" is.

      The observation that the yeast MTC complex has m6A dependent and independent function is based on the previous observations and the current work. In Agarwala et al 2012 PLOS Genetics, it was shown that mum2 and ime4 deletion mutants have more severe phenotype than slz1 deletion mutant or the catalytically inactive mutant of Ime4. We confirmed these observations in the revised manuscript (see Figure S5A and S5B). In this work, we showed that kar4 and vir1 deletion mutants have comparable delay in the onset of meiosis as mum2 and ime4 deletion mutants. Also, the MTC remains intact with absence of Slz1, but falls apart in ime4D, mum2D, vir1D or showed strongly reduced RNA binding (kar4 deletion mutant). Based on this we conclude that an m6A independent function of the MTC exists.

      We have included data demonstrating that the catalytically inactive mutant has no residual m6A and a milder meiotic phenotype compared to the ime4 deletion mutant (see Figure S5A and S5B).

      5) In Mum2-TEV-ProA IP (1B) and Kar4-TEV-ProA IP (S1A), Slz1 was not significantly enriched; however, in the repeated Mum2-TEV-ProA IP with/without RNAse (S1B, 4C), Slz1 was strongly enriched. Why are the Slz1 results so variable?

      This is an astute observation, for which we do not have a definitive answer. One possibility is that Slz1 is the only subunit that is induced during meiosis. It is possible that induction of Slz1 varied between the different IP-MS experiments, hence leading to variability in its association with the MTC complex.

      6) The last paragraph on page 11, "Collectively...", and the first paragraph on page 12, "Collectively...", seem redundant.

      We have removed the duplicated paragraph in the revised manuscript.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      MCM8 and MCM9 are paralogues of the eukaryotic MCM2-7 proteins. MCM2-7 form a heterohexameric complex to function as a replicative helicase while MCM8-9 form another hexameric helicase complex that may function in homologous recombination-mediated longtract gene conversion and/or break-induced replication. MCM2-7 complex is loaded during the low Cdk period by ORC, CDC6, and Cdt1, when the origin DNA may intrude into the central channel via the MCM2-MCM5 entry "gate". In the S phase, MCM2-7 complex is activated as CMG helicase with the help of CDC45 and GINS complex. On the other hand, it still remains unclear how MCM8-9 complex is loaded onto DNA and then activated.

      In this study, the authors first investigated the cryo-EM structure of chicken MCM8-9 (gMCM89) complex. Based on the data obtained, they suggest that the observed gMCM8-9 structure might represent the structure of a loading state with possible DNA entry "gate". The authors further investigated the cryo-EM structure of human MCM8-9 (hMCM8-9) complex in the presence of the activator protein, HROB, and compared the structure with that obtained without HROB1, which the authors published previously. As a result, they suggest that MCM8-9 complex may change the conformation upon HROB binding, leading to helicase activation. Furthermore, based on the structural analyses, they identified some important residues and motifs in MCM8-9 complex, mutations of which actually impaired the MCM8-9 activity in vitro and in vivo.

      Overall, the data presented would support the authors' conclusions and would be of wide interest for those working in the fields of DNA replication and repair. One caveat is that most of the structural data are shown only as ribbon model without showing the density map data obtained by cryo-EM, which makes accurate evaluation of the data somewhat difficult.

      We thank the reviewer for the positive comments on our work. For evaluating all the structural data, in our revised manuscript, we have presented the density maps of the cryo-EM structures of the gMCM8/9 complex in supplementary figure S5 and S6. In addition, the 3D cryo-EM map of the gMCM8/9 complex and the hMCM8/9 NTD ring have been deposited to the EMDB database with accession number EMD-32346 and EMD-33989, respectively. The corresponding atomic models have been deposited at the RSCB PDB under the accession code 7W7P and 7YOX, respectively. All these data have been released in May 2023.

      Reviewer #2 (Public Review):

      MCM8 and MCM9 together form a hexameric DNA helicase that is involved in homologous recombination (HR) for repairing DNA double-strand breaks. The authors have previously reported on the winged-helix structure of the MCM8 (Zeng et al. BBRC, 2020) and the Nterminal structure of MCM8/9 hexametric complex (MCM8/9-NTD) (Li et al. Structure, 2021). This manuscript reports the structure of a near-complete MCM8/9 complex and the conformational change of MCM8/9-NTD in the presence of its binding protein, HROB, as well as the residues important for its helicase activity.

      The presented data might potentially explain how MCM8/9 works as a helicase. However, additional studies are required to conclude this point because the presented MCM8/9 structure is not a DNA-bound form and HROB is not visible in the presented structural data. Taking into these accounts, this work will be of interest to biologists studying DNA transactions.

      A strength of this paper is that the authors revealed the near-complete MCM8/9 structure with 3.66A and 5.21A for the NTD and CTD, respectively (Figure 1). Additionally, the authors discovered a conformational change in the MCM8/9-NTD when HROB was included (Figure 4) and a flexible nature of MCM8/9-CTD (Figure S6 and Movie 1).

      The biochemical data that demonstrate the significance of the Ob-hp motif and the N-C linker for DNA helicase activity require careful interpretation (Figures 5 and 6). To support the conclusion, the authors should show that the mutant proteins form the hexamer without problems. Otherwise, it is conceivable that the mutant proteins are flawed in complex formation. If that is the case, the authors cannot conclude that these motifs are vital for the helicase function.

      A weakness of this paper is that the authors have already reported the structure of MCM8/9NTD utilizing human proteins (Li et al. Structure, 2021). Although they succeeded in revealing the high-resolution structure of MCM8/9-NTD with the chicken proteins in this study, the two structures are extremely comparable (Figure S2), and the interaction surfaces seem to be the same (Figure 2).

      Another weakness of this paper is that the presented data cannot fully elucidate the mechanistic insights into how MCM8/9 functions as a helicase for two reasons. 1) The presented structures solely depict DNA unbound forms. It is critical to reveal the structure of a DNA-bound form. 2) The MCM8/9 activator, HROB, is not visible in the structural data. Even though HROB caused a conformational change in MCM8/9-NTD, it is critical to visualize the structure of an MCM8/9HROB complex.

      We appreciate the reviewer’s comments on our work. Regarding the first weakness mentioned above, the previously reported cryo-EM structure of hMCM8/9 NTD ring was achieved with a resolution of 6.6 Å. At this level of resolution, we were only able to observe the overall shape of the structure and a partial representation of the protein's secondary structure. It is hard for us to discern any specific details regarding the interaction interface between MCM8 and MCM9. In this study, we solved the structure of gMCM8/9 NTD ring with a resolution of 3.67 Å. We believe that the higher resolution of gMCM8/9 NTD structure provides a significant advantage in analyzing the interaction surface between MCM8 and MCM9. This improved resolution has enabled us to gain valuable insights into the assembly mechanism of the MCM8/9 hexamer, representing a significant step forward in our understanding of the MCM8/9 helicase complex. In response to the second weakness raised by the reviewer, we fully agree with the reviewer that high-resolution structures of the MCM8/9 complex with DNA or HROB are necessary to elucidate the mechanism of this helicase complex. We are actively working towards obtaining these complex structures using cryo-EM and X-ray crystal diffraction.

      Moreover, we would like to address the reviewer's concern regarding the mutant proteins used in the in vitro helicase assays. We have conducted additional experiments to confirm that these mutant proteins do not impair the formation of the MCM8/9 hexamer. Specifically, we performed size exclusion chromatography (SEC) analyses of the wild-type (WT) MCM8/9 complex, as well as MCM8 and MCM9 mutant proteins (Author response image 1). The results demonstrated that all the proteins behaved consistently and displayed similar SEC profiles during the purification process. Notably, the N-C linker deletion mutant (hMCM8_Δ369-377+MCM9_Δ283-287) combining the MCM8 and MCM9 N-C linker deletions also behaved similarly with WT MCM8/9 (Author response image 2). These findings strongly suggest that the mutations in the OB-hps regions and the N-C linkers do not disrupt the hexamer formation of the MCM8/9 complex. Author response image 1 and Author response image 2 have been included into the supplementary figure S8 and S11, respectively.

      Author response image 1.

      SEC profiles of WT and OB-hps mutants of MCM8/9 complex.

      Author response image 2.

      SEC profiles of WT and N-C linker mutant of MCM8/9 complex.

      Reviewer #1 (Recommendations For The Authors):

      I would like to provide some suggestions to improve the manuscript.

      1) Throughout the manuscript, more density map data obtained by the cryo-EM should be shown for accurate evaluation of the data. For example, in Figure 1C, the authors state that inner channel of the gMCM8-9 hexamer is ~28 angstrom, apparently based on the ribbon model. This is not appropriate because the space upon ribbon model is not same as that upon the density map. For Figure 1B, they state that "The domain structures of gMCM8-9 fit well into their electron map". If so, please show the actual docking data. Also for Figure 2, the docking presentation between the side chains in the ribbon model and the density map should be shown.

      We sincerely appreciate the reviewer for the constructive suggestions. In addition to releasing our structural data in the EMDB and PDB, we have also followed the reviewer’s suggestions to included more density map data in the supplementary material. In fact, when calculating the dimeter of the inner channel of the MCM8/9 hexamer, we also measured that upon the density map (Author response image 3. A and B), which is consistent with our report in our manuscript. To further evaluate the structure of MCM8/9, we have included additional docking structures based on the density map (Author response image 3. C-F). Moreover, for Figure 2, more docking presentation are provided and the key residues involved in the hydrophobic interactions were highlighted in a bold manner (Author response image 4). Author response image 3 and Author response image 4 have been included into the supplementary figure S5 and S6, respectively.

      Author response image 3.

      The cryo-EM structure of gMCM8/9. (A and B) Reconstructed cryo-EM map of gMCM8/9. The diameter of the inner channel of MCM8/9 was measured at ~28 Å. (C-F) Representative regions of the cryo-EM structure of gMCM8/9 NTD are shown based on their density map. C, chain A (MCM9); D, chain B (MCM8); E, chain C (MCM9); F, chain D (MCM8).

      Author response image 4.

      Representative regions of the cryo-EM structure of gMCM8/9 NTD. (A and B), the region mediated hydrophobic interaction in figure 2B. A (MCM8), B (MCM9). (C and D), the region mediated hydrophobic interaction in figure 2C. C (MCM8), D (MCM9). The key residues were in bold.

      2) Figures 4, 5, and 6: For helicase assay, more detailed experimental conditions (e.g. concentrations of DNA substrates and proteins used) should be presented. In addition, it should be described how Flag-hMCM8-9 complex (Figure 4C) was purified.

      We sincerely appreciate the constructive suggestion provided by the reviewer. In the revised manuscript, we have included more experimental details in the helicase assays, including the concentrations of DNA substrates and proteins. The following paragraph describes the updated experimental procedure and also provided in the revise version of the manuscript.

      Helicase assays: To prepare the substrate, the oligonucleotide (5'(dT)40GTTTTCCCAGTCACGACG-TTGTAAAACGACGGCCAGTGCC-3') containing a 40 nt region complementary to the M13mp18(+) stand and a 40 nt oligo-dT at the 5′ end was labeled at the 3′ terminus with [α-32P] dCTP (Perkin Elmer) and annealed to the single-stranded DNA M13mp18 (24). 0.1 nM (in molecules) DNA substrates were respectively mixed with 5 µg recombinant MCM8/9 complex and its mutants as indicated within each 15 µl volume reaction in the helicase buffer (25 mM HEPES, pH 7.5, 1 mM magnesium acetate, 25 mM sodium acetate, pH 5.2, 4 mM ATP, 0.1 mg/ml BSA, 1 mM DTT). 2.5 µg HROB was used as an activator. To avoid re-annealing, the reaction was supplemented with a 100-fold unlabeled oligonucleotide. The reactions were then incubated at 37 °C for 60 min and stopped by adding 1 µl of stop buffer (0.4% SDS, 30 mM EDTA, and 6% glycerol) and 1µl of proteinase K (20 mg/ml, Sigma) into the reaction for another 10 min incubation at 37 °C. The products were separated by 15% polyacrylamide gel electrophoresis in 1× TBE buffer and analyzed by the Amersham typhoon (Cytiva).

      In addition, to describe the expression of Flag-hMCM8/9 complex in Figure 4C, we have included the Pull-Down Assay in the “Material and Methods” section. The description is as follow: The HEK293T cells transfected with Flag-hMCM8/9-FL or Flag-hMCM8/9-NTD were cultured overnight and washed twice with cold phosphate-buffered saline (PBS). Cell pellets were resuspended with lysis buffer (20 mM Tris, pH7.5, 150 mM NaCl, 5mM EDTA, 0.5% NP-40, 10% glycerol, protease inhibitor cocktail (Roche, 04693132001)). After incubation for 45 min at 4°C with gentle agitation, the whole-cell lysates were collected by centrifugation (12,000 × g for 15 min, at 4 °C). GST beads coupled with 2 μg GST-HROB or GST alone were then incubated with an equal volume of above HEK293T cell lysates at 4°C for 4h. The beads were washed four times with lysis buffer. Proteins bound to the beads were separated by SDS–PAGE and subsequently immunoblotted with anti-Flag antibody (Cytiva).

      3) Figure 3C: This is just an assumed model. Please clearly state it in the manuscript.

      We appreciate the reviewer’s comment. We guess the reviewer is referring to Figure 5C. As Figure 3C depicts the top view of the gMCM8/9 hexamer structurally aligned with the MCM2-7 double hexamer (wheat) by aligning their respective C-tier ring. On the other hand, Figure 5C represents an assumed model where we docked a forked DNA fragment into the central channel of the gMCM8/9 hexamer. To address this assumed model, we have made the following clarification in the revised manuscript: “We artificially docked a forked DNA into the central channel to generate a gMCM8/9-DNA model and found that the OB-hps of gMCM8 are capable to closely contact with it and insert their highly positively charged terminal loops into the major or minor grooves of the DNA strand, implying that they could be involved in substrate DNA processing and/or unwinding (Figure 5C)”.

      4) Figure S1, C and D: The coloring of the gMCM8-9 CTD appears to show higher resolution than the NTD. May this be mispresentation?

      We appreciate the reviewer's valuable feedback, and we have thoroughly re-evaluated Figure S1C and D. At the beginning, the local resolution distributions of the gMCM8/9 NTD and gMCM8/9 CTD were calculated using CryoSPARC. Upon re-examination, we found that the density maps of the gMCM8/9 CTD may be lower than 3.66 Å, because the density map of the gMCM8/9 CTD does not reveal more structural details than what is observed in the gMCM8/9 NTD. Thus, although the map shown in Figure S1D may appear to show a greater distribution of high-resolution regions., we would like to clarify that this discrepancy could be attributed to an optical illusion. We thank the reviewer for bringing this to our attention.

      5) Figure S9: Is the "mean resolution" 5.21 angstrom identical to the Gold standard FSC? If not, please estimate the resolution using FSC, like other maps in this paper.

      We thank the reviewer for the constructive suggestion. In response to this feedback, we would like to clarify the resolution estimation process for the gMCM8/9 CTD. Initially, we calculated the resolution of the gMCM8/9 CTD using the gold standard Fourier shell correlation (FSC) method, which yielded a resolution of 3.66 Å. However, upon further analysis, we identified an issue with the GSFSC Resolution curves, which led to an overestimation of the resolution based on the density map of the gMCM8/9 CTD. To ensure a more reliable and accurate estimation, we employed the Phenix software package to calculate the mean resolution during the refinement process of the gMCM8/9 CTD structure. The calculated mean resolution was determined to be 5.21 Å, which aligns more reasonably with the characteristics of the density map. To address any potential misunderstandings and provide clarity, we have explicitly labeled and described the evaluation process for this mean resolution in the "Single particle data processing" section of the Materials and Methods.

      Minor points:

      1) Throughout the manuscript, there are several typographical and grammatical errors, which should be corrected. For example, in "Introduction", "GNIS complex" should be "GINS complex".

      We thank the reviewer for pointing out the typographical and grammatical errors. We have corrected the grammar errors and polished our manuscript with the help of native speakers.

      Reviewer #2 (Recommendations For The Authors):

      1) "During HR repair, MCM8/9 was rapidly recruited to the DNA damage sites and colocalized with the recombinase Rad51 (21). It also interacted with the nuclease complex MRN (MRE11RAD50-NBS1) and was required for DNA resection at DSBs to facilitate the HR repair (Introduction)."

      There is a debate about whether MCM8/9-HROB colocalizes with RAD51 and whether it works upstream or downstream of RAD51 (Park et al. MCB, 2013; Lee et al. Nat Commun., 2015; Lutzmann et al. Mol Cell, 2012; Nishimura et al. Mol Cell, 2012; Natsume et al. G&D, 2017; Hustedt et al. G&D, 2019; Huang et al. Nat Commun., 2020).

      We completely agree with the reviewer that previous studies have reported contradictory results regarding to the function of MCM8/9 in homologous recombination. Based on the structure information of MCM8/9, now we do not have direct evidence to resolve the ongoing debate. Nonetheless, based on our findings, we speculate that the MCM8/9 complex is likely involved in multiple steps within the process of homologous recombination. The structural insights provided by our study serve as a foundation for further investigations and may contribute to a better understanding of the complex and multifaceted roles of MCM8/9 in homologous recombination repair.

      2) I noted that the BioRxiv version 1 (https://www.biorxiv.org/content/10.1101/2022.01.26.477944v1?versioned=true) contains a near-complete MCM8/9 with human protein based on the crystal analysis. Because its structure is comparable to chicken MCM8/9 revealed by cryo-EM, I highly suggest including this data in the manuscript.

      We would like to thank the reviewer for this suggestion. The resolution of the hMCM8/9 crystal structure presented in our previous BioRxiv version is 6.6 Å, which is a little low. Moreover, it cannot provide more information than the present cryo-EM structures of MCM8/9. We are dedicated to optimizing the crystal quality and implementing strategies to enhance the resolution of the structure. We hope to present an improved crystal structure of hMCM8/9 in our forthcoming article.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their insightful comments. The main issue raised by the reviewers was that because E6AP depletion reduced checkpoint signaling vis MASTL upregulation, this pathway is likely to be involved also in DNA damage checkpoint activation, in addition to checkpoint recovery. Hence, the proposed “timer”-like model is not fully supported. However, it is important to note that, the expression level of MASTL is not upregulated during the activation stage of the DNA damage checkpoint (unless E6AP is depleted). DNA damage signaling, via ATM-dependent E6AP phosphorylation, caused MASTL accumulation over time. This ultimately shifts the balance toward checkpoint recovery and cell cycle re-entry. As such, the role of MASTL (and E6AP-depletion) in suppressing DNA damage checkpoint is in harmony with the proposed role of MASTL upregulation in promoting checkpoint recovery. We have made additional clarifications about this point in the revised manuscript.

      We have also addressed other concerns raised by the reviewers, as explained in the point-to-point responses below. With the addition of new modifications and data, we believe the revised manuscript is complete and conclusive.

      Reviewer #1 (Public Review):

      In principle a very interesting story, in which the authors attempt to shed light on an intriguing and poorly understood phenomenon; the link between damage repair and cell cycle re-entry once a cell has suffered from DNA damage. The issue is highly relevant to our understanding of how genome stability is maintained or compromised when our genome is damaged. The authors present the intriguing conclusion that this is based on a timer, implying that the outcome of a damaging insult is somewhat of a lottery; if a cell can fix the damage within the allocated time provided by the "timer" it will maintain stability, if not then stability is compromised. If this conclusion can be supported by solid data, the paper would make a very important contribution to the field.

      However, the story in its present form suffers from a number of major gaps that will need to be addressed before we can conclude that MASTL is the "timer" that is proposed here. The primary concern being that altered MASTL regulation seems to be doing much more than simply acting as a timer in control of recovery after DNA damage. There is data presented to suggest that MASTL directly controls checkpoint activation, which is very different from acting as a timer. The authors conclude on page 8 "E6AP promoted DNA damage checkpoint signaling by counteracting MASTL", but in the abstract the conclusion is "E6AP depletion promoted cell cycle recovery from the DNA damage checkpoint, in a MASTL-dependent manner". These 2 conclusions are definitely not in alignment. Do E6AP/MASTL control checkpoint signaling or do they control recovery, which is it?<br /> Also, there is data presented that suggest that MASTL does more than just controlling mitotic entry after DNA damage, while the conclusions of the paper are entirely based on the assumption that MASTL merely acts as a driver of mitotic entry, with E6AP in control of its levels. This issue will need to be resolved.

      We thank the reviewer for his/her insightful comments. The main issue raised by the reviewers was that because E6AP depletion reduced checkpoint signaling vis MASTL upregulation, this pathway is likely to be involved also in DNA damage checkpoint activation, in addition to checkpoint recovery. Hence, the proposed “timer”-like model is not fully supported. However, it is important to note that, the expression level of MASTL is not upregulated during the activation stage of the DNA damage checkpoint (unless E6AP is depleted). DNA damage signaling, via ATM-dependent E6AP phosphorylation, caused MASTL accumulation over time. This ultimately shifts the balance toward checkpoint recovery and cell cycle re-entry. As such, the role of MASTL (and E6AP-depletion) in suppressing DNA damage checkpoint is in harmony with the proposed role of MASTL upregulation in promoting checkpoint recovery. We have made additional clarifications about this point in the revised manuscript.

      As suggested by the reviewer, we have rephrased the statement in abstract to “E6AP depletion reduced DNA damage signaling, and promoted cell cycle recovery from the DNA damage checkpoint, in a MASTLdependent manner”.

      As a mitotic kinase, MASTL promotes mitotic entry and progression. This is well in line with our findings that DNA damage-induced MASTL upregulation promotes cell cycle re-entry into mitosis. MASTL upregulation could also inhibit DNA damage signaling. This manner of feedback, inhibitory, modulation of DNA damage signaling by mitotic kinases (e.g., PLK1, CDK) has been implicated in previous studies (reviewed in Cell & Bioscience volume 3, Article number: 20 (2013)). In the revised manuscript, we have included more discussions about this aspect of checkpoint regulation.

      Finally, the authors have shown some very compelling data on the phosphorylation of E6AP by ATM/ATR, and its role in the DNA damage response. But the time resolution of these effects in relation to arrest and recovery have not been addressed.

      Detailed time point information is now added in the figure legends for E6AP phosphorylation data. We were able to observe this event during early stages (e.g., 1 hr, or 2-4 hr) of the DNA damage response, prior to significant MASTL protein accumulation.

      Reviewer #2 (Public Review):

      This is an interesting study from Admin Peng's laboratory that builds on previous work by the PI implicating Greatwall Kinase (the mammalian gene is called MASTL) in checkpoint recovery.

      The main claims of this study are:

      1) Greatwall stability is regulated by the E6-AP ubiquitin ligase and this is inhibited following DNA damage in an ATM dependent manner.

      2) Greatwall directly interacts with E6-AP and this interaction is suppressed by ATM dependent phosphorylation of E6-AP on S218

      3) E6-AP mediates Greatwall stability directly via ubiqitylation

      4) E6-AP knock out cells show reduced ATM/ATR activation and quicker checkpoint recovery following ETO and HU treatment

      5) Greatwall mediated checkpoint recovery via increased phosphorylation of Cdk substrates

      In my opinion, there are several interesting findings presented here but the overall model for a role of the E6-AP -Greatwall axis is not fully supported by the current data and will require further work. Moreover, there are a number of technical issues making it difficult to assess and interpret the presented data.

      Major points:

      1) The notion that Greatwall is indeed required for checkpoint recovery hinges on two experiments shown in Figures 5A and B where Greatwall depletion blocks the accumulation of HELA cells in mitosis following recovery from ETO treatment and in G2/M following release from HU. An alternative possibility to the direct involvement of Greatwall in checkpoint recovery could be that Greatwall in HeLA cells is required for S-phase progression (as for example Charrasse et al. suggested). A simple control would be to monitor the accumulation of mitotic cells by microscopy or FACS following Greatwall depletion without any further checkpoint activation.

      We thank the reviewer for his/her insightful comments.

      Charrasse et al. showed ENSA knockout prolonged, but not stopped the progression of S-phase. In our experiments, MASTL (partial) knockdown did not significantly impact HeLa cells proliferation in the absence of DNA damage (Fig. 5, supplemental 1A). The reported role of MASTL in checkpoint recovery was consistently seen in response to various drugs, including etoposide which typically induces G2 arrest. Thus, we do not believe a prolonged S-phase accounts for the checkpoint recovery phenotype.

      2) The changes in protein levels of Greatwall and the effects of E6-AP on Greatwall stability are rather subtle and depend mostly on a qualitative assessment of western blots. Where quantifications have been made (Figures 2D and 4F) the loading control and the starting conditions for Greatwall (0 timepoints in the right panel) appear saturated making precise quantification impossible. I would argue that the authors should at least quantify the immuno-blots that led them to conclude on changes in Greatwall levels and make sure that the exposure times used are in the dynamic range of the camera (or film). A more precise experiment would be to use the exogenously expressed CFP-Greatwall that is described in Figure 6 and measure the acute changes in protein levels using quantitative fluorescence microscopy in live cells. This is, in my opinion, a lot more trustworthy than quantitative immuno-blots.

      I also note here that most experiments linking Greatwall levels to E6-AP were done using siRNA, while the E6-AP ko cells would be a more reliable background for these experiments, especially with reconstituted controls.

      DNA damage-induced MASTL upregulation was observed in various cell lines and after different treatments. To further strengthen this point, as suggested by the reviewer, we have included quantification of fluorescent measurements (Fig. 2, supplemental 1 A-C). Quantification of immunoblots for MASTL upregulation was also added in Fig. 1, supplemental 1E. The effects of E6AP depletion were consistently shown for both siRNA and stable KO.

      3) This study has no data linking the effects of Greatwall to its canonical target PP2A:B55. The model shown in Figure 9 is therefore highly speculative. The possibility that Greatwall could act independently of PP2A:B55 should at least be considered in the discussion given the lack of experimental evidence.

      The role of MASTL in promoting cell cycle progression via suppressing PP2A/B55 has been well established. As suggested by the reviewer, we have included discussions to acknowledge that “The role of MASTL upregulation in promoting checkpoint recovery and cell cycle progression can be attributed to inhibition of PP2A/B55, although the potential involvement of additional mechanisms is not excluded”.

      4) The major effect of E6-AP depletion on the checkpoint appears to be a striking reduction in ATM/ATR activation, suggesting that this ubiquitin ligase is involved in checkpoint activation rather than recovery. It is not clear if this phenotype is dependent on Greatwall. If so it would be hard to reconcile with the default model that E6-AP acts via the destabilisation of Greatwall. In the permanent absence of E6-AP, increased Greatwall levels should inactivate B55:PP2A. How would this lead to a decrease in ATM/ATR activation? This is unlikely, and indeed Figure 5E shows that the reduction of MASTL in parallel to E6-AP does not result in elevated levels of ATR/ATM activation. Conversely, the S215A E6-AP mutant does have a strong rescue impact on ATR/ATM (Figure 8D).

      We do not propose that PP2A/B55 directly dephosphorylates ATM/ATR-mediated phosphorylation. In fact, PP2A/B55 dephosphorylates and inactivates mitotic kinases and substrates which can feedback inhibit DNA damage checkpoint signaling (as previously shown for PLK1 and CDK). We included in a discussion about this point in the revised manuscript.<br /> The point regarding checkpoint activation vs recovery is addressed below (point 5).

      5) In summary, I do not think that the presented experiments clearly dissect the involvement of E6-AP and Greatwall in checkpoint activation and recovery. E6-AP depletion has a strong effect on checkpoint activation while Greatwall depletion is likely to have various checkpoint-independent effects on cell cycle progression.

      It is important to note that, the expression level of MASTL is not upregulated during the activation stage of the DNA damage checkpoint (unless E6AP is depleted). DNA damage signaling, via ATM-dependent E6AP phosphorylation, caused MASTL accumulation over time. This ultimately shifts the balance toward checkpoint recovery and cell cycle re-entry. As such, the role of MASTL (and E6APdepletion) in suppressing DNA damage checkpoint is in harmony with the proposed role of MASTL upregulation in promoting checkpoint recovery. We have made additional clarifications about this point in the revised manuscript.

      Reviewer #3 (Public Review):

      In this manuscript, Li et al. describe the contribution of the ATM-E6AP-MASTL pathway in recovery from DNA damage. Different types of DNA damage trigger an increase in protein levels of mitotic kinase MASTL, also called Greatwall, caused by increased protein stability. The authors identify E3 ligase E6AP to regulate MASTL protein levels. Depletion or knockout of E6AP increases MASTL protein levels, whereas overexpression of E6AP leads to lower MASTL levels. E6AP and MASTL were suggested to interact in conditions without damage and this interaction is abrogated after DNA damage. E6AP was shown to be phosphorylated upon DNA damage on Ser218 and a phosphomimicking mutant does not interact with MASTL. Stabilization of MASTL was hypothesized to be important for recovery of the cell cycle/mitosis after DNA damage.

      The identification of this novel pathway involving ATM and E6AP in MASTL regulation in the DNA damage response is interesting. However, is surprising that authors state that not a lot is known about DNA damage recovery while Greatwall and MASTL have been described to be involved in DNA damage (checkpoint) recovery. In addition, PP2A, a phosphatase downstream of MASTL is a known mediator of checkpoint recovery, in addition to other proteins like Plk1 and Claspin. Although some of the publications regarding these known mediators of DNA damage recovery are mentioned, the discussion regarding the relationship to the data in this manuscript are very limited.

      We thank the reviewer for his/her insightful comments. As suggested, the previously reported role of PLK1 and other cell cycle kinases in DNA damage checkpoint recovery is discussed in more details in the revised manuscript. As for PP2A/B55, we do not think it promotes checkpoint recovery, e.g., by dephosphorylating ATM/ATR or their substrates. Instead, this phosphatase dephosphorylates cell cycle kinases or their substrates, such as CDK1 or PLK1.

      The regulation of MASTL stability by E6AP is novel, although the data regarding this regulation and the interaction are not entirely convincing. In addition, several experiments presented in this paper suggest that E6AP is (additionally) involved in checkpoint signalling/activation, whereas the activation of the G2 DNA damage checkpoint was described to be independent of MASTL. Has E6AP multiple functions in the DNA damage response or is ATM-E6AP-MASTL regulation not as straightforward as presented here?

      Altogether, in my opinion, not all conclusions of the manuscript are fully supported by the data.

      We showed that E6AP depletion reduced checkpoint signaling vis MASTL upregulation, so this pathway is likely to be involved also in DNA damage checkpoint activation, in addition to checkpoint recovery. However, it is important to note that, the expression level of MASTL is not upregulated during the activation stage of the DNA damage checkpoint (unless E6AP is depleted). DNA damage signaling, via ATM-dependent E6AP phosphorylation, caused MASTL accumulation over time. This ultimately shifts the balance toward checkpoint recovery and cell cycle re-entry. As such, the role of MASTL (and E6APdepletion) in suppressing DNA damage checkpoint is in harmony with the proposed role of MASTL upregulation in promoting checkpoint recovery. We have made additional clarifications about this point in the revised manuscript.

      Reviewer #1 (Recommendations For The Authors):

      In principle a very interesting story, that attempts to shed light on an intriguing and poorly understood phenomenon; the link between damage repair and cell cycle re-entry once a cell has suffered from DNA damage. The issue is highly relevant to our understanding of how genome stability is maintained or compromised when our genome is damaged. The authors present the intriguing conclusion that this is based on a timer, implying that the outcome of a damaging insult is somewhat of a lottery; if a cell can fix the damage within the allocated time it will maintain stability, if not then stability is compromised. However, the story in its present form suffers from a number of major gaps that will need to be addressed

      Major point:

      My primary concern regarding the main conclusion is that altered MASTL regulation seems to be doing much more than simply promoting more rapid recovery after DNA damage. This concern comes from the following gaps that I noted whilst reading the paper:

      • Knock out of E6AP, is leading to a dramatic inhibition of ATM/ATR activation after damage (Fig.5C,D,E), this is (partially) rescued by co-depletion of MASTL (Fig5E). The authors will have to show that the primary effect of altered MASTL regulation is improved recovery, rather than reduced checkpoint activation. In other words, is initial checkpoint activation in cells that have lost E6AP normal, or do these cells fail to mount a proper checkpoint response? If the latter is true, that could completely alter the take home-message of this paper, because it could mean that E6AP/MASTL do not act as a "timer", but as a "tuner" to set checkpoint strength at the start of the DNA damage response. The authors themselves conclude on page 8 "E6AP promoted DNA damage checkpoint signaling by counteracting MASTL", but in the abstract the conclusion is "E6AP depletion promoted cell cycle recovery from the DNA damage checkpoint, in a MASTL-dependent manner". These 2 conclusions are definitely not in alignment, do E6AP/MASTL control checkpoint signaling or do they control recovery?

      The expression level of MASTL is not upregulated during the activation stage of the DNA damage checkpoint (unless E6AP is depleted). DNA damage signaling, via ATM-dependent E6AP phosphorylation, caused MASTL accumulation over time. This ultimately shifts the balance toward checkpoint recovery and cell cycle re-entry. As such, the role of MASTL (and E6AP-depletion) in suppressing DNA damage checkpoint is in harmony with the proposed role of MASTL upregulation in promoting checkpoint recovery. We have made additional clarifications about this point in the revised manuscript. We have also made clarification to the statement indicated by the reviewer.

      • MASTL KD has a rather unexpected effect on cell cycle progression after HU synchronization (Fig.5B). It seems that the MASTL KD cells fail to exit from the HU-imposed G1/S arrest, an effect that is not rescued in the E6AP knock-outs. Inversely, E6AP knock-outs seem to more readily exit from the HU-imposed arrest, an effect that is completely lost after knock-down of MASTL. How do the authors interpret these results? Their conclusions are entirely based on a role for MASTL as a driver of mitotic entry, with E6AP in control of its levels, but this experiment suggests that MASTL and E6AP are controlling very different aspects of cell cycle control in their system.

      As the reviewer pointed out, our data in checkpoint signaling and cell cycle progression suggested that MASTL upregulation could also inhibit DNA damage signaling, in addition to promoting cell cycle progression. This manner of feedback, inhibitory, modulation of DNA damage signaling by mitotic kinases (e.g., PLK1, CDK) has been implicated in previous studies (reviewed in Cell & Bioscience volume 3, Article number: 20 (2013)). In the revised manuscript, we have included discussions about this aspect of checkpoint regulation.

      • It is not possible to evaluate the validity of the conclusions that are based on Figure 6. We need to know how long the cells were treated with HU to disrupt the interaction between E6AP and MASTL. Is the timing of this in the range of the timing of MASTL increase after damage? A time course experiment is required here.

      • The data obtained on E6AP-S218 phosphorylation and with the S218A mutant during damage and recovery look very promising. But again, the release from HU is confusing me as to what to conclude from them. Also, the authors should show how S218A expression affects MASTL levels (before and after damage). Also, a time course of ATM/ATR activation is required to decide if initial or late ATM/ATR signaling is affected.

      Detailed time point information is now added in the figure legends for E6AP phosphorylation and E6AP-MASTL dissociation data. We were able to observe these events during early stages (e.g., 1 hr, or 2-4 hr) of the DNA damage response, prior to significant MASTL protein accumulation.

      • The conclusion that "and was not likely to be caused by the completion of DNA repair, as judged by the phosphorylation of replication protein A" (page 5) is based on western blots that represent the average across the entire population. It is possible that MASTL expression is still low in the cells that have not completed repair, while it's increase on blots comes from a subset of cells where repair is complete. The authors should perform immunofluorescence so that expression levels of MASTL can be directly compared to levels of phospho-RPA in individual cells. In fact, the manuscript could benefit a lot from a more in-depth single-cell (microscopy)-based analysis of the relations over time between ATM/ATR activation, E6AP phosphorylation, MASTL stabilization versus the checkpoint arrest and subsequent recovery.

      Time point analyses were provided for DNA damage-induced RPA phosphorylation and ATM/ATR substrate phosphorylation (Fig. 1). These data showed MASTL accumulation in the presence of active DNA damage checkpoint signaling. To further strengthen this point, as suggested by the reviewer, we have included quantification of fluorescent measurements (Fig. 2, supplemental 1 A-C). IF data showed MASTL upregulation in correlation with ATM/ATR activation.

      Minor points:

      It's not "ionized radiation", but "ionizing radiation" (page 5)

      We have made the correction as pointed out by the reviewer.

      Expression levels of MASTL should be quantified over time after DNA damage. In some of the experiments the increase seems to plateau relatively quick (HU treatment, fig 1B, 1-2 hours), while in others the levels continue to increase over longer periods (HU treatment, fig 1D, 6 hours). This is relevant to the timer function of MASTL that is proposed here.

      The kinetics of MASTL upregulation is generally consistent among all cell lines. As suggested, quantification of immunoblots is provided (Fig. 1, supplemental 1E); additional quantification of IF signals is also included (Fig. 2, supplemental 1 A-C).

      The experiment executed with caffeine (page 5) should be repeated with more selective/potent ATM/ATR inhibitors that are commercially available.

      Specific ATM inhibitor was used to confirm the caffeine result in Fig. 7 supplemental 1B&C.

      "a potential binding pattern" (page 6) should be "a potential binding partner"

      We have made the correction as pointed out by the reviewer.

      Reviewer #2 (Recommendations For The Authors):

      1) All western blots require size markers. The FACS blots shown do not have any axis labels.

      We have included size markers for blots, at the first appearance of each antibody. Labels are added for FACS blots.

      2) The quantification of mitotic cells does not indicate how many cells were counted and if this was done by eye or using software.

      The missing experimental information is included in the figure legends, as suggested.

      3) The western blots demonstrating ubiquitylation of Greatwall (Figure 4D) are of very poor quality and impossible to interpret.

      The ubiquitination of MASTL did not show clear ladders, possibly due to its relative protein size.

      Reviewer #3 (Recommendations For The Authors):

      Specific suggestions to improve the manuscript:

      1) Include literature regarding known mediators of DNA damage checkpoint recovery, including MASTL/Greatwall and PP2A, in the manuscript and discuss the observations from this manuscript in relationship with the literature.

      Related literatures are included in the discussion.

      2) The increase in MASTL protein levels upon DNA damage are not always clear, for example Fig. 1A. The same for MASTL stability after DNA damage, such as in Fig. 2C. Quantification of the westerns would help demonstrating a significant effect.

      As suggested by the reviewer, we have included quantification of fluorescent measurements (Fig. 2, supplemental 1 A-C). Quantification of immunoblots for MASTL upregulation was also added in Fig. 1, supplemental 1E.

      3) The E6AP-MASTL in vitro interaction studies shown in Fig. 3 raise doubts. First, beads only are used as negative control, whereas MBP only-beads are a better control. The westerns in top panels of 3B (MASTL), 3C (GST-MASTL) and 3D (MASTL) should be improved. In addition, in Fig. 3C, different GSTMASTL fragments are used in an MBP-E6AP pull down, but the GST-MASTL input does not show any specific band to demonstrate that these fragments are correct. The same for the GFP-E6AP fragments in Fig. 3 Suppl. 1C The input does not show any proteins, there is no N fragment present in the IP and the size of the fragment N3 in the IP GFP does not seem correct.

      Altogether, it makes me doubt that the interaction between E6AP and MASTL is direct. Better data with appropriate controls should show whether the interaction is direct or mediated via another protein.

      Purified proteins used for the in vitro interaction had significant degradation, causing many bands in the input. We included a lighter exposure of the input here as Author response image 1. MBP alone did not bind MASTL, as both M and C segments of MASTL were MBP-tagged, and did not pull down MASTL. We agree with the reviewer that our direct interaction data showed rather weak MASTL/E6AP interaction, suggesting the interaction is dynamic or possibly mediated by additional binding proteins. We have included this statement in the revised manuscript “Taken together, our data characterized MASTL-E6AP association which was likely mediated via direct protein interaction, although the potential involvement of additional binding partners was not excluded”.

      Author response image 1.

      4) Fig. 4B. Overexpression of HA-E6AP results in a decrease in MASTL protein levels. Can this effect be rescued by treatment with proteasome inhibitor MG132?

      As expected, MG132 stabilized MASTL, with or without E6AP overexpression. We have added this new data in Fig. 4, supplemental 1B.

      5) Fig. 4G. MASTL interacts with HA-ubiquitin in WT, but not E6AP KO cells. These cells are treated with MG132, so if E6AP really ubiquitinates MASTL, I would expect MASTL to be polyubiquitinated. However, the "interaction signal" does not show polyubiquitination. In fact, this band actually runs lower than MASTL in input samples, which even could be an artifact. Please explain.

      The ubiquitination of MASTL did not show clear ladders, possibly due to its relative protein size. As the reviewer noted, the band position in the HA-Ub IP lanes seemed slightly shifted, compared to the input. We have noticed in many experiments that bands in the IP lanes did not perfectly align with the input lanes.

      6) The DNA damage recovery experiments measuring mitotic index after washing off etoposide (Fig. 5A and Fig. 8A): What are the time points taken? And importantly, why are there no error bars on these intermediate time points, but only on the 4 hour time point?

      As suggested, time point information and additional error bars are included.

      7) Fig. 5E. According to the authors, depletion of MASTL rescues the effect of KO of E6AP. However, no increase in pATM/ATR substrate signal is seen upon etoposide treatment in these samples so I am not convinced this experiment demonstrates a rescue.

      The rescue was evident, especially for many high molecular weight bands which were more effectively detected by this phospho-specific antibody.

      8) Fig. 5C and 8D strongly suggest that E6AP is involved in checkpoint activation. How do these data relate to DNA damage recovery? Is the recovery in E6AP KO cells faster as a consequence of reduced checkpoint signaling or is the recovery effect really specific by stabilization of MASTL? These data should be explained, also taken the data from Wong et al. (Sci. Rep. 2016) into account, that demonstrate that G2 checkpoint activation is independent of MASTL.

      The expression level of MASTL is not upregulated during the activation stage of the DNA damage checkpoint (unless E6AP is depleted). DNA damage signaling, via ATM-dependent E6AP phosphorylation, caused MASTL accumulation over time. This ultimately shifts the balance toward checkpoint recovery and cell cycle re-entry. As such, the role of MASTL (and E6AP-depletion) in suppressing DNA damage checkpoint is in harmony with the proposed role of MASTL upregulation in promoting checkpoint recovery. We have made additional clarifications about this point in the revised manuscript.

      9) The model presented in Fig. 9 is puzzling because there does not seem to be a difference between phosphorylation of E6AP and the interaction with MASTL on early versus late times after DNA damage. And this exactly is what is missing in the manuscript: A more detailed evaluation of the timing of E6APSer218 phosphorylation and the E6AP-MASTL interaction in response to DNA damage.

      More clarification is given to explain this model in the figure legend of Fig. 9.<br /> Time point analyses were provided for DNA damage-induced RPA phosphorylation and ATM/ATR substrate phosphorylation (Fig. 1). These data showed MASTL accumulation in the presence of active DNA damage checkpoint signaling. To further strengthen this point, we have included quantification of fluorescent measurements (Fig. 2, supplemental 1 A-C). IF data showed MASTL upregulation in correlation with ATM/ATR activation. Time point information was also added for Ser-218 phosphorylation and MASTL-ENSA dissociation which were observed in early stages of the DNA damage response (1 hr, or 2-4 hr).

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Membrane receptor guanylyl cyclases are important for many physiological processes but their structures in full-length and their mechanism are poorly understood. Caveney et al. determined the cryo-EM structure of a highly engineered GC-C in a complex with endogenous HSP90 and CDC37. The structural work is solid and the structural information will be useful for the membrane receptor guanylyl cyclases field and the HSP90 field. However, a detailed characterization of the protein sample is lacking. Moreover, the physiological significance of this structure is not fully exploited by supporting experiments and the mechanistic insight is currently limited.

      We thank Reviewer #1 for constructive reviews and agree that this work forms the basis for future exploration by the guanylyl cyclase and HSP90 fields.

      1) The characterization of the protein sample is lacking. SDS-PAGE would be useful to identify potential proteolysis, leading to the dissociation of GC dimer. Further size-exclusion chromatography would be helpful to estimate the molecular weight of the complex and to determine if only GC-C monomer is purified.

      We have included a representative SDS-PAGE gel in our revised version of the manuscript (Figure 1—figure supplement 1). While we agree that SEC could be beneficial to further explore the stoichiometry of the imaged sample, we see no significant degradation of the guanylyl cyclase via SDS-PAGE, and therefore believe that the zippered construct would remain dimeric. Relatively poor yields of these samples precluded further exploration in this regard.

      2) The orientation distribution of the particles is not homogenous in Fig. S1D. It would be helpful to present the 3DFSC curve to evaluate the effect of preferred orientation on the reconstruction.

      While the orientational distribution is not perfectly uniform, the provided angles allowed for sufficient reconstruction of maps with no notable anisotropy. We have included 3DFSC curves in our revised version of Figure 1—figure supplement 1.

      3) Description of protein expression details is lacking. Did the author use transient transfection, stable cell line or virus-mediated transduction?

      We have clarified that these cells were expressed using transiently transfected ExpiCHO cells.

      4) HSP90 binds ATP and is often co-purified with endogenous ATP/ADP. Is there ATP or ADP present in the sample/cryo-EM maps? Is the conformation of NBD similar to ATP-bound HSP90? The author needs to include the description/figures about the nucleotide state of HSP90.

      There is clear density for present nucleotide in our reconstruction. Given the mechanistic role for ATP turnover in the release of HSP90 client (Young, Hartl, 2000 – PMID 11060043) and the resolved density, we believe the identity for this nucleotide is ATP. We have added comment to this regard in the revised manuscript: “…the C2 pseudosymmetric, ATP bound, closed state Hsp90 dimer.”

      5) The catalytic domains of GC have to be dimerized to perform cyclase function. The presence of only one GC-PK monomer in the cryo-EM structure indicates the structure does not represent an active state of GC. These results suggest the GC expressed in this way is not functional. The authors need to explain why most of the GC protein is trapped in this inactive form.

      Indeed, we do believe that this regulatory state is non-functional, as observed for active kinases. We have clarified this in the revised manuscript: “In addition, this disruption of the native state of GC-C, as observed in our structure, would likely leave GC domains out of each other’s proximity, precluding their catalytic activity while Hsp90 is bound.”

      6) The GC-C construct used here is a highly engineered "artificial" construct, which has not been fully characterized in this work. Does this construct have similar activity as the activated wt GC-C? Does the protein (this engineered construct) expressed in CHO cells show activity?

      While our original goal in developing this construct was to create an imageable construct that was locked in the active state, our current interpretation of the data is that the leucine-zipper induced, putative active geometry leads to the majority of this construct falling into the regulatory state with HSP90 binding. We make no claim to have resolved an active conformation in this work, yet believe that this state is of note due to the previously unresolved nature of these regulatory complexes for guanylyl cyclase receptors.

      7) Are the residues on the interface between GC and HSP conserved in other members of membrane receptor guanylyl cyclases? Would mutations on this interface affect the activity of GC?

      Given the role this structure plays in our understanding that HSP90 client recruitment is largely not driven by specific residue interactions and the ~30% identity of GC-C to NPR-A and NPR-B, we do not believe that mutations that do not significantly change the stability or fold of the PK domain would significantly modify recruitment to HSP.

      8) The authors propose that targeting HSP90 would tune the activity of GC. Is there any experimental data supporting this idea?

      Based on the work of Kumar et al., 2001 (PMID 11152473), we do believe that there is a functional link between HSP90 recruitment and GC activity. We hope that this work will spark further exploration of these concepts.

      9) The model in Fig. S3 is largely speculative due to the lack of supporting functional data. In addition, it would be better to change the title to "structure of the protein kinase domain of guanylyl cyclase receptor in complex with HSP90 and cdc37" because the mechanistic insight is limited.

      We agree that our supplemental figure is more speculative. We have referenced this in the discussion section of the manuscript and put this figure in the supplement to ensure that this is understood to be more speculative in nature.

      Reviewer #2 (Public Review):

      Caveney et al have overexpressed an engineered construct of the human membrane receptor guanyl cyclase GC-C in hamster cells and co-purified it with the endogenous HSP90 and CDC37. They have then determined the structure of the resultant complex by single particle cryoEM reconstruction at sufficient resolution to dock existing structures of HSP90 and CDC37, plus an AlphaFold model of the pseudo-kinase domain of the guanylyl cyclase. The novelty of the work stems from the observation that the pseudo-kinase domain of GC-C associates with CDC37 and HSP90 similarly to how the bona fide protein kinases CDK4, CRAF and BRAF have been previously shown to interact.

      The experimentation is limited to the cryoEM analysis, and is lacking additional studies that would give deeper insight into the oligomeric nature - if any - of the GC-C when bound to HSP90-CDC37 as compared to the free protein. This is relevant, as the dimerization domain downstream of the pseudokinase, is evident in the maps - albeit not well resolved - and it is not clear whether it is still able to mediate dimerization with a second free or HSP90-CDC37bound GC-C. It would also be good to see some experimentation that asks whether association with HSP90-CDC37 inhibits the guanyl cyclase activity. It is clear from previous work that HSP90-CDC37 silence the kinase activity of their bound client kinases, but in this case the catalytic guanyl cyclase is not directly associated with the chaperone complex and may still be able to function.

      Given the geometry of the interaction, the dimerization domain of the GC would likely be monomerized, albeit with global dimerization remaining – contributed by the ECD, or in our case the liganded-ECD mimicking leucine zipper. Experimentally, it has been shown in live cells (Kumar et al., 2001, PMID 11152473) that the HSP90 association is required for maximal GC-A function. This is likely due to some sort of resetting nature to the associating to allow further activity, as opposed to activity during the association – given the latter is unlikely based on our structure, where the two GC domains would not be able to form the active dimerized state. Further dissection of this, while outside the scope of the current work, is of great interest.

      Although the sequence alignment presented in SuppFig 2 shows that GC-C conserves the classic DFG motif that plays a critical role in the regulation of most kinases, the numbering of the sequence is absent, making it very difficult to relate this to the structural detail shown in Fig 2B. This needs to be clarified, as the interaction of CDC37-Trp31 with the DFG motifs and downstream activation loops in CRAF and BRAF have been proposed as important features of the selectivity of these kinases for the HSP90-CDC37 system, and it would be good to be able to see clearly how much of this is also conserved in the GC-C pseudokinase domain interaction. For example, is the much shorter activation segment (DFG -> APE) ordered in the complex or disordered?

      We have clarified Figure 2—figure supplement 1 with additional numbering. While we agree that the DFG motif may play a role in recognition, only the first residue of this motif is interacting with CDC37 in our structure, so it may be likely that the role of this motif is more structural in maintaining a CDC37 complementary fold, as opposed to direct residue interactions. Additionally, many kinases which are not regulated by CDC37/HSP90 contain this motif. The shorter DFGAPE of GC-C is traceable with the exception of N613, S614, I615, though the density in this region reflects this loop not being well stabilized.

      It was not easy to follow what was in the sample used for cryoEM. The cloning of the guanylyl cyclase (GC) component is described in the methods and they have shown some illustrations in fig 1 but a proper numbered figure of the domain organisation clearly showing domain boundaries and linker segments is really needed for a reader not familiar with the structure of GCs, especially since they have replaced the ECD with a leucine zipper in their construct. It is important to show a domain figure of what this construct looks like as well, as from the illustrations in fig 1 for examples its hard to see what's PK, DD, GC domains. It would also be helpful to see in the supplementary a gel of complex they put on the grids, to make it clearer what exactly the sample is and to reassure that the GC-C domains that are not resolved in the cryoEM are nonetheless present in the sample.

      We have added in a gel figure to the supplement and clarified the content of the imaged construct in the methods section: “This construct contains all domains of the native GC-C, with the exception of the ECD.”

      Overall there is only minimal proposal of mechanism or biological function based on the structure. The speculation in the Discussion of two fates - PP5 dephosphorylation or E3 ligase recruitment, is not supported by any experimentation, which is reasonable for speculation, but is also not underpinned by reference to any previously published work suggesting that these additional processes may be important. In the absence of any work by the authors can they put these speculations more in context with previously published work that supports the importance of these processes specifically for GC regulation?

      We have ensured that these potential pathways only appear in the discussion section. It has been observed, for instance by Oberoi et al., 2022 that phosphatases can act on all components of a HSP90–CDC37–client system. Given there are well characterized phosphorylation sites for membrane GC receptors, we believe this is worth discussing in this manuscript, to stimulate further exploration of these mechanisms in the field. In addition, it has been reported that many E3 ligases are recruited to HSP90 complexes and can degrade rather non-specifically. It has been shown that one can generate PROTAC-like molecules to target non-specific clients to HSP90–E3 ligase machinery for degradation (Li et al., 2023). Given this proximity induced nature to E3 degradation of HSP90 clients, it would be highly likely that, at least in some cases, mGCs would be degraded by this mechanism as well.

      Reviewer #3 (Public Review):

      A detailed understanding of how membrane receptor guanylyl cyclases (mGC) are regulated has been hampered by the absence of structural information on the cytoplasmic regions of these signaling proteins. The study by Caveney et al. reports the 3.9Å cryo-EM structure of the human mGC cyclase, GC-C, bound to the Hsp90-Cdc37 chaperone complex. This structure represents a first view of the intracellular functional domains of any mGC and answers without doubt that Hsp90-Cdc37 recognizes mGCs via their pseudokinase (PK) domain. This is the primary breakthrough of this study. Additionally, the new structural data reveals that the manner in which Hsp90-Cdc37 recognizes the GC-C PK domain C-lobe is akin to how kinase domains of soluble kinases docks to the chaperone complex. This is the second major finding of this study, which provides a concrete framework to understand, more broadly, how Hsp90-Cdc37 recruits a large number of other diverse client proteins containing kinase or pseudokinase domains. Finally, the Hsp90-Cdc37-GC-C structure offer clues as to how GC-C may be regulated by phosphorylation and/or ubiquitinylation by serving as a platform for recruitment of PP5 and/or E3 ligases.

      Comments:

      1) The authors used an interesting approach to obtain the GC-C-Hsp90-Cdc37 complex. Flagtagged human GC-C was overexpressed in CHO cells with the expectation of co-purifying endogenous hamster homologs of Hsp90 and Cdc37. There are several points worth noting:

      a) It is not clear from the data presented (Figure 1C, Suppl Fig 1A) or the Methods the percentage of particles in the cryo-EM specimen that represent the GC-C-Hsp90-Cdc37 complex. Presumably, some fraction of GC-C isolated will not be associated with Hsp90Cdc37. If a very large portion of GC-C is associated with Hsp90-Cdc37, it would be good to explain why this is to be expected. Are 2D/3D classes corresponding to the activated GC-C dimer found? If not, why?

      While we see some traces of GC-C not bound by Hsp90, there is, in the least, a significant alignment bias for the Hsp90 bound complex. We believe that the engineered construct, which we designed to be locked in a putative active conformation, is going through catalytic cycles to some point where the regulatory mechanism is kicking in. It may be that for proper resetting of the receptor, the receptor needs to cycle back through an unliganded, inactive conformation, which our leucine zipper construct is unable to allow, thus locking our GC in the regulatory complex, though this is speculation.

      b) Figure 1A suggests that GC-C is phosphorylated before recruitment of Hsp90-Cdc37. What is the phosphorylation status of the GC-C specimen that was imaged by cryo-EM?

      We had placed the P in grey in this figure to represent the potential for the active state to be phosphorylated. For GC-C in particular, the phosphorylation state does not affect activity as much as GC-A and GC-B for example. We have removed this P from the figure for clarity.

      c) The resolution of the cryo-EM map (3.9 Å) is too low for unambiguous identification of proteins. Please provide more precise justification for the claim that the densities observed do in fact correspond to hamster Hsp90 and Cdc37.

      While we agree that the resolution is limiting for protein identification, the fact that we are using a very stringent FLAG purification allows confidence in the ID for our target, GC-C. For Hsp90 and Cdc37, we are confident that they are endogenous hamster Hsp90 and Cdc37, given the large structural similarity observed in comparison to prior Hsp90/Cdc37/client complex structures, and the ID/register well confirmed by the placement of bulky residues.

      d) The authors state that human GC-C pulls down hamster Hsp90-cdc37 but soluble kinases cannot, despite the high sequence identity between human and hamster Hsp90-cdc37. Is this because GC-C recognition is more promiscuous? Can this difference be understood in light of the new structural information presented?

      “This native pulldown strategy contrasts with the structures of Hsp90–Cdc37 in complex with soluble kinases (García-Alonso et al., 2022; Oberoi et al., 2022; Verba et al., 2016), for which Hsp90 and Cdc37 had to be overexpressed to obtain complex suitable for imaging.”

      It is our understanding, from reading the papers cited above, that Hsp90/Cdc37 needed to be overexpressed to obtain these samples for imaging. We use a different strategy because our sample does not require overexpression of Hsp90 and Cdc37. This may be because of something specific to hamster cells, which were (presumably) not tested in the above studies, or it could be something specific to do with GC-C.

      2) A large portion of the enforced GC-C dimer was not visible in the cryo-EM maps. It is not easy to learn from Figure 1 exactly which parts of the GC-C construct was sufficiently ordered and observed structurally. Please improve Figure 1.

      We have adjusted Figure 1 to better depict what is observed in the cryoEM density.

      3) On page 4, the authors claim that they are able to orient the GC-C-Hsp90-Cdc37 complex "as it would sit on a membrane" and referred to Figure 1B. It is not clear what is implied here. Does Hsp90-Cdc37 binding constrain the complex to face the inner leaflet of the membrane in a specific orientation as shown in Figure 1B? If true, this could potentially have important functional implications. Please illustrate how this was deduced based on the information available.

      Given the observed density for the PK domain, which is membrane proximal, we can safely assume that the TM would be located immediately above this region. Given the size of Hsp90 and assuming the soluble Hsp90 must sit below the membrane, we can determine, with some accuracy the relative orientation of this complex next to the membrane. This orientation is depicted in Figure 1B.

      4) Also on page 4, it is stated that it is sterically unlikely an additional Hsp90-Cdc37 complex is associated with the other copy of GC-C in the leucine zippered dimer. It is not obvious to the reader how this may be the case. An additional figure could help make this more clear. Additional biochemical evidence will also help. The absence of GC-C-Hsp90-Cdc37 dimers in cryo-EM micrographs can also support the argument.

      We have clarified this: “is sterically unlikely that an additional regulatory complex is forming on the second GC-C in a concurrent fashion, given the large size of the first Hsp90–Cdc37 and the requisite proximity of the second GC-C.”

      5) Some comments on Figure 2:

      a) NTD and CTD are mislabeled in Figure 2A.

      Thank you for catching this, we have fixed this.

      b) The authors should show cryo-EM density to support their modeling of GC-C in Figures 2B and C.

      We have provided maps and models to the reviewer and will release these maps and models upon publication so that all relevant densities can be interpreted to their fullest extent by readers. In addition, we have added representative density panels to Figure 1-figure supplement 2.

      6) The authors claim that Hsp90-Cdc37 clients are more similar structurally near the cdc37 interface. Please illustrate this with additional figures. Suppl. Figure 2 is inadequate for this purpose.

      We have added a structural overlay to Figure 2—figure supplement 1A to illustrate this.

      The authors can also consider adding a more detailed discussion comparing the interactions between the pseudokinase/kinase C-lobe and Cdc37 in known structures. Is shape/charge complementarity a universal feature of cdc37-dependent kinase/pseudokinase recruitment? It would be interesting to also consider if it would be possible to predict which of the ~60 human pseudokinases are possible Hsp90-Cdc37 clients. New structural findings from this study and publicly available AI-predicted protein structures could help.

      While the use of AI to predict pseudokinase interactions would indeed be interesting, we believe this is outside the scope of this work. Given methodology is in place for determination of kinase clients for Hsp90 (Taipale et al., 2012), this could be an additional route to obtain this information in future work.

      Reviewer #2 (Recommendations For The Authors):

      In Figure 1B the authors show a large unaccounted-for region of density which they speculate may be due to the dimerization domain. That this is lost in the sharpened maps suggests that it is more mobile than the core which probably dominates the automatic mask generation used by cryoSPARC. It would be very interesting to try and resolve this region further by using focussed classification and refinement - probably in RELION. This would add further novelty, as so far in the three HSP90-CDC37 kinase complexes previously described, little is seen outside the C-terminal lobe of the kinase (or in this case pseudokinase) lobe.

      Given the structurally uncharacterized nature of the DD and GC domains for mGCs, using computational means to further our understanding of these regions was attempted. Across several software packages, these attempts were unsuccessful. We will be uploading these micrographs to EMPIAR shortly after publication, which will allow for other groups to re-process this data as they see fit and as new software techniques emerge in this rapidly developing field. We believe that the partially unfolded nature of the PK domain is providing too much of a hinge point prior to the DD for the software to be able to resolve this currently.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer # 1

      Specific comments

      1) Figure 1: it is unclear how many mice were used for the described phenotypic analyses (panels D and E). Please clarify.

      We acknowledge that we made a mistake in failing to clearly describe the phenotypic analyses. In Figure 1D and E, we performed statistical analysis on the number of TEBs in whole mammary mounts. One mouse stained a mammary whole mount with Carmine-alum staining. Thus, “n” represents the 10 mice we analyzed. We have modified the legend of Figure 1 to " D, E. Quantification of the average number of TEBs and bifurcated TEBs in littermate Crb3fl/fl (n=10) and Crb3fl/fl;MMTV-Cre (n=10) mice at 8 weeks old" in lines 909-911.

      2) Figure 2: in panels B and C it is unclear how the data was quantified; the legend states "n=10", does this mean the experiment in B was done 10 times? And that 10 acini per condition were measured in panel C? In panel D a difference in 0.3% between NC and shCRB3 seems miniscule; do the authors mean 30% instead? And how many acini were counted per condition per (how many) experiments? Same applies to panels G and H, it is unclear how many cells were analyzed per (how many) experiments.

      Thanks for your suggestions. We failed to describe the details of the statistical analysis well in the experimental method. To provide a brief overview of our statistical analysis method, we took 3-4 random bright-field micrographs of each well in the chamber slide system and repeated the experiment three times. We then counted the number of acini in all micrographs (Figure 2B) and examined the diameter of all acini in each photograph, averaging the values as data (Figure 2C). We also determined the percentage of aberrant acini in each photograph, which was used as an analysis value (Figure 2D). We carefully confirmed that the vertical axis of Figure 3D was indeed mislabeled and should mean 30%, and revised the original figure. For IF analysis of the mitotic spindle orientation during lumen formation, we examined the division angle of one cell in one acinus that was mitotically dividing, 3-4 acini were randomly examined in each well in the chamber slide system, and this experiment was repeated three times (Figure 2G and H). Therefore, we have provided a detailed description of these issues in the Figure 2 legend. The revised parts are found in lines 922-924, lines 926-927, lines 929-930, and line 932.

      3) Figure 2: it would be desirable if authors were able to quantify the data in panels E and I.

      Thank you for your comments. According to your suggestions, we performed the quantitative analysis of Figure 2E and I, which is now presented in the new Figure 2D and H.

      4) For all cell-based assays using shRNA to knock down CRB3 (Fig. 2A-H; Fig. 3A-F; Fig. 4C-E; Fig. 5G-J; Fig. 6C; Fig. 7C, D; Fig. 8E-G), it would be desirable to perform rescue experiments to ensure that the observed phenotype of CRB3 depleted cells is specific and not due to off-target effects of the shRNA.

      Yes, rescue experiments involving overexpression of CRB3 in CRB3 depleted cells can accurately account for the specific phenotype as well as eliminate the off-target effects of shRNA. However, our group has long focused on the role of the cell polarity protein CRB3 in contact inhibition and tumorigenesis. Our previous studies have ruled out the off-target effects of shRNA and reported that CRB3 regulates contact inhibition and tumorigenesis through Hippo or Wnt signaling pathways (Cell Death Dis 2017;8(1):e2546, Oncogenesis 2017;6(4):e322, J Cell Mol Med 2018;22(7):3423-33). Therefore, we will pay close attention to rescue experiments to ensure experimental integrity and phenotypic specificity in our subsequent studies.

      5) Figure 3: how many cells were counted/measured per condition (in how many experiments) in panels B, D, H, F, G and H? In panels C and D, what is the CRB3 protein level in these cells? This is of relevance as protein overexpression per se could impinge on ciliation frequency. This question could be addressed by performing a western blot analysis with CRB3 antibody.

      We did not clearly describe the measurement and statistical analysis methods in the previous manuscript. Similarly, we took 3-4 random IF and SEM micrographs of each sample in one experiment, and this experiment was repeated three times. Subsequently, the number of ciliated cells and total cells were counted, and the proportion of ciliated cells was calculated (Figure 3B, D and F). In these figures, the cilium length of representative ciliated cells was measured in each photograph. In the knockout mouse model, we needed to find the intact mammary ductal lumen and renal tubule in IF staining of mouse mammary and renal tissue sections, with 5-6 random fields micrographs taken per slice, and the proportion of ciliated cell was measured by counting and taking the average. A total of ten mice were repeated in these experiments (Figure 3G and H). Therefore, the legend of Figure 3G and H has been partially modified and a detailed description has been added to the Figure 3 legend. The revised parts are in lines 945-946, lines 950-951, line 953.

      Thank you for your suggestions that we perform a western blot analysis with CRB3 antibody in Figure 3C and D. And we have added the western blotting with CRB3 analysis in the new Supplementary Figure 3A.

      6) Figure 3G: it is very difficult to see that the red stained structures are primary cilia.

      Yes, the staining structure of primary cilia in mammary ductal lumen are less clear than that of individual cells and in renal tubule in Figure 3G. We used recognized acetylated tubulin and γ-tubulin to stain the primary cilia, which were clearly labeled in individual cells. However, the labeled primary cilia in renal tubule were longer length and demonstrated a more pronounced structure than those in the mammary ductal lumen. In the mammary ductal lumen of the 10 mice we analyzed, the primary cilia showed shorter length and staining structure than the others shown in Figure 3G. This difference may be due to the distinct characteristics of primary cilia in different tissues.

      7) Figure 4B: how many cells were analyzed in how many experiments?

      Our statistical methods for analyzing cellular experiments using IF were essentially the same. We randomly selected 3-4 IF micrographs of each sample in one experiment, and this experiment was repeated three times. Subsequently, the number of colocalization cells and total cells were counted, and the proportion of cells with pericentrin and CRB3 colocalization was calculated (Figure 4B). The detailed description has been added to the Figure 4 legend. The revised part is in lines 962-963.

      8) Lines 217-219: since the cells were not stained with a cilia marker, only a centrosome marker, the claim that CRB3 localizes to the base of cilia is unsubstantiated.

      Thank you for your comments. The base of cilia is the basal body, which develops from the mother centriole of the centrosome (Cancer Res. 2006;66(13): 6463-7). Firstly, we found colocalization of CRB3 and pericentrin, a centrosome marker, in MCF10A cells (Figure 4A and B). Secondly, we verified the colocalization of CRB3 with γ-tubulin, a marker of basal body in primary cilia, in confluent quiescence cells (Figure 4C and D). In addition, we found that CRB3 was localized at the base of primary cilia labeled with acetylated tubulin (Figure 4E and F). Due to the species of commercialized CRB3 antibody, we were able to indirectly claim that CRB3 localizes to the base of cilia through these experiments.

      9) Figure 3 and Figure 4: is it problematic to use gamma tubulin as centrosome marker if CRB3 depletion causes reduced centrosomal recruitment of gamma tubulin ring complex components? Also, in Figure S3A no gamma tubulin staining can be seen in the lower panel, why?

      Thank you for your positive comments. As is well known, γ-tubulin is a marker of the centrosome, and we found that CRB3 depletion causes reduced centrosomal recruitment of gamma tubulin ring complex components. However, Our Figure 3 was illustrated the effect of CRB3 on ciliary assembly, and Figure 4 was analyzed the localization of CRB3 in primary cilia. In some reports on ciliary assembly, the fluorescent double staining of acetylated tubulin and γ-tubulin have been used to label primary cilia, and the effect of target genes on ciliary number and assembly were analyzed by these markers (Nature. 2013;502(7470): 254-7, Cell. 2007;130(4): 678-90 and so on). Although CRB3 affects the recruitment of gamma tubulin ring complex components, it does not affect the analysis of ciliary number and localization in Figures 3 and 4.

      In Figure S3A, green staining labeled with γ-tubulin could be clearly found in the lower left panel. The representative area from the left amplification may have been poorly selected, resulting in no γ-tubulin staining on the right side. We have updated the lower right panel in the new Supplementary Figure 3B.

      10) Figure S4A: the grouping of indicated proteins is factually wrong. For example, FBF1, SCLT1 and ODF2 are not IFT-B components, and several of the proteins indicated as localizing to the basal body also localize to (unciliated) centrioles. In contrast, CP110 is usually only found on unciliated centrioles and not mature basal bodies. Authors should consult the relevant literature and correct the figure accordingly. Alternatively, this misleading text/grouping could be removed from the figure. Furthermore, in the legend to Figure S4 there is no information provided about this quantitative analysis (how many independent experiments, which cells were analyzed etc.).

      Thank you for your helpful suggestions. We have taken your advice and removed this misleading information from the manuscript, Supplementary Figure 4A and its corresponding legend. In the legend to Supplementary Figure 4A, we have added the detailed information for this quantitative analysis in the legend. The revised legend is shown in lines 1098-1100.

      11) Figure S4B: how do authors know which of the bands correspond to CRB3 fusion protein?

      Based on the construction strategy of the CRB3-GFP fusion protein (Figure 6D) and its base sequence, we were able to calculate its molecular weight. Then the molecular weight of CRB3-GFP fusion protein was verified by western blotting (Figure 6F and 7A). Meanwhile, exogenous overexpression allowed for the production of the CRB3-GFP fusion protein in large quantities. Due to these features, we could know that the band indicated by the black arrow is most likely CRB3-GFP fusion proteins. In order to check the molecular weight, we have labeled the key molecular weight markers in the new Supplementary Figure 4B.

      12) Lines 251-253: this seems like data overinterpretation.

      Thank you for your comments. We have revised this sentence in lines 252-254.

      13) Lines 260-261: the data showing perturbed gamma tubulin localization is not convincing as data was not quantified.

      According to your suggestions, we performed the quantitative analysis of Figure 4C, which is now presented in the new Figure 4E.

      14) Figure 5H and Figure 6C: to show that the GCP6 IP actually worked, these blots should be probed also for GCP6.

      Thank you for your good suggestions. We have added these blots probed for GCP6 in new Figure 5H and 6C.

      15) Figure 5I: how many cells were analyzed in how many experiments?

      Our statistical methods for analyzing cellular experiments using IF were essentially the same. We took 3-4 random IF micrographs of each sample in one experiment, and this experiment was repeated three times. The detailed description has been added to the Figure 5 legend. The revised part is in lines 992-994.

      16) Figure S5: it looks like GPC6 and Rab11 are localizing all over the cell, are the antibodies used for the IFMs specific for these proteins?

      After checking the specificity of these antibodies used for the IFMs, we have decided to delete the corresponding results in the Supplementary Figure 5 and their description in the original manuscript.

      17) Lines 43, 89, and 314-315: the claim that CRB3 directly binds Rab11 is not supported by the data. The data provided only shows that these proteins interact indirectly. To show direct interaction, yeast-2-hybrid analysis or pull-down assays with purified proteins would be required.

      Thank you for your positive comments. Since we were unable to complete the relevant experiments to demonstrate direct interaction of two proteins, we have revised our conclusions. Replace " CRB3 directly binds Rab11" with " CRB3 binds Rab11" in the manuscript.

      18) Figure 6G and lines 314-315: this result is surprising as it indicates GTP- and GDP-locked versions of Rab11 have the same inhibitory effect on CRB3 binding? Please comment, and also indicate how data in Figure 6G was quantified (and how many independent experiments were used for the quantification).

      We were also puzzled by the results shown in Figure 6G. Based on the western blotting bands, we suspected that there may have been some issues with the experiment. Specifically, we believed that the inefficient transfection of Flag-Rab11aWT, Flag-Rab11a[Q70L], Flag-Rab11a[S20V], and Flag-Rab11a[S25N] plasmids, as well as the insufficient amount of GFP antibody used in the co-IP experiment, led to the corresponding bands being too weak and masking the true differences.

      To address this, we optimized the experimental conditions, strictly increased the experimental control, and repeated the experiment in triplicate. The new results are shown in the revised Figure 6G. The statistics from the three independent experiments revealed that CRB3b had a stronger interaction with Rab11a[Q70L] and Rab11a[S20V], while showing a weaker interaction with Rab11a[S25N], compared to Rab11aWT. As this result, we revised the original manuscript in lines 308-310 and added a detailed description to the Figure 6 legend in lines 1012-1013.

      19) Figure 8G: data needs to be quantified.

      Thank you for your comments. We replaced the unattractive bands in the western blotting of Figure 8G with better quality ones. The statistical analysis of the Figure 8G data is shown in Supplementary Figure 6.

      Further minor comments

      1) Abstract should indicate that this study describes conditional knockout of Crb3 in mouse mammary gland epithelial cells.

      This is good writing advice. We have added the relevant description in lines 40-42.

      2) Line 87: specify which gland (mammary?).

      We have modified to " mammary gland" in line 87.

      3) Line 140: sentence states that knockout of Crb3 is essential for branching morphogenesis in mammary gland development, I do not think this is correct.

      We have removed the inappropriate finding.

      4) Line 152: "formed more number" should be "formed more" or "formed higher number of".

      We modified "formed more number" to "formed more" in line 154.

      5) Lines 157-163: text and logic are difficult to follow for a non-expert.

      We have modified the logic of this paragraph, as detailed in lines 158-165.

      6) Figure 4A, C: figure resolution could be improved. It is difficult to see what the authors claim these figures are showing.

      The clarity of the original images in Figure 4A and C is acceptable, while the images on the right are electronically enlarged. Although there is a decrease in pixels, it can still display our findings.

      7) Figure 7D, E: images look pixelated.

      The clarity of the original images in Figure 7D and E is acceptable using a laser confocal microscope, while the images on the right are electronically enlarged.

      8) Line 222: unclear what authors mean by "detected a series".

      We modified "detected a series" to "some important" in line 226.

      9) Lines 221-225: which cells were used for the analysis in Fig. S4?

      We used MCF10A cells for the analysis in Supplementary Figure 4, and modified its legend in line 1098.

      10) Line 245: what is "cytomembrane"?

      We modified "cytomembrane" to "cell membrane" in lines 246-247.

      11) Lines 246-250: wording is unclear/difficult to understand.

      We have modified this paragraph, as detailed in lines 248-251.

      12) Line 273: should "regimented" be "sedimented"?

      We modified "regimented" to "sedimented" in line 274.

      13) Line 287-288: sentence does not make sense.

      We have removed this sentence.

      14) Figure 5A: it would be desirable to show the original dataset (Excel file) used for generating this figure.

      To maintain data integrity, we should provide the original dataset (Excel file). However, there are some unpublished data in this file that we must withhold for the time being. If needed, the corresponding author can be requested to provide the file.

      15) Lines 298-299: wording is unclear.

      We have modified this sentence, as detailed in lines 296-298.

      16) Lines 285-287: replace "instead of" with "but not".

      We modified "instead of" to "but not" in line 286.

      17) For all IFMs showing merged images of the green and red channel, please also show the red and green channel separately.

      Most of our fluorescence images are presented separately for each channel in this manuscript, with only a few merged images due to space limitations. This type of presentation is commonly used in published papers.

      18) Lines 326 and 327: replace "bonded" with "bound".

      We have modified in lines 322-323.

      19) Lines 327-328 and 361-364: wording is unclear/grammatically incorrect.

      We have modified these paragraphs, as detailed in line 323 and lines 357-360.

      20) Line 342: what is meant by "the combination of"?

      We modified "the combination of" to "the binding of" in line 338.

      21) Line 365: localization of what?

      This means "subcellular localization" in lines 360-361.  

      Reviewer # 2

      Major points

      1) CRB3 is present in mammals as 2 isoforms, A and B, originating from alternative splicing. In this study, the authors never mention this fact and when using approaches to KO or KD CRB3A/B they are likely to deplete both isoforms which have been shown to have different C-terminal domains and functions (Fan et al., 2007). This is also important for the CRB3 antibodies used in the study since according to the material and methods section they are either against the extracellular domain common to both isoforms or the intracellular domain which is only similar in the domain close to transmembrane between the 2 isoforms. Since the antibodies used in each figure are not detailed it is impossible to know if the authors are detecting CRB3A or B or both. Please provide the information and correct for the actual isoform detected in the data and conclusions.

      Thanks for your positive comments. In mammals, CRB3 has two isoforms, CRB3a and CRB3b, distinguished by alternative splicing within the fourth exon of the CRB3 gene, which in turn produces a protein with 23 amino acid differences at the C terminus. Both CRB3a and CRB3b have mostly identical amino acid sequences, and have indistinguishable molecular weight sizes. As a result, the knockout mouse construction strategy and the design principles of RNAi sequences target both CRB3a and CRB3b. This is described in lines 100-104 and lines 149-150. Additionally, commercially available antibodies detect both CRB3a and CRB3b, as mentioned in line 123 and lines 636-637 in revised manuscript.

      However, it should be noted that our CRB3 overexpression, as shown in the CRB3 structural domain in Figure 6D, refers specifically to the sequence of CRB3b. As a result, we have updated the original manuscript as well as the legends of Figures 3C, 3E, 4A, 5A, 5B, 6D-G, 7A, 7B and Supplementary Figure 2F-H, 3A, 4B, 6B to reflect this change. All instances of overexpressed CRB3 have been changed to CRB3b.

      2) CRB3A and B have been localized in the cilium itself (Fan et al., 2004; 2007) but in the study CRB3A/B does not enter the cilium but is localized in the basal body (figure 4). How the authors reconcile these different localizations?

      Indeed, we found that CRB3 is mainly localized at the basal body of the primary cilium, which differs from previous reports in the literature (Curr Biol. 2004;14(16):1451-61 and J Cell Biol. 2007;178(3):387-98). However, upon closer examination of one of these reports (Curr Biol. 2004;14(16):1451-61), it appears that CRB3 was actually scattered on the primary cilia, with a strong focus at the basal body. Additionally, in rat kidney collecting ducts, the localization of CRB3 on primary cilia was significantly reduced, with obvious localization at the basal body. Another study (J Cell Biol. 2007;178(3):387-98) also reported the co-localization of CRB3b and γ-tubulin in MDCK cells, which is consistent with our conclusion. We further verified the co-localization of CRB3 with the centrosome by overexpressing CRB3b in mammary epithelial cells, indicating that CRB3 mainly localizes to the basal body of the primary cilium. This information is discussed in the Discussion section of the manuscript (lines 400-410).

      3) The authors use GFP-CRB3A/B, it is not stated which isoform, over-expression to localize CRB3A/B in MCF10A cells (figure 4A). The levels of expression appear to be very high in the GFP panel and it is likely that the secretory pathway of the cells is clogged with GFP-CRB3A/B in transit from the ER to the plasma membrane. Thus, the colocalization with pericentrin might be due to the accumulation of ER and Golgi around the centrosome. This colocalization should be done with the endogenous CRB3A/B and with a better resolution.

      Thank you for your comments. We were also interested in the co-localization of endogenous CRB3 and centrosome proteins. However, the only commercial CRB3 antibody available is the rabbit species, and the pericentrin antibody (Abcam, ab4448) that is very useful is also the rabbit species. We had difficulty finding commercial centrosome-associated antibodies for other species. Therefore, we examined the co-localization of endogenous CRB3 with γ-tubulin in Figure 4C and combined the results with those of exogenous CRB3 to illustrate the co-localization of CRB3 with centrosomes.

      4) The staining for CRB3A/B in figure 4C (red) is striking with a very strong accumulation in an undefined intracellular structure and the authors do not provide any explanation for such a difference with the GFP-CRB3A/B just above.

      Thank you for your good suggestions. The immunofluorescence images of GFP-CRB3 in Figure 4a were obtained using a fluorescence microscope, while the images of endogenous CRB3 were obtained using a laser confocal microscope. The fluorescence microscope excites a fluorescent dye to emit a signal, which is amplified into a visible light signal and presents a full fluorescent signal. In Figure 4a, we can clearly see the full distribution of exogenous CRB3 in MCF10A cells, including its tight junctional localization consistent with previous reports in the literature and its co-localization with centrosomal proteins. On the other hand, laser confocal microscopy uses a laser as the light source to excite the fluorescence within the sample point by point. It employs a precision pinhole filtering technique with strong laminar imaging capabilities. In the specific analysis of endogenous CRB3 co-localization studies with centrosomes and primary cilium, signals at tight junctions must be excluded. Therefore, Figure 4c represents the fluorescence signal at the level of intracellular CRB3 co-localization with γ-tubulin. The two methods use different detection means and techniques, and are not directly comparable.

      5) The staining in figure 4E is also different from those shown in figure 4F in which the CRB3A/B staining is right at the base of the axoneme while it is not the case in figure 4E where we can see a red dot close to but not right at the base of the axoneme.

      Thank you for your comments. The new Figure 4F displays the localization relationship between CRB3 and primary cilium, analyzed using laser confocal microscopy. With the unique single-level detection function of this microscope, the problem of level selection may cause the red dots to appear close to, rather than right at the basal body of the primary cilium. However, the new Figure 4G, based on the use of 3D reconstruction scanning technique, clearly demonstrates the localization of CRB3 at the basal body of the primary cilium under the same cells and conditions.

      6) The authors claim that CRB3A/B interacts directly with Rab11 but they only show co-immunoprecipitation experiments from cell lysates which do not support direct interactions. The only way to show a direct interaction is to produce both proteins in vitro. Thus, the term direct interaction should be removed.

      Thank you for your positive comments. Since we were unable to complete the relevant experiments to demonstrate direct interaction of two proteins, we have revised our conclusions. Replace " CRB3 directly binds Rab11" with " CRB3 binds Rab11" in the manuscript.

      7) In addition, the authors claim (Line 251/252) that Rab11 is necessary for the transport of CRB3A/B but they should KD Rab11 to show this.

      Thank you for your good suggestions. It is essential to observe CRB3 trafficking after knockdown Rab11. However, in Figure 5C, we used the endocytosis inhibitor dynasore, which also inhibits Rab11-positive endosomes. This result shows that dynasore can significantly inhibit CRB3 trafficking in MCF10A cells. We believe that this experiment partially demonstrates that inhibiting Rab11 function can affect CRB3 trafficking.

      8) The domain of CRB3A/B that is necessary for the interaction with Rab11 is the N-terminal part of the extracellular domain. This domain is thus inside the transport vesicles and not accessible from the cytoplasm. Given that Rab11 is a cytoplasmic protein, how the 2 proteins could interact across the membrane? The authors do not even discuss this essential point for their hypothesis.

      Thank you for your positive comments. As shown in the schematic model in Figure 9, we believe that when cells form tight junctions, CRB3 is primarily located on the cell membrane. Subsequently, endosomes are involved in the intracellular degradation process of CRB3 on the cell membrane. Intracellular CRB3 can bind to Rab11 through the extracellular domain, which in turn participates in primary cilia assembly. We have made detailed modifications to lines 418-421.

      9) Figures are not numbered.

      Thank you for your comments. We have updated the numbers in the original manuscript as well as the legends of Figures 1D, 1E, 2B, 2D, 2F, 2G, 3B, 3D, 3F-H, 4B, 4E, 5I, 6, 8G and Supplementary Figure 1E, 2, 3C, 4A, 5B, 6.

      Minor points

      1) The authors cite several studies showing that a down regulation of CRB3A/B in human cells promotes cancer but other studies show the contrary: Lin et al., 2015 for example. Please discuss these discrepancies.

      Thanks for your good suggestion. We have included additional studies with contrasting results in the discussion section, specifically in lines 378-380.

      2) Line 98: "exhibit smaller" smaller than what?

      We modified "exhibit smaller" to "exhibit smaller size" in line 97.

      3) Line 152: "form more number, ..." ???

      We modified "formed more number" to "formed more" in line 154.

      4) Line 180: "Compared with the control, the number of cells with primary cilium was significantly increased ». To me it is the contrary! This part is not clear at all. Please rewrite.

      We have revised the sentence in lines 183-185.

      5) Authors should check and review extensively for improvements to the use of English.

      Thanks for your good writing advice. We have carefully reviewed and revised the entire manuscript to improve its readability.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers and appreciate their recommendations to improve this work.

      Reviewer 1:

      Reviewer 1 recognizes that ‘This is an important finding that is relevant to the actions of VDR on colorectal cancer. The data presented to support the presented conclusion is convincing’.

      Reviewer 1 identifies as a major weakness ‘that the site of SIRT1 regulatory lysine acetylation is defined by mutational analysis rather than by direct biochemical analysis.

      However, as the reviewer mentions “previous reports of K610 acetylation using mass spec https://www.phosphosite.org/proteinAction.action?id=5946&showAllSites=true), and the absence of SIRT1 mutant K610R in the immunoprecipitates using anti-acetylated lysine antibodies presented in Fig. 4E clearly overcome this weakness”.

      In addition, overall SIRT1 acetylation is reduced by vitamin D and by the specific SIRT1 activator SRT1720 as shown by decreased SIRT1 in the anti-acetyl-lysine immunoprecipates, (Fig. 4A and B). The second weakness identified by Reviewer 1 concerns “the use of only one shRNA to deplete VDR in CRC cells.”

      We have made efforts to demonstrate that the results are specific, though we do not have results with alternative shRNAs for a variety of reasons. To mitigate this issue, we have compared two colon cancer cells originating from the same patient which differ in the presence/absence of VDR. SW480, derive from the primary tumor and express VDR, whereas SW620 cells were derived from a lung metastasis and lack VDR. Similar, to the comparison of HCT116 with shVDR HCT116 cells presented in this study, VD induced SIRT1 levels in SW480 in contrast to a lack of induction in SW620, as shown in Author response image 1. This result provides support for the specificity of the shVDR.

      Author response image 1.

      Vitamin D requires the presence of VDR to increase SIRT1 protein levels. SW480 and SW620 cell lines derive from the same patient, from primary tumor and lung metastasis respectively and differ in their VDR content. 1α,25-dihydroxyvitamin D3 (1,25(OH)2D3) was added at 100 nM for 24 h. Representative western-blot, where TBP was used as a loading control, of four biological replicates. Statistical analysis by ANOVA and values represent mean ± SEM; *p<0.05; *** p<0.001.

      The referee noticed the inclusion of an siRNA for SIRT1 in Table 1. We apologize for that, since this is an error, and no results are presented in this study with SIRT1 depletion. Table 1 has been modified accordingly.

      Concerning the third and fourth weaknesses that Reviewer 1 identifies, we agree that mapping the interacting domains in both VDR and SIRT1 and in vitro reconstitution would improve the present study. However, we believe that these would constitute long-term studies that themselves are not strictly necessary at this stage. Consequently, we favor the publication of the present body of work. In vitro reconstitution of the present work and the putative relevance of the proposed mechanism of vitamin D action via SIRT1 on types of cancer other than colon (eg breast etc), are certainly very interesting and warrant further investigation.

      Reviewer 2:

      This reviewer acknowledges that “…this study provides very interesting and solid information on the link between vitamin D and colorectal cancer. It is likely that this study will provide insight into the importance of vitamin D in other types of cancer. It may also lead to new therapeutic strategies for specific cases. This article is convincing, although the authors can improve their study as outlined…”

      We acknowledge the proposed changes and recommendations, and have changed the text and Figures as suggested the by Reviewer as follows:

      Figure 1

      Figure 1E and F: the cell lines used were described in the figure legend, but we agree that including the name in the figure brings more clarity and these are now added.

      Figure 1G: the statistical analysis was for all panels of Figure 1 as described in the Figure legend (lines 731-32), We have amended the original omission of panels 1G and 1H. In panel G, * represents statistical analysis by ANOVA (comparing the four groups) whereas # was the analysis by Students t test (comparing the two indicated groups), where * or #p<0.05. We hope to have clarified this point now.

      Figure 2

      Figure 2C: We showed originally the SIRT1/VDR interaction by immunoprecipitation of VDR and detection of SIRT1 in immunoprecipitates. We also showed immunoprecipitation of exogenously expressed Myc-SIRT1 (WT or mutants) and detection of VDR in immunoprecipitates (Figure 4F). The reviewer requests that we perform the inverse IP for endogenous SIRT1, that is immunoprecipitate SIRT1 and detect VDR in the immunoprecipates, which we now supply for the reviewer in Author response image 2.

      Author response image 2

      Immunoprecipitation of endogenous SIRT1 to show interaction with VDR. 1α,25-dihydroxyvitamin D3 (1,25(OH)2D3) was added at 100 nM for 24 h. Representative western-blots, where TBP was used as a loading control.

      Figure 3

      • Figure 3D: ‘The authors should indicate the color of the different stainings’. Immunostainings have been revealed with DAB (diaminebenzidine); thus, positiveness is highlighted by light or dark brown according to their low or high protein expression. Counterstaining has been performed with hematoxilin, which stains nuclei in dark blue and cytoplasm in light blue.

      Do the authors mean that the secondary antibody marks in brown/red? If so, these results are inconsistent with the text considering that hematoxylin was used for non-tumor tissue. This part needs to be clarified.

      We thank the Reviewer for asking us to clarify this issue. Neither the primary nor anti-Ig horseradish peroxidase-conjugated secondary antibodies presented positiveness resulting from these antibodies individually. Therefore, secondary antibody does not mark in any color. Hematoxylin has been used as counterstaining for both non-tumor as well as for tumor tissues.

      What about the level of FOXO3A in these tissues/tumors?

      We did not prove the tumor sections for specific SIRT1 substrates such as FoxO3A since their levels may not entirely depend on SIRT1 specific deacetylation.

      What is the level of 1,25(OH)2D3 in these patients?

      We agree with this referee that this information would be very useful, but unfortunately, we do not have data on vitamin D levels for these patients since they were not specifically recruited for this study and vitamin D levels are not routinely measured.

      Figure 3D, the following information is missing: "A detailed amplification is shown in the lower left of each micrograph."

      We decided not to include the amplification in micrographs because the aim of the manuscript is focused on protein levels, not localization and including the amplification was more confusing than enlightening. This has been amended now in the text.

      Figure 3E, it says p=0.325, in the legend p<0.01, and in the text there is a trend. Which is the correct version?

      We really apologize for this misunderstanding. As stated in the Figure, p=0.325 and therefore it does not reach statistical significance. We have amended the main text and figure legend to report that differences between SIRT1 expression levels of healthy and cancer human colon samples are not statistically significant.

      Figure 4

      Figure 4F. The quality of the presented blots is not optimal. It needs to be improved. In addition, the number of independent biological experiments is not indicated.

      We have substituted the representative western-blot and included statistical analysis of four independent biological replicates. Since 4F is now a bigger panel, it has required a slight reorganization of the whole Figure, but the rest of panels remain with the originals. Now we indicate in the figure legend that at least three independent biological replicas were analyzed. In addition, we supply below the four experiments for the reviewer in Author response image 3.

      Author response image 3

      Immunoprecipitation of exogenous myc-tagged SIRT1 to show interaction with VDR of wild type (WT) or mutants. 1α,25-dihydroxyvitamin D3 (1,25(OH)2D3) was added at 100 nM for 24 h. FT: Flow Through. TBP as a loading control.

      Regarding the last general comment concerning the number of independent experiments performed, this is indicated in the Figure legends (lines 732-36, 757-58, 82324, 840-41). All the in vitro experiments were performed at least as three independent experiments and not by repeating a western blot. A representative western blot is shown, and the statistical analysis corresponds to the analysis of the three biological replicates. For experiments with patient samples, the number of patients appears clearly indicated in the corresponding panel.

    1. Author Response

      Reviewer #1 (Public Review):

      The objective of this investigation was to determine whether experimental pain could induce alterations in cortical inhibitory/facilitatory activity observed in TMS-evoked potentials (TEPs). Previous TMS investigations of pain perception had focused on motor evoked potentials (MEPs), which reflect a combination of cortical, spinal, and peripheral activity, as well as restricting the focus to M1. The main strength of this investigation is the combined use of TMS and EEG in the context of experimental pain. More specifically, Experiment 1 investigated whether acute pain altered cortical excitability, reflected in the modulation of TEPs. The main outcome of this study is that relative to non-painful warm stimuli, painful thermal stimuli led to an increase on the amplitude of the TEP N45, with a larger increase associated with higher pain ratings. Because it has been argued that a significant portion of TEPs could reflect auditory potentials elicited by the sound (click) of the TMS, Experiment 2 constituted a control study that aimed to disentangle the cortical response related to TMS and auditory activity. Finally, Experiment 3 aimed to disentangle the cortical response to TMS and reafferent feedback from muscular activity elicited by suprathreshold TMS applied over M1. The fact that the authors accompanied their main experiment with two control experiments strengthens the conclusion that the N45 TEP peak could be implicated in the perception of painful stimuli.

      Perhaps, the addition of a highly salient but non-painful stimulus (i.e. from another modality) would have further ruled out that the effects on the N45 are not predominantly related to intensity/saliency of the stimulus rather than to pain per se.

      We thank the reviewer for their comment on the possibility of whether stimulus salience influences the N45 as opposed to pain per se. However, we note that in Experiment 1, despite the same level of stimulus salience/intensity for all participants (46 degrees), individual differences in pain ratings were associated with the change in the N45 amplitude, suggesting that the results cannot be explained by stimulus intensity/salience.

      Reviewer #2 (Public Review):

      The authors have used transcranial magnetic stimulation (TMS) and motor evoked potentials (MEPs) and TMS-electroencephalography (EEG) evoked potentials (TEPs) to determine how experimental heat pain could induce alterations in these metrics.In Experiment 1 (n = 29), multiple sustained thermal stimuli were administered over the forearm, with the first, second, and third block of stimuli consisting of warm but non-painful (pre-pain block), painful heat (pain block) and warm but non-painful (post-pain block) temperatures respectively. Painful stimuli led to an increase in the amplitude of the fronto-central N45, with a larger increase associated with higher pain ratings. Experiments 2 and 3 studied the correlation between the increase in the N45 in pain and the effects of a sham stimulation protocol/higher stimulation intensity. They found that the centro-frontal N45 TEP was decreased in acute pain.

      The study comes from a very strong group in the pain fields with long experience in psychophysics, experimental pain, neuromodulation, and EEG in pain. They are among the first to report on changes in cortical excitability as measured by TMS-EEG over M1.

      While their results are in line with reductions seen in motor-evoked responses during pain and effort was made to address possible confounding factors (study 2 and 3), there are some points that need attention. In my view the most important are:

      1) The method used to calculate the rest motor threshold, which is likely to have overestimated its true value : calculating highly abnormal RMT may lead to suprathreshold stimulations in all instances (Experiment 3) and may lead to somatosensory "contamination" due to re-afferent loops in both "supra" and "infra" (aka. less supra) conditions.

      The method used to assess motor threshold was the TMS motor threshold Assessment Tool (Awiszus et al., 2003). This was developed as a quicker alternative for calculating motor threshold compared to the traditional Rossini-Rothwell method which involves determining the lowest intensity that evokes 5/10 MEPs of at least 50 microvolts. The method has been shown to achieve the same accuracy of determining motor threshold as the traditional Rossini-Rothwell method, but with fewer pulses (Qi et al., 2011; Silbert et al., 2013). Therefore, the high RMTs in our study cannot be explained by the threshold assessment method. Instead, they are likely explained by aspects of the experimental setup that increased the distance between the TMS coil and the scalp, including the layer of foam placed over the coil, the EEG cap and the fact that the electrodes we used had a relatively thick profile.

      Awiszus, F. (2003). TMS and threshold hunting. In Supplements to Clinical neurophysiology (Vol. 56, pp. 13-23). Elsevier.

      Qi, F., Wu, A. D., & Schweighofer, N. (2011). Fast estimation of transcranial magnetic stimulation motor threshold. Brain stimulation, 4(1), 50-57.

      Silbert, B. I., Patterson, H. I., Pevcic, D. D., Windnagel, K. A., & Thickbroom, G. W. (2013). A comparison of relative-frequency and threshold-hunting methods to determine stimulus intensity in transcranial magnetic stimulation. Clinical Neurophysiology, 124(4), 708-712.

      2) The low number of pulses used for TEPs (close to ⅓ of the usual and recommended)

      We agree that increasing the number of pulses can increase the signal to noise ratio. During piloting, participants were unable to tolerate the painful stimulus for long periods of time and we were required to minimize the number of pulses per condition.

      We note that there is no set advised number of trials in TMS-EEG research. According to the recommendations paper, the number of trials should be based on the outcome measure e.g., TEP peaks vs. frequency domain measures vs. other measures and based on previous studies investigating test-retest reliability (Hernandez-Pavon et al., 2023). The choice of 66 pulses per condition was based on the study by Kerwin et al., (2018) showing that optimal concordance between TEP peaks can be found with 60-100 TMS pulses delivered in the same run (as in the present study). The concordance was particularly higher for the N40 peak at prefrontal electrodes, which was the key peak and electrode cluster in our study.

      Further supporting the reliability of the TEP data in our experiment, we note that the scalp topographies of the TEPs for active TMS at various timepoints (Figures 5, 7 and 9) were similar across all three experiments, especially at 45 ms post-TMS (frontal negative activity, parietal-occipital positive activity).

      In addition to this, the interclass correlation coefficient (Two-way fixed, single measure) for the N45 to active suprathreshold TMS across timepoints for each experiment was 0.90 for Experiment 1 (across pre-pain, pain, post-pain time points), 0.74 for Experiment 2 (across pre-pain and pain conditions), and 0.95 for Experiment 3 (across pre-pain conditions). This suggests that even with the fluctuations in the N45 induced by pain, the N45 for each participant was stable across time, further supporting the reliability of our data. These ICCs will be reported in the next revision.

      Hernandez-Pavon, J. C., Veniero, D., Bergmann, T. O., Belardinelli, P., Bortoletto, M., Casarotto, S., ... & Ilmoniemi, R. J. (2023). TMS combined with EEG: Recommendations and open issues for data collection and analysis. Brain Stimulatio, 16(3), 567-593

      Kerwin, L. J., Keller, C. J., Wu, W., Narayan, M., & Etkin, A. (2018). Test-retest reliability of transcranial magnetic stimulation EEG evoked potentials. Brain stimulation, 11(3), 536-544.

      Lack of measures to mask auditory noise.

      In TMS-EEG research, various masking methods have been proposed to suppress the somatosensory and auditory artefacts resulting from TMS pulses, such as white noise played through headphones to mask the click sound (Ilmoniemi and Kičić, 2010), and a thin layer of foam placed between the TMS coil and EEG cap to minimize the scalp sensation (Massimini et al., 2005). However, recent studies have shown that even when these methods are used, sensory contamination of TEPs is still present, as shown by studies that show commonalities in the signal between active and sensory sham conditions that mimic the auditory/somatosensory aspects of real TMS (Biabani et al., 2019; Conde et al., 2019; Rocchi et al., 2021). This has led many authors (Biabani et al., 2019; Conde et al., 2019) to recommend the use of sham conditions to control for sensory contamination. To separate the direct cortical response to TMS from sensory evoked activity, Experiment 2 (n = 10) included a sham TMS condition that mimicked the auditory/somatosensory aspects of active TMS to determine whether any alterations in the TEP peaks in response to pain were due to changes in sensory evoked activity associated with TMS, as opposed to changes in cortical excitability. Therefore, the lack of auditory masking does not impact the main conclusions of the paper.

      Ilmoniemi, R. J., & Kičić, D. (2010). Methodology for combined TMS and EEG. Brain topography, 22, 233-248.

      Massimini, M., Ferrarelli, F., Huber, R., Esser, S. K., Singh, H., & Tononi, G. (2005). Breakdown of cortical effective connectivity during sleep. Science, 309(5744), 2228-2232.

      Biabani, M., Fornito, A., Mutanen, T. P., Morrow, J., & Rogasch, N. C. (2019). Characterizing and minimizing the contribution of sensory inputs to TMS-evoked potentials. Brain stimulation, 12(6), 1537-1552.

      Conde, V., Tomasevic, L., Akopian, I., Stanek, K., Saturnino, G. B., Thielscher, A., ... & Siebner, H. R. (2019). The non-transcranial TMS-evoked potential is an inherent source of ambiguity in TMS-EEG studies. Neuroimage, 185, 300-312.

      Rocchi, L., Di Santo, A., Brown, K., Ibáñez, J., Casula, E., Rawji, V., ... & Rothwell, J. (2021). Disentangling EEG responses to TMS due to cortical and peripheral activations. Brain stimulation, 14(1), 4-18.

      3) A supra-stimulus heat stimulus not based on individual HPT, that oscillates during the experiment and that lead to large variations in pain intensity across participants is unfortunate.

      The choice of whether to calibrate or fix stimulus intensity is a contentious question in experimental pain research. A recent discussion by Adamczyk et al., (2022) explores the pros and cons of each approach and recommends situations where one method may be preferred over the other. That paper suggests that the choice of the methodology is related to the research question – when the main outcome of the research is objective (neurophysiological measures) and researchers are interested in the variability in pain ratings, the fixed approach is preferrable. Given we explored the relationship between MEP/N45 modulation by pain and pain intensity, this question is better explored by using the same stimulus intensity for all participants, as opposed to calibrating the intensity to achieve a similar of pain across participants.

      Adamczyk, W. M., Szikszay, T. M., Nahman-Averbuch, H., Skalski, J., Nastaj, J., Gouverneur, P., & Luedtke, K. (2022). To calibrate or not to calibrate? A methodological dilemma in experimental pain research. The Journal of Pain, 23(11), 1823-1832.

      So is the lack of report on measures taken to correct for a fortuitous significance (multiple comparison correction) in such a huge number of serial paired tests.

      Note that we used a Bayesian approach for all analyses as opposed to traditional frequentist approach. In contrast to the frequentist approach, the Bayesian approach does not require corrections for multiple comparisons (Gelman et al., 2000) given that they provide a ratio representing the strength of evidence for the null vs. alternative hypotheses as opposed to accepting or rejecting the null hypothesis based on p-values. As such, throughout the paper, we frame our interpretations and conclusions based on the strength of evidence (e.g. anecdotal/weak, moderate, strong, very strong) as opposed to referring to the significance of the effects.

      Gelman A, Tuerlinckx F. (2000). Type S error rates for classical and Bayesian single and multiple comparison procedures. Computational statistics, 15(3):373-90.

      Reviewer #3 (Public Review):

      The present study aims to investigate whether pain influences cortical excitability. To this end, heat pain stimuli are applied to healthy human participants. Simultaneously, TMS pulses are applied to M1 and TMS-evoked potentials (TEPs) and pain ratings are assessed after each TMS pulse. TEPs are used as measures of cortical excitability. The results show that TEP amplitudes at 45 msec (N45) after TMS pulses are higher during painful stimulation than during non-painful warm stimulation. Control experiments indicate that auditory, somatosensory, or proprioceptive effects cannot explain this effect. Considering that the N45 might reflect GABAergic activity, the results suggest that pain changes GABAergic activity. The authors conclude that TEP indices of GABAergic transmission might be useful as biomarkers of pain sensitivity.

      Pain-induced cortical excitability changes is an interesting, timely, and potentially clinically relevant topic. The paradigm and the analysis are sound, the results are mostly convincing, and the interpretation is adequate. The following clarifications and revisions might help to improve the manuscript further.

      1) Non-painful control condition. In this condition, stimuli are applied at warmth detection threshold. At this intensity, by definition, some stimuli are not perceived as different from the baseline. Thus, this condition might not be perfectly suited to control for the effects of painful vs. non-painful stimulation. This potential confound should be critically discussed.

      In Experiment 3, we also collected warmth ratings to confirm whether the pre-pain stimuli were perceived as different from baseline. We did not include this data initially in the first submission, but will do so in the supplemental material in our next revision. This data showed warmth ratings were close to 2/10 on average. This confirms that the non-painful control condition produced some level of non-painful sensation.

      2) MEP differences between conditions. The results do not show differences in MEP amplitudes between conditions (BF 1.015). The analysis nevertheless relates MEP differences between conditions to pain ratings. It would be more appropriate to state that in this study, pain did not affect MEP and to remove the correlation analysis and its interpretation from the manuscript.

      The interindividual relationship between changes in MEP amplitude and individual pain rating is statistically independent from the overall group level effect of pain on MEP amplitude. Therefore, conclusions for the individual and group level effects can be made independently.

      It is also important to note that in the pain literature, there is now increasing emphasis placed on investigating the individual level relationship between changes in cortical excitability and pain as opposed to the group level effect (Seminowicz et al., 2019; Summers et al., 2019). As such, it is important to make these results readily available for the scientific community.

      Summers, S. J., Chipchase, L. S., Hirata, R., Graven-Nielsen, T., Cavaleri, R., & Schabrun, S. M. (2019). Motor adaptation varies between individuals in the transition to sustained pain. Pain, 160(9), 2115-2125.

      Seminowicz, D. A., Thapa, T., & Schabrun, S. M. (2019). Corticomotor depression is associated with higher pain severity in the transition to sustained pain: a longitudinal exploratory study of individual differences. The Journal of Pain, 20(12), 1498-1506.

      3) Confounds by pain ratings. The ISI between TMS pulses is 4 sec and includes verbal pain ratings. Considering this relatively short ISI, would it be possible that verbal pain ratings confound the TEP? Moreover, could the pain ratings confound TEP differences between conditions, e.g., by providing earlier ratings when the stimulus is painful? This should be carefully considered, and the authors might perform control analyses.

      It is unlikely that the verbal ratings contaminated the TEP response as the subsequent TMS pulse was not delivered until the verbal rating was complete and given that each participant was cued by the experimenter to provide the pain rating after each pulse (rather than the participant giving the rating at any time). As such, it would not be possible for participants to provide earlier ratings to more painful stimuli. We will make this part of the protocol clearer in the next revision of the manuscript.

      4) Confounds by time effects. Non-painful and painful conditions were performed in a fixed order. Potential confounds by time effects should be carefully considered.

      Previous research suggests that pain alters neural excitability even after pain has subsided. In a recent meta-analysis (Chowdhury et al., 2022) we found effect sizes of 0.55-0.9 for MEP reductions 0-30 minutes after pain had resolved. As such, we avoided intermixing pain and warm blocks given subsequent warm blocks would not serve as a valid baseline, as each subsequent warm block would have residual effects from the previous pain blocks.

      At the same time, given there was no conclusive evidence for a difference in N45 amplitude between pre-pain and post-pain conditions of Experiment 1 (Supplementary Figure 1), it is unlikely that the effect of pain was an artefact of time i.e., the explanation that successive thermal stimuli applied to the skin results an increase in the N45, regardless of whether they are painful or not. We will make this point in our next revision.

      Chowdhury, N. S., Chang, W. J., Millard, S. K., Skippen, P., Bilska, K., Seminowicz, D. A., & Schabrun, S. M. (2022). The Effect of Acute and Sustained Pain on Corticomotor Excitability: A Systematic Review and Meta-Analysis of Group and Individual Level Data. The Journal of Pain, 23(10), 1680-1696.

      5) Data availability. The authors should state how they make the data openly available.

      We will upload the MEP, TEP and pain data on the Open science framework at the time of the next revision.

    1. Author Response

      Reviewer #1 (Public Review):

      Sun and colleagues investigated the cross-reactive antibodies between E. coli and the host in severe alcoholic hepatitis (SAH). The study found that IgA and IgG were deposited in the liver of SAH patients. Complements C3d and C4d were also deposited in the SAH patient's liver. Moreover, they found that the Ig accumulated in the SAH liver, but not in the SAH serum, induced hepatocyte killing, suggesting that liver Ig is important. Then, they found that these Ig can recognize both human and E. coli antigens. Very interestingly, SAH-derived Ig shows cross-reactivity to both human and E. coli antigens, suggesting E. coli-primed Ig in SAH may damage hepatocytes through host antigen recognition. These Ig are not observed in alcoholic cirrhosis patients. The liver RNA-seq data suggested that Ig was also produced in the liver, not only gut-derived Ig. This is a very interesting study showing the novel mechanism of SAH mediated by the Ig with the cross-reactivity with bacteria and host antigens, which is not observed in AC patients. Overall, the study design is reasonable and the data are consistent to support their central hypothesis. There are a few comments.

      We thank the Reviewer for his/her positive comments on our manuscript!

      Specific comments:

      1) Figures 1 and 2 show Ig deposition in the liver (it seems on hepatocytes). Not only Ig reaction to the specific antigen but also non-specific Fc receptor-mediated binding to hepatocytes could have contributed.

      2) Similarly, in Figure 2G Ig-mediated hepatocyte killing, Fc receptor-mediated hepatocyte killing may be involved.

      Anti-IgG antibody (ab200699) recognizes a protein of 75 kDa, identified as gamma heavy chain of human immunoglobulins. It is possible that non-specific Fc receptor-mediated binding to hepatocytes in the SAH liver can also be recognized by this anti-IgG antibody staining.

      However, no IgG or IgA deposition in the healthy donor livers was identified by anti-IgG or IgA staining. These results suggest that there was no antigen specific or Fc receptor-mediated binding to healthy hepatocytes.

      In the ADCC assay, hepatocytes isolated from healthy donor livers were used as the target cells. Immune cell (NK) mediated ADCC is mainly triggered by IgG (binding to antigens of hepatocytes) through the interaction between IgG Fc fragment and Fc-receptors (FcγRs) of NK cells. If IgG deposition in the SAH liver were mainly due to non-specific Fc receptor-mediated binding to hepatocytes, we would expect IgG binding to FcγRs of hepatocytes and no activation of NK cells. Ig-mediated hepatocyte killing (Figure 2G) indicates the Ig (from SAH liver) reaction to the specific antigens.

      3) The study examined the possibility of liver resident B cell and plasma cell-mediated Ig. As the authors mentioned in the discussion, B cells may be translocated from the intestine to the liver. Or the resident B cells (not from the gut) are also involved.

      We agree with the Reviewer at this point. The resident B cells may be also involved in the Ig production.

      Reviewer #2 (Public Review):

      In this paper, Ahmadi et al demonstrated that antibodies produced locally in the liver by infiltrating B cells can enhance liver damage caused by fat accumulation. The main finding is that human samples extracted from severe alcoholic hepatitis showed antibody accumulation that may be related to an enhanced immune response to self-antigens, which could ultimately fuel liver damage - which was already present due to alcohol consumption. Their data are corroborated by arrays and gene ontology assays, and I strongly believe that these data could add to the future options we have to treat patients.

      We thank the Reviewer for his/her positive comments on our manuscript!

    1. Author Response:

      We thank the reviewers and eLife editorial team for their valuable assessment. While additional experiments could further strengthen the theoretical framework proposed in this study, we believe that we have successfully established the delayed nuclear export of hemagglutinin and neuraminidase mRNAs by quantifying the FISH observation with the mathematical model. We agree that this finding raises a further important question to be addressed regarding the molecular mechanism underlying the prolonged nuclear retention of these segments. Our ongoing investigation is focusing on identifying potential cis-elements that contribute to the delay of these segments.

    1. Author Response:

      We are grateful to the three referees for their overall positive evaluation of our work and valuable constructive suggestions. We will address their public reviews with utmost care, as well as their private recommendations.

      To Reviewer #1: thanks for the positive comments and for the appreciation of our « impressive approaches »

      • We will add a more comprehensive section of neuronal migration analysis in the Material and Methods section. Sorry for that regrettable lack of precision.

      • We will address the comments about the sinuosity index definition and interpretations.

      • We will enhance the clarity of our writing and delve deeper into the discussion. As mentioned to Reviewer #3, the brevity of the text was influenced by the Short report format.

      To Reviewer #2 : thanks for the overall positive appreciation.

      We will also consider the recommendations for authors with care.

      To Reviewer #3 : thanks a lot for the feedback.

      • We will further develop the introduction and discussion sections as suggested. Regrettably and as mentioned to Reviewer #1, we had to significantly condense them due to the space constraints imposed by the Short Report format.

      • We will attempt to overexpress Map1B in order to assess the potential phenotypic similarity to the Fmr1 null condition, as suggested. However, it is important to acknowledge that this experiment may not yield a definitive answer due to potential differences in the level of Map1B expression driven by a CMV promoter compared to its endogenous expression in Fmr1 null neurons, as well as variations in the subcellular distribution of the overexpressed Map1B.

      • Regarding the anatomical consequences of aberrant migration, we acknowledge that neither our present work nor our previous study by Scotto-Lomassesse et al. provide evidence in this regard, as pointed out by the reviewer. Indeed, the delayed neurons do reach the olfactory bulb based on our findings. However, other studies have demonstrated that a delay in migration can have important functional consequences (eg Bocchi et al, 2017 doi: 10.1038/s41467-017-01046-w). Accordingly, we will revise and moderate our conclusions on this specific point.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors examine the role of the K700E mutation in the Sf3B1 splicing factor in PDAC and report that this Sf3B1 mutation promotes PDAC by decreasing sensitivity to TGF-b resulting in decreased EMT and decreased apoptosis as a result. They propose that the Sf3b1 K700E mutant causes decreased expression of Map3K7, a known mediator of TGF-β signaling and also known to be alternately spliced in other systems by the Sf3b1 K700E mutation. The role of splicing defects in cancer is relatively understudied and could identify novel targets for therapeutic intervention so this work is of potential significance. However, the data is over-interpreted in many instances and it is not clear the authors can make the claims they do based on the data shown. In particular, the data showing that decreased Map3k7 underlies the effects of the Sf3b1K700E mutant is very weak. Does over-expression of Map3k7 promote the EMT signature and induce apoptosis? Do the Map3k7 expressing organoids form tumors more effectively when transplanted into mice? Also, the novelty of the work is a concern since aberrant Map3k7 splicing due to SF3B1 mutation was seen previously in other systems. The authors also do not address the apparent conundrum of Sf3b1 K700E mutation promoting tumorigenesis despite there being less EMT which is also required for progression to metastasis in PDAC.

      Major Concerns.

      1) The analysis of the effect of Sf3b1K700E expression on normal pancreas and on PanINs in KC mice and PDAC in KPC mice is superficial and could be enhanced by staining for amylase, cytokeratin-19 and insulin. In particular, the data quantified in figure 1L should be accompanied by staining for CK19, Mucin5AC or some other marker of ductal transformation. Also, are any effects seen at older ages in normal mice?

      We performed staining of normal and cancerous mouse pancreata using Ck19, MUC5AC and b-amylase antibodies. In line with our hypothesis that Sf3b1K700E mainly plays a role in early stages of PDAC formation, we observed significant differences in CK19 (increase), MUC5AC (increase) and b-amylase (decrease) expression in early stage KPC-Sf3b1K700E vs. KPC tumors (Fig. 1G-J), but not in late stage tumors (see Figure 1-figure supplement 1F-I). In addition, no differences were observed in normal mice. We added these data to the revised manuscript (see Figure 1-figure supplement 1D, E).

      2) The invasion assays used are limited and should be complemented by more routine quantification of cell migration and invasion including such assays as a scratch assay, Boyden chamber assays and use of the IncuCyte system to quantify. As it stands the image in Figure 3B is difficult to interpret since it is very poorly described in the figure legend. Additional evidence is needed to make the claims made by the authors.

      During the revisions we performed wound healing/scratch assays using PANC-1 cells with inducible SF3B1 WT/K700E overexpression. We observed a significant difference in migratory capacity between SF3B1 WT- and SF3B1 K700E overexpressing cells stimulated with TGF-β. We added this data to the revised manuscript (Fig. 2I, J). We also describe the abovementioned figure 3B in more detail (revised manuscript Fig. 2G, H; line 759-767).

      3) The authors should show the actual CC3 staining quantified in Suppl. Figure 2G.

      We added a representative image of CC3 staining (see Figure 3-figure supplement 1A) for the quantified data (see Figure 3-figure supplement 1B in the revised manuscript).

      4) The graph in Figure 3L should show WT and Sf3b1K700E expressing organoids number both with and without TGF-b.

      Since without TGF-b supplementation organoids have to be split in a 1:3 ratio every 5 days, we could not follow the same passaging regimen as in experiments with TGF-b supplementation (split in a 1:2 ratio every 20 days, Fig. 3I). However, we assessed the organoid number grown in control medium without TGF-b for 4 passages (20 days) in a 1:3 ratio, and observe no difference in organoid number in WT and Sf3b1K700E expressing organoids (Author response image 1). In the revised manuscript we show with a highly quantitative read-out (CellTiterGlo) that Sf3b1K700E expressing organoids do not grow faster than Sf3b1 WT expressing organoids in absence of TGF-β (see Figure 3-figure supplement 1E). Taken together, we can exclude that Sf3b1K700E organoids outgrow Sf3b1 WT organoids in medium with TGF-β supplementation because they generally have a growth advantage.

      Author response image 1.

      Author response image 1. WT and Sf3b1K700E expressing organoids were cultured without TGF-β supplementation. Organoids were split in a 1:3 ratio every 5 days. Data points show organoid number before splitting, assessed for 4 passages.

      Reviewer #2 (Public Review):

      The manuscript has several areas of strength; it functionally explores a mutant that is detected in a portion of pancreatic cancers; it conducts mechanistic investigation and it uses human cell lines to validate the findings based on mouse models. Some areas for improvement are described below.

      1) TGF-b is known to act as a tumor suppressor early in carcinogenesis, and as a tumor promoter later. The authors should extend their analysis of mouse models to determine whether the effect of SF3B1K700E is specific to promoting initiation (e.g. more, early acinar ductal metaplasia) or faster progression of PanINs following their formation. Another way to address this could be acinar cultures, to determine whether an increased propensity to ADM exists.

      To further detangle the effect KPC-Sf3b1K700E with respect to tumor progression, we analyzed our autochthonous model at an early and late stage of tumor progression: Histological examination at 5 weeks revealed increased propensity to ADM (see Figure 1-figure supplement 1J, K), PanIN formation (shown by Muc5a1 and CK19 IF stainings, Fig. 1G, I, J) and a concomitant decrease of acinar cells (shown by b-amylase staining) in KPC-Sf3b1K700E vs. KPC tumors (Fig. 1G, H). Analyzing tumors at 9 weeks of age did not show differences in CK19 staining and fibrosis. We added these data to the revised manuscript (see Figure 1-figure supplement 1F-I).

      2) Given that the effect of SF3B1K700E expression is more prominent in KC mice, rather than in KPC mice, the authors should explain the rationale for using the latter for RNA sequencing.

      In KC mice, pre-invasive PanIN lesions only infrequently progress to PDAC (spontaneous progression, see Gabriel et al., Pancreatology, 2020 ). Therefore, it would have been difficult to collect enough material for cell sorting and downstream RNA sequencing of tumor cells. The KPC mouse model develops PDAC with a 100% penetrance, allowing the collection of sufficient material.

      3) Given that this mutation is found in about 3% of human pancreatic cancer, it would be interesting to know whether these tumors have any unique feature, and specifically any characteristic that could be harnessed therapeutically.

      Unfortunately, the size of published datasets is too small for a meaningful differential gene expression analysis of SF3B1-WT vs. SF3B1-K700E PDAC tumors (due to the low occurrence of SF3B1-K700E PDAC). However, harnessing the K700E mutation therapeutically by increasing missplicing through splicing inhibitors has previously been suggested, and it was shown that SF3B1-K700E mutated cancer cells are more prone to apoptosis when splicing is chemically targeted than SF3B1-WT cells. We tested a similar approach in murine pre-cancerous organoids, demonstrating that Sf3b1-WT organoids show higher survival than Sf3b1K700E expressing organoids when treated with the splicing-inhibitor Pladienolide B (Author response image 2). However, since this concept is not novel and not within the topic of our manuscript, we would prefer to not integrate this data into our manuscript.

      Author response image 2.

      Author response image 2. 33 nM of the splicing inhibitor Pladienolide B was added to the cell culture medium for 48 hours and the viability was assessed by normalizing organoid numbers to untreated control organoids. The line indicates WT and Sf3b1K700E organoids assessed in the same replicate.

      4) It would be interesting to know whether this mutation mutually exclusive to other mutations affecting response to TGF-b. Further, while the data might not be widely available, it would be interesting to know whether in human patients the mutation occurs in precursor lesions (PanIN might be difficult to assess, but IPMN might be doable) or at later stages.

      We performed a mutual exclusivity analysis in PDAC samples available at www.cbioportal.org, but did not find mutual exclusivity of SF3B1-K700E to genes of the TGF-β-pathway. Of note, the value of the analysis is limited by the small sample size of SF3B1-K700E PDAC (n=7) Moreover, to our knowledge there is no public tissue biobank for PDAC which would allow us to assess the stage of SF3B1-K700E mutated PDAC tumors. Thus, unfortunately we cannot histologically assess if the mutations already occur in early stages of human tumor development.

      Author response table 1.

      Author response table 1: Mutual exclusivity analysis of public PDAC databases (ICGC, CPTAC, QCMG, TCGA, UTSW), including 910 patients. Mutation frequency is 25% for SMAD4, 5% for TGF-ΒR2, 3% for SMAD2, 2.6% for TGF-ΒR1, 1.4% for SMAD3, 0.7% for SF3B1-K700E, 0.7% for TGF-ΒR3, 0.4% for SMAD1. Analysis was performed on cbioportal.org.

      Reviewer #3 (Public Review):

      Alternative splicing as a result of mutations in different components of the splicing machinery has been associated with a variety of cancer types, including hematological malignancies where this has been most extensively studied but also for solid tumors such as breast and pancreatic ductal adenocarcinoma (PDAC). Here the authors analyze genome sequencing data in human PDAC samples and identify a recurring mutation in the SF3B1 subunit that substitutes lysine for glutamate at residue 700 (SF3B1K700E) in PDACs. This mutation has been identified and its' molecular role in disease progression in other diseases has been studied, but the mechanism for promoting disease progression in pancreatic cancer has not been as well characterized.

      To study how SF3B1K700E contributes to PDAC pathology, the authors generate a novel genetically modified mouse model of a pancreas specific SF3B1K700E mutation and explore its oncogenicity and tumor promoting potential. The authors find that SF3B1K700E is not oncogenic, but potentiates the oncogenic potential of Kras and p53 (KP) driver mutations commonly found in PDAC tumors. The authors then proceed to characterize the molecular mechanisms that might drive this phenotype. By transcriptomic analysis, the authors find KP-SF3B1K700E tumors have downregulation of epithelial-to-mesenchymal transition (EMT) genes compared to KP tumors. The cytokine TGFβ has previously been found to limit PDAC initiation and progression by causing lethal EMT in PDAC and PDAC precursor cells. Thus, the authors propose SF3B1K700E inhibition of EMT blocks the tumor suppressive activity of TGFβ and this underpins the tumor promoting role of SF3B1K700E mutation in PDAC. Consistent with this finding, SF3B1K700E mutation blocks TGFβ-induced toxicity in a variety of cell culture models of PDAC and PDAC precursor models.

      Lastly, the authors seek to identify how altered splicing reduces EMT activity in PDAC cells. The authors identify misspliced genes consistent in both KP and human SF3B1K700E mutant cancer samples and find Map3k7 as one of 11 consistently misspliced genes. MAP3K7 has previously been identified as a positive regulator of EMT. Thus the authors speculated Map3k7 missplicing would lead to reduced MAP3K7 activity and a reduction EMT and that this underpins the TGFβ in SF3B1K700E mutant PDAC cells. Consistent with this, the authors find inhibition of MAP3K7 reduces TGFβ toxicity in SF3B1K700E WT cells and overexpression of MAP3K7 in SF3B1K700E mutant PDAC cells induces TGFβ toxicity. Altogether, this suggests activity of Map3k7 is responsible for altered EMT activity and TGFβ sensitivity in SF3B1K700E mutant PDAC.

      Altogether, the authors generate a valuable model to study the role of a recurring splicing mutation in PDAC and provide compelling evidence that this mutation is accelerates disease. The authors then perform both: (1) an open-ended investigation of how this mutation alters PDAC cell biology where they identify altered EMT activity and (2) rigorous mechanistic studies showing suppressed EMT provides PDAC cells with resistance to TGFβ, which has previously been shown to be tumor suppressive in PDAC, suggesting a possible mechanism by which SF3B1K700E mutation is oncogenic in PDAC that future animal studies can confirm. This work generates valuable models and datasets to advance the understanding of how mutations in the splicing machinery can promote PDAC progression and suggests alternative splicing of MAP3K7 is one such possible mechanism that altered splicing promotes PDAC progression in vivo.

      • One major concern about the manuscript is that the proposed mechanism by which SF3B1K700E mutation accelerates PDAC progression (MAP3K7 inhibition -> EMT inhibition -> reduced TGF-β toxicity) is only tested in ex vivo culture models and there is very limited and correlative data to suggest that this is the operative mechanism by which SF3B1K700E mutant tumors are accelerated. This is especially important because of recent findings that IFN-α signaling, which the authors also found to be high in SF3B1K700E mutant tumors, also promotes PDAC progression (https://www.biorxiv.org/content/10.1101/2022.06.29.497540v1). Thus, while thoroughly convinced by the rigorous ex vivo work that SF3B1K700E does lead to MAP3K7 inhibition -> EMT inhibition -> reduced TGF-β toxicity, further experiments to confirm this mechanism is critical in vivo would be needed to convince me that this mechanism is critical to tumor progression in vivo. For example, would forced expression of MAP3K7 slow orthotopic KP-SF3B1K700E tumor growth while leaving IFN-α signaling unperturbed?

      We thank the reviewer for raising these important points. To first test if the upregulation of IFN-α signaling, seen in our RNA-seq data of sorted KPC-Sf3b1K700E cells, was directly caused by the Sf3b1-K700E mutation, we assessed the 5 most deregulated genes of the IFN-α signature in in-vitro activated KPC and KPC-Sf3b1K700E organoids (analogous to the experiments on the EMT gene signature in see Figure 2-figure supplement 1D). However, in contrast to EMT marker genes, INFa signature genes were not differently expressed in KPC-Sf3b1K700E vs. KPC organoids (Author response image 3). Thus, increased IFN-α signaling in KPC-Sf3b1K700E tumors in mice is likely an indirect consequence of further progressed cancers rather than an effect directly caused by Sf3b1K700E mediated missplicing.

      Author response image 3.

      Author response image 3. Expression of the 5 most deregulated genes of the IFN-α gene set identified in sorted KPC-Sf3b1K700E cells in in-vitro activated KPC-Sf3b1K700E and KPC organoids. 4 biological replicates were performed. For analysis, Ct-values of the indicated genes were normalized to Actb and a two-tailed unpaired t-test was used to compute the indicated p-values.

      To next examine the effect of Map3k7 on tumors in vivo, we established orthotopic transplantation models with KPC and KPC-Sf3b1K700E cells, with overexpression or knockdown of Map3k7 (Author response image 4). However, in contrast to the autochthonous mouse model, already orthotopically transplanted KPC vs. KPC-Sf3b1K700E cells did not show differences in tumor size (see Figure 1-figure supplement 1M, N). These data support our hypothesis that Sf3b1-K700E rather plays an important role during early stages of PDAC (KPC cells are isolated from fully developed PDAC tumors and orthotopic KPC transplantation thus represents a late-stage PDAC model).

      Unfortunately, these data also demonstrate that orthotopic transplantation of KPC cells is not a suitable model for studying the impact of Map3k7 in PDAC development, and as expected, neither Map3k7 overexpression in transplanted KPC-Sf3b1K700E cells nor shRNA mediated knockdown of Map3k7 (shMap3k7) in transplanted KPC cells led to differences in growth compared to their control groups (Author response image 4). In line with these results, the EMT genes that were found to be differentially expressed in our autochthonous mouse model (KPC vs. KPC-Sf3b1K700E) were expressed at similar levels upon Map3K7 downregulation or overexpression.

      Since establishment of an autochthonous KPC PDAC mouse model with a knock-down of MAP3K7 is out of scope for a revision, in the revised manuscript we discuss the limitation of our study that the molecular link between Sf3b1K700E, Map3k7 and Tgfb resistance has only been studied in vitro in organoids and cell lines. We also adapted the abstract and the title of the manuscript accordingly (formerly “Mutant SF3B1 promotes PDAC malignancy through TGF-β resistance”, now “Mutant SF3B1 promotes malignancy in PDAC”).

      Author response image 4.

      Author response image 4. (A) Relative gene expression of Map3k7 in KPC cells transduced with shRNA targeting Map3k7 (shMap3k7), normalized to KPC cells transduced with scrambled control shRNA (shCtrl). 3 biological replicates are shown. (B) Weight of tumors derived by orthotopical transplantation of shMap3k7 and shCtrl KPC cells. 5 biological replicates are shown. (C) Relative gene expression of EMT genes in tumors derived by orthotopic transplantation of shCtrl and shMap3k7 cells. 4 biological replicates are shown. (D) Relative gene expression of Map3k7 in KPC-Sf3b1K700E cells transduced with an overexpression vector of Map3k7 (OE Map3k7), normalized to control KPC cells without Map3k7 overexpression. 3 biological replicates are shown, a two-sided student’s t-test was used to calculate significance. (E) Weight of tumors derived by orthotopical transplantation of Map3k7 overexpressing KPC-Sf3b1K700E cells (n=5) and control KPC-Sf3b1K700E cells (n=4). (F) Relative gene expression of EMT genes in tumors derived by orthotopic transplantation of KPC-Sf3b1K700E cells with- and without overexpression of Map3k7. 4 biological replicates are shown. A two-sided student’s t-test was used to calculate significance in Fig. 2A-F.

    1. Author Response

      Reviewer #1 (Public Review):

      Cedillo et al. address the critically important question of how biguanides exert their positive effects on longevity using the powerful C. elegans model. Biguanides metformin and phenformin have been widely prescribed in the clinic to address metabolic challenges of diabetes; more recently the value of metformin in addressing specific cancers has emerged, and testing for impact on healthy human aging is getting underway. The need to understand the mechanism of biguanide action and the metabolic consequences of biguanide administration is clear.

      The authors report that three genes that suppress longevity associated with metformin or phenformin treatment affect a common pathway for ether lipid biosynthesis; this ether lipid biosynthesis pathway is required for mitochondrial lifespan extension, eat-2 mediated dietary restriction longevity, and TOR inhibition-associated longevity, but not insulin pathway mediated longevity. Authors document with lipid profiling how ether lipids and some other lipids are impacted by phenformin vs. genetic disruption of ether lipid biosynthesis, define the tissue primarily responsible for the ether lipid biosynthesis, show that over-expression of enzyme fard-1 is sufficient to confer most of the phenformin effect, and implicate conserved stress transcription factor SKN-1 as a downstream outcome of the ether lipid change.

      Strengths include the exploitation of the nematode model to address requirements not readily discerned in other models, the rigor of genetic documentation, the inclusion of metabolic profiling, the testing of multiple potential pathways that have been in the general discourse regarding metformin action, and the elaboration of a reasonably supported model that ether lipid biosynthesis is required for phenformin to activate longevity-promoting metabolic defenses downstream of conserved stress-responsive transcription factor SKN-1/NRF2. The novelty includes that ether lipids are directly linked to lifespan, ether lipid biosynthesis is needed for specific longevity pathways, and that ether lipids might play a role in a shift to pro-longevity metabolism.

      There are some points that require clarification and could benefit from additional study, some wording and presentation issues, and a few missing points of potential discussion.

      Overall, the data reported in this paper contribute a highly valuable advance in the biguanide field and adds stimulating hypotheses to the scientific community for moving forward in this biomedically important area.

      We thank Reviewer #1 for their positive feedback regarding our work, and for their insightful suggestions to improve the rigor and impact of this manuscript.

      Reviewer #2 (Public Review):

      This manuscript pulls together a series of integrated genetic and metabolomic data sets to examine the molecular basis for biguanide action in C. elegans. Biguanides such as Metformin are important anti-diabetic drugs as well as being explored as a therapeutic mechanism for increasing human longevity. Understanding the molecular basis of biguanide action is of general interest to those in the ageing and age-related health fields as well as to those studying metabolism and obesity. The work here has been carried out in C. elegans but the work can be picked up by those working in mammalian systems. More could be done to highlight the conserved aspects of the mechanisms involved to assist with this translatability.

      The methodology used is in general standard in the field and experiments are reported in detail. The successful use of metabolomics in C. elegans and its associated protocols is helpful as more labs expand to do this type of work.

      Strengths: In general all the experiments presented are logical and well executed with the conclusions supported by the data. I am convinced that: 1) Metformin and Phenformin extend C. elegans lifespan (although that has previously been shown), 2) biguanides induce changes in ether lipids, 3) genes required for ether lipid biogenesis are required for the lifespan incurred with biguanide treatment and, in the case of fard-1 oe, can also promote longevity when levels are increased, 4) ether lipid biogenesis is also needed for other specific key longevity processes to extend lifespan, and 5) that some key ageing regulators (skn-1, aak-2 and daf-16) are required for fard-1 oe to extend lifespan.

      Weaknesses: I was less convinced by the fat accumulation data and felt that the link between skn-1 gain of function and ether lipid genes was not clear and that the results were more correlative than mechanistic. If age-associated somatic depletion of fat is important for the lifespans seen here then this is interesting and important and identifying an epistatic, genetic link between the implicated genes and fat levels is desirable. Additionally, biguanides are reported to have major effects on the metabolism and growth of bacteria. As C. elegans grows on and eats E. coli, it is important that the biguanides in question do not alter the worm's food source. If bacterial growth is restricted or metabolically altered this would have a major impact on fat metabolism and the other outputs examined here (see Cabreiro et al 2013). Therefore the impact of these biguanide treatments on the C. elegans foods used here should be clearly addressed. Additionally, biguanide treatment is subject to dose dependence. Different concentrations of biguanide are used for different types of experiments to make correlative points e.g. growth inhibition at 160mM metformin, and metformin uptake measured in C. elegans treated with 50mM. It is not clear why, or whether this could impact the results. Can the authors be sure that these different doses do not alter metformin action and/or uptake either by the worms or the way the bacteria metabolise it? I appreciate that it is interesting and important to understand what biguanides are doing in the organism irrespective of whether this is a direct or indirect effect but knowing how the effects are achieved could be important for treatment strategies moving forwards.

      We thank Reviewer #2 for their favorable comments on our manuscript and for their helpful feedback regarding the weaknesses in our initial manuscript submission. We address the major comments below:

      1. Regarding the genetic link between SKN-1 and ether lipid biosynthetic machinery in regulation of fat accumulation, we have performed Asdf analysis in skn-1(zu135) total loss-of-function animals, rigorously indicating that biguanides require SKN-1 to drive somatic lipid depletion (Figure 6D-E). We additionally show that biguanides activate the innate immune response sensor dod-24, previously shown by us to be activated by a transcriptionally redirected SKN-1 metabolic stress response program2, in a manner that requires both SKN-1 and all ether lipid biosynthetic machinery (Figure 6F and Figure 6 – figure supplement 1C). Combined with our previous result showing that fard-1 (oe3) requires SKN-1 to extend lifespan (Figure 5D), and our observation that SKN-1 gain-of-function animals do not mimic the ether lipid pattern seen in FARD-1 overexpressing animals (Reviewer Response 1), our results rigorously corroborate that biguanides activate SKN-1 downstream of ether lipid machinery to exert a metabolic stress defense response. This activation results in alterations of somatic lipid homeostasis, innate immune response, and pro-longevity outcomes.

      2. Regarding possible indirect effects of biguanides on bacterial growth and metabolism to modulate ether lipid biosynthetic activity, we performed FAME GC/MS of Adult Day 1 nematodes treated with or without phenformin and grown on live or dead, metabolically inactive OP50-1 E. coli food sources using a rigorously established 1% PFA treatment protocol (Figure 6 – figure supplement 2)3. We additionally performed lifespan analyses in the same experimental design, with the inclusion of lifespan extending doses of metformin (Figure 6 – figure supplement 3). Both experiments show, with biological replication, that biguanide-mediated induction of ether lipid synthesis, biguanide-mediated lifespan extension, and the dependency of ether lipid machinery on biguanide-mediated lifespan extension all operate through direct interactions in the worm, as opposed to indirect effects on bacterial growth and metabolism.

      3. Regarding the use of different doses of biguanides: this point was also raised by Reviewer 1 and is responded to above in Author Response 4. Briefly, the goal of the 160 mM dosage of metformin used in our prior genetic screens10 and subsequently highlighted in Figure 1 – figure supplement 1A is to enhance the sensitivity and specificity of our discovery approach to identify effectors of the biological action of biguanides. The 160 mM dose causes potent growth inhibition in C. elegans. Our prior published work indicates that use of this dose to identify growth inhibitory effectors of biguanides can also identify longevity effectors of metformin 10. Thus, we used a similar strategy here to identify fard-1 and acl-7, which were initially identified as gene knockdowns that block the growth inhibitory effects of 160 mM metformin. The justification for the different biguanide concentrations used in this work is now included in the text for clarity (lines 135 to 153).

    1. Author Response

      Reviewer #2 (Public Review):

      "... the fact that MGN-BLA circuit disruptions were done during the conditioning phase of associative threat learning, and not during the recall phase only, complicates the side-by-side comparison: it could be argued that in this case what is disturbed is the processing of the unconditioned innately aversive stimulus in the task, the foot shock, instead of the learnt threat of the sound".

      In our previous email to the editors, we mentioned work by Barsy et al., showing that indeed the inhibition of this input during the recall phase reduces freezing response (Please see Fig. 8 in Barsy et al). In the new revision, we refer to this experiment.

      Specific comments (weaknesses):

      e) There are not enough analysis and method descriptions to demonstrate the specificity of the targeting approach

      We have included these data as supplementary figures (S2A and B, S5B, S7, S9A and S10K) and added a more detailed methodology in the method section.

      f) …the authors administer blockers of beta-adrenergic receptors systemically. This reveals differences between MGN-BLA projecting neurons, BLA neurons, and innate and learnt threat, but the mechanistic implications are not clear and should be discussed.

      In the revised manuscript, we extensively discuss these points: (This indicates that the looming stimulus conveyed through the thalamic input…may contribute to the variability in the effect of the drug in freezing response); (...in mice injected with propranolol, the defensive responses…The differences in species or strains used, or experimental parameters may contribute to the variability in the effect of the drug in freezing response.)

    1. Author Response

      Reviewer #2 (Public Review):

      Mahbub et al further elucidate the structural and functional consequences of the ARL15-CNNM2 interaction for divalent cation transport. They show that ARL15 has low GTP binding affinity and could not detect GTPase activity, questioning whether ARL15 functions as a GTPase. Although the interaction of ARL15 and CNNMs has been demonstrated by multiple groups before, this study addresses some of the key questions that are central within the TRPM-CNNM-PRL-ARL15 field. Particularly, the authors have identified residues in both ARL15 and CNNM proteins which are required for their binding to one another. In addition, they have also illustrated how PRL proteins compete with ARL15 for their binding to CNNMs. Lastly, the functional consequences of ARL15 binding to CNNMs are shown by TRPM7-mediated Zn2+ transport assays.

      We thank the reviewer for the many positive comments.

      However, the current dataset also comes with limitations. Previous studies demonstrated that PRLs interact with the CBS domains of CNNMs and lock them in their so-called "flat" confirmation. It remains unclear how ARL15 affects the structure of the CBS domains, especially in the presence of ATP. The subcellular localisation of these interactions has not been examined. Moreover, the consequences of ARL15 on TRPM7 activity are not completely elucidated. It remains unclear whether this functional effect is CNNM-dependent. Moreover, how the zinc uptakes translate to other divalent ion transport, such as magnesium, has not been examined. These questions should be answered to confirm the model as presented in Figure 7.

      We agree that CBS-pair domain dimerization is important. Structural studies of a prokaryotic CNNM homolog from our group showed large conformational changes in an ATP-binding mutant (Chen et al., Nat Comm, 2021).

      While most crystal structure of PRL-CNNM complexes do indeed show the flat conformation, it is unclear if that is a consequence of crystal packing or PRL binding. We do not see an effect of ATP on PRL binding affinity. The CBS-pair domain dimerization interface appears to be very adaptable; our recent structure of PRL-CNNM proteins from flies shows a completely different dimerization interface (Fakih et al, JBC, 2023).

      In contrast, the ARL15-CNNM interaction is affected by ATP. As suggested by the reviewer, we have examined ARL15 binding to a CNNM2 mutant (T568I) that is unable to bind ATP. These results confirm the roughly two-fold improvement in affinity is due to ATP binding to the CNNM2 CBS-pair domain and resulting conformational changes.

      As requested by all the reviewers, we have added experiments to Figure 7 that investigate the effect of ARL15 on Mg2+ transport.

    1. Author Response

      Reviewer #1 (Public Review):

      It has recently been shown that the HIV-1 protease can cleave and activate the inflammasome-forming sensor CARD8 upon treatment of infected cells with non-nucleoside reverse-transcriptase inhibitors (Wang et al., Science 2021). Here, Kulsuptrakul and colleagues show that the high susceptibility to proteolytic activation by the HIV-1 protease is a specific feature of human CARD8. They show that changes in human-specific F-F motif render the CARD8 protein of non-human primates largely resistant to cleavage. Interestingly, the protease of SIVcpz the direct precursor of pandemic HIV-1 strains are also capable of cleaving human but not chimpanzee CARD8. Thus, the authors propose that a human-specific CARD8 motif may contribute to the increased levels of inflammation and disease progression in HIV-infected humans compared to non-human primates that are naturally infected with SIV.

      Strengths of the study are that the authors convincingly show that a single human-specific amino acid change in CARD8 determines its susceptibility to cleavage by the HIV-1 protease and that the results shown are well controlled and presented. It is also interesting that SIVcpz can cleave human CARD8 and activate an inflammatory response. The major weakness is that it remains unclear whether HIV-1 of SIVcpz may induce CARD8-dependent inflammatory responses in primary CD4+ T cells or macrophages. The most relevant setting in the study was the infection of THP-1 cells with the T cell line-adapted X4-tropic HIV-1 LAI molecular clone. However, the effects on cell death were modest (Figure 3A) and on IL-1ß secretion was not dose-dependent (Figure 3B). Altogether, stronger effects were observed with VSV-G-pseudotyped HIV-1 and only those were used in subsequent experiments involving human CARD8 cleavage mutants (Figure 4). Additional evidence that primary HIV-1 molecular clones and/or SIVcpz may indeed induce CARD8-dependent inflammatory responses in primary viral target cells would greatly increase the significance of the study. In the absence of such data, conclusions about the potential role of CARD8 sensing of the viral protease for the pathogenesis of AIDS should be cautioned throughout.

      We have now added an experiment using the HIV-1 strain BG505, which uses a distinct co-receptor and is from a different clade than LAI. The results show that BG505 infection also induces CARD8-depdenent inflammasome activation (Figure 3E).

      We have also more specifically measured caspase-1 activation using a FLICA assay (which specifically measures active CASP1) in WT, CARD8 KO and CASP1 KO THP-1 cells (Figure 3D, right panel). In experiments with both VSV-g pseudotyped and infectious virus, we observed increased FLICA signal in WT but not CASP1 KO THP-1 cells. Moreover, the FLICA signal and other readouts of inflammasome activation in CARD8 KO THP-1 cells was indistinguishable from the CASP1 KO THP-1 cells (Figure 3D). Thus, our results are consistent with HIV-1 infection inducing CASP1-dependent pyroptosis downstream of CARD8.

      While we agree with the reviewers that primary cell data would be informative, we believe that this is not the main point of our paper. Moreover, others have already shown CARD8-dependent cell death after infection of primary T cells with HIV-1 (Wang et al., 2021, Science; Clark et al. 2022, Nature Chem Biol; Balibar et al. 2023, Science Trans Med; Wang & Shan, 2023, BioRxiv). We therefore have not extensively pursued primary cell experiments in this manuscript and instead have elected to use a more easily manipulatable cell line to focus on the evolutionary and mechanistic basis of CARD8 activation by simian lentiviruses.

    1. Author Response

      Reviewer #1 (Public Review):

      Point 1: Many of the initial analyses of behavior metrics, for instance predicting reaction times, number of fixations, or fixation duration, use value difference as a regressor. However, given a limited set of values, value differences are highly correlated with the option values themselves, as well as the chosen value. For instance, in this task the only time when there will be a value difference of 4 drops is when the options are 1 and 5 drops, and given the high performance of these monkeys, this means the chosen value will overwhelmingly be 5 drops. Likewise, there are only two combinations that can yield a value difference of 3 (5 vs. 2 and 4 vs 1), and each will have relatively high chosen values. Given that value motivates behavior and attracts attention, it may be that some of the putative effects of choice difficulty are actually driven by value.

      To address this question, we have adapted the methods of Balewski and colleagues (Neuron, 2022) to isolate the unique contributions of chosen value and trial difficulty to reaction time and the number of fixations in a given trial (the two behaviors modulated by difficulty in the original paper). This new analysis reveals a double dissociation in which reaction time decreases as a function of chosen value but not difficulty, while the number of fixations in a trial shows the opposite pattern. Our interpretation is that reaction time largely reflects reward anticipation, whereas the number of fixations largely reflects the amount of information required to render a decision (i.e., choice difficulty). See lines 144-167 and Figure 2.

      Point 2: Related to point 1, the study found that duration of first fixations increased with fixated values, and second (middle) fixation durations decreased with fixated value but increased with relative value of the fixated versus other value. Can this effect be more concisely described as an effect of the value of the first fixated option carrying over into behavior during the second fixation?

      This is a valid interpretation of the results. To test this directly, we now include an analysis of middle fixation duration as a function of the not-currentlyviewed target. Note that the vast majority of middle fixations are the second fixation in the trial, and therefore the value of the unattended target is typically the one that was viewed first. The analysis showed a negative correlation between middle fixation duration and the value of the unattended target which is consistent with the first fixated value carrying over to the second fixation. See lines 243-246.

      Point 3: Given that chosen (and therefore anticipated) values can motivate responses, often measured as faster reaction times or more vigorous motor movements, it seems curious that terminal non-decision times were calculated as a single value for all trials. Shouldn't this vary depending at least on chosen values, and perhaps other variables in the trial?

      In all sequential sampling model formulations we are aware of, nondecision time is considered to be fixed across trial types. Examples can be found for perceptual decisions (e.g., Resulaj et al., 2009) and in the “bifurcation point” approach used in the recent value-based decision study by Westbrook et al. (2020).

      To further investigate this issue, we asked whether other post-decision processes were sensitive to chosen value in our paradigm. To do so, we measured the interval between the center lever lift and the left or right lever press, corresponding to the time taken to perform the reach movement in each trial (reach latency). We then fit a mixed effects model explaining reach latency as a function of chosen value. While the results showed significantly faster reach latencies with higher chosen values, the effect size was very small, showing on average a ~3ms decrease per drop of juice. In other words, between the highest and lowest levels of chosen value (5 vs. 1), there is only a difference of approximately 12ms. In contrast, the main RT measure used in the study (the interval between target onset and center lever lift) is an order of magnitude more sensitive to chosen value, decreasing ~40ms per drop of juice. These results are shown in Author response image 1.

      Author response image 1.

      This suggests that post-decision processes (NDT in standard models and the additive stage in the Westbrook paper) vary only minimally as a function of chosen value. We are happy to include this analysis as a supplemental figure upon request.

      Point 4: The paper aims to demonstrate similarities between monkey and human gaze behavior in value-based decisions, but focuses mainly on a series of results from one group of collaborators (Krajbich, Rangel and colleagues). Other labs have shown additional nuance that the present data could potentially speak to. First, Cavanaugh et al. (J Exp Psychol Gen, 2014) found that gaze allocation and value differences between options independently influence drift rates on different choices. Second, gaze can correlate with choice because attention to an option amplifies its value (or enhances the accumulation of value evidence) or because chosen options are attended more after the choice is implicitly determined but not yet registered. Westbrook et al. (Science, 2020) found that these effects can be dissociated, with attention influencing choice early in the trial and choice influencing attention later. The NDTs calculated in the present study allot a consistent time to translating a choice into a motor command, but as noted above don't account for potential influences of choice or value on gaze.

      The two-stage model of gaze effects put forth by Westbrook et al. (2020) is consistent with other observations of gaze behavior and choice (i.e., Thomas et al., 2019, Smith et al., 2018, Manohar & Husain, 2013). In this model, gaze effects early in the trial are best described by a multiplicative relationship between gaze and value, whereas gaze effects later in the trial are best described with an additive model term. To test the two-stage hypothesis, Westbrook and colleagues determined a ‘bifurcation point’ for each subject that represented the time at which gaze effects transitioned from multiplicative to additive. In our data, trial durations were typically very short (<1s), making it difficult to divide trials and fit separate models to them. We therefore took at different approach: We reasoned that if gaze effects transition from multiplicative to additive at the end of the trial, then the transition point could be estimated by removing data from the end of each trial and assessing the relative fit of a multiplicative vs. additive model. If the early gaze effects are predominantly multiplicative and late gaze effects are additive, the relative goodness of fit for an additive model should decrease as more data are removed from the end of the trial. To test this idea, we compared the relative model fit of an additive vs. multiplicative models in the raw data, and for data in which successively larger epochs were removed from the end of the trial (50, 100, 150, 200, 300, and 400ms). The relative fit was assessed by computing the relative probability that each model accurately reflects the data. In addition, to identify significant differences in goodness of fit, we compared the WAIC values and their standard errors for each model (Supplemental File 3). As shown in Figure 4, the relative fit probability for both models is nonzero in the raw data 0 truncation), indicating that a neither model provides a definitive best fit, potentially reflecting a mixture of the two processes. However, the relative fit of the additive model decreases sharply as data is removed, reaching zero at 100ms truncation. 100ms is also the point at which multiplicative models provide a significantly better fit, indicated by non-overlapping standard error intervals for the two models (Supplemental File 3). Together, this suggested that the transition between early- and late-stage gaze effects likely occurs approximately 100ms before the RT.

      To minimize the influence of post-decision gaze effects, the main results use data truncated by 100ms. However, because 100ms is only an estimate, we repeated the main analyses over truncation values between 0 and 400ms, reported in Figure 6 - figure supplement 1 & Figure 7 - figure supplement 1. These show significant gaze duration biases and final gaze biases in data truncated by up to 200ms.

      Reviewer #2 (Public Review):

      Recommendation 1: The only real issue that I see with the paper is fairly obvious: the authors find that the last fixations are longer than the rest, which is inconsistent with a lot of the human work. They argue that this is due to the reaching required in this task, and they take a somewhat ad-hoc approach to trying to correct for it. Specifically, they take the difference between final and non-final, second fixations, and then choose the 95th percentile of that distribution as the amount of time to subtract from the end of each trial. This amounts to about 200 ms being removed from the end of each trial. There are several issues with this approach. First, it assumes that final and non-final fixations should be the same length, when we know from other work that final fixations are generally shorter. Second, it seems to assume that this 200ms is "the latency between the time that the subject commits to the movement and the time that the movement is actually detected by the experimenter". However, there is a mismatch between that explanation and the details of the task. Those last 200ms are before the monkey releases the middle lever, not before the monkey makes a left/right choice. When the monkey releases the middle lever, the stimuli disappear and they then have 500ms to press the left or right lever. But, the reaction time and fixation data terminate when the monkey releases the middle lever. Consequently, I don't find it very likely that the monkeys are using those last 200ms to plan their hand movement after releasing the middle lever.

      Thanks for the opportunity to clarify these points. There are three related issues:

      First, with regards to fixation durations, in the updated Figure 3 we now show durations as a function of both the absolute order in the trial (first, second, third, fourth, etc.) and the relative order (final/nonfinal). We find that durations decrease as a function of absolute order in the trial, an effect also seen in humans (see Manohar & Husain, 2013). At the same time, while holding absolute order constant, final fixations are longer than non-final fixations. To explain the discrepancy with human final fixation durations, we note that monkeys make many fewer fixations per trial (~2.5) than humans do (~3.7, computed from publicly available data from Krajbich et al., 2010.) This means that compared to humans, monkeys’ final fixations occur earlier in the trial (e.g., second or third), and are therefore comparatively longer in duration. Note that studies with humans have not independently measured fixation durations by absolute and relative order, and therefore would not have detected the potential interaction between the two effects.

      Second, the comment suggests that the final 200ms before lever lift is not spent planning the left/right movement, given that the monkeys have time after the lever lift in which to execute the movement (400 or 500ms, depending on the monkey). The presumption appears to be that 400/500ms should be sufficient to plan a left/right reach. However, we think that these two suggestions are unlikely, and that our original interpretation is the most plausible. First, the 400/500ms deadline between lift and left/right press was set to encourage the monkeys to complete the reach as fast as possible, to minimize deliberations or changes of mind after lifting the lever. More specifically, these deadlines were designed so that on ~0.5% of trials, the monkeys actually fail to complete the reach within the deadline and fail to obtain a reward. This manipulation was effective at motivating fast reaches, as the average reach latency (time between lift and press) was 165 SEM 20ms for Monkey K, and 290 SEM 100ms for Monkey C.

      Therefore, given the time pressure imposed by the task, it is very unlikely that significant reach planning occurs after the lever lift. In addition to these empirical considerations, the idea that the final moments before the RT are used for motor planning is a standard assumption in many theoretical models of choice (including sequential sampling models, see Ratcliff & McKoon 2008, for review), and is also well-supported by studies of motor control and motor system neurophysiology. Based on these, we think the assumption of some form of terminal NDT is warranted.

      Third, we have changed our method for estimating the NDT interval. In brief we sweep through a range of NDT truncation values (0-400ms) and identify the smallest interval (100ms) that minimizes the contribution of “additive” gaze effects, which are thought to reflect late-stage, post-decision gaze processes. See the response to Point 4 for Reviewer 1 above, Figure 4 and lines 267-325 in the main text. In addition, we report all of the major study results over a range of truncation values between 0 and 400ms.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      […] Overall, the conclusions of this study are mostly well supported by the data. The concept of placental aging has been controversial, with several prior studies with conflicting viewpoints on whether placental aging occurs at all, is a normal process during gestation, or rather only a pathologic phenomenon in abnormal pregnancies. This has been rather difficult to study given the difficulty of obtaining serial placental samples in late gestation. The authors used both a mouse model of serial placental sampling and human placental samples obtained at preterm, but non-pathologic deliveries, which is an impressive accomplishment as it provides insight into a previously poorly understood timepoint of pregnancy. The data clearly demonstrate changes in the HIF-1 pathway and cellular senescence at increasing gestational ages in the third trimester, which is consistent with the process of aging in other tissues.

      Weaknesses of this study are that although the authors attribute alterations in HIF-1 pathways in advanced gestation to hypoxia, there are no experiments directly assessing whether the changes in HIF-1 pathways are due to hypoxia in either in vitro or in vivo experiments. HIF-1 has both oxygen-dependent and oxygen-independent regulation, so it is unclear which pathways contribute to placental HIF-1 activity during late gestation, especially since the third-trimester placenta is exposed to significantly higher oxygen levels compared to the early pregnancy environment. Additionally, the placenta is in close proximity to the maternal decidua, which consists of immune and stromal cells, which are also significantly affected by HIF-1. Although the in vitro experimental data in this study demonstrate that HIF-1 induction leads to a placenta senescence phenotype, it is unclear whether the in vivo treatment with HIF-1 induction acts directly on the placenta or rather on uterine myometrium or decidua, which could also contribute to the initiation of preterm labor.

      We thank Reviewer #1 for the thoughtful analysis offered here. We agree that our study has not determined whether placental HIF-1 activation occurring during late gestation is due to oxygen-dependent or oxygen-independent regulation; both possibilities are outlined in paragraph 3 of the Discussion. We used a pharmacological approach in our experiments characterizing the effects of HIF-1 stabilization in trophoblasts because it allows superior command of experimental conditions, but in future studies using hypoxic growth conditions we will determine whether oxygen sensing is a critical component of the aging effects on mitochondrial abundance, metabolism, and cellular senescence in the placenta.

      Reviewer #1 also appropriately highlights the possibility that extra-placental effects of DMOG may contribute to the initiation of preterm labor in our mouse model. Future studies making use of mice with placenta-specific transgenes will allow clarification of the specific contributions of placental HIF-1 signaling to labor onset.

      Reviewer #2 (Public Review):

      […] The major strength of this study is the use of multiple model systems to address the question at hand. The consistency of findings between mouse and human placenta, and the validation of mechanisms in vitro and in vivo modeling are strong support for their conclusions. The rationale for studying the term placentas to understand the abnormal process of preterm birth is clearly explained. Although the idea that hypoxic stress and placental senescence are triggers for labor is not novel, the comprehensiveness of the approach and idea to study the normal aging process are appreciated.

      There are some areas of the manuscript that lack clarity and weaknesses in the methodology worth noting. The rationale for focusing on senescence and HIF-1 is not clearly given that other pathways were more significantly altered in the WGCNA analysis. The placental gene expression data were from bulk transcriptomic analyses, yet the authors do not explicitly discuss the limitations of this approach. Although the reader can assume that the authors attribute the mRNA signature of aging to trophoblasts - of which, there are different types - clarification regarding their interpretation of the data and the relevant cell types would strengthen the paper. Additionally, while the inclusion of human placenta data is a major strength, the differences between mouse and human placental structure and cell types make highlighting the specific cells of interest even more important; although there are correlations between mouse and human placenta, there are also many differences, and the comparison is further limited when considering the whole placenta rather than specific cell populations.

      Additional details regarding methods and the reasons for choosing certain readouts are needed. Trophoblasts are sensitive to oxygen tension which varies according to gestational age, and it is unclear if this variable was taken into consideration in this study. Many of the cellular processes examined are well characterized in the literature yet the rationale for choosing certain markers is unclear (e.g., Glb1 for senescence; the transcripts selected as representative of the senescence-associated secretory phenotype; mtDNA lesion rate).

      Overall, the findings presented are a valuable contribution to the field. The authors provide a thoughtful discussion that places their findings in the context of current literature and poses interesting questions for future pursuit. Their efforts to be comprehensive in the characterization of placental aging is a major strength; few placental studies attempt to integrate mouse and human data to this extent, and the validation and presentation of a potential mechanism by which fetal trophoblasts signal to maternal uterine myocytes are exciting.

      Nevertheless, a clear discussion of the methodologic limitations of the study would strengthen the manuscript.

      We thank Reviewer #2 for careful consideration of our data and for the valuable feedback.

      We chose to focus on HIF-1 signaling, mitochondrial function and abundance, and cellular senescence among the pathways that emerged from WGCNA based on our testable hypothesis that these three phenomena could be linked, with HIF-1 upstream of mitochondrial changes and cellular senescence (noted in Lines 166-169 with references to studies on aging establishing this connection in other systems). The other pathways not studied here (FOXO, AMPK, mTOR signaling) are important stress-response mediators which likely play additional key roles in the biology we have begun to describe; extensive future studies are warranted to explore this fully.

      While we focused on establishing new mechanistic insights for aging in the placenta as a whole, localization of the effects described here to specific placental cell populations will be important to clarify in future studies, as is proposed in the Discussion (lines 316-319, which has been updated for emphasis). To our knowledge, no single-cell transcriptomics studies of the placenta have been published to date describing gene expression changes across advancing gestational age in healthy pregnancies, and the quantitative value of immunolocalization studies of candidate proteins in isolation would be limited.

      We do not dispute the limitations of mouse placenta as an imperfect model for the human organ; we have provided parallel data from human specimens wherever possible. We agree that this will continue to be critical in future studies, especially those aiming to achieve cell-type localization of these signaling pathways.

      As mentioned in the response to Reviewer #1, we utilized pharmacological HIF-1 induction in our experimental models rather than manipulation of oxygen tension but acknowledge the value of follow-up studies utilizing hypoxic growth conditions in the Discussion.

      SA-b-Gal activity is a key biomarker of cellular senescence, and this is most commonly assessed histochemically. Unfortunately, detecting b-galactosidase enzyme activity was not possible in the biobanked human specimens we accessed in this study (not collected/stored in a suitable format for histochemical processing), which is why we instead quantified expression of the lysosomal enzyme b-D-galactosidase, encoded by GLB1, the gene responsible for SA-b-Gal activity (Lee BY et al. Senescence-associated β-galactosidase is lysosomal β-galactosidase. Aging Cell 2006 – cited in line 106). A host of other senescence markers exists, but their appearance in senescent cells depends on the cell type and underlying drivers of the senescent phenotype (reference #45), with SA-b-Gal activity among the most universal. Similarly, the specific SASP components depend on cell type and senescence stimulus; we selected the markers in Figure 5H based on their previously established roles as mediators of placental signaling. As noted in the text (lines 120-121 with references to the relevant literature), mtDNA damage has previously been implicated as a driver of age-related loss-of-function in other tissues, which led us to explore whether mtDNA damage accompanies the other signs of mitochondrial dysfunction and dysregulation that were emerging in our data.

      Reviewer #3 (Public Review):

      In this study, Ciampa and colleagues demonstrate that HIF-1α activity is increased with gestation in humans and mice placentas and use several in vitro models to indicate that HIF activation in trophoblasts may release factors (yet to be identified) which promote myometrial contraction. Previous studies have linked placental factors to the preparation of the myometrium for labour (e.g. prostaglandins), but HIF-1α has not been implicated. Due to several issues regarding the experimental design, the results do not currently support the conclusions.

      Major concerns:

      1)  The hypothesis states that placental aging promotes parturition via HIF-1a activation, the study does not provide any evidence of an aged placenta. Aging is considered a progressive and irreversible loss of functional capacity, inability to maintain homeostasis, and decreased ability to repair the damage. The placenta retains all these abilities throughout pregnancy [PMID: 9462184], and there's no evidence that the placenta functionally declines between 35-39 weeks, otherwise, it wouldn't be able to support fetal development. However, there is evidence of a functional decline in post-term placentas (i.e. >40 weeks in humans) but the authors compare preterm placentas with E17.5 mice placentas or 39-week human placentas, both these gestational periods are prior to the onset of parturition in most pregnancies (human = 40wkGA, mice=E18.5).

      We thank Reviewer #3 for careful consideration of our manuscript and the thoughtful feedback.

      Our stance that the placenta ages across its normal lifespan is based on the appearance of cellular senescence as an emerging pathway in latter gestational timepoints in the WGCNA, with subsequent validation of cellular senescence markers accumulating in placental samples from the advanced gestational age cohort. Although functional deficits stemming from the appearance of cellular senescence late in pregnancy may not be appreciable at the whole-system level until post-dates, we propose that the subclinical cellular aging that we have detected even before labor onset may be relevant in the setting of a “second hit” stressor — eg, impaired ability to maintain homeostasis, repair damage.

      Future studies will examine functional deficits at the cellular level in response to HIF-1 stabilization (eg. Seahorse assay) and in early- versus late-gestational age primary cells. We hypothesize such studies will reveal impaired resistance to metabolic stressors in the senescent phenotype. Further, there will be value in exploring the impact of senolytics in restoring function to aged tissue.

      In both mouse and human, our selection of placentas that had not yet been exposed to spontaneous labor was deliberate, in order to avoid confounding from the inflammatory effects of labor and delivery itself (due to large swings in perfusion pressure and local ischemia-reperfusion events).

      2)  While the authors provide evidence that HIF-1α activity increases in both the human and mice placenta as gestation progresses, the mechanistic link between placental HIF-1α and parturition is not strongly supported. For example, most of the evidence is based on in vitro studies showing that conditioned media from trophoblasts treated with CoCl2 increased the contraction of myometrial cells. The specific factor responsible was not identified but the authors allude to pro- inflammatory factors such as cytokines. It was therefore interesting to note that the conditioned media had undergone a filtration step that removes all substances >10kDa, which includes the majority of cytokines and hormones.

      We appreciate the opportunity to clarify that in the filtration step, we retained the >10 kDa fraction, allowing us to clear CoCl2 itself among other <10kDa molecules. A 10kDa cutoff was chosen to allow for retention of cytokines including those previously implicated as signals that can promote contractility in uterine myocytes. As mentioned in the discussion, future studies will aim to identify specific factors within the secretome that are necessary and sufficient to induce the contractile changes.

      3) An alternative explanation is that CoCl2 treatment-induced trophoblast differentiation and the effects on myometrial contraction may be related to differences in secreted factors produced by cytotrophoblasts versus syncytiotrophoblast. Although JAR cells do not spontaneously differentiate, they can be induced to syncytialise upon cAMP stimulation. Ref 39 the authors cite shows this. Indeed, the morphology of the cells in Fig5F that were exposed to CoCl2 indicates that they may be syncytialised. Syncytialised trophoblasts also express markers of senescence including increased SA-β-gal activity and reductions in mitochondrial activity.

      The following is taken from Reference 39, final paragraph:

      For instance, among the tested cell lines the choriocarcinoma cell line BeWo is best suited for studies on syncy8al fusion. However, ACH-3P, JAR and Jeg-3 cells react to forskolin treatment with elevated levels of hCG they do not form syncy8a73 and are therefore poor models for syncy8aliza8on over a period of 7

      days.

      4)  The in vivo experiment showing reduced gestation length in pregnant mice receiving DMOG injection is interesting. However, we cannot exclude the effects of DMOG on non-placental tissues (both maternal and fetal) or the non-specific effects of DMOG (i.e. HIF-1α independent). Furthermore, previous studies using a more direct approach to alter HIF-1α activity in the placenta using trophoblast-specific overexpression of HIF-1α in mice do not lead to changes in gestation length [PMID: 30808910].

      As stated in the response to Reviewer #1, we acknowledge the possibility that extra-placental effects of DMOG may contribute to the initiation of preterm labor in our mouse model. Future studies making use of mice with placenta-specific transgenes will allow clarification of the specific contributions of placental HIF-1 signaling to labor onset.

      Regarding PMID 30808919, as noted in our Discussion (lines 326-335), an important distinction is that the referenced paper studied effects of trophoblast- specific expression of a constitutively active HIF1 from the beginning of pregnancy, and their findings highlight markedly abnormal placental development in that context. By contrast, we describe effects of HIF-1 stabilization late in pregnancy in a normally developed placenta.

      5)  Lack of appropriate experimental models. E.g. JAR choriocarcinomas are not an ideal model of the human trophoblast as they are malignant. Much better models are available e.g. primary human trophoblasts from term placentas or human trophoblast stem cells from first-trimester placentas. Similarly, the mouse model is also not specific as discussed above.

      We agree with the Reviewer that the JAR cell line has important differences from human trophoblasts, nonetheless as stated in the Results section (Lines 181-184) they were used in order to model long-term exposure to HIF-1 induction without underlying syncytialization confounding the findings, as would be the case with primary cells.

      6)  Lack of cohesion between the different experimental models. E.g. CoCl2 was used to induce hypoxia/HIF1α in mouse TBs, but DMOG was used in vivo in mice. SA-β Gal staining was carried out in cells but not in mouse or human tissues.

      We used two distinct prolyl hydroxylase inhibitors (CoCl2 and DMOG) in our in vitro studies (Figures 4, 5, and 5 Supplement) to demonstrate reproducibility across models and to help attribute the effects to HIF-1 stabilization rather than off-target effects. DMOG was chosen for the in vivo studies because of its prior use in mice.

      As mentioned in response to reviewer 2, detecting b-galactosidase enzyme activity was not possible in the biobanked human specimens we accessed in this study (not collected/stored in a suitable format for histochemical processing), which is why we instead quantified expression of the lysosomal enzyme b-D- galactosidase, encoded by GLB1, the gene responsible for SA-b-Gal activity (Lee BY et al. Senescence-associated β-galactosidase is lysosomal β-galactosidase. Aging Cell 2006 – cited in line 106).

      7)  Evidence of senescence and mitochondrial abundance could be strengthened by providing additional markers. E.g. only GLB1 mRNA expression is provided as evidence of senescence, and COX IV protein for mitochondrial abundance in mouse and human placentas.

      As mentioned in response to Reviewer 2, the appearance of other senescence markers depends on the cell type and underlying drivers of the senescent phenotype (reference #45), with SA-b-Gal activity among the most universal. Future studies will further probe which markers accompany cellular senescence in aging placenta to define the senescence phenotype in this setting.

      8)  Given that the main goal of this study was to investigate the role of hypoxia, hypoxia (i.e. low oxygen) was never directly induced and the results were based on chemical inducers of HIF-1α which have multiple off-target effects.

      As mentioned in response to Reviewer 1, we agree that our study has not determined whether placental HIF-1 activation occurring during late gestation is due to oxygen-dependent or oxygen-independent regulation; both possibilities are outlined in paragraph 3 of the Discussion. We used a pharmacological approach in our foundational experiments characterizing the effects of HIF-1 stabilization in trophoblasts because it allows superior command of experimental conditions, but in future studies using hypoxic growth conditions we will determine whether oxygen sensing is a critical component of the aging effects on mitochondrial abundance, metabolism, and cellular senescence in the placenta. We are encouraged by the consistency of the senescence phenotype in JAR cells following administration of two distinct prolyl hydroxylase inhibitors, CoCl2 and DMOG, suggesting that the effects seen are more likely attributable to HIF-1 stabilization (versus off-target effects).

      Reviewer #1 (Recommendations For The Authors):

      This is a very interesting and well-written study that supports the concept of placental aging using a combination of a mouse model, in vitro cell lines, and human placental samples.

      Overall this is an important contribution to our current understanding of placental biology highlighting the role of the HIF-1 pathway and merits publication.

      This study would be strengthened by the following addition:

      - As stated in the Public Review, the authors attribute HIF-1 induction at increased gestation to hypoxia, however, this has not been demonstrated experimentally and HIF-1 has both O2-dependent and independent regulation. The authors could perform an in vitro culture of primary placental cells or JAR cells under hypoxic conditions and assess the HIF-1 pathway/mitochondria activity to provide support for a hypoxia-dependent mechanism.

      We thank Reviewer #1 for the thoughtful analysis offered here. We agree that our study has not determined whether placental HIF-1 activation occurring during late gestation is due to oxygen-dependent or oxygen-independent regulation; both possibilities are outlined in paragraph 3 of the Discussion. We used a pharmacological approach to characterize effects of HIF-1 stabilization in trophoblasts because it allows superior command of experimental conditions, but in future studies using hypoxic growth conditions we will determine whether oxygen sensing is a critical component of the aging effects on mitochondrial abundance, metabolism, and cellular senescence in the placenta.

      Reviewer #2 (Recommendations For The Authors):

      Major comments:

      1. The rationale for the pursuit of HIF-1 and cellular senescence after initial WGCNA was weakly supported, though this avenue led to interesting and impactful results. The text could provide a stronger rationale for pursuing these pathways as opposed to the top- upregulated and downregulated pathways, perhaps by emphasizing previously published work in the field.

      We thank Reviewer #2 for careful consideration of our data and for the valuable feedback.

      We chose to focus on HIF-1 signaling, mitochondrial function and abundance, and cellular senescence among the pathways that emerged from WGCNA based on our testable hypothesis that these three phenomena could be linked, with HIF-1 upstream of mitochondrial changes and cellular senescence (noted in Lines 166-169 with references to studies establishing this connection in other systems). The other pathways not studied here (FOXO, AMPK, mTOR signaling) are important stress-response mediators which likely play additional key roles in the biology we have begun to describe; extensive future studies are warranted to explore this fully.

      2.  Validation of the gene expression data with placental histology and immunolocalization of proteins of interest would bolster the study by identifying the relevant cell types and showing changes in protein levels over time. Additionally, single-cell RNA-seq data from mouse and human placenta are available. Exploration of these published datasets would also be interesting.

      While we focused on establishing new mechanistic insights for aging in the placenta as a whole, localization of the effects described here to specific placental cell populations will be important to clarify in future studies, as is proposed in the Discussion (lines 316-319, which has been updated for emphasis). To our knowledge, no single-cell transcriptomics studies of the placenta have been published to date describing gene expression across advancing gestational age timepoints, and the value of single timepoint “snapshots” that exist in the literature is limited for the purpose of validating the aging mechanisms we have proposed here.

      3. In Figure 2, all of the data have a gestational age-dependent trend except for Fig 2F where the mtDNA lesion rate drops at e15.5. What is the authors' interpretation of these results?

      A testable hypothesis to explain this result is that as mtDNA damage begins to accumulate, cells are initially able to respond via mitophagy, removing those mitochondria with damaged DNA (e15.5), until that response is overwhelmed, allowing the detectable mtDNA lesion rate to spike at e17.5.

      4. In paragraph three of the results, the authors conclude that there is an accumulation of ROS stress, yet there is no direct measurement of ROS. Measuring ROS directly in this setting would strengthen this conclusion (similar to what is done in Figure 5E).

      We interpreted the accumulation of mtDNA damage as a marker of ROS stress, but the Reviewer correctly points out that we did not measure ROS directly in this model. We have updated the language (line 126) to be more accurate.

      5. There is a discrepancy in the length of CoCl2 treatment in primary trophoblasts vs. JAR cells (48 hours vs. 6 days). Treatment with DMOG in JAR cells also differed (4 days). Do the authors have any evidence that longer vs. shorter stabilization of HIF-1 has secondary effects in these cells that could affect the results of the study?

      We preliminarily explored the timecourse of the effects of HIF-1 stabilization in JAR cells, as shown in Fig 5 – Supp 1, and also found that the decline in mt abundance precedes the appearance of senescence markers (data not shown). JAR cells are a much better model for exploring effects of chronic exposure to HIF-1 stabilization because they do not syncytialize as primary trophoblasts do. We limited our studies in primary cells for this reason to a 48h- timepoint, because the effects of syncytialization would confound longer studies. With the aim of simply validating our CoCl2 findings with a separate prolyl hydroxylase inhibitor, we picked an intermediate timepoint for convenience. The reviewer correctly pinpoints the value of future studies that further dissect the kinetics of these phenomena, which could also potentially identify at which phases the effects are reversible.

      6. The authors evaluated mitochondrial effects in a time course experiment (Figure 5 Supplement 1) and found that the effects of HIF-1 stabilization emerged after three days of treatment, but no such experiment was conducted to determine the timing of senescence with SA-βGal. It would be interesting to correlate the mitochondrial effects and onset of senescence caused by HIF-1 stabilization.

      In future studies we will continue to explore the relative dynamics of HIF1 stabilization vs mitochondrial effects and senescence. In doing so it will be important to explore other markers of senescence; while SAbGal is the most universal senescence marker, others (such as p16 or p21 induction), if present, may lend themselves to more precise quantification and a clearer definition of senescence “start time”.

      7. IL-1β is used in experiments testing the effect of JAR-conditioned media on uterine myocytes. The conclusion of this experiment is that conditioned media from JAR cells treated with CoCl2, but not from untreated JAR cells, results in myocyte contraction (Figure 6E) and expression of contraction-associated genes (Figure 6A-D). Although the figure shows that IL-1β + conditioned media increases expression of these genes compared to IL- 1β alone, an added control condition where conditioned media is used in the absence of IL- 1β would underscore this conclusion and show whether the components in the conditioned media are sufficient to induce gene expression and contraction. There is also no justification for the 10 kDa cutoff in this experiment.

      We did test whether conditioned media could induce contractile changes in myocytes in the absence of IL-1b co-stimulation, and indeed found that the CoCl2-stimulated conditioned media does elicit this effect on its own. We eliminated these conditions from the published figure in an aim to limit its complexity, but present them here (*, p< 0.05 vs no treatment):

      Author response image 1.<br />

      The filtration step was implemented to concentrate the conditioned media prior to applying it to the myocytes. A 10kDa cutoff was chosen to ensure retention of most cytokines, especially those previously implicated in contractile switching of uterine myocytes (eg. IL1b, IL1a, TNFa each approximately 18 kDa, IL6 approximately 21 kDa). The filtration and wash steps also ensured clearance of CoCl2 out of the conditioned media before it was applied to myocytes.

      8. Figure 7 shows the use of DMOG in vivo to stabilize HIF-1, which induces preterm labor. Is there a way to inhibit HIF-1 signaling downstream to show that preterm labor in vivo is specifically due to HIF-1 stabilization and not an off-target effect of DMOG? Rescue experiments either in vitro or in DMOG-treated mice using HIF-1s inhibitors would be very compelling although we recognize these may not be feasible. Regardless, a comment on the translational impact of this study and the potential of targeting the HIF pathway to treat or prevent SPTB should be considered.

      There is considerable research into HIF inhibitors as cancer therapeutics (and FDA approval of a HIF2a inhibitor, belzutifan, for von Hippel Lindau disease). Future studies into the ability of HIF-1 inhibitors to rescue preterm labor are certainly of interest, though translational potential may be limited by systemic toxicity unless a targeted placenta-specific delivery system can be achieved. Genetic approaches using placenta-specific knockout might also be useful, particularly if conditional knockout can be achieved to limit the effects on HIF-1 signaling to late pregnancy, after placental development is complete.

      9. The effect of JAR-conditioned media on uterine myocytes is very interesting. The authors might consider additional discussion of what the putative mediators are and what is suggested in the preterm birth literature (e.g., Sheller-Miller, PMID: 30679631). Assessment of other SASP factors in using ELISA, e.g., would strengthen the study, or at least a rationale for the genes evaluated.

      We agree that follow-up studies should be done to identify which components of the secretome are key for mediating the contractile effect in myocytes, as noted in the Discussion (Lines 271-273), now updated for emphasis and with the suggested references.

      Additional minor comments:

      10.  For Figure 1A, without reading the figure legend it is unclear that the vertical color graph represents different gene clusters; consider labeling the y-axis with 'Gene clusters.' Also, blue and turquoise clusters could be labeled as "upregulated" or "downregulated" for simplicity and clarity.

      Updated, thank you for the suggestions.

      11. For mRNA expression wherever relevant, state in the figure legends and main text the method used (i.e., qPCR) and what the reference timepoint and normalization strategy was. For instance, in Figure 2 (and supplement 1), we were of the impression that the e15.5 and e17.5 values were normalized to e13.5.

      Updated, thank you for the suggestions.

      12.  For Figure 5, can the authors explain in the main text what is Mtsox and how is it a marker for mitochondrial depolarization? In 5E, it would be helpful to mention what is TMRE and FCCP are and how it measures mitochondrial ROS.

      Updated, thank you for the suggestions.

      13.  Figure 5 Supplement 2 and Figure 5 Supplement 3 appear to be missing labels indicating black vs. blue vs. red datasets.

      Updated, thank you for the suggestion.

      14.  Figure 7c, what is the n in each group?

      Gestational length data in Figures 7c and 7d each reflect the same n=8 mice per group.

      15.  Minor edits are needed for inconsistent use of terms (pre-term vs. preterm, for example) and grammar.

      Updated, thank you for the suggestion.

      Suggested additions to the Methods section to improve reproducibility:

      16.    Include more detail re: cell culture conditions, including % oxygen.

      Updated, thank you.

      17.  Collagen lattice contraction assay - include details on how measurements of collagen discs were performed. Was this automated?

      Updated, thank you.

      18.  Immunoblots. Details, such as the amount of protein loaded, gel composition, protein extraction method, etc., would be helpful.

      Updated, thank you.

      Reviewer #3 (Recommendations For The Authors):

      Minor comments:

      1.  It is unclear why 2-way ANOVA was performed in figure 3 when there are only 2 groups under comparison: <35 wks vs >39 wks

      As in Figure 2D, multiple genes are analyzed together in Figure 3A using 2-way ANOVA with the two factors being 1) gestational age and 2) individual gene targets (GLB1, HK2, GLUT1). This approach allows us to define the combined effect of gestational age on expression level of all of the genes whose expression is increasing.

      2.  Scale bars missing in some figures - Fig4E, Fig 5D, 5F, Fig5 - Suppl 3C.

      Scale bars were not captured with the original images; we regret this omission.

    1. Author Response:

      Reviewer #1 (Public Review):

      […] Collective variable choice:

      The explanation for the choice of CVs on page 5 is not sufficient to understand the process and its likely success. How were the most important and unimportant CVs identified exactly? Table 2 on page 19 shows only gate distances, cavity-filter distances and a single variable related to filter structure itself (77 CA - 77 CA) representing a pinch. Is that pinching really the only slow variable associated with inactivation changes in the filter? Why are there no variables, say for carbonyl flipping, E71 or D80 movements or even for ion and water occupancy (although water may be sampled with control of other interactions, such as involving L81)?

      CVs for steering simulations were selected based on structural comparisons between the X-ray structures as well as the information about the inactivation available in the literature. These steering CVs were later used as CVs for the string method with the exception of those found to be irrelevant in preliminary string simulations (see methods for details). For example we discarded CVs that would just oscillate freely and thus represent fast equilibrating CVs. We will add additional explanations to the methods section of the manuscript in revisions.

      Carbonyl flipping, E71 and D80 movement and SF occupancy were observed in the initial steering simulation to correlate with the 77 CA - 77 CA opening and the opening of the L81-W67 contact. They were not biased but followed the expected path as a consequence of the motion of the imposed selectivity filter constriction. Therefore, they did need not be explicitly biased. The same can be said with respect to water occupancy behind the selectivity filter, which correlates with the opening of the L81-W67 contact.

      I understand that the X-ray structure is the one source of information used to define an inactivated structure and is one with just a pinch and no complete carbonyl flipping away from the pore, as has been identified in past studies and discussed as being involved by the authors on page 14. Key changes like carbonyl flipping surely are part of the story and may be slow variables. At the very least, if not part of the CV space, could be analysed.

      Indeed, the reviewer is correct in stating that there are molecular motions of interest aside from the ones included in the CV space. Figure 3 and associated supplementary figures indeed extensively investigate the probability distributions of many of those as the system progresses along the inactivation pathway. These results are presented in the section titled “Free energy landscapes offer insights into atomistic-resolution mechanistic details”. Carbonyl flipping seemed to be highly correlated with the 77CA- 77CA distance and this analysis was therefore not presented.

      On page 10 the authors discuss possible differences in Amber and Charmm involving the extent to which the 4 subunits change in respect to the L81-W67 water pathway and W67-D80 hydrogen bond, arguing the different results for force field could be to do with different numbers of subunits doing different things. If I understand, the chosen CVs are all tetramer-based distances (including across subunits) and not subunit-based CVs, so that random and incomplete changes may occur to subunits for a given point in CV space.

      In fact, some of the CVs represent intrasubunit distances, for example L81-W67 while others represent distance across subunits. This distinction never represented a criterion to select CVs.

      There is thus potential for the string to converge on a local minimum pathway with partial changes to its interactions within and between subunits, and may not be a unique global solution. Can the authors please explain whether or not this is possible and what analysis has been done to check it?

      This indeed represent a well-recognized shortcoming of all string-based enhanced sampling methods. The string-of-swarms method used herein indeed assumes that there is a dominant minimum free energy path and requires a reasonable starting path. One major advantage of this methodological choice, however, is that the path can be described in high dimension, thus avoiding stark dimensionality reduction as is the case in many collective-variable based methods such as metadynamics.

      We do note that though the initial path was the same for the two force fields, the final pathway is different, which tends to indicate that the results do not only depend on the initial path but also on the force field guiding the dynamics of the process.

      X-ray endpoints and initial pathway:

      The string was created from a pulling/steered MD between existing X-ray structures for the closed (5VKH), partially open (3FB5), fully open (5VK6) and finally inactivated (5VKE) states. The authors write on page 12 that "The block of conduction during inactivation appears to result from pinching at the selectivity filter...", but given the end point was forced to be the X-ray structure with pinching, wasn't this outcome predetermined? This raises a significant point of how much has choice of endpoints predetermined the final states of the string? i.e. How much is an end state actually allowed to draft away from the initial Xray structure. Was a bead placed at the very endpoint and allowed to update via swarms, or was it fixed and all beads just interpolate between those fixed end states? The reason this is important is that it is plausible the inactivated crystal structure with pinching but not other changes (such as complete V76 carbonyl flipping or outer filter splaying), may not be the actual free energy minimum structure for that state and that force field.

      The reviewer is right to point out that this observation is most likely a consequence of the choice of the end points of the initial string. The string method assumes that the end points of the string are fairly representative of the initial and final states of the processed studied. In this case, for ease of use, the endpoints of the simulation were fixed. When endpoints are left free to relax, they drift towards the closest minima and make comparisons between force fields, between simulation conditions, etc more difficult.

      We do agree that the selection of initial and final states as well as the starting string are important modeling choices. For this reason, we were very mindful and made these choices based on the existing published evidence (available at the time).

      We will make these details explicit in a revised version of the manuscript.

      Another obvious concern is the possible reliance on the initial pulling procedure used before string optimisation began. Fig.2 Supp 1 shows generally that the Amber path stayed pretty close to the initial steered MD path, whereas Charmm drifted downward away from that path. One could justifiably ask, if a very different initial path was chosen, might different local minimum pathways result, including Amber sampling a path like Charmm? How does one test whether or not the final path has not been trapped in some local trough of free energy? e.g. Imagine starting the Amber string using an initial path like the more diagonal Charmm-like path, or even a more extreme unphysiological one, such as a steered trajectory that initially inactivates before opening the gate. Would the final results be the same? I appreciate the simulations are very expensive and such trials may not be possible, but what evidence is there that the final path has not been trapped away from the global minimum?

      As stated above, the reviewer is right to point out the weakness of the method of converging to the closest local minimum free energy path. It is unfortunately computationally infeasible to test many possible paths. For this reason, we chose to initiate our calculations with a pathways based on experimental data; in this case based on available X-ray structures. In addition, it is necessary to contrast the results of the simulation with available experimental evidence: the string method with swarms of trajectories, when aptly used, has a history of bringing useful insights to several biological systems (Lev et al. 2017b; Suh et al. 2019, Fleetwood et al 2021, 2019; McComas et al. 2022).

      As already noted, the fact that the two force field yield very different energy landscapes is evident since they would otherwise converge to the same final pathway given the same initial pathway guess.

      One test offered by the authors is a set of unbiased MD simulations launched from points on the string. The authors ran 200ns simulations and write on page 5 that "These simulations have the expected stability based on their starting values. This is a good quality test to check the correct estimation of the general features of the free energy surface". While this sounds reasonable, 200ns MD may only be sufficient to begin to explore locally within the solved free energy trough, much like the swarms in the iterations were able to do. My own examination of Fig2 Supp 5 is that some of these simulations linger around the expected states and some drift away within the general trough of sampling, which is a good sign. What those 200ns simulations may not be able to do is escape that trough and see evidence of other possible solutions, beyond what was sampled with the string that was tied to Xray endpoints and trapped in the solution pathway that was already formed after 100-300 iterations. Overall, the string involved 800 iterations of 10ps swarms (80ns around each bead; albeit 32 trajectories in parallel), allowing good local sampling around the beads in the free energy trough, but in terms of ability to diffuse away from that point, only being comparable in contiguous trajectory time to the unbiased MD tests. It therefore would have been interesting to see if longer simulations remain in this trough; though I understand the challenges in running so much MD. Such simulations may, however, lead to exploration beyond what was seen in the string solutions.

      We agree with the authors that longer simulations would be very interesting to understand the behavior of the string-of-swarms method and how it behaves for this intricate FES. Note however, that 80 ns divided over 32 trajectories yields an overall trajectory length that is ~two orders of magnitude below a single 200 ns-long simulation. We thus still stand by our statement that the fact that these simulations behave as expected from the free energy landscapes is a good quality check of the CVs and of the resulting free energy landscapes.

      Force field effects and origin:

      Regarding the effect of the chosen force field, the authors state that "Given that our simulations were conducted under activating conditions, we had expected the open states to be more populated than the closed ones. Simulations carried out at higher pH may be able to resolve this inconsistency". Also running at high pH would be a nice thing to do to prove the method is in fact sensitive to conditions to see a shift in the distribution of states.

      Indeed this is the logical next step for future work.

      But the question is why were open states not more occupied under low pH and 50mM K+? From my analysis of the figures, the results show that the Charmm force field tends to allow for opening of the channel somewhat (at least with similar free energy for partially and fully open to closed) whereas Amber tends to close the channel more (with more uphill energy as the channel opens than Charmm; Fig 2). i.e. at low pH and 50 K+, isn't the Amber model incorrectly reporting fairly strong bias against opening? Moreover, regarding the free energy of the inactivated state itself, why should we not expect equilibrated channels under activating conditions to eventually fall into an inactivated state, in which case we should expect low free energy of that state (as found with Charmm and not Amber in Fig2), but with a slow rate. While much discussion in the manuscript appears to discuss limitations in Charmm (although on page 12 discussion leans either way), these factors may seem to favour Charmm over Amber.

      We would like to thank the reviewer for raising these points. We can only speculate about what might be the reasons for these discrepancies, and we have tried to be as honest as possible in our manuscript and avoid overinterpretation of our results. It is interesting that Reviewer 2 gathered from our data that the AMBER results were more consistent with expectations while this reviewer thought the opposite. This does reinforce our decision to avoid taking sides and present both options. Our personal opinion is currently that both force fields are imperfect at describing all the aspects of the activation-inactivation gates coupling. We will include more discussion in the revisions of the manuscript.

      On page 12 the authors explain the possible causes for force field dependence, although this seems limited to ion interactions, glutamate charges and dihedrals. But it would be nice to get a bit more insight into what terms may have influenced the pathway, in particular involving interactions between TM2 and the base of the selectivity filter and hydration behind the filter. Regarding ion interactions, is there a good reason to believe ions are key to the difference seen? i.e. How were ions involved differently in the state transitions involving Amber and Charmm? The authors have noted a role for ion-carbonyl interactions.

      We agree that this would be interesting, but judged that this would be better done in a separate study. We do note that the K-carbonyl interactions have been reported as candidates for these discrepancies, as mentioned and cited in the manuscript. Very recent simulations using ab initio MD support that the overstimation of the K-carbonyl interaction is the reason for the low conductance of potassium channels in classical MD, refer to Hui et al. Biophysical Journal, vol. 122, issue 3, p. 520a. We will add this reference in revisions.

      It is important that the authors explain which is the two competing models has been used and why. i.e. Off-the-shelf Charmm36 force field includes strong K+-backbone carbonyl interaction, previously seen to promote high ion occupancy, similar to Amber, whereas Lennard-Jones parameters modified to match N-methyl-acetamide and water partitioning (such as early Berneche, Noskov and Roux work) reduce ion occupancy and increase water content inside the filter.

      We have used “off-the-shelf” or conventional CHARMM36 as described in the literature cited.

      Reviewer #2 (Public Review):

      […] The study is impressive and interesting. However, I have a number of concerns that the authors may wish to address in a revised version of the manuscript.

      First, concerning a set of unbiased simulations spawned at different regions of the investigated free energy landscapes, the authors write: "These simulations have the expected stability based on their starting values".

      Fig 2.c shows a rather smooth downhill slope in the free energy curve towards the closed state for AMBER , so wouldn't the expected behavior in that case be that all unbiased trajectories end up in the closed state, or at least travel a substantial amount in that direction? However, that is not observed. This should be further investigated.

      It is true that this would be the effect we should observe after a significant simulation time. Resorting to 200ns-long simulations, our goal was to test whether the local free energy basins identified by the string-of-swarms method were indeed metastable. If that were the case, we would expect the trajectories to remain within the basins on medium timescales due to the kinetic barriers that would need to be overcome to transfer to other basins. Of course, if simulations were long enough, all basins would eventually be explored by the trajectory with a probability related to the relative free energy of the basins.

      Second, "This suggests that stabilization of the partially open state by the removal of bound lipids can explain the increase in open probability" is an odd statement, as "stabilization of the partially open state" means almost the same as "increase in open probability".

      It is true that one appears to necessarily imply the other. An increase in open probability could potentially come from two effects: a stabilization of the open state or a destabilization of the closed one. In a two-state system, the two cases are indistinguishable since only relative difference in free energies matter. However, this is a three state system, if one takes as a reference the energy of the inactivated state, there is an effective difference in the physics of the system if a stabilization of the open state or a destabilization of the closed state occurs.

      The statement "both force fields yield inactivation barriers that are orders of magnitude lower than what is expected from electrophysiology experiments" seems inaccurate. Perhaps the authors mean "inactivation rates that are orders of magnitude lower" rather than barriers?

      Yes, this was a mistake on our part. We will amend the manuscript.

      In addition, the assertion "The CHARMM force field, on the other hand, results in landscapes in agreement with the fact that one of the dominant states in activating conditions is the partially open state, as revealed by a combination of ssNMR+MD." seems to hold for the AMBER force field without PG lipids rather than for CHARMM?

      AMBER simulations with or without bound PG lipids have a fully open state basin within the minimum free energy path (Fig 4a, 4b) which is not the case for CHARMM (Fig 2b). In that sense, the CHRAMM force field seems to be in better agreement with the ssNMR data. The ssNMR+MD study however suggests that the PO open state basin should be the lowest in free energy. In both cases, however, the C basin is lower in free energy than the PO. We can only speculate about why that may be.

      Together with the higher barrier towards the inactivated state as well as covering most known x-ray structures along the inactivation pathway, this would seem to point all in the direction that the studied AMBER force field provides a more faithful picture of the inactivation pathway than CHARMM. I, therefore, find the somewhat inconclusive summary as presented in Fig. 5 a bit uninformative, as it suggests that both mechanisms might be equally likely.

      Although the X-ray structures do suggest an AMBER-like path, structural information in isolation is not sufficient to fully understand a phenomenon of dynamical nature. The X-ray structures of metastable structures particularly of open states require the use of engineered mutations and other techniques to trap these states. We of course do not question that a lot of very valuable information can be derived from them, but they should be considered in the context of other computational and experimental techniques. We believe we are very explicit in the text in discussing the weakness and strengths of either possibilities. In fact, we find it interesting that Reviewer 1 gathered from our data that the CHARMM results were more consistent with expectations. This does reinforce our decision to avoid taking sides and present both options. Our personal opinion is currently that both force fields are imperfect at describing all the aspects of the activation-inactivation gates coupling.

      Overall, the study would benefit from a follow-up step to become more conclusive. This could be either in the form of the suggested L81 mutation or changing the simulation conditions to inactivating conditions such as low salt, in which case the inactivated state would be expected to become a minimum, which would provide an additional reference point for validation. Either of these would narrow down the spectrum of possible mechanisms.

      We absolutely agree with this reviewer. These are great suggestions for further investigations that will definitely be considered in future studies.

      Reviewer #3 (Public Review):

      […] The analysis is careful and is state-of-the-art. The results reveal remarkable differences between the CHARMM and AMBER force fields.

      Unfortunately, the "elephant in the room" with regards to K+ channel inactivation is the significance of the dilated structures more recently obtained by Xray and EM. While it is worthwhile doing our best to really understand the constriction mechanism of KcsA, and the present manuscript does an excellent job at that, the ground has shifted and understanding finer points about KcsA constriction has become, unfortunately, not the most prominent issue in the field at the present time.

      Let's discuss the current situation about the inactivation of K+ channels. The situation is fairly unsettled. The KcsA channel was the first for which some atomic structure and mechanism, centered on a constriction of the selectivity filter, were proposed. The constricted conformation really does not conduct because the filter is too narrow. More recently a few structures (Xray and EM) for channel mutants known to have more propensity to inactivate have revealed a different conformation of the filter which appears to be dilated toward the extracellular side. This is a conformation that had never been seen previously. Different "camps" co-exist in the K+ channel community about inactivation. Those who were very skeptical about the constricted conformation claim that the new dilated structures is the final truth. While the dilated structures are certainly part of the body of information that we have now, but their significance remains somewhat unclear if anything because of the fact that they are not perfectly occluded and they allow ion conduction! While it is worthwhile doing our best to really understand the constriction mechanism of KcsA, and the present manuscript does an excellent job at that, the ground has shifted and understanding finer points about KcsA constriction has become, unfortunately, not the most prominent issue in the field at the present time.

      We appreciate the reviewer’s comments and we are also grateful for the contextualization of the current state of the literature with respect to KcsA inactivation.

      Although we acknowledge the importance of these new findings and look forward to a lively debate in the literature regarding the importance of this alternative mechanism, this information was not available at the time when this study was started. In any case, for an initial study with a novel technology and with methodological choices such as the force field choice, studying the more established path seems still a valid choice. Of course, the techniques used to study this method can be used to study new hypotheses and contrast them with our current work. This will be an important line of work going forward. We will add further literature discussion to the manuscript and better outline how we decided on the scope of this study.

    1. Author Response:

      Weaknesses:

      1) In vivo studies are limited to select outcomes of recovery and do not validate or address mechanism of action in vivo.

      2) Known activities of DMAPT beyond microtubule detyrosination, such as oxidative stress, mitochondrial function and NFkB inhibition, are not considered in experimental examinations or in the interpretation of findings.

      Response: Our research indicates that parthenolide exhibits a regenerative effect within a nanomolar range and with a bell-shaped concentration-response curve in culture. Moreover, we demonstrate a close correlation between the inhibition of detyrosinated microtubules and regeneration and consider the effects of hIL-6 or PTEN-KO on detyrosination in mouse and human RGCs. Therefore, we offer a coherent and satisfactory mechanistic explanation for the effects of parthenolide. We, therefore, feel the request to experimentally explore additional, somewhat speculative possibilities is not reasonable or helpful, and this issue should not be considered as a weakness.

      Moreover, to the best of our knowledge, no evidence suggests profound antioxidative effects of DMAPT or parthenolide within these low-concentration ranges and that these would affect axon regeneration. Antioxidative effects may also not explain the observed bell-shaped curve. Furthermore, we have already considered the effect of NFkappaB in our previous work (Gobrecht et al., 2016) and shown that NFkappaB remains unaffected by low concentrations of parthenolide. Hence, conducting additional experiments addressing oxidative stress or other speculative causes will not strengthen our findings and do not justify the additional sacrifice of animal lives. Nevertheless, we will consider discussing these points in a revised version.

    1. Author Response:

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      - There were no mechanistic or causation-focused investigations that could have greatly strengthened the study. The study is ultimately providing two prioritized candidate genes that may be causative, reactive, or independent of the disease.

      Answer: We thank the reviewer for their positive assessment and agree that our study lacks formal causal analyses. We are aware of this limitation and have made it clear throughout the text. Through triangulation of evidence across tissues and species, we point to very interesting candidates that merit further study, which is the usual scope of such systems genetics investigations. Nevertheless, to introduce some causal inference and reinforce the human relevance of our results, we have performed Mendelian randomization (MR) analysis to investigate the potential associations between MUC4’s gene expression in human colons and the risk of IBD. EPHA6 lacks detectable eQTLs in human colon so we could not include it in this analysis. We found suggestive evidence that increased expression of MUC4 in the sigmoid, but not transverse, colon may increase the risk of IBD (nominal p = 0.033).

      The description in the manuscript:

      However, it is unclear through what mechanisms the genetic variants in the candidate genes affect IBD susceptibility. One possibility is that genetic variation leads to altered levels of expression of the gene, ultimately affecting disease susceptibility. To test this possibility, we examined the GTEx resource (GTEx Consortium, 2013) and found that MUC4, but not EPHA6, has cis-eQTLs in the sigmoid and transverse colon. To establish likely causal links with IBD incidence, we used these associations as instruments in a two-sample Mendelian randomization (MR) (Hemani, Tilling and Smith, 2017; Hemani et al., 2018) analysis. Using publicly available GWAS summary statistics for IBD, Crohn’s disease, and ulcerative colitis (Liu et al., 2015; Elsworth et al., 2020) as outcomes, we found suggestive evidence that increased expression of MUC4 in the sigmoid, but not transverse, colon may increase the risk of IBD (nominal P value = 0.033, Appendix 1 - Table 6). No eQTLs were reported for EPHA6 in the colon, precluding us from investigating the potential consequences of changes in its expression in these tissues.

      - Figures 3 and its supplement Figure 1: Among the 39 modules, the authors have only focused on significantly overlapping up-regulated IBD-related gene modules in both CD (M28 and M32) and HFD (M9 and M28) for their follow up analyses in Figures 4 and 5 to prioritize candidate genes. However, this reviewer thinks there is great value in also focusing on significantly overlapping down-regulated IBD-related gene modules in both CD (M17) and HFD (M15 and M26) for their follow up candidate gene prioritization analyses.

      Answer: Thank you for your suggestion. We had initially performed overrepresentation analyses in HFD_M15, HFD_M26 and CD_M17, but did not find enrichments related to inflammation (see Author response image 1 below). We did not include this result in the manuscript.

      Author response image 1.

      Dot plot showing the enrichment of IBD-related modules in hallmark genesets. Gene ratios higher than 0.1 are shown and represented by dot size. Dots are colored by -Log10(BH-adjusted P values).

      We also checked the module QTL mapping for the significantly overlapping down-regulated IBD-related gene modules in both CD and HFD. We did not find any loci that are significantly associated with these modules, indicating that they are not modulated by genetic variation and hence are less likely to inform on IBD susceptibility.

      The description in the manuscript:

      The ModQTL analysis was also performed on the modules that are significantly enriched in IBD-downregulated genes (HFD_M15, HFD_M24, and HFD_M26), but no significant or suggestive QTLs were detected. Therefore, we focused on the QTL for IBD-induced genes in HFD_M28 and annotated its candidate genes based on three criteria (Figure 5B).

      Reviewer #2 (Recommendations For The Authors):

      - One small addition that would be nice would be to indicate if the two candidate genes have cis eQTL in human tissues and/or have any protein-coding variants in humans. This would provide nice additional evidence of causality for these two genes.

      Answer: Thank you for your positive assessment and suggestion. MUC4 and EPHA6 both have protein-coding variants in humans that were listed in the Appendix – Table 3 and Table 4. In addition, cis-eQTLs have been found for MUC4 in both the sigmoid and transverse colon in humans (GTEx, https://gtexportal.org/home/locusBrowserPage/ENSG00000145113.21). As indicated in our response to the first comment of Reviewer #1, we have now performed mendelian randomization on human eQTL for MUC4. However, no eQTLs were reported for EPHA6 in the colon, preventing us from performing MR analysis on its expression.

      - Also, it would be helpful to include the size of the modules in the text of the manuscript. Especially the two modules that were followed up on.

      Answer: Thank you for your suggestion, we have indicated the size of IBD-related modules in the text of the manuscript.

      The description in the manuscript:

      Enrichment analyses indicated that modules HFD_M9 (484 genes), HFD_M16 (328 genes), and HFD_M28 (123 genes) were enriched with genes that are upregulated by DSS-induced colitis, while HFD_M15 (368 genes), HFD_M24 (159 genes), and HFD_M26 (135 genes) were significantly enriched with downregulated genes (Figure 3C). Of note, more than 20% of genes involved in HFD_M9 and HFD_M28 were part of the dysregulated genes of the acute phase of mouse UC (day6 and day7) (Figure 3C). Interestingly, genes perturbed during IBD pathogenesis in humans were also enriched in HFD_M9 and HFD_M28 (Figure 3C).

      While IBD-related genes were predominantly found in HFD modules, we also found that two modules, CD_M28 (185 genes) and CD_M32 (142 genes), in CD-fed mouse colons were associated with IBD (Figure 3—figure supplement 1A). These two-modules significantly overlapped with the IBD-related HFD_M9 and HFD_M28 modules, respectively (BH-adjusted P value < 0.05) (Figure 3—figure supplement 1B). Moreover, the molecular signatures underlying human UC and Crohn’s disease were also clustered in these two modules (CD_M28 and CD_M32) under CD (Figure 3—figure supplement 1C). Collectively, the co-expression and enrichment analyses identify HFD_M9 and HFD_M28 as IBD-related modules on which we focus our subsequent investigation.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      Jamge et al. sought to identify the relationships between histone variants and histone modifications in Arabidopsis by systematic genomic profiling of 13 histone variants and 12 histone modifications to define a set of "chromatin states". They find that H2A variants are key factors defining the major chromatin types (euchromatin, facultative heterochromatin, and constitutive heterochromatin) and that loss of the DDM1 chromatin remodeler leads to loss of typical constitutive heterochromatin and replacement of this state with features common to genes in euchromatin and facultative heterochromatin. This study deepens our understanding of how histone variants shape the Arabidopsis epigenome and provides a wealth of data for other researchers to explore.

      Strengths:

      1) The manuscript provides convincing evidence supporting the claims that: A) Arabidopsis nucleosomes are homotypic for H2A variants and heterotypic for H3 variants, B) that H3 variants are not associated with specific H2A variants, and C) H2A variants are strongly associated with specific histone post-translational modifications (PTMs) while H3 variants show no such strong associations with specific PTMs. These are important findings that contrast with previous observations in animal systems and suggest differences in plant and animal chromatin dynamics.

      2) The authors also performed comprehensive epigenomic profiling of all H2A, H2B, and H3 variants and 12 histone PTMs to produce a Hidden Markov Model-based chromatin state map. These studies revealed that histone H2A variants are as important as histone PTMs in defining the various chromatin states, which is unexpected and of high significance.

      3) The authors show that in ddm1 mutants, normally heterochromatic transposable element (TE) genes lose H2A.W and gain H2A.Z, along with the facultative heterochromatin and euchromatin signatures associated with H2A.Z at silent and expressed genes, respectively.

      Weaknesses:

      1) Following up on the finding that H2A.Z replaces H2A.W at TE genes in ddm1 mutants, the authors provide in vitro evidence that DDM1 binds to H2A.Z-H2B dimers. These results are taken together to conclude that DDM1 normally removes H2A.Z-H2B dimers from nucleosomes at TE genes and replaces them with H2A.W-H2B dimers. However, the evidence for this model is circumstantial and such a model raises a variety of other questions that are not addressed by the authors.

      The Reviewer raises a series of interesting questions. We proposed that DDM1 exchanges H2A.Z to H2A.W because it is the simplest model and also because LSH - the mammalian ortholog of DDM1 exchanges H2A to macroH2A. However we do stress in the revised manuscript that this is a model and other possible models that could involve chaperones and additional remodelers are possible. Addressing why the loss of DDM1 results in a net exchange of H2A.W to H2A.Z is not the purpose of this study. Here we use the perturbation caused by ddm1 as a means to address the importance of the dynamics exchange of H2A variants in setting up the chromatin states. We do observe that perturbing this dynamic exchange causes an important perturbation of chromatin states. This further supports our main conclusion: H2A variants dynamics are one important factor that organizes chromatin states.

      For example: if DDM1 does remove H2A.Z from TE genes, how does H2A.Z normally come to occupy these sites, given that they are highly DNA methylated and that H2A.Z is known to anticorrelate with DNA methylation in plants and animals?

      The anticorrelation between H2A.Z and DNA methylation is observed at steady state. The exchange of H2A.Z to H2A.W that results from the action of DDM1 would indeed remove unwanted H2A.Z from regions occupied by DNA methylation as suggested by the Reviewer.

      Given that H2A.Z does not accumulate in TEs in h2a.w mutants, how would H2A.X and H2A instead become enriched at these sites if DDM1 cannot bind these forms of H2A?

      This is a valid question: We envisage that H2A.X and H2A are deposited by remodelers and chaperones other than DDM1 in the h2a.w mutant.

      Given that there are no apparent regions with common sequence between H2A.Z and H2A.W variants that are not also shared with other H2A classes, how would DDM1 selectively bind to H2A.W-H2B and H2A.Z-H2B dimers to the exclusion of H2A(.X)-H2B dimers?

      It was shown by the Muegge Lab both in vitro and in vivo that LSH - the mammalian ortholog of DDM1 binds to macroH2A and H2A, and these two H2A variants do not share similar specific region. Yet it remains to determine which region of H2A.Z and H2A.W binds to DDM1, which does not fit in the scope of this study.

      Reviewer #2 (Public Review):

      Jamge et al. set out to delineate the relationship between histone variants, histone modifications and chromatin states in Arabidopsis seedlings and leaves. A strength of the study is its use of multiple types of data: the authors present mass-spec, immunoblotting and ChIPseq from histone variants and histone modifications. They confirm the association between certain marks and variants, in particular for H2A, and nicely describe the loss of constitutive heterochromatin in the ddm1 mutant.

      The support for some of the conclusions is weak. The title of the discussion, "histone variants drive the overall organization of chromatin states" implies a causation which wasn't investigated, and overstates the finding that some broad chromatin states can be further subdivided when one considers histone variants (adding variables to the model).

      We have removed subtitles in the discussion and have taken care to avoid over simplified statements.

      Adding variables to a ChromHMM model naturally increases the complexity of the models that can be built, however it is difficult to objectively define which level of complexity is optimal. The differences between states may be subtle to the point that they may be considered redundant. The authors claim that the sub-states they define are biologically important, but provide little evidence to support this claim. It is not obvious whether the 26 states model is much more useful than a 9-states model. Removing variables naturally affects the definition of states that depend on these variables, but it is also hard to define the biological significance of that change. This sensitivity analysis is thus not very developed.

      We agree that adding more input tracks/ data will increase the complexity.

      But we would like to mention the differences of this study and the 9-state model,

      1) We have included the histone variants which have been previously missed in chromatin state definition.

      2) The previous 9-state model used data from different tissue types. In this study all the data generated and analyzed is from seedlings.

      3) Increasing the number of states allowed us to resolve heterochromatin states compared to 9-state model which was previously missed. (BioRXiv)

      4) The biological relevance of the 26 states model is analyzed and described in depth (States BioRxiv paper).

      In addition we have now updated the Figure 2F to include a more direct comparison of marks used in both models. And we have expanded the description in the methods section and our reasoning behind using 26 state model to be analyzed in depth.

      There are issues with the logical sequence of arguments in Fig1 and Fig3. Fig1A shows that nucleosomes often contain both H3.1 and H3.3. Therefore pulling-down H3.1-containing nucleosomes also pulls down H3.3 and whether specific H2A variants associated with H3.1 cannot be answered in this way (Fig1B).

      We thank the Reviewer for point this out. If 60% of nucleosomes are homotypic and if they would associate with a specific H2A variant this would be clearly visible on WB as a much stronger band. Also, the MS data presented in Figure1 figure supplement 1D clearly show that all H2A variants associate with both H3.1 and H3.3. We have included in the revised version more detailed explanation to clarify this point.

      The same issue likely carries to the investigation of the association with H3 modifications if Fig1C and 1D, since the H3.1-HA pull-down also pulls down endogenous H3.1 (so presumably the rest of the nucleosome, with H3.3, as well).

      We disagree on this point. The H3 band corresponding to the transgene copy is either H3.1 or H3.3, so all signals on upper band (T) in Figure 1C are associated with either H3.1 (H3.1 IP) or H3.3 (H3.3 IP), thus unambiguously showing that all modifications we analyzed are present on both H3.1 and H3.3. Furthermore, data shown in Figure 1D and E, where we analyzed modifications on K27 and K36 which are in the H3 region that can be distinguished between H3.1 and H3.3 by MS clearly demonstrate that these modifications are present on both H3.1 and H3.3. In order to make this clearer, we also extended the description of this part in the Results section to emphasize this.

      In Fig3, the conclusion that it is the loss of H2A.Z -> H2A.W exchange in the ddm1 mutant that causes loss of constitutive heterochromatin is rushed. The fact that the h2a.w mutant does not recapitulate the loss of constitutive heterochromatin seen in ddm1 argues against this interpretation.

      We agree that at first the minimal impact of the loss of H2A.W alone is surprising. However, we point to the preprint https://www.biorxiv.org/content/10.1101/2022.05.31.493688v1. There it is shown that the joint loss of H2A.W and H3K9 methylation (also observed in ddm1) affects silencing of a large range of transposons that also lose silencing in ddm1.

      It's also difficult to conclude about the importance of dynamic exchanges when the ddm1 mutation has been present for generations and the chromatin landscape has fully readapted. Further work is needed to support the authors' hypothesis.

      We apologize that the Reviewer could not find the information regarding the origin of ddm1 mutant material. We did not use a mutant where ddm1 mutations was kept for generations. We were in fact very careful on this point and used leaves from ddm1 first homozygous plants segregated from heterozygous ddm1 kept heterozygous.

      The study also relies on a large number of custom (polyclonal) antibodies with no public validation data. Lack of specificity, a common issue with antibodies, would muddle the interpretation of the data.

      We added information about validation of custom made antibodies into Methods: ”Specificities of custom made polyclonal antibodies against Arabidopsis H2A.Z.9, H2A.X, H2A.W.6, H2A.13, H2A.W.7, H2Bs, and linker histone H1 were validated in previous publications (Yelagandula et al., 2014; Lorkovic et al., 2017; Jiang et al., 2020; Osakabe et al., 2021).“ For H2A.2 and H2A.Z.11 antibodies we provide validation data as Figure 2 figure supplement 1.

      Overall, this study nicely illustrates that, in Arabidopsis, histone variants (and H2A variants in particular) display specificity in modifications and genomic locations, and correlate with some chromatin sub-states. This encourages future work in epigenomics to consider histone variants with as much attention as histone modifications.

      Reviewer #3 (Public Review):

      How chromatin state is defined is an important question in the epigenetics field. Here, Jamge et al. proposed that the dynamics of histone variant exchange control the organization of histone modifications into chromatin states. They found 1) there is a tight association between H2A variants and histone modifications; 2) H2A variants are major factors that differentiate euchromatin, facultative heterochromatin, and constitutive heterochromatin; 3) the mutation in DDM1, a remodeler of H2A variants, causes the mis-assembly of chromatin states in TE region. The topic of this paper is of general interest and results are novel.

      Overall, the paper is well-written and results are clearly presented. The biochemical analysis part is solid.

      Reviewer #4 (Public Review):

      This work aims at analyzing the impact of histone variants and histone modifications on chromatin states of the Arabidopsis genome. Authors claim that histone variants are as significant as histone modifications in determining chromatin states. They also study the effect of mutations in the DDM1 gene on the exchange of H2A.Z to H2A.W, which convert the silent state of transposons into a chromatin state normally found on protein coding genes.

      This is an interesting and well done study on the organization of the Arabidopsis genome in different chromatin states, adding to the previous reports on this issue.

      Reviewer #1 (Recommendations For The Authors):

      1) The rationale for switching from using 10-day old seedlings for chromatin profiling to using mature leaves in Figure 3 and beyond is not explained and introduces additional complexity into the analyses. The reasoning should be clearly explained in the text, and there are several additional suggestions or questions related to this that should be addressed:

      This was done for practical reasons. We had already obtained some profiles of marks in ddm1 mutants and extended the dataset using the same stage of development because this tied this study with our previous study. Using different stages of development provides an additional benefit. The same chromatin states are observed in 10 day old seedlings and leaves of older plants. Constitutive heterochromatin is occupied by the same chromatin states and logically euchromatin is positioned on different genes as expected by the distinct pattern of gene expression at the two stages of development.

      A) In the 16-state model (Figure 3A), euchromatin states were not well defined compared to the 26-state model. Why did the authors not profile these marks also, and could this explain why ddm1 mutants did not show a significant effect on euchromatin states in this model?

      We apologize for the lack of detailed explanation: In our previous study we used leaves of five weeks ld plants to show the impact of ddm1 on the profiles of H2A.W.6, H2A.X, H1, H3K9me2, H3K36me3 and H3K27me3 in leaves (Jamge, Osakabe et al., 2021). This study showed that DDM1 causes the deposition of H2A.W.6 to heterochromatin and we thus used leaves to extend this investigation to the two other marks of heterochromatin (constitutive or facultative) H3K9me1, H2A.W.7 and H2A.Z.9 and H2A.Z.11.

      B) The authors state that the tissue types do not impact the definition of chromatin states. However, there is a clear difference in the portion of the genome occupied by each chromatin state between leaf and seedling (states 1, 5, 8, 13, and 14; Figure S3A).

      We had missed a comment on supFig3B and have now provided more explanation: “Although the composition of the chromatin states did not vary significantly between seedlings and leaves, each state occupied a similar proportion of the genome in seedling or leaves to the exception of state 5 present primarily in leaves and state 13 only present in seedlings (Figure 3 figure supplement 3A, right column with green bars) and the euchromatin states occupied different genes (Figure 3 figure supplement 3B) as expected by the dissimilar transcriptomes of these two developmental stages.”

      2) The naming of supplemental figures throughout the text is confusing as the legends refer to them as "Figure SX" but they are called out in the text as "Figure X figure supplement XA-B". The eLifeconvention is "Figure X figure supplement XA-B".

      This was changed.

      3) In Figure 4, Panel D is mislabeled as C in the figure, and C is lacking a label.

      4) Please remove the word "the" from the title.

      This was done

      Reviewer #2 (Recommendations For The Authors):

      Fig1D legend should also mention K37.

      This was corrected.

      Fig2F legend should say "no H3 modifications" rather than "no histone modifications" This was corrected.

      Fig4 labels C/D do not correspond to the legend. D is missing and C should go to the ddm1 stacked barplot.

      This was corrected.

      H3 variants analysis: Taking the relative abundance of H3.1 and H3.3 (and transgenes) into account would be useful to interpret the results of the nucleosome composition results. If they are at equivalent amounts, the null hypothesis of independent association would give 50% heterotypic nucleosomes and 50% homotypic.

      This is a valid comment. In an ideal system the last statement would be correct, but this does not take into account chromatin dynamics associated with replication, transcription, etc. Also, total amounts of H3.1 and H3.3 in tissue we used for the experiment is not known. It could possibly be inferred from RNAseq data, but if this would reflect real amounts of the protein is highly questionable. In Arabidopsis there are 5 H3.1 genes and 3 H3.3 genes. Nevertheless, we recalculated data for H3.1 and H3.3 and this has been updated in the main text (~60% of H3.1 and ~42% of H3.3 immunoprecipitated nucleosomes contained both H3 variants). Thus, from the available data these numbers are the best we can get.

      p. 5 bottom paragraph. Repetition.

      This was corrected

      p12. The reference to LSH is dropped in without making clear how it is relevant. Expand on mechanism to suggest similar DDM1 mechanism?

      This section was expanded to provide more background in the interpretation of the results.

      p13. inversion between H2A.W and H2A.Z in "the loss of DDM1 prevents the replacement of H2A.W by H2A.Z".

      This was corrected

      p13. make it clear that the last sentence of the results is a working model, not a fully backed up conclusion.

      Alternative models are mentioned in this section and in the discussion in the revised version.

      p14 middle paragraph. Not clear what "in silico simulation" refers to. Simply chromatin-state classification with ChromHMM?

      This refers to the Jacard index calculation in Fig. 2F that models the impact of the loss of H2A variants (or other elements of chromatin) on the definition of chromatin states by ChromHMM. This is now clarified.

      p14 bottom paragraph: the H2A.Z tail repression of ubiquitin ligase but its being the favoured substrate for H2AK121Ub is apparently contradictory. Can this be explained?

      This refers to H2B Ubiquitination and is now clarified

      p15. Correlation between variants and modifications/chromatin states does not necessarily mean causation.

      We agree and have improved the revised version in this respect.

      p15 "forward feedback loop" is ambiguous (is it a feed-forward loop? A feedback loop?), just use "positive feedback loop".

      This was corrected.

      p23 top "$(Ingouff et al)" doesn't seem properly formatted.

      This reference did not belong there and has been removed.

      Data availability: GSE226469 is not public. The manuscript also mentions availability of source data for all the main figures, but I could not find it. It would be great to make the code publicly available too.

      All the data and code will be public upon posting the revised version of the manuscript.

      Reviewer #3 (Recommendations For The Authors):

      My major concern is authors only used DDM1 as an example to show that the exchange of the histone variant contributes to definition and distribution of chromatin state on transposons (i.e., constitutive heterochromatin regions associated with H2A.W). Readers may wonder whether similar mechanisms also work at the euchromatin region. This point should be clearly discussed and mentioned in the Results (for example, cite recent work on INO80).

      We discuss the impact of other remodelers in the Discussion in the revised version. We hope that the reviewer will understand that doing a study on the impact of other remodelers on chromatin states which would require dozens of new ChIP profiles and is clearly beyond the scope of revising a manuscript.

      Minor:

      1) Fig. 2A and 2B, what does color mean? I guess the color code is referred to chromatin states (Fig. 2F).

      We have clarified on Figure 2A the attribution of a specific color to each chromatin state. This same color is used also in other panels of Figures 2 and S2.

      2) Supplemental Figures: All the figure panels should be on the same page.

      We rearranged supplemental figures so that each figure fits on one page. In places where this was not possible, we created additional supplemental figures.

      3) "We observed that increasing state numbers from 26 to 27 gave rise to biologically redundant states.": Where are the data? Fig S2A? This figure is hard to understand.

      In the updated manuscript, we have described the legend and the methods for FigS2A in more detail.

      Reviewer #4 (Recommendations For The Authors):

      A general concern refers to the text that frequently falls into excessive oversimplifications and/or overstatements, with the danger of being misleading for the reader. This needs to be thoroughly revised.

      We added more careful statements and proposed alternative models when it was possible.

      Specific comments.

      1) Fig 1A. Authors found the ~40% of nucleosomes contained both H3.1 and H3.3. This is a significant finding that deserves a more detailed comment.

      We now provide a more detailed description of IP and MS data presented in Figure 1. This should also help to avoid oversimplifications and/or overstatements as criticized in a general comment.

      2) Fig 1C. "H3. And H3.3 bore the same sets and comparable levels of methylation and acetylation...". Too general statement, please specify. Is this also the case for H3K9me2? Others?

      We did describe this part into more detail to emphasize more precisely what Figure 1 shows. We also included data on K9me into Figure 1 figure supplement 1H.

      3) Fig 1D. Could you confirm the high level of H3K27me1 on H3.3?

      H3K27me1 data are shown both by WB (Figure 1C) and Mass spectrometry (Figure 1D and E). We also provide a possible explanation for high levels of this mark on H3.3 by taking into account the fact that H3K27me1 is also produced by demethylation of H3K27me3 by JMJ demethylases.

      4) All WB in Fig 1. They need to be quantified and normalized (plus statistical analysis) in order to provide strong support to the conclusions.

      The conclusion of all WB are supported by quantified Mass spectrometry data and many WB were even repeatedly shown in Figure 1F (for example IPs for H2A variants and a large set of H3 marks used for WBs) with the same results. Also, association of H3K4me3 and H3K36me3 with H2A variants was analyzed in both ways (Figure 1F); IPs of variants and WBs of variants and marks and IPs of marks and WBs of marks and variants. For most of the data we do not have more than two repeats, so statistical analysis may not be possible.

      Nevertheless, we are convinced that our major conclusions from data presented in Figure 1 and Supporting figure 1 (these are: that H3 variants form both homotypic and heterotypic nucleosomes, that H3 marks do not preferentially associate with H3 variants but some of them do so with H2A variants and that H3 modifications show very complex pattern of associations with each other) are fully valid as they were drawn from two orthogonal approaches and further supported by the chromatin states identified.

      5) Fig. 2A. Authors focus on "the most parsimonious model" based on 26 chromatin states. This needs to be justified in a more explicit manner. It is surprising that this number emerges for an analysis of 27 independent variants and marks. What are the differences in the conclusions when other number of states are used? See also below (reduced number of number derived from the "concatenated model").

      Why 26 states were chosen is now explained in great details in the method section. Since to the exception of H2A variants that are invariably homotypic, nucleosomes can be heterotypic for all other histone variants and histone modifications, the random combination of the 27 marks in one nucleosome representing one states is 4 H2A (without the subtypes) x 4H3 x 2H1 x 2(power16) (for each mark) which is well above the circa 26 states observed. This shows that our probabilistic model reduces the potential complexity of a theorical random association in a remarkable manner.

      6) As a summary, it would be very helpful to generated a table (or similar) where is proposed chromatin state is ascribed to functional genomic elements.

      This aspect of the work is presented in a preprint where the biological association with the chromatin is described in details. See Jamge et al 2002, https://www.biorxiv.org/content/10.1101/2022.06.02.494419v1

      7) Fig 2F (and S2B). A comprehensive comparison a various approaches should include others and estimate the Jaccard similarity index: (1) the same of marks and variants used in the Sequeira-Mendes et al paper, and (2) the subset of marks and variants added in this study. In this way, a direct evaluation of the contributions could be more properly made.

      We thank the reviewer for this suggestion and have now included a new column with the combination of marks and variants as used in Sequeira-Mendes et al., 2014 (see Figure 2F). These data clearly demonstrate that adding histone variants significantly contribute to the definition of chromatin states.

      8) Fig. 3. Explain in more detail the concatenated model used here. Does the reduction in the number of chromatin states mean that the other do not add new information?

      ChromHMM concatenated model allows to identify common definition of chromatin state in multiple tissue types. Here multiple cell types are concatenated leading to a shared definition of chromatin states, but specific to each cell type.

      In our paper we used the concatenated model to identify common chromatin states in two different genotypes (WT and ddm1). The data for WT and ddm1 was obtained from leaves. As we had a limited number of ChIP-seq profiles in the leaves dataset The complexity of the concatenated model was also reduced compared to the extensive 26 chromatin state model. We chose to analyze 16-states in the concatenated model because this was the minimal number of states that gave rise to a similar complexity of heterochromatic states.

      9) The ddm1 mutant. The text in page 14 is a bit confusing. It seems that H2A.Z is deposited on TEs and the exchanged by the H2A.W.

      We have provided additional alternative models that could explain our observations.

      10) Page 15: link between H2A.Z and H3K27me3. Gomez-Zambrano et al (2018, cited in the text, found that only a relatively small subset of (putative) targets are common to H2A.Z and H3K27me3. How do authors reconcile this with their statement supporting a link between both of them?

      We refer to Gomez-Zambranao et al to illustrate the link between H2A.Z and H2AK121ub so we do not understand this comment. The strong link between H2A.Z and H3K27me3 is shown without ambiguity by our work and also Carter et al., 2018.

  2. Jun 2023
    1. Author Response:

      Reviewer #1 (Public Review):

      The study investigates the nature of "trailblazer" cells in distinct tumor models, including luminal B (MMTV/PyMT) and triple negative (TNBC) tumors (C3-TAg). The authors note that the trail-blazer phenotypes in the TNBC model are more complex relative to the Luminal B model and represent distinct EMT programs associated with the expression of distinct EMT-TFs (Zeb1, Zeb2 and Fra-1). They demonstrated that of numerous EMT-TFs, Zeb1 and Fra-1 were required for increased cancer cell migration and invasion. They reveal that TGF-beta and EGF-mediated signaling are required for the diverse EMT states that are required for trailblazer cell activity and increased cell migration/invasion. TGF-beta signaling engaged Zeb 1 and Zeb2 while EGF sig-naling activated Fra-1. Indeed, inhibitors of either TGF-beta or EGF signaling could impair cell migration/invasion. While both pathways contributed to trailblazer phenotypes, EGF signaling was shown to interfere with certain TGF-beta induced transcriptional response, including the ex-pression of genes encoding extracellular matrix proteins.

      One concern was the heavy reliance of the C3-TAg as the sole TNBC model in which the dis-tinct trailblazer phenotypes were described. The data in Fig. 3 of the submission reveals that the phenotypes observed in the C3-TAg model could be recapitulated in a TNBC patient-derived xenograft model (PDX). Using this PDX, the authors were able to show vimentin expression in lung metastatic TNBC cells that were intravascular, those that had extravasated and clusters of cancer cells fully within the lung parenchyma. This was an important addition to the manuscript. The additional experiments to investigate the role of Zeb1 and Zeb1 more fully, beyond the focus on Fra-1 in the initial submission was an additional strength of the new submission. Additional clarifications to the discussion also clarified the concepts articulated in the study. The study em-ploys multiple breast cancer models, utilizes numerous in vitro and in vivo assessments of the trailblazer phenotypes, and the experimental design is rigorous and the interpretation of the data is sound. The manuscript will be of general interest to the research community.

      Thank you for the supportive comments. We are glad that the revisions addressed your prior concerns.

      Reviewer #2 (Public Review):

      This represents an important study that demonstrates a high degree of heterogeneity within trailblazer cells in clusters that participate in collective migration. Solid methods highlight this het-erogeneity and show that in TNBC cancers, trailblazer cells are defined by vimentin (and not Keratin 14) and are dependent on both TGFbeta and EGFR signaling. Additional, single cell stud-ies would further support this work.

      Thank you for the suggestion. Our current data establishes that trailblazer cells are heterogene-ous using FACS, immunostaining and functional studies of fresh tumor organoids and estab-lished tumor organoid lines. In addition, our RNA-seq experiments provided deep insight into the nature of gene expression changes that corresponded with the evolution of new trailblazer states. This discovery of trailblazer cell heterogeneity was one of multiple key new discoveries in this manuscript, along with revealing a Krt14-independent invasion mechanism, the regulation of trailblazer cells by Tgfβ and Egfr signaling and a new compromise mode of signal integration. We agree that our results support further investigation of the nature and function of basal-like breast cancer heterogeneity during the progression to metastasis. However, a comprehensive implementation of scRNA-seq is mostly likely required to further unravel new aspects of hetero-geneity that substantially advance upon the conclusions supported by our current data. Such an undertaking is beyond the scope of this investigation.

      We agree that scRNA-seq would be confirmatory of trailblazer cell heterogeneity that has been demonstrated with multiple approaches rather than a new discovery of heterogeneity.

      Strengths:

      The paper highlights that collective migration, and the nature of trailblazer cells can be highly heterogeneous. This is important as it suggests that the ability to move between states may su-persede a singular phenotype.

      The paper uses animal models and organoids and in several areas attempts to correlate find-ings to human tissues.

      The experiments are logically described.

      Reviewer #3 (Public Review):

      Cancer is a disease of many faces and in particular, the ability of cancers cells to change their phenotypes and cell behaviors - cancer cell plasticity - is a major contributor to cancer lethality and therapeutic challenge of treating this disease. In this study, Nasir, Pearson et al., investigate tumor cell plasticity through the lens of invasive heterogeneity, and in particular in models of tri-ple-negative breast cancer (TNBC), a subtype of breast cancer with particularly poor clinical prognosis and more limited treatment modalities. Using organoid models in a variety of matrix systems, microscopy, and signaling pathway inhibitors, they find that invading TNBC breast tu-mors, primarily in the C31-Tag genetically engineered mouse model of TNBC, are composed of heterogeneous invasive/"trailblazer" type tumor cells that in many cases express vimentin, a classical intermediate filament marker of epithelial-mesenchymal transition, and reduced keratin-14, another filament marker of basal epithelial cells associated with collective invasion in differ-ent breast cancer models. Supportive genetic and pharmacologic evidence is provided that gen-eration of these cells is TGF-beta signaling pathway driven, likely in vivo from the surrounding tumor microenvironment, in accord with published studies in this space. Another important as-pect of this study is the good transcriptional evidence for multiple migratory states showing dif-fering degrees of partial overlap with canonical EMT programs, dependent on TGF-beta, and suggestive but at present incomplete understanding of a parallel program involving Egfr/Fra-1 mediated effects on invasion. When taken in context with other recent studies (Grasset et al. Science Translational Medicine 2022), these data are broadly supportive of concept of targeting vimentin-dependent invasion programs in TNBC tumors.

      The core conclusions of this paper are generally supported by the data, but there are some conceptual and technical considerations that should be taken into account when interpreting this study. Specific comments:

      1) The contribution of the different vimentin-positive trailblazer cells to distant metastasis was not directly confirmed in vivo in this study. Given the limited proliferative potential of many fully EMT'd cells and in light of recent studies indicating that invasion can be uncoupled from meta-static potential, it seems important to directly test whether the different C31-tag isolates, varying in invasive potential in this study, produce metastases and if so do metastases abundance corre-late with the invasive potential in 3D culture. The collection of lungs at 34 days post injection de-scribed in methods is too short to evaluate metastatic frequency.

      We agree that it is important to determine the contribution of trailblazer cells towards metastatic dissemination. In this manuscript, we show that Vimentin expressing cells in a triple negative breast cancer (TNBC) PDX model disseminate to the lungs (Figure 3F). We have also shown that Vimentin expressing SUM159 breast cancer (BC) trailblazer cells spontaneously metasta-size to the lungs in previous publications (Fig. 2–figure supplement 1C) and (Westcott et al, J Clin Invest, 2015, 10.1172/JCI77767 and Maine et al, Oncotarget, 2016, 10.18632/oncotarget.7408). Notably, the depletion of genes specifically expressed in trailblazer cells reduced spontaneous metastasis without significantly impinging on primary tumor growth (Westcott et al, J Clin Invest, 2015, 10.1172/JCI77767 and Maine et al, Oncotarget, 2016, 10.18632/oncotarget.7408). Our new results in Figure 5D show that Tgfβ activates genes that define the trailblazer state in the metastatic SUM159 trailblazer cell model. Thus, features of the Tgfβ regulated trailblazer program in the C3-TAg cells is active in the SUM159 trailblazer model of spontaneous metastasis. In addition, commonly employed BC cell line metastasis models, such as MDAMB231 derivatives are highly mesenchymal (Fig. 2–figure supplement 1C) and (Kang et al, Cell, 2003, 10.1016/S1535-6108(03)00132-6 and Minn et al, Nature, 2005, 10.1038/nature03799, as examples).

      It is not technically feasible to establish a correlation between the relative invasion of The C3-TAg GEMM primary tumors and spontaneous metastasis. C3-TAg GEMM primary tumors de-velop rapidly and the mice must be euthanized prior to the detection of metastasis. This limitation of the model is mentioned in the Results section “Trailblazer cells are specified by Vimentin ex-pression in basal-like breast cancer patient tumors”. The aggressive primary tumor growth and limited spontaneous metastasis of the the C3-TAg model has also been previously reported by others (Green et al, Oncogene, 2000, 10.1038/sj.onc.1203280). Surgical resection of the original primary tumor is not feasible option to allow metastases to form since additional tumors develop in multiple mammary glands.

      In response to reviewer requests, we initiated the growth of orthotopic primary tumors from con-trol or Tgfβ treated 1339-org cells to address the relationship between induction of the trailblazer state and primary tumor cell dissemination. We had to euthanize the mice at day 34 (d34) be-cause tumors within both cohorts had reached the maximum permitted diameter of 2 cm. This will be indicated in the Methods section with revised text. We detected CTCs from the mice bearing control and Tgfβ treated 1339-org cell tumors. However, no micrometastases were de-tected, which is indicated in the text describing Figure 4–figure supplement 3A-B. Thus, per-forming surgical resection in new experiments would not be expected to allow the later detection of metastasis, as there did not appear to be DTCs in the lungs that could initiate colonization. In addition, we would have to resect the tumors prior to d34 to successfully and humanely remove the primary tumors, further reducing the odds of metastases developing. We will continue our work to identify an experimental balance that permits sufficient primary tumor growth to initiate spontaneous metastasis. However, the time scale of resolving this technical challenge is uncer-tain and we believe that our published analysis of trailblazer cell metastasis and new findings here showing the dissemination of Vimentin expressing cells in a PDX model addresses the question of whether Vimentin expressing trailblazer cells metastasize.

      We agree that certain cell states induced by EMT programs can limit the proliferative potential of tumor cells. As described in the Introduction, we previously found that the induction of a trailblaz-er state in a subset of breast cancer cell line models triggers a collateral cost in fitness that limits the ability of trailblazer cells to initiate tumor growth (Westcott et al, Cancer Res, 2020, 10.1158/0008-5472.CAN-20-0014). The traits that distinguish trailblazer cells which are capable of tumor initiation and metastasis versus trailblazer cells with reduced fitness have begun to be delineated. Our prior report suggested that cells that were dependent on p63 for growth lost their proliferative capacity when converting to a trailblazer state (Westcott et al, Cancer Res, 2020, 10.1158/0008-5472.CAN-20-0014). C3-TAg cells are not dependent on p63 for growth, which is indicated by the vast majority of the tumor cells lacking p63 expression in primary tumors and primary tumor organoids (Westcott et al, Cancer Res, 2020, 10.1158/0008-5472.CAN-20-0014), similar to the metastatic SUM159 breast cancer cell line model. We were also able to derive clonal trailblazer cell lines that lacked detectable p63 expression from a C3-TAg tumor (Figure 2—figure supplement 1B) and grow organoids even when the limited extent of p63 expression was further reduced by Tgfβ (Figure 5C). Additionally, the persistent Tgfβ treated 1339-org cells, which were enriched for trailblazer cells and had reduced p63 expression, were capable of initiating primary tumor growth (Figure 4F). Together, these results indicate that C3-TAg trail-blazer cells are capable of initiating metastatic colonization. However, given the heterogeneity in trailblazer states that we discovered, it is possible that a subset of trailblazer cell states have re-duced proliferative capacity. Our analysis approach in this manuscript would not necessarily de-tect these low fitness trailblazer cells if they were a relatively small fraction of the total trailblazer population. We will clarify this point in the Discussion section in the revised manuscript. Our re-sults have begun to reveal mechanisms for the transcriptional regulation of trailblazer cell heter-ogeneity. We plan to continue delineating the regulatory programs conferring specific transcrip-tion state, defining approaches for the prospective isolation of distinct trailblazer subpopulations and determining trailblazer subpopulation specific biomarkers to understand the specific contri-bution of distinct trailblazer subpopulations towards metastasis. Given the scope of this analysis, it is not feasible to incorporate these future studies into this manuscript.

      2) The invasion of cancer cells is dependent on 3D matrix composition. In other studies, collec-tive cancer invasion is performed in exclusively collagen type 1 gels or in other instances entirely in 3D reconstituted basement membrane gel, e.g. lung cancer invasion studies. In this study, the authors use a mixture composed of both matrices. Given the invasion suppressive effects of matrigel, particularly for epithelial type cells, further studies would be important to determine whether the invasion phenotypes seen in this study are generalizable across matrix environ-ments.

      The invasion of C3-TAg and PyMT organoids embedded in a 100% pure reconstituted base-ment is shown in Fig. 1–figure supplement 1G. We will emphasize that trailblazer invasion was evaluated in multiple ECM compositions with revised text and figure graphic. We also provide images for the reviewer showing that C3-TAg organoids collectively invade in a pure Collagen I ECM. Importantly, these findings are consistent with our results showing that Vimentin express-ing cells are associated with basal-like mammary tumor cell invasion in the complex ECM of C3-TAg GEMM primary tumors (Figure 2G) and patient primary tumors (Figure 3D). Moreover, Vimentin expressing cells disseminated to the lungs in the TNBC PDX that we evaluated (Figure 3F).

      The ECM composition selected for experiments is dictated by the experimental question(s) being addressed. It is unlikely that mammary tumor cells would only ever collectively invade through an ECM that is either pure Collagen I or pure reconstituted basement membrane (BM). Indeed, it has been proposed that mixtures of Collagen I and BM proteins best reconstitute the complexity of primary tumor ECM (Hooper et al, Methods Enzymol, 2006, 10.1016/S0076-6879(06)06049-6). In line this observation, mixtures of Collagen I and BM proteins have been routinely used for the past 20 years to define mechanisms of 3D invasion; Xiang and Muthuswamy, Methods En-zymol, 2006, 10.1016/S0076-6879(06)06054-X; Calvo et al, Nat Cell Biol, 2013 10.1038/ncb2756; and Kato et al, eLife, 2023, 10.7554/eLife.76520, as examples).

      Consistent with the known complexity of the ECM in the tumor microenvironment (TME), we detect Collagen I and Collagen IV (a key component of experimental BM) in the TME of primary breast cancer tumor models (Westcott et al, J Clin Invest, 2015, 10.1172/JCI77767). Important-ly, we have found that a mixture of collagen I and experimentally derived BM proteins reliably reveals breast cancer trailblazer cell invasion mechanisms that promote the malignant progres-sion and metastasis of primary tumors and whose expression correlates with poor patient out-come (Westcott et al, J Clin Invest, 2015, 10.1172/JCI77767 and Westcott et al, Cancer Res, 2020, 10.1158/0008-5472.CAN-20-0014, as examples). Notably, the relative differences in trail-blazer and opportunist cell invasive phenotypes are not dictated by the ECM composition used in our 3D assays. We have previously tested the invasion of trailblazer and opportunist subpopula-tions in different ECM compositions using both spheroid vertical invasion assays (Westcott et al, J Clin Invest, 2015, 10.1172/JCI77767). Increasing collagen I concentration enhanced the rela-tive rate of trailblazer cell invasion, with trailblazer cells always showing a significantly enhanced invasion relative to opportunist cells.

      The relationship between trailblazer and opportunist cells that we have detected in primary tu-mors is recapitulated when using mixtures of Collagen I and BM proteins in our past publications and in this manuscript. The clonal opportunist cell lines derived from a C3-TAg tumor expressed high levels of the transcription factor p63 (Figure 2–figure supplement 1A-B). We previously showed that p63 restricts induction of a trailblazer state in human breast cancer trailblazer cell lines (Westcott et al, Cancer Res, 2020, 10.1158/0008-5472.CAN-20-0014). Notably, we showed that p63 expressing C3-TAg cells were not able to initiate collective invasion in the same ECM composition used in our current manuscript. Moreover, p63 cells in primary C3-TAg tumors were noninvasive opportunist cells that were limited to trailing p63-low trailblazer cells when collective-ly invading in primary tumors and in organoids (Westcott et al, Cancer Res, 2020). We now show that p63 expressing opportunist cell lines are limited to invading behind primary C3-TAg trailblazer cells and trailblazer cell lines in our 3D invasion assays (Figure 1B and Figure 1–figure supplement 1D-E). Together, these results indicate that the ECM employed in our 3D assays reveals the mechanistic underpinnings of both trailblazer and opportunist cell invasion in primary tumors.

      With respect to lung cancer invasion, leader cells that we would classify as trailblazer cells have been isolated from 2 non-small cell lung cancer cell line spheroid models grown in pure reconsti-tuted BM extract (Konen et al, Nat Comm, 2017, 10.1038/ncomms15078). However, it unclear whether these cell line derived NSCLC trailblazer cells are more intrinsically invasive than non-trailblazer siblings in primary NCSCLC tumors or if the traits associated cell line NSCLC trail-blazer cells are required for metastasis. These tests have never been reported to the best of our knowledge. Similarly, it is not clear whether these NSCLC cell line derived trailblazer cells reflect features of primary NSLC primary tumor cells, as we are unaware of any such comparisons be-ing reported. Thus, there is no reason to consider pure reconstituted BM to be an equivalent or preferred experimental option to define trailblazer cell features. Nevertheless, as we mentioned before, our discovery approach identifies trailblazer cells that are intrinsically more invasive than opportunist siblings across multiple ECM conditions, including pure reconstituted BM and, im-portantly, in primary tumors.

      3) TGF-beta is well known to induce EMT. Although this study identifies potential transcriptional mediators of the invasion/trailblazer program, is this program reversible?

      We have previously shown the breast cancer trailblazer cells can convert to an opportunist state, demonstrating that trailblazer states are reversible (Westcott et al, J Clin Invest, 2015, 10.1172/JCI77767). In this manuscript. we show that C3-TAg organoid lines derived in the Tgfbr1 inhibitor A83-01 have few if any cells with a trailblazer phenotype relative to C3-TAg pri-mary tumors, suggesting a reversion of the trailblazer state (Fig. 4C and Figure 4–figure sup-plement 2A-C). However, our results do not entirely rule out the possibility that only non-trailblazer cells grew to establish the organoid lines. Indeed, the problem of tracing phenotypic conversions when evaluating heterogeneous populations is a systemic challenge that extends beyond our analysis of trailblazer cells. Clearly defining the conversion rates for trailblazer cells will require multiple genetic markers to distinguish the different trailblazer states we have now identified, in addition to phenotypic and molecular analysis over multiple days, or possibly weeks. Thus, further definition of the rate of reversion of different trailblazer cells is worthy line of future investigation rather than a feasible objective of this study.

    1. Author Response:

      We thank the reviewers for their careful and overall positive assessment of our work.

      Reviewer #1 (Public Review):

      This paper describes the discovery, functional analysis and structure of TcaP, a protein encoded by the Vibrio phage satellite PLE that forms a size-determining scaffold around PLE procapsids made from helper phage ICP1 structural proteins. The system displays a fascinating similarity to the P2/P4 system, which had previously been unique in its use of a size-determining external scaffolding protein, Sid. The work is interesting, comprehensive and of high quality. The presentation could be improved as listed in the suggestions below.

      An interesting observation is that PLE appears to be dependent on small capsids for efficient transduction. This is not completely surprising if the element uses a cos site type mechanism for packaging, since this requires an integer number of genomes to be packaged when the capsid is full, and this might be more difficult to accomplish when the helper capsid is much larger than the satellite, as is the case with ICP1. The authors mention in a few places that this is the first known satellite to have this requirement. However, this is not quite correct: a similar defect was seen in phi12/SaPIbov5, where the large phi12 capsid was not quite the right size for either two or three copies of the wildtype ("unevolved") SaPIbov5 (Carpena et al. 2016).

      We thank the reviewer for bringing up this point. First, we agree that for cos type packaging systems, this would not be surprising. However, ICP1 is a pac type phage and we have evidence that PLE is also a pac rather than a cos type packaging satellite; therefore, PLE is the first headful satellite to show such a defect. For cos packaging elements, both SaPIbov5 and P4, non-integer genome lengths have been shown to pack less efficiently into capsids as pointed out above and shown in Carpena et al 2016 and Shore 1978. However, in both of these cases, the genomes were manipulated to change their size, suggesting that naturally occurring cos satellites maintain their genome sizes to be proportional to their capsid sizes or in integer proportion to their helper capsids. We will include a short summary of these previous findings in the main text to provide context for the rare decreases in transduction efficiency reported in the cos satellites.

      The authors present several micrographs showing capsids formed in the presence or absence of wildtype or mutant TcaP and CP (Fig. 1, Fig 2., Fig 3). However, each micrograph shows only a handful of particles of the "correct" size, in addition to a few shells that are aberrant or of a different size. I miss a more statistically rigorous enumeration of shells of different size (PLE or ICP1 sized, or different), empty vs. full, aberrant shells etc. This could be presented as a size distribution graph, a histogram or in table form.

      We thank the reviewer for this recommendation and agree that it would add to the manuscript. We will quantify these particles and present the data in the main text.

      In the abstract, the term "divergent satellite P4" is vague and unclear. Divergent from what? Probably they mean distinct from or unrelated to PLE. Please clarify.

      Yes, we did mean unrelated to PLE, and we will clarify in the text.

      How do they know that gp123 is a decoration protein? Was this previously determined, does it have (sequence) similarity to other known decoration proteins, or is it simply the most likely designation based on its position in the genome?

      Gp123 was annotated based on its position. While there is sequence similarity to other annotated Vibrio phages’ decoration proteins, we will clarify in the text that Gp123 is a putative decoration protein.

      Although the reconstruction and modeling statistics are good, it is difficult to assess the quality of the map and the model from the presented figures. Details of the density and FSC curves (half-map and model-to-map) should be shown. It is also difficult to see the TcaP structure and how it compares to Sid from the figures presented.

      We will address this concern in the revised manuscript.

      Introduction, Paragraph 3: "...which is the number of coat proteins divided by 60" is not strictly speaking the definition of T number. The T number corresponds to the number of subtriangles that one triangular face of the icosahedron is divided into. It corresponds to the number of coat proteins divided by 60 in the canonical case, but in tailed phages, 5 copies are removed to make way for the portal protein. (Other viruses could be described as having architecture corresponding to a specific T number, but with divergent numbers of subunits, e.g. adenoviruses or polyomaviruses.)

      We agree that our simplified explanation of the T number is not entirely accurate and will modify the sentence appropriately.

      Reviewer #2 (Public Review):

      Phage satellites are fascinating elements that have evolved to hijack phages for induction, packaging, and transfer, promoting their widespread dissemination in nature. It is remarkable how different satellites use conserved strategies of parasitism, utilising unrelated proteins that perform similar roles in their cognate elements. In the current manuscript, Dr. Seed and coworkers elucidated the mechanism used by one family of satellites, the PLEs, to produce small capsids, a process that inhibits phage reproduction while increasing PLE transmission. The work is presented beautifully, and the results are astonishing. The authors identified the gene responsible for generating the small capsids, characterised its role in the PLE transfer and phage inhibition, and determined the structure of the PLE-sized small capsids. It is a truly impressive piece of work.

      We thank the reviewer for their positive evaluation of our work.

      Reviewer #3 (Public Review):

      The manuscript by Boyd and co-authors "A Vibrio cholerae viral satellite maximizes its spread and inhibits phage by remodelling hijacked phage coat proteins into small capsids" reports important results related to self-defending mechanisms that bacteria are used against phages that infect them. It has been shown previously that bacteria produce phage-inducible chromosomal island-like elements (PLE) that encode proteins that are integrated into bacterial genome. These proteins are used by bacteria to amend the phage capsids and to create phage-like particles (satellites) that move between cells and transfer the genetic material of PLE to another bacteria. That study highlights the interactions between a PLE-encoded protein, TcaP, and capsid proteins of the phage ICP1.

      The manuscript is well written, provides a lot of new information and the results are supported by biochemical analysis.

      We thank the reviewer for their supportive evaluation of our work.

    1. Author Response:

      We would like to thank the reviewers for their time in evaluating our manuscript. The reviewers provided constructive comments and suggested changes to improve our manuscript. The main comment was about the framing. We agree with the reviewers and will rewrite the manuscript to focus more on migration patterns than conservation. We will add and expand the paper's theoretical framework and include the studies and descriptions of migration patterns of individual species suggested by the reviewers. At the same time, some of the reviewers' comments (especially on the terms and suggestions for changing the title of the paper) are mutually exclusive. We will pay particular attention to this issue and improve the paper's theoretical basis.

    1. Author Response

      Joint Public Review

      Strengths

      Overall, the idea that the PAG interacts with the BLA via the midline thalamus during a predator vs. foraging test is new and quite interesting. The authors have used appropriate tools to address their questions. The major impact in the field would be to add evidence to claims that the BLA can be downstream of the dPAG to evoke defensive behaviors. The study also adds to a body of evidence that the PAG mediates primal fear responses.

      Weaknesses

      (Anatomical concerns)

      1) The authors claim that the recordings were performed in the dorsal PAG (dPAG), but the histological images in Fig. 1B and Supplementary S2 for example show the tip of the electrode in a different subregion of PAG (ventral/lateral). They should perform a more careful histological analysis of the recording sites and explain the histological inclusion and exclusion criteria. Diagrams showing the sites of all PAG and BLA recordings, as well as all fiber optics, would be helpful.

      The PAG is composed of dorsomedial (dm), dorsolateral (dl), lateral (l), and ventrolateral (vl) columns that extend along the rostro-caudal axis of the aqueduct. The term “dorsal PAG” (dPAG) generally encompasses dmPAG, dlPAG, and lPAG, as substantiated by track-tracing, neurochemical, and immunohistochemical techniques (e.g., Bandler et al., 1991; Bandler & Keay, 1996; Carrive, 1993). As Bandler and Shipley (1994) summarized, “These findings suggest that what has been traditionally called the 'dorsal PAG' (a collective term for regions dorsal and lateral to the aqueduct), consists of three anatomically distinct longitudinal columns: dorsomedial and lateral columns…and a dorsolateral column…" Similarly, Schenberg et al. (2005) clarified in their review that, “According to this parcellation...the defensive behaviors (freezing, flight or fight) and aversion-related responses (switchoff behavior) were ascribed to the DMPAG, DLPAG, and LPAG (usually named the ‘dorsal’ PAG).” In our study, all recordings were conducted within the dPAG. Also, Figures 1B and S2 in our manuscript correspond to the -6.04 mm template from Paxinos & Watson’s atlas (1998), which is shown in the left panel in Author response image 1 and is considerably anterior to the location where the vlPAG emerges, as shown in the right panel. In our revised manuscript, we will provide a detailed definition of the dPAG, inclusive of dmPAG, dlPAG, and lPAG, and support this with the referenced literature.

      Author response image 1.

      2) Prior studies investigating the role of BLA neurons during a foraging vs. robot test similar to the one used in this study should be also cited and discussed (e.g., Amir et al 2019; Amir et al 2015). These two studies demonstrated that most neurons in the basal portion of the BLA exhibit inhibitory activity during foraging behavior and only a small fraction of neurons (~4%) display excitatory activity in response to the robot (in contrast to the 25% reported in the present study). A very accurate histological analysis of BLA recording sites should be performed to clarify whether distinct subregions of the BLA encode foraging and predator-related information, as previously shown in the two described studies.

      In the revised manuscript, we will discuss papers by Amir et al. (2015) and Amir et al. (2019) that utilized a similar 'approach food-avoid predator' paradigm. These studies found a correlation between the neuronal activities in the basolateral amygdala (BL) and the velocity of animal movement during foraging, regardless of the presence or absence of predators. Specifically, the majority of BL neurons were inhibited in both conditions, with only 4.5% being responsive to predators. Consequently, Amir et al. posited that amygdala activity predominantly aligns with behavioral output such as foraging, rather than with responses to threats.

      In contrast, our body of work (Kim et al., 2018; Kong et al., 2021; the present study) reveals that the majority of neurons in the BA/BLA displayed distinct responses in pre-robot and robot sessions. Kong et al. (2021) discussed in depth several factors that may account for this discrepancy, given that both Amir et al. and our research used similar behavioral paradigms. Differences in apparatus features, experimental procedures, and data analysis methodologies (refer to Amir et al., 2019) could be contributing to the conflicting results and interpretations concerning the significance of amygdalar neuronal activities.

      Additionally, our studies uniquely monitored the same set of amygdalar neurons during pre-robot and robot sessions, affording us the opportunity for a direct comparison of neuronal activities under different threat conditions.

      Another salient difference lines in the foraging success rates, which were markedly higher in Amir et al (~80%) compared to our studies (<3-4%). We hypothesize that there may be an inverse relationship between the pellet procurement rate and the intensity of fear. The high foraging success rate in Amir et al., which correlates with subdued amygdalar activity, stands in contrast to our findings of heightened amygdalar activity associated with a lower foraging success rate. Supporting this notion, optogeneticallyinduced amygdalar activity led naïve rats to abandon foraging and escape to the nest (Kong et al., 2021, the present study).

      3) An important claim of this study that the PAG sends predator-related signals to BLA via the PVT (Fig. 4). The authors stated that PVT neurons labeled by intra-BLA injection of the retrograde tracer CTB were activated by the predator, but a proper immunohistochemical quantification with a control group was not provided to support this claim. To provide better support for their claim, the authors should quantify the doublelabeled PVT neurons (cFos plus CTB positive neurons) during the robot test.

      As recommended, we will include a revised Fig. 4 in the manuscript to present the quantification of neurons that are double-labeled with c-Fos and CTB in the PVT. This updated figure will provide a more rigorous analysis and visual representation of the data.

      4) The AVV anterograde tracer deposit spread to a large part of the PAG, including dorsolateral and lateral PAG, and supraoculomotor regions (Fig. 4B). Is the projection to the PVT from the dPAG or other regions of the PAG?

      As previously addressed in response to Comment #1, the dPAG comprises the dmPAG, dlPAG, and lPAG. In the revised manuscript, we will acknowledge the diffusion of the AAV to the adjacent deep gray layer of the superior colliculus. Additionally, we are considering conducting more restricted AAV injections into the dPAG to verify terminal expressions in the PVT.

      (Concerns about the strength of the evidence supporting a role for the PVT)

      5) The authors conclude in the discussion section that the dPAG-amygdala pathway is involved in generating antipredatory defensive behavior. However, the current results are entirely based on correlational analyses of neural firing rate and there is no direct demonstration that the PAG provides information about the robot to the BLA. Therefore, the authors should tone down their interpretation or provide more evidence to support it by performing experiments applying inhibitory tools in the dPAG > PVT > BLA pathway and examining the impact on behavior and downstream neural firing.

      As suggested, we will moderate the assertions about the functional implications of the PVT, based on the data from anterograde and retrograde tracers, to present a more measured interpretation in the manuscript.

      (Other concerns)

      6) One of the main findings of this study is the observation that BLA neurons that are responsive to PAG photostimulation are preferentially recruited during the foraging vs. robot test (Fig. 3). However, the experimental design used to address this question is problematic because the laser photostimulation of PAG neurons preceded the foraging vs. robot test. Prior photoactivation of PAG may have caused indirect shortterm synaptic plasticity in BLA cells, which would favor the response of these cells to the robot. Please see Oishi et al, 2019 PMID: 30621738, which demonstrated that 10 trains of 20Hz photoactivation (300 pulses each) was sufficient to induce LTP in brain slices.

      After approximately eight photostimulation trials of the dPAG, with 40 pulses each, the animals entered a post-photostimulation testing phase (referred to as "Post"; Fig. 3C), lasting 10-15 minutes over an average of eight trials before robot testing. Although the PAG does not directly project to the BLA, the remote possibility of trans-synaptic plasticity in the BLA cannot be completely excluded and will be acknowledged. Additionally, it is noteworthy that Oishi et al's (2019) study applied a total of 3,000 pulses (i.e., 10 15-s trains of 20-Hz pulses) and investigated CA3-CA3 synaptic plasticity, as opposed to a total of 320 pulses (i.e., 8 2-s trains of 20-Hz pulses) in our study.

      7) The authors should perform a longitudinal analysis of the behavioral responses of the rats across the trials to clarify whether the animals habituate to the robot or not. In Figure 1E, it appears that PAG neurons fire less across the trials, which could be associated with behavioral habituation to the predator robot. If that is the case, the activity of many other PAG and BLA neurons will also most likely vary according to the trial number, which would impact the current interpretation of the results.

      In Figure 1E, the y-axis represents the Z scores of individual dPAG neurons, instead of representing repeated tests of the same neuron across multiple trials. The raster plot in Figure 1F clearly depicts that the same dPAG neurons consistently display heightened neural activity in response to the approaching robot across successive trials.

      8) In Figure 1, it is unclear why the authors compared the activity of neurons that respond to the robot activation against the activity of the neurons during the retrieval of the food pellets in the pre-robot and postrobot sessions. The best comparison would be aligning the cells that were responsive to the activation of the robot with the moment in which the animals run back to the nest after consuming the pellets during the prerobot or post-robot sessions. This would enable the authors to demonstrate that the PAG responses are directly associated with the expression of escaping behavior in the presence of the robot rather than associated with the onset of goal-directed movement in direction to the next during the pre- and post-robot sessions. A graphic showing the correlation between PAG firing rate and escape response would be also informative.

      Figure 1E compares the dPAG neural activity when animals enter a designated pellet zone (time-stamped by camera tracking) during both pre-robot and post-robot trials to the dPAG neural activity when entering the robot trigger zone (time-stamped by robot activation). We wish to clarify that rats carry the large (0.5 g) pellet back to the nest for consumption rather than consume it in the open arena before returning to the nest.

      In our study, we aimed to investigate the direct response of dPAG neurons to the looming predator and explore the communication between dPAG and BLA in relation to antipredatory defensive responses. To build upon our previous research that suggests a potential role of dPAG in conveying such responses to the BLA (Kim et al., 2013) and the immediate firing of BLA neurons in response to predatory threats (Kim et al., 2018; Kong et al., 2021), we chose to narrow our testing window to a short latency period (< 500 ms) following robot activations. This specific time window allowed us to focus on the initial stages of the threat stimulus processing and minimize potential confounding factors such as the presence of residual firing activity triggered by the robot during the animals’ escape or any activity changes induced by the animals' behavior.

      Furthermore, Figure S1C clearly demonstrates that (i) increased activity of dPAG robot cells preceded the animals’ actual turning and fleeing behavior toward the nest, as indicated by the peak values of movement speed (dark yellow), and (ii) the presence of pellets did not affect activity changes of the robot cells during pre- and post-robot sessions. These observations suggest that the heightened activity of dPAG robot cells was not due to movement changes or pellet motivation.

      Lastly, as stated in the original manuscript, the vast majority of robot cells (90.9%) did not show significant correlations between movement speed and firing rates, lending further support to the interpretation that the dPAG activity observed was not merely a reflection of movement changes.

      References

      Bandler, R., Carrive, P., & Depaulis, A. (1991). Emerging principles of organization of the midbrain periaqueductal gray matter. The midbrain periaqueductal gray matter: functional, anatomical, and neurochemical organization, 1-8.

      Bandler, R. & Keay, K. A. (1996). Columnar organization in the midbrain periaqueductal gray and the integration of emotional expression. Progress in brain research, 107, 285-300.

      Bandler, R. & Shipley, M. T. (1994) Columnar organization in the midbrain periaqueductal gray: modules for emotional expression? Trends in Neurosciences, 17(9), 379-89.

      Carrive, P. (1993). The periaqueductal gray and defensive behavior: functional representation and neuronal organization. Behavioural brain research, 58(1-2), 27-47.

      Oishi, N., Nomoto, M., Ohkawa, N., Saitoh, Y., Sano, Y., Tsujimura, S., ... & Inokuchi, K. (2019). Artificial association of memory events by optogenetic stimulation of hippocampal CA3 cell ensembles. Molecular brain, 12, 1-10.

      Paxinos, G. & Watson, C. (1998). The Rat Brain in Stereotaxic Coordinates. Academic Press, San Diego. Schenberg, L. C., Póvoa, R. M. F., Costa, A. L. P., Caldellas, A. V., Tufik, S., & Bittencourt, A. S. (2005). Functional specializations within the tectum defense systems of the rat. Neuroscience & Biobehavioral Reviews, 29(8), 1279-1298.

    1. Author Response

      We are grateful for the constructive feedback and the possibility of further improving our manuscript in terms of quality and clarity. Below, we have prepared a brief answer to the points raised in the reviewers’ feedback. We plan to address all these issues fully in the revised version of the manuscript.

      We agree that some of our claims were overly enthusiastic. We will rewrite parts of the manuscript to tame our statements. Additionally, we are thankful for the comments on the use of language, which we will certainly apply while editing the manuscript. Below, we focus on the main comments.

      Both reviewers: We appreciate advice on possible confounding factors. We should note here that there is substantial evidence on the effects of alpha rhythm amplitude on the excitability of a neuronal network and, as a consequence, on the amplitude of evoked responses (Baumgarten et al., 2016 Cerebral Cortex; Iemi et al., 2017 eLife; Stephani et al., 2021 eLife). This effect is due to changing the gain for evoked responses, and it is quite different compared to the baseline-shift mechanism (BSM). In BSM, the changes in the amplitude of evoked responses occur due to the generation of an additional evoked response component, which we tried to reveal in our current work. Still, we agree with suggestions to test additional factors, such as earlier evoked responses, baseline window, and head size, and we will test those.

      Reviewer #2 Comment 2: Certainly, for low-density recordings, some method of data transformation is required. Here we would like to show our reasoning for why we did not use current-source density (CSD) but rather utilised other approaches. First, the CSD transform performs well for spatially localised activities since it is a spatial high-pass filter. In our case, P300 and alpha amplitude dynamics are fairly widespread with low spatial frequency, and we believe we would not benefit from applying CSD. Second, CSD has been shown to be more sensitive to surface sources in the crowns of gyri. For activity in the P300 window, we have no reason to believe that this is the case. Third, as we completely agree that low density montage is a limitation, we used source reconstruction with eLoreta (Fig. 5) to refine the spatial localisation of potential sources of P300 and alpha amplitude change.

      Reviewer #1 Comment 4: Our study is indeed based on a sample of older participants. However, in our previous work (Studenova et al., 2022), we compared young and elderly participants using resting-state data. There, we measured the baseline-shift index (BSI). We found that BSIs for elderly participants were lower in comparison to those for young participants. Therefore, despite these limitations, in the current study, we were still able to detect a correspondence between BSIs and evoked responses in elderly participants. Therefore, we believe that for a sample of young participants, the results should not be different.

      Reviewer #2 Comment 4: We agree that mediation analysis will provide additional insights, and we will add it to the revised version of the manuscript.

      Overall, we found the reviewer's comments very helpful. We will update the manuscript accordingly.

    1. Author Response:

      We would like to thank the reviewers for their comments on the manuscript. The primary concern that they raised is that the imaging data are largely qualitative. This is a fair assessment, and we agree that a careful quantitative characterization of TF clustering with and without IDRs using high resolution imaging would provide valuable insight that would extend our findings. Our goal for this study was to conduct a high level survey of IDR localization, for which we believe a qualitative overview was sufficient. We hope that this work can serve as a useful foundation for future studies of the complex roles that IDRs play in TF function.

    1. Author Response

      Reviewer #1 (Public Review):

      1) Only one PITAR siRNA was tested in majority of the experiments, which compromises the validity of the results. Some results are inconsistent. For example, Fig 2G indicates that PITAR siRNA caused G1 arrest. However, PITAR overexpression in the same cell line did not show any effect on cell cycle progression in Fig 5I.

      We thank the reviewer for this comment. Indeed, we have used two siRNAs in experiments related to Fig. 2C, 2D, and 2E. Keeping the reviewer’s comment, we plan to reproduce the results of Fig. 2F, 2G, 2H, 2I, 5A, 5B, 5E, and supplementary Fig. 5A using additional siRNA targeting PITAR.

      The reason for the fact that “PITAR silencing showed a robust G1 arrest, but PITAR overexpression failed to show any effect on the cell cycle profile” is as follows: since glioma cells overexpress PITAR (which keeps the p53 suppressed), silencing PITAR (which will elevate p53 levels) in glioma cells will show a robust phenotype in cell cycle profile (in the form of increase G1 arrest). In contrast, the overexpression of PITAR in glioma cells (which already has high levels of PITAR and hence drastically reduced p53 levels) is unlikely to show any significant change in the cell cycle profile. But, a phenotype for PITAR overexpression on cell cycle profile can be shown in DNA-damaged (which induces p53 levels) glioma cells. Indeed, we have done this experiment in Fig. 5L, which shows G2/M arrest (42.34%) induced by DNA damage is reduced significantly (19%) in PITAR overexpressed condition (34.42%). However, keeping reviewers' comments in the right spirit, we plan to repeat this experiment with appropriate modifications to arrive at a more robust phenotype for PITAR overexpression.

      2) The conclusion that PITAR inactivates p53 through regulating TRIM28, which is highlighted in the title of the manuscript, is not supported by convincing results. Although the authors showed that a PITAR siRNA increased while PITAR overexpression decreased p53 level, the siRNA only marginally increased the stability of p53 (Fig 5E). The p53 ubiquitination level was barely affected by PITAR overexpression in Fig 5F. To convincingly demonstrate that PITAR regulates p53 through TRIM28, the authors need to show that this regulation is impaired/compromised in TRIM28-knockout conditions. The authors only showed that TRIM28 overexpression suppressed PITAR siRNAinduced increase of p53, which is not sufficient. Note that only one cell line was investigated in Fig 5.

      To address this issue, we will overexpress PITAR in TRIM28 silenced cells to show the requirement of TRIM28 for PITAR to inhibit p53. In addition, we also plan to carry out PITAR silencing and overexpression experiments in another glioma cell line as recommended by the reviewer.

      3) Another major weakness of this manuscript is that the authors did not provide any evidence indicating that the glioblastoma-promoting activities of PITAR were mediated by its regulation of p53 or TRIM28 (Fig 6 and Fig 7). Thus, the regulation of glioblastoma growth and the regulation of TRIM28/p53 appear to be disconnected.

      We would like to respectfully disagree with the reviewer on this particular point. We have indeed provided the following evidence in the current version of the manuscript glioblastomapromoting activities of PITAR were mediated by its regulation of p53 or TRIM28.

      A) In Fig. 6, we demonstrate that PITAR silencing-induced reduction in the neurosphere growth is accompanied by a reduction in TRIM28 RNA and an increase in the CDKN1A RNA without a change in p53 RNA levels. We also demonstrate that PITAR overexpression-induced neurosphere growth is accompanied by an increase in the TRIM28 RNA, and a decrease in CDKN1A RNA without a change in p53 RNA levels.

      B) To add strength to the above results, we plan to do western blot experiments under similar conditions to demonstrate the appropriate changes in TRIM28, p53, and CDKN1A levels. Also, we will do a TRIM28 rescue experiment in RG5 neurosphere cells.

      C) In supplementary Fig. 6 (related to Fig. 6), we show that PITAR silencing failed to decrease neurosphere growth in mutant p53 containing GSC line (MGG8).

      D) In supplementary Fig. 7 (related to Fig. 6), we show that PITAR silencing failed to inhibit colony growth of p53-silenced U87 glioma cells (U87/shp53#1). We also show that while PITAR silencing decreased TRIM28 RNA levels in U87/shNT and U87/shp53#1 glioma cells, it failed to increase CDKN1A and MDM2 (p53 targets) at the RNA level.

      E) In Fig. 7, we show that the TRIM28 protein level is drastically reduced in small tumors formed by U87/siPITAR cells.

      F) In supplementary Fig. 8 (related to Fig. 7), we show that glioma tumor formed by U87/PITAR OE express high levels of TRIM28 protein but reduced levels of p21 protein.

      G) We also plan to do additional experiments, as described below, to demonstrate that glioblastoma-promoting activities of PITAR are indeed mediated by its regulation of p53 or TRIM28. We will demonstrate the inability of PITAR overexpression to induce the growth of glioma-tumor initiated by TRIM28 silenced U87 cells.

      4) It is not clear what kind of message the authors tried to deliver in Fig 7F/G. Based on the authors' hypothesis, DNA-damaging agents like TMZ would induce PITAR to inactivate p53, which would compromise TMZ's anti-cancer activity. However, the data show that TMZ was very effective in the inhibition of U87 growth. The authors may need to test whether PITAR downregulation, which would increase p53 activity, have any effects on TMZ-insensitive tumors. Such results are more therapeutically relevant.

      Reviewer #1 rightly pointed out that TMZ induces PITAR expression, which should compromise TMZ's anti-cancer activity. In addition, overexpression of PITAR also promotes glioma-tumor growth. Figure 7F&G demonstrates the following two facts:1. PITAR overexpression increases the glioma-tumor growth (Figure 7G, compare red line with the blue line), 2. PITAR overexpressing glioma-tumor are resistant to TMZ chemotherapy (Figure 7G, compare the pink line with the green line).

      In addition, in Figure 2I, we indeed show that PITAR-silenced cells are more sensitive to TMZ and Adriamycin chemotherapy.

      However, considering reviewers’ comments, we plan to repeat Figure 7A, combining TMZ chemotherapy and PITAR silencing to demonstrate that TMZ chemotherapy-induced PITAR indeed promotes chemo-resistance.

      5) Lastly, the model presented in Fig 7H is confusing. It is not clear what the exact role of PITAR in the DNA damage response based on this model. If DNA damage would induce PITAR expression, this would lead to inactivation of p53 as revealed by this manuscript. However, DNA damage is known to activate p53. Do the authors want to imply that PITAR induction by DNA damage would help to bring down the p53 level at the end of DNA damage response? The presented data do not support this role unfortunately.

      We appreciate reviewer #1 comments. Based on our model in 7H, we believe DNA damageinduced PITAR attenuates DNA damage response by increasing TRIM28 protein levels. TRIM28 ubiquitinates p53 in an MDM2-dependent manner ( Wang et al., 2005). Based on this, we hypothesised that PITAR-induced TRIM28 also contributes to MDM2 mediated ending of DNA damage response.

      Considering the reviewers' comments, we plan to do the following experiment.

      The kinetics of p53, TRIM28, p21, MDM2 protein levels, and PITAR RNA levels after DNA damage will be monitored in PITAR-silenced conditions. It is known that reduction in the DNA damage-induced p53 levels coincides with high levels of MDM2 accumulation. We believe that in PITAR-silenced cells, p53 levels will remain high for a longer time compared to control cells because of the lack of PITAR-induced TRIM28-mediated degradation of p53.

    1. Author Response:

      Reviewer #1 (Public Review):

      […] The major strength of the study is the elegant and well-powered data set. Longitudinal data on this scale is very difficult to collect, especially with patient cohorts, so this approach represents an exciting breakthrough. Analysis is straightforward and clearly presented. However, no multiple comparison correction is applied despite many different tests. While in general I am not convinced of the argument in the citation provided to justify this, I think in this case the key results are not borderline (p<0.001) and many of the key effects are replications, so there are not so many novel/exploratory hypothesis and in my opinion the results are convincing and robust as they are. The supplemental material is a comprehensive description of the data set, which is a useful resource.

      The authors achieved their aims, and the results clearly support the conclusion that the AD and mean confidence in a perceptual task covary longitudinally. I think this study provides an important impact to the project of computational psychiatry.Sspecifically, it shows that the relationship between transdiagnostic symptom dimensions and behaviour is meaningful within as well as across individuals.

      Response: We thank the reviewer for their appraisal of our paper and positive feedback on the main manuscript and supplementary information. We agree with the reviewer that the lack of multiple comparison corrections can also justified by key findings being replications and not borderline significance. We have added this additional justification to the manuscript (Methods, Statistical Analyses, page 15, line 568: “Adjustments for multiple comparisons were not conducted for analyses of replicated effects”)

      Reviewer #2 (Public Review):

      […] The major strength and contribution of this study is the use of a longitudinal intervention design, allowing the investigation of how the well-established link between underconfidence and anxious-depressive symptoms changes after treatment. Furthermore, the large sample size of the iCBT group is commendable. The authors employed well-established measures of metacognition and clinical symptoms, used appropriate analyses, and thoroughly examined the specificity of the observed effects.

      However, due to the small effect sizes, the antidepressant and control groups were underpowered, reducing comparability between interventions and the generalizability of the results. The lack of interaction effect with treatment makes it harder to interpret the observed differences in confidence, and practice effects could conceivably account for part of the difference. Finally, it was not completely clear to me why, in the exploratory analyses, the authors looked at the interaction of time and symptom change (and group), since time is already included in the symptom change index.

      Response: We thank the reviewer for their succinct summary of the main results and strengths of our study. We apologise for the confusion in how we described that analysis. We examine state-dependence., i.e. the relationship between symptom change and metacognition change, in two ways in the paper – perhaps somewhat redundantly. (1) By correlating change indices for both measures (e.g. as plotted in Figure 3D) and (2) by doing a very similar regression-based repeated-measures analysis, i.e. mean confidence ~ time*anxious-depression score change. Where mean confidence is entered with two datapoints – one for pre- and one for post-treatment (i.e. within-person) and anxious-depression change is a single value per person (between-person change score). This allowed us to test if those with the biggest change in depression had a larger effect of time on confidence. This has been added to the paper for clarification (Methods, Statistical Analysis, page 14, line 553-559: “To determine the association between change in confidence and change in anxious-depression, we used (1) Pearson correlation analysis to correlate change indices for both measures and, (2) regression-based repeated-measures analysis: mean confidence ~ time*anxious-depression score change, where mean confidence is entered with two datapoints (one for pre- and one for post-treatment i.e., within-person) and anxious-depression change is a single value per person (between-person change score)”).

      The analyses have also been reported as regression in the Results for consistency (Treatment Findings: iCBT, page 5, line 197-204: ‘To test if changes in confidence from baseline to follow-up scaled with changes in anxious-depression, we ran a repeated measure regression analyses with per-person changes in anxious-depression as an additional independent variable. We found this was the case, evidenced by a significant interaction effect of time and change in anxious-depression on confidence (b=-0.12, SE=0.04, p=0.002)… This was similarly evident in a simple correlation between change in confidence and change in anxious-depression (r(647)=-0.12, p=0.002)”).

      This longitudinal study informs the field of metacognition in mental health about the changeability of biases in confidence. It advances our understanding of the link between anxiety-depression and underconfidence consistently found in cross-sectional studies. The small effects, however, call the clinical relevance of the findings into question. I would have found it useful to read more in the discussion about the implications of the findings (e.g., why is it important to know that the confidence bias is state-dependent; given the effect size of the association between changes in confidence and symptoms, is the state-trait dichotomy the right framework for interpreting these results; suggestions for follow-up studies to better understand the association).

      Response: Thank you for this comment. We have elaborated on the implications of our findings in the Discussion, including the relevance of the state-trait dichotomy to future research and how more intensive, repeated testing may inform our understanding of the state-like nature of metacognition (Discussion, Limitations and Future Directions, page 10, line 378-380: “More intensive, repeating testing in future studies may also reveal the temporal window at which metacognition has the propensity to change, which could be more momentary in nature.”).

      Reviewer #3 (Public Review):

      […] I think these findings are exciting because they directly relate to one of the big assumptions when relating cognition to mental health - are we measuring something that changes with treatment (is malleable), so might be mechanistically relevant, or even useful as a biomarker?

      This work is also useful in that it replicates a finding of heightened confidence in those with compulsivity, and lowered confidence in those with elevated anxious-depression.

      One caveat to the interest of this work is that it doesn't allow any causal conclusions to be drawn, and only measures two timepoints, so it's hard to tell if changes in confidence might drive treatment effects (but this would be another study). The authors do mention this in the limitations section of the paper.

      Another caveat is the small sample in the antidepressant group.

      Some thoughts I had whilst reading this paper: to what extent should we be confident that the changes are not purely due to practice? I appreciate there is a relationship between improvement in symptoms and confidence in the iCBT group, but this doesn't completely rule out a practice effect (for instance, you can imagine a scenario in which those whose symptoms have improved are more likely to benefit from previously having practiced the task).

      Response: We thank the reviewer for commenting on the implications of our findings and we agree with the caveats listed. We thank the reviewer for raising this point about practice effects. A key thing to note is that this task does not have a learning element with respect to the core perceptual judgement (i.e., accuracy), which is the target of the confidence judgment itself. While there is a possibility of increased familiarity with the task instructions and procedures with repeated testing, the task is designed to adjust the difficulty to account of any improvements, so accuracy is stable. We see that we may not have made this clear in some of our language around accuracy vs. perceptual difficulty and have edited the Results to make this distinction clearer (Treatment Findings: iCBT, pages 4-5, lines 184-189: “Although overall accuracy remained stable due to the staircasing procedure, participants’ ability to detect differences between the visual stimuli improved. This was reflected as the overall increase in task difficulty to maintain the accuracy rates from baseline (dot difference: M=41.82, SD=11.61) to follow-up (dot difference: M=39.80, SD=12.62), (b=-2.02, SE=0.44, p<0.001, r2\=0.01)”.)

      However, it is true that there can be a ‘practice’ effect in the sense that one may feel more confident (despite the same accuracy level) due to familiarity with a task. One reason we do not subscribe to the proposed explanation for the link between anxious-depression change and confidence change is that the other major aspect of behaviour that improved with practice did so in a manner unrelated to clinical change. As noted above in the quoted text, participants’ discrimination improved from baseline to follow-up, reflected in the need for higher difficulty level to maintain accuracy around 70%. Crucially, this was not associated with symptom change. This speaks against a general mechanism where symptom improvement leads to increased practice effects in general. Only changes in confidence specifically are associated with improved symptoms. We have provided more detail on this in the Discussion (page 9, lines 324-326: “This association with clinical improvements was specific to metacognitive changes, and not changes in task performance, suggesting that changes in confidence do not merely reflect greater task familiarity at follow-up.”).

      Relatedly, to what extent is there a role for general task engagement in these findings? The paper might be strengthened by some kind of control analysis, perhaps using (as a proxy for engagement) the data collected about those who missed catch questions in the questionnaires.

      Response: Thank you for your comment. We included the details of data quality checks in the Supplement. Given the small number of participants that failed more than one attention checks (1% of the iCBT arm) and that all those participants passed the task exclusion criteria, we made the decision to retain these individuals for analyses. We have since examined if excluding these small number of individuals impacts our findings. Excluding those that failed more than one catch item did not affect the significance of results, which has now been added to the Supplementary Information (Data Quality Checks: Task and Clinical Scales, page 5, lines 181-185: “Additionally, excluding those that failed more than one catch item in the iCBT arm did not affect the significance of results, including the change in confidence (b=0.16, SE=0.02, p<0.001), change in anxious-depression (b=-0.32, SE=0.03, p<0.001), and the association between change in confidence and change in anxious-depression (r(638)=-0.10, p=0.011)”).

      I was also unclear what the findings about task difficulty might mean. Are confidence changes purely secondary to improvements in task performance generally - so confidence might not actually be 'interesting' as a construct in itself? The authors could have commented more on this issue in the discussion.

      Response: Thank you for this comment and sorry it was not clear in the original paper. As we discussed in a prior reply, accuracy – i.e. proportion of correct selections (the target of confidence judgements) are different from the difficulty of the dot discrimination task that each person receives on a given trial. We had provided more details on task difficulty in the Supplement. Accuracy was tightly controlled in this task using a ‘two-down one-up’ staircase procedure, in which equally sized changes in dot difference occurred after each incorrect response and after two consecutive correct responses. The task is more difficult when the dot difference between stimuli is lower, and less difficult when the dot difference between stimuli is greater. Therefore, task difficulty refers to the average dot difference between stimuli across trials. Crucially, task accuracy did not change from baseline to follow-up, only task difficulty. Moreover, changes in task difficulty were not associated with changes in anxious-depression, while changes in confidence were, indicating confidence is the clinically relevance construct for change in symptoms.

      We appreciate that this may not have been clear from the description in the main manuscript, and have added more detail on task difficulty to the Methods (Metacognition Task, page 14, lines 540-542: “Task difficulty was measured as the mean dot difference across trials, where more difficult trials had a lower dot difference between stimuli.”) and Results (Treatment Findings: iCBT, pages 4-5, lines 184-186: “Although overall accuracy remained stable due to the staircasing procedure, participants’ ability to detect differences between the visual stimuli improved.”). We have also elaborated more on how improvements in symptoms are associated with change in confidence, not task performance in the Discussion (page 9, lines 324-326: “This association with clinical improvements was specific to metacognitive changes, and not changes in task performance, suggesting that changes in confidence do not merely reflect greater task familiarity at follow-up”).

      To make code more reproducible, the authors could have produced an R notebook that could be opened in the browser without someone downloading the data, so they could get a sense of the analyses without fully reproducing them.

      Response: Thank you for your comment. We appreciate that an R notebook would be even better than how we currently share the data and code. While we will consider using Notebooks in future, we checked and converting our existing R script library into R Notebooks would require a considerable amount of reconfiguration that we cannot devote the time to right now. We hope that nonetheless the commitment to open science is clear in the extensive code base, commenting and data access we are making available to readers.

      Rather than reporting full study details in another publication I would have found it useful if all relevant information was included in a supplement (though it seems much of it is). This avoids situations where the other publication is inaccessible (due to different access regimes) and minimises barriers for people to fully understand the reported data.

      Response: We agree this is good practice – the Precision in Psychiatry study is very large, with many irrelevant components with respect to the present study (Lee et al., BMC Psychiatry, 2023). For this reason, we tried to provide all that was necessary and only refer to the Precision in Psychiatry study methods for fine-grained detail. Upon review, the only thing we think we omitted that is relevant is information on ethical approval in the manuscript, which we have now added (Methods, Participants, page 11, lines 412-417: “Further details of the PIP study procedures that are not specific to this study can be found in a prior publication (21). Ethical approval for the PIP study was obtained from the Research Ethics Committee of School of Psychology, Trinity College Dublin and the Northwest-Greater Manchester West Research Ethics Committee of the National Health Service, Health Research Authority and Health and Care Research Wales”). If any further information is lacking, we are happy to include it here also.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      She et al studied the evolution of gene expression reaction norms when individuals colonise a new environment that exposes them to physiologically challenging conditions. Their objective was to test the "plasticity first" hypothesis, which suggest that traits that are already plastic (their value changes when facing a new environment compared to the original environment) facilitates the colonisation of novel environments, which, if true, would be predicted to result in the evolution of gene expression values that are similar in the population that colonised the new environment and evolved under these particular selection pressures. To test this prediction, they studied gene expression in cardiac and muscle tissues in individuals originating from three conditions: lowland individuals in their natural environment (ancestral state), lowland individuals exposed to hypoxia (the plastic response state), and a highland population facing hypoxia for several generations (the coloniser state). They classified gene expression patterns as maladaptive or adaptive in lowland individuals responding to short term hypoxia by classifying gene expression patterns using genes that differed between the ancestral state (lowland) and colonised state (highland). Genes expressed in the same direction in lowland individuals facing hypoxia (the plastic state) as what is found in the colonised state are defined as adaptative, while genes with the opposite expression pattern were labelled as maladaptive, using the assumption that the colonised state must represent the result of natural selection. Furthermore, genes could be classified as representing reversion plasticity when the expression pattern differed between the plasticity and colonised states and as reinforcement when they were in the same direction (for example more expressed in the plastic state and the colonised state than in the ancestral state). They found that more genes had a plastic expression pattern that was labelled as maladaptive than adaptive. Therefore, some of the genes have an expression pattern in accordance with what would be predicted based on the plasticity-first hypothesis, while others do not.

      Thank you for a precise summary of our work. We appreciate the very encouraging comments recognizing the value of our work. We have addressed concerns from the reviewer in greater detail below.

      Q1. As pointed out by the authors themselves, the fact that temperature was not included as a variable, which would make the experimental design much more complex, misses the opportunity to more accurately reflect the environmental conditions that the colonizer individuals face at high altitude. Also pointed out by the authors, the acclimation experiment in hypoxia lasted 4 weeks. It is possible that longer term effects would be identifiable in gene expression in the lowland individuals facing hypoxia on a longer time scale. Furthermore, a sample size of 3 or 4 individuals per group depending on the tissue for wild individuals may miss some of the natural variation present in these populations. Stating that they have a n=7 for the plastic stage and n= 14 for the ancestral and colonized stages refers to the total number of tissue samples and not the number of individuals, according to supplementary table 1.

      We shared the same concerns as the reviewer. This is partly because it is quite challenging to bring wild birds into captivity to conduct the hypoxia acclimation experiments. We had to work hard to perform acclimation experiments by taking lowland sparrows in a hypoxic condition for a month. We indeed have recognized the similar set of limitations as the review pointed out and have discussed the limitations in the study, i.e., considering hypoxic condition alone, short time acclimation period, etc. Regarding sample sizes, we have collected cardiac muscle from nine individuals (three individuals for each stage) and flight muscle from 12 individuals (four individuals for each stage). We have clarified this in Supplementary Table 1.

      Q2. Finally, I could not find a statement indicating that the lowland individuals placed in hypoxia (plastic stage) were from the same population as the lowland individuals for which transcriptomic data was already available, used as the "ancestral state" group (which themselves seem to come from 3 populations Qinghuangdao, Beijing, and Tianjin, according to supplementary table 2) nor if they were sampled in the same time of year (pre reproduction, during breeding, after, or if they were juveniles, proportion of males or females, etc). These two aspects could affect both gene expression (through neutral or adaptive genetic variation among lowland populations that can affect gene expression, or environmental effects other than hypoxia that differ in these populations' environments or because of their sexes or age). This could potentially also affect the FST analysis done by the authors, which they use to claim that strong selective pressure acted on the expression level of some of the genes in the colonised group.

      The reviewer asked how individual tree sparrows used in the transcriptomic analyses were collected. The individuals used for the hypoxia acclimation experiment and represented the ancestral lowland population were collected from the same locality (Beijing) and at the same season (i.e., pre-breeding) of the year. They are all adults and weight approximately 18g. We have clarified this in the Supplementary Table S1 and Methods. We did not distinguish males from females (both sexes look similar) under the assumption that both sexes respond similarly to hypoxia acclimation in their cardiac and flight muscle gene expression.

      The Supplementary Table 2 lists the individuals that were used for sequence analyses. These individuals were only used for sequence comparisons but not for the transcriptomic analyses. The population genetic structure analyzed in a previously published study showed that there is no clear genetic divergence within the lowland population (i.e., individuals collected from Beijing, Tianjing and Qinhuangdao) or the highland population (i.e., Gangcha and Qinghai Lake). In addition, there was no clear genetic divergence between the highland and lowland populations (Qu et al. 2020).

      Q4. Impact of the work

      There has been work showing that populations adapted to high altitude environments show changes in their hypoxia response that differs from the short-term acclimation response of lowland population of the same species. For example, in humans, see Erzurum et al. 2007 and Peng et al. 2017, where they show that the hypoxia response cascade, which starts with the gene HIF (Hypoxia-Inducible Factor) and includes the EPO gene, which codes for erythropoietin, which in turns activates the production of red blood cell, is LESS activated in high altitude individuals compared to the activation level in lowland individuals (which gives it its name). The present work adds to this body of knowledge showing that the short-term response to hypoxia and the long term one can affect different pathways and that acclimation/plasticity does not always predict what physiological traits will evolve in populations that colonize these environments over many generations and additional selection pressure (UV exposure, temperature, nutrient availability). Altogether, this work provides new information on the evolution of reaction norms of genes associated with the physiological response to one of the main environmental variables that affects almost all animals, oxygen availability. It also provides an interesting model system to study this type of question further in a natural population of homeotherms.

      Erzurum, S. C., S. Ghosh, A. J. Janocha, W. Xu, S. Bauer, N. S. Bryan, J. Tejero et al. "Higher blood flow and circulating NO products offset high-altitude hypoxia among Tibetans." Proceedings of the National Academy of Sciences 104, no. 45 (2007): 17593-17598.

      Peng, Y., C. Cui, Y. He, Ouzhuluobu, H. Zhang, D. Yang, Q. Zhang, Bianbazhuoma, L. Yang, Y. He, et al. 2017. Down-regulation of EPAS1 transcription and genetic adaptation of Tibetans to high-altitude hypoxia. Molecular biology and evolution 34:818-830.

      Thank you for highlighting the potential novelty of our work in light of the big field. We found it very interesting to discuss our results (from a bird species) together with similar findings from humans. In the revised version of manuscript, we have discussed short-term acclimation response and long-term adaptive evolution to a high-elevation environment, as well as how our work provides understanding of the relative roles of short-term plasticity and long-term adaptation. We appreciate the two important work pointed out by the reviewer and we have also cited them in the revised version of manuscript.

      Reviewer #2 (Public Review):

      This is a well-written paper using gene expression in tree sparrow as model traits to distinguish between genetic effects that either reinforce or reverse initial plastic response to environmental changes. Tree sparrow tissues (cardiac and flight muscle) collected in lowland populations subject to hypoxia treatment were profiled for gene expression and compared with previously collected data in 1) highland birds; 2) lowland birds under normal condition to test for differences in directions of changes between initial plastic response and subsequent colonized response. The question is an important and interesting one but I have several major concerns on experimental design and interpretations.

      Thank you for a precise summary of our work and constructive comments to improve this study. We have addressed your concerns in greater detail below.

      Q1. The datasets consist of two sources of data. The hypoxia treated birds collected from the current study and highland and lowland birds in their respective native environment from a previous study. This creates a complete confounding between the hypoxia treatment and experimental batches that it is impossible to draw any conclusions. The sample size is relatively small. Basically correlation among tens of thousands of genes was computed based on merely 12 or 9 samples.

      We appreciate the critical comments from the reviewer. The reviewer raised the concerns about the batch effect from birds collected from the previous study and this study. There is an important detail we didn’t describe in the previous version. All tissues from hypoxia acclimated birds and highland and lowland birds have been collected at the same time (i.e., Qu et al. 2020). RNA library construction and sequencing of these samples were also conducted at the same time, although only the transcriptomic data of lowland and highland tree sparrows were included in Qu et al. (2020). The data from acclimated birds have not been published before.

      In the revised version of manuscript, we also compared log-transformed transcript per million (TPM) across all genes and determined the most conserved genes (i.e., coefficient of variance ≤  0.3 and average TPM ≥ 1 for each sample) for the flight and cardiac muscles, respectively (Hao et al. 2023). We compared the median expression levels of these conserved genes and found no difference among the lowland, hypoxia-exposed lowland, and highland tree sparrows (Wilcoxon signed-rank test, P<0.05). As these results suggested little batch effect on the transcriptomic data, we used TPM values to calculate gene expression level and intensity. This methodological detail has been further clarified in the Methods and we also provided a new supplementary Figure (Figure S5) to show the comparative results.

      The reviewer also raised the issue of sample size. We certainly would have liked to have more individuals in the study, but this was not possible due to the logistical problem of keeping wild bird in a common garden experiment for a long time. We have acknowledged this in the manuscript. In order to mitigate this we have tested the hypothesis of plasticity following by genetic change using two different tissues (cardiac and flight muscles) and two different datasets (co-expressed gene-set and muscle-associated gene-set). As all these analyses show similar results, they indicate that the main conclusion drawn from this study is robust.

      Q2. Genes are classified into two classes (reversion and reinforcement) based on arbitrarily chosen thresholds. More "reversion" genes are found and this was taken as evidence reversal is more prominent. However, a trivial explanation is that genes must be expressed within a certain range and those plastic changes simply have more space to reverse direction rather than having any biological reason to do so.

      Thank you for the critical comments. There are two questions raised we should like to address them separately. The first concern centered on the issue of arbitrarily chosen thresholds. In our manuscript, we used a range of thresholds, i.e., 50%, 100%, 150% and 200% of change in the gene expression levels of the ancestral lowland tree sparrow to detect genes with reinforcement and reversion plasticity. By this design we wanted to explore the magnitudes of gene expression plasticity (i.e., Ho & Zhang 2018), and whether strength of selection (i.e., genetic variation) changes with the magnitude of gene expression plasticity (i.e., Campbell-Staton et al. 2021).

      As the reviewer pointed out, we have now realized that this threshold selection is arbitrarily. We have thus implemented two other categorization schemes to test the robustness of the observation of unequal proportions of genes with reinforcement and reversion plasticity. Specifically, we used a parametric bootstrap procedure as described in Ho & Zhang (2019), which aimed to identify genes resulting from genuine differences rather than random sampling errors. Bootstrap results suggested that genes exhibiting reversing plasticity significantly outnumber those exhibiting reversing plasticity, suggesting that our inference of an excess of genes with reversion plasticity is robust to random sampling errors. We have added these analyses to the revised version of manuscript, and provided results in the Figure 2d and Figure 3d.

      In addition, we adapted a bin scheme (i.e., 20%, 40% and 60% bin settings along the spectrum of the reinforcement/reversion plasticity). These analyses based on different categorization schemes revealed similar results, and suggested that our inference of an excess of genes with reversion plasticity is robust. We have provided these results in the Supplementary Figure S2 and S4.

      The second issue that the reviewer raised is that the plastic changes simply have more space to reverse direction rather than having any biological reason to do so. While a causal reason why there are more genes with expression levels being reversed than those with expression levels being reinforced at the late stages is still contentious, increasingly many studies show that genes expression plasticity at the early stage may be functionally maladapted to novel environment that the species have recently colonized (i.e., lizard, Campbell-Staton et al. 2021; Escherichia coli, yeast, guppies, chickens and babblers, Ho and Zhang 2018; Ho et al. 2020; Kuo et al. 2023). Our comparisons based on the two genesets that are associated with muscle phenotypes corroborated with these previous studies and showed that initial gene expression plasticity may be nonadaptive to the novel environments (i.e., Ghalambor et al. 2015; Ho & Zhang 2018; Ho et al. 2020; Kuo et al. 2023; Campbell-Staton et al. 2021).

      Q3. The correlation between plastic change and evolved divergence is an artifact due to the definitions of adaptive versus maladaptive changes. For example, the definition of adaptive changes requires that plastic change and evolved divergence are in the same direction (Figure 3a), so the positive correlation was a result of this selection (Figure 3d).

      The reviewer raised an issue that the correlation between plastic change and evolved divergence is an artifact because of the definition of adaptive versus maladaptive changes, for example, Figure 3d. We agree with the reviewer that the correlation analysis is circular because the definition of adaptive and maladaptive plasticity depends on the direction of plastic change matched or opposed that of the colonized tree sparrows. We have thus removed previous Figure 3d-e and related texts from the revised version of manuscript. Meanwhile, we have changed Figure 3a to further clarify the schematic framework.

      Reviewer #1 (Recommendations For The Authors):

      Q1. Here are private recommendations that I think could help improve the manuscript. West-Eberhard was a pioneer back in 2003 in explicating the hypothesis of "plasticity first". I think it is important to cite their main work in the first paragraph of introduction and to use the term "plasticity-first", which is widely known among evolutionary biologists studying phenotypic plasticity, instead of "plasticity followed by genetic change", since the three papers cited in paragraph 1 call it « plasticity first ».

      West-Eberhard, M.J. (2003) Developmental Plasticity and Evolution, Oxford University Press.

      Thank you for suggesting West-Eberhard (2003) and we have cited this important work. We have also changed “plasticity followed by genetic change” to “plasticity first”.

      Q2. Introduction. Line 5, Change for « On the one hand, if plasticity changes ... »

      We have modified as suggested.

      Q3. Line 52, Change for « ...same direction as adaptive evolution does ...»

      We have modified as suggested.

      Q4. Line 66,When presenting papers that address the plasticity and evolution of gene expression in response to environmental variables, paper by Morris et al is another example that could be useful to include (but this is only a suggestion in case the authors missed it).

      Thank you for suggesting this nice work. We have cited Morris et al. (2014).

      Q5. Line 94, Change for "We acclimated"

      We have modified as suggested.

      Q6. In Figure 3, the figure in panel A and B is labelled "normaxia", but I think that "normoxia" is usually the term used.

      Thank you for spot the typo. We have modified Figure 3a and we no longer used the term “normaxia”.

      Material and methods

      It would be important to merge supplementary table 1 and 2 and only present the individuals that were used with their respective cardiac and muscle libraries (if they come from the same individual?). Also, the origin of the individuals used in the hypoxia experiment should be explained at the beginning of the methods section and explicated in the supplementary table. Information on sex or stage of development (juvenile? Adult? Male? female?) and time of year (in breeding stage? Pre-migration (if any), etc) would allow the reader to see that individuals from lowland differed only in their exposure to hypoxia or not, or if other variables may affect gene expression patterns. Similarly, if all individuals form the highland are males and the lowland hypoxia exposed individuals are females (or juveniles versus breeders, or different time of year, etc) this should be stated in the methods. Gene expression is labile so the reader should know if other variables influence the results presented or not.

      Thank you for suggestion. We have added detailed information (i.e., age, collecting time and season) to the supplementary Table 1. We have also added this information to the Methods. Because the birds used in transcriptomic analysis (Supplementary Table 1) were different individuals from those used in the sequence analyses (Supplementary Table 2), these two tables cannot be merged.

      References:

      Campbell-Staton SC, Velotta JP, Winchell KM. 2021. Selection on adaptive and maladaptive genes expression plasticity during thermal adaptation to urban heat islands. Nat. Commun. 12: 6195.

      Ghalambor CK, Hoke KL, Ruell EW, Fischer EK, Reznick DN, Hughes KA. 2015. Non-adaptive plasticity potentiates rapid adaptive evolution of gene expression in nature. Nature 525:372–375.

      Hao et al. 2023. Divergent contributions of coding and noncoding sequences to initial high-altitude adaptation in passerine birds endemic to the Qinghai–Tibet Plateau. Mol. Ecol. Doi: 10.1111/mec.16942.

      Ho WC, Zhang J. 2018. Evolutionary adaptations to new environments generally reverse plastic phenotypic changes. Nat. Commun. 9: 350.

      Ho WC, Zhang J. 2019. Genetic gene expression changes during environmental adaptations tend to reverse plastic changes even after correction for statistical nonindependence. Mol. Biol. Evol. 36: 604–612.

      Ho WC, Li D, Zhu Q, Zhang J. 2020. Phenotypic plasticity as a long-term memory easing readaptations to ancestral environments. Sci. Adv. 6: eaba3388.

      Kuo KC, Yao CT, Liao BY, Weng MP, Dong F, Hsu YC, Hung CM. 2023. Weak gene-gene interaction facilitates the evolution of gene expression plasticity. BMC Biol. 21: 57.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      I would recommend the authors check the results section, it seems to me that the first two paragraphs are not results, but methods.

      We would like to express our appreciation to both reviewers for bringing this to our attention. Indeed, we discussed this in detail, but decided that because the methods come after the results section. We believe that providing the basic methodological approach to readers before the results is essential for better comprehension. Once again, we sincerely thank the reviewers for their valuable feedback, however, we would prefer to leave this part as it is.

      In Figure 3B, why there is not male and female shown in different lines, as in the rest of figures? I recommend following the same pattern everywhere.

      Has been changed accordingly, and the respective sex-specific lines were also added to Figure 4.

      I recommend checking carefully all the articles included in Table 2. Maybe some of the included information here is not precise.

      We thank the reviewer for highlighting this. We carefully checked the articles again, and made some small adjustments.

      In Material and methods: just note that when ages are estimated, usually there is a variable accounting for the amount of estimated years, that should be included in the model, and see that it has no effect on the dependent variable. I recommend including this variable.

      We sincerely appreciate the helpful comment from the reviewer, which we have carefully considered and implemented in our manuscript. However, we would like to highlight that addressing age estimation error is complex, as it involves measurement error. Thus, simply adding it as an independent variable may not fully capture its potential impact, as the effect may be positive or negative depending on the individual. Hence, the potential effect would be better accounted for by the implementation of individual random intercepts and smooths to adjust the confidence intervals, which is part of our model structures. Furthermore, we would like to emphasize that we have also conducted analyses on a reduced dataset that only included zoo-born individuals with precisely known birthdates, and the results remained consistent. So instead of changing our analyses, we now emphasize how our approach also addresses this aspect.

      Creatinine: Is there any other reference, more recent and in English, to complement the original one cited?

      We have now supplemented the original citation with an additional English citation: Anestis et al. 2009.

      Reviewer #2 (Recommendations For The Authors):

      Minor corrections

      Please, in Study population, the citation of table 2 is in fact Table 3. For table 3 (in Methodology), please provide the units Body weight having a mean of 32.4, has it a median of 9 ?

      Please, provide results separately for males and females

      We changed the table as requested, though the table only reports sample sizes and thus only numbers without units. The values for body weight are accurate.

      In Results

      The two first paragraphs have to be included in methods and structured with those already present.

      We would like to express our appreciation to both reviewers for bringing this to our attention. Indeed, we discussed this in detail, but decided that because the methods come after the results section, we believe that providing the basic methodological approach to readers before the results is essential for better comprehension. Once again, we sincerely thank the reviewers for their valuable feedback, however, we would prefer to leave this part as it is.

      In Table 1, indicate what 'Est' means.

      Has been changed accordingly

    1. Author Response

      Reviewer #1 (Public Review):

      The cerebral cortex, or surface of the brain, is where humans do most of their conscious thinking. In humans, the grooves (sulci) and bumps (convolutions) have a particular pattern in a region of the frontal lobe called Broca's area, which is important for language. Specialists study features imprinted on the internal surfaces of braincases in early hominins by casting their interiors, which produces so-called endocasts. A major question about hominin brain evolution concerns when, where, and in which fossils a humanlike Broca's area first emerged, the answer to which may have implications for the emergence of language. The researchers used advanced imaging technology to study the endocast of a hominin (KNM-ER 3732) that lived about 1.9 million years ago (Ma) in Kenya to test a recently published hypothesis that Broca's remained primitive (apelike) prior to around 1.5 Ma. The results are consistent with the hypothesis and raise new questions about whether endocasts can be used to identify the genus and/or species of fossils.

      We would like to thank Rev. 1 for their comments on our paper.

      Reviewer #2 (Public Review):

      The authors tried to support the hypothesis that early Homo still had a primitive condition of Broca's cap (the region in fossil endocasts corresponding to Broca's area in the brain), being more similar to the condition in chimpanzees than in humans. The evidence from the described individual points to this direction but there are some flaws in the argumentation.

      We are grateful to Rev. 2 for their comments, although we partially agree with some of them.

      First, we would like to rectify the statement of Rev. 2 that we “tried to support the hypothesis that early Homo still had a primitive condition of Broca's cap”, indeed, our aim was to test this hypothesis and not to try to validate it.

      First, only one human and one chimpanzee were used for comparison, although we know that patterns of brain convolutions (and in addition how they leave imprints in the endocranial bones) are very variable.

      We understand the point raised by Rev. 2 about the variation of brain convolutions in humans and chimpanzees. We used atlases published by Connolly (1950), Falk et al. (2018) and de Jager et al. (2019, 2022) to analyse the endocast of KNM-ER 3732 and compare it to the extant human and chimpanzee cerebral conditions. However, in Figure 2, for the sake of clarity only two Homo and Pan specimens were used to illustrate the comparison (as it has been done in other published papers, e.g., Carlson et al., 2011; Science, Gunz et al., 2020 Sci Adv). In the revised version, we modified the manuscript to explain further our approach (line 156) “We used brain and endocast atlases published in Connolly (1950), Falk et al. (2018) and de Jager et al. (2019, 2022; see also www.endomap.org) for comparing the pattern identified in KNM-ER 3732 to those described in extant humans and chimpanzees. To the best of our knowledge, these atlases are the most extensive atlases of extant human and chimpanzee brains/endocasts available to date and are widely used in the literature to explore variability in sulcal patterns. In Figure 2, the extant human and chimpanzee conditions are illustrated by one extant human (adult female) and one extant chimpanzee (adult female) specimens from the Pretoria Bone Collection at the University of Pretoria (South Africa) and in the Royal Museum for Central Africa in Tervuren (Belgium), respectively (Beaudet et al., 2018).”.

      Second, the evidence from this fossil specimen adds to the evidence of previously describe individuals but still not yet fully prove the hypothesis.

      We tempered our discussion by concluding that (line 116) “Overall, the present study not only demonstrates that Ponce de León et al.’s (2021) hypothesis of a primitive brain of early Homo cannot be rejected, but also adds information […]”.

      Third, there is a vicious circle in using primitive and derived features to define a fossil species and then using (the same or different) features to argue that one feature is primitive or derived in a given species. In this case, we expect members of early Homo to be derived compared to their predecessors of the genus Australopithecus and that's why it seems intriguing and/or surprising to argue that early Homo has primitive features. However, we should expect that there is some kind of continuum or mosaic in a time in which a genus "evolves into" another genus. This discussion requires far more discussions about the concepts we use, maybe less discussion about what is different between the two groups but more discussion about the evolutionary processes behind them.

      We fully agree with Rev. 2 on this aspect. We believe that identifying these differences/similarities between fossil and extant hominids constitute the first step of a better understanding of the evolutionary mechanisms. Our work suggests indeed a certain continuity between genera and raises questions on the genus concept and how to interpret the specimens currently attributed to early Homo. In the revised version of the manuscript we included a reference to this possible scenario (line 134): “[…] or to the absence of a definite threshold between the two genera based on the morphoarchitecture of their endocasts (Wood and Collard, 1999).”.

      Fourth, the data of convolutional imprints presented are rather subjective when identifying which impressions represent which brain convolutions. Not seeing an impression does not necessarily mean that the corresponding brain feature did not exist. Interestingly, the manuscript does not mention and discuss at all the frontoorbital sulcus. This is a sulcus that usually runs from the orbital surface of the frontal lobe up to divide the inferior frontal gyrus in chimpanzees, a condition totally different than in humans who do not have a frontoorbital sulcus. Could such a sulcus be identified, this would provide a far more convincing argument for a primitive condition in this specimen. In Australopithecus sediba, e.g., the condition in this region seems to be a mosaic in which some aspects of the morphology seem to be more modern while one of the sulcual impressions can well be interpreted as a short frontoorbital sulcus. For this specimen, by the way, I would come back to my third point above: some experts in the field might argue that this specimen could belong to Homo rather than Australopithecus...

      We agree that the presence of a fronto-orbital sulcus would be more conclusive. However, this sulcus has not been identified in KNM-ER3732 and the region in which we would expect to find it is not preserved. As demonstrated by Ponce de León et al. (2021), because of the topographic relationships between sulci (and cranial structures), it is possible to interpret imprints on endocasts and the evolutionary polarity of some traits even in the absence of landmarks such as the fronto-orbital sulcus. In Australopithecus sediba the main derived feature of the endocast corresponds to the ventrolateral bulge in the left inferior frontal gyrus, and not to the sulcal pattern itself (Carlson et al., 2011 Science). However, the discussion around the taxonomic status of this taxon confirms the urgent need for reconsidering specimens from that time period and clarifying the mosaic-like or concerted evolution of the derived Homo-like traits within our lineage. Regarding the subjective nature of this approach, we invite readers to examine the specimen on MorphoSource (https://www.morphosource.org/concern/media/000497752?locale=en) and to request access to the National Museums of Kenya to the physical or virtual specimen to falsify our hypothesis.

      According to my arguments above, I think that this manuscript might revive interesting discussions about this topic but it is not likely to settle them because the data presented are not strong enough to fully support the hypothesis.

      We would be more than happy to consider new/other specimens with similar chronological and geographical contexts and investigate further this hypothesis in the future.

      Reviewer #3 (Public Review):

      The authors provide a detailed analysis of the sulcal and sutural imprints preserved on the natural endocast and associated cranial vault fragments of the KNM-ER3732 early Homo specimen. The analyses indicate a primitive ape-like organization of this specimen's frontal cortex. Given the geological age of around 1.9 million years, this is the earliest well-documented evidence of a primitive brain organization in African Homo.

      In the discussion, the authors re-assess one of the central questions regarding the evolution of early Homo: was there species diversity, and if yes, how can we ascertain it? The specimen KNM-ER1470 has assumed a central role in this debate because it purportedly shows a more advanced organization of the frontal cortex compared to other largely coeval specimens (Falk, 1983). However, as outlined in Ponce de León et al. 2021 (Supplementary Materials), the imprints on the ER1470 endocranium are unlikely to represent sulcal structures and are more likely to reflect taphonomic fracturing and distortion. Dean Falk, the author of the 1983 study, basically shares this view (personal communication). Overall, I agree with the authors that the hypothesis to be tested is the following: did early Homo populations with primitive versus derived frontal lobe organizations coexist in Africa, and did they represent distinct species?

      I greatly appreciate that the authors make available the 3D surface data of this interesting endocast.

      We are grateful to Rev. 3 for their comments and for contextualizing our finding. We would also like to point out that, although the 3D surface can be viewed on MorphoSource, permission from the National Museums of Kenya has to be requested for studying the specimen and getting access to the physical specimen and/or the 3D model.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their positive and constructive evaluations. Based upon the reviewers’ helpful comments, we have performed complementary experiments. In particular, we additionally show that:

      • a complete analysis of CXCR1/2 binding chemokines in the secretions of tissular CD8+ T cells reinforces the key role of CXCL8 in CD8+ T cell-induced fibrocyte chemotaxis (new panel D in Figure 2)

      • a direct contact between fibrocytes and CD8+ T cells triggers CD8+ T cell cytotoxicity against primary basal bronchial epithelial cells (new Figure 6)

      • the interaction between CD8+ T cells and fibrocytes is bidirectional, with CD8+ T cells triggering the development of fibrocyte immune properties (new Figure 7)

      • the characteristic time to reach a stationary state reminiscent of a resolution of the COPD condition was estimated to be about 2.5 years using the simulations. Interfering with chemotaxis and adhesion processes by inhibiting CXCR1/2 and CD54, respectively was not sufficient to reverse the COPD condition, as predicted by the mathematical model (new Figure 9)

      • the massive proliferation effect induced by fibrocytes is specific to CD8+ T cells and not CD4+ T cells (new Figure 3-figure supplement 2), and that fibrocytes moderately promote the death of unactivated CD8+ T cells in direct co-culture (new Figure 3-figure supplement 3)

      We have graphically summarized our findings (new Figure 10) suggesting the existence of a positive feedback loop playing a role in the vicious cycle that promotes COPD. A new table describing patient characteristics for basal bronchial epithelial cell purification has also been added (new Supplementary File 9), the Supplementary Files 7 and S8 have been up-dated to take into account the new experiments.

      The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium (http://proteomecentral.proteomexchange.org) via the PRIDE partner repository with the dataset identifier PXD041402.  

      Reviewer #1 (Recommendations For The Authors):

      The experimental approaches are all rationally designed and the data clearly presented, with appropriate analyses and sample sizes. I could find no technical or interpretative concerns. The interrelationship between the observational data (histology) with the quantitative live cell imaging and the follow-on functional investigations is especially laudable. The data nicely unifies several years of accumulated data regarding the (separate) participation of CD8 T cells and fibrocytes in COPD.

      We thank the reviewer for his/her comments.

      I have only minor comments:

      1) Line 79: The observation that T cells may influence fibrocyte differentiation/function was initially made some years earlier by Abe et al (J Immunol 2001; 7556), and should be cited in addition to the follow-on work of Niedermeyer.

      This reference has been added to acknowledge this seminal work.

      2) Line 632: Corticosteroids originate from the cortex of the adrenal gland. Budenoside and fluticasone are glucocorticoids, not corticosteroids.

      This mistake has been corrected in the discussion of the revised manuscript (see line 802 in the revised manuscript).

      3) Given the state of T cell immunotherapies, cytokine/chemokine antagonists, and emerging fibrocyte-targeted drugs, can the authors possibly speculate as to desired pathways to target therapeutically?

      Chemokine-receptor based therapies could be used to inhibit fibrocyte recruitment into the lungs, such as CXCR4 blockade. We have very recently shown that using the CXCR4 antagonist, plerixafor, alleviates bronchial obstruction and reduces peri-bronchial fibrocytes density (Dupin et al., 2023). Because CXCR4 expression in human fibrocytes is dependent on mTOR signaling and is inhibited by rapamycin in vitro (Mehrad et al., 2009), alternative strategies consisting of targeting fibrocytes via mTOR have been proposed. This target has proven effective in bronchiolitis obliterans, idiopathic pulmonary fibrosis, and thyroid-associated ophthalmopathy, using rapamycin (Gillen et al., 2013; Mehrad et al., 2009), sirolimus (Manjarres et al., 2023) or an insulin-like growth factor-1 (IGF-I) receptor blocking antibody (Douglas et al., 2020; Smith et al., 2017). Inhibiting mTOR is also expected to have effects on CD8+ T cells, ranging from an immunostimulatory effect by activation of memory CD8+ T-cell formation, to an immunosuppressive effect by inhibition of T cell proliferation (Araki et al., 2010). Last, chemokine-receptor base therapies could also include strategies to inhibit the CD8+-induced fibrocyte chemotaxis, such as dual CXCR1-CXCR2 blockade. We were able to test this latter strategy in our mathematical model, see response to point 6 of reviewer 2.

      Immunotherapies directly targeting the interaction between fibrocytes and CD8+ T cells could also be considered, such as CD86 or CD54 blockade. The use of abatacept and belatacept, that interfere with T cell co-stimulation, is effective in patients with rheumatoid arthritis (Pombo-Suarez & Gomez-Reino, 2019) and in kidney-transplant recipients (Vincenti et al., 2016), respectively. Targeting the IGF-I receptor by teprotumumab in the context of thyroid-associated ophthalmopathy also improved disease outcomes, possibly by altering fibrocyte-T cell interactions (Bucala, 2022; Fernando et al., 2021).

      We also tested this CD86 and CD54 blocking strategy for COPD treatment by simulations, see response to point 6 of reviewer 2.

      However, such therapies should be used with caution as they may favour adverse events such as infections, particularly in the COPD population (Rozelle & Genovese, 2007). Additionally, the fibrocytes-lymphocytes interaction has recently been shown to promote anti-tumoral immunity via the PD1-PDL1 immunological synapse (Afroj et al., 2021; Mitsuhashi et al., 2023). Therefore, care should be taken in the selection of patients to be treated and/or timing of treatment administration with regards to the increased risk of lung cancer in COPD patients.

      The discussion section has been altered accordingly.

      4) The authors may want to consider mentioning (and citing) recent insight into the immune-mediated fibrosis in thyroid-associated ophthalmopathy

      These important publications are now cited in a dedicated paragraph about the possible therapeutical interventions (see answer to point 3, and discussion in the revised manuscript).

      Reviewer #2 (Recommendations For The Authors):

      Specific comments

      1) The rationale for the selection of chemokines overexpressed by CD8+ T cells in COPD is based on literature data of n=2 patients per group. This is limited and risky. I am less concerned about false positives given the selection of chemokines and the available literature but am worried about the possibility that many chemokines may not have been selected based on insufficient power to do meaningful stats on this comparison. For example, many other CXCR1/2 binding CXCL chemokines exist and these could contribute to the migration effect in Fig 2C as well. Given the currently available single-cell resources it should be possible to extend these observations and to investigate CXCL chemokine expression in COPD CD8 T cells to the benefit of Fig 2A in full detail.

      We agree with the reviewer that the rationale for the selection of chemokines of interest could be reinforced by the analysis of supplementary single-cell resources. We used data from the COPD cell atlas (Gene Expression Omnibus GSE136831 (Sauler et al., 2022)) to perform such an analysis of chemokine expression by CD8+ CD103+ and CD8+ CD103- T cells. However, the expression level of all chemokines was globally very low, and was not different between control and COPD patients (see Author response image 1).

      Author response image 1.

      Expression of CXC chemokines in lung CD8+ CD103+ and CD8+ CD103- T cells from patients with COPD (n=18 independent samples) in comparison with healthy control subjects (n=29 independent samples) under resting conditions by Single-Cell RNA sequencing analysis (GEO accession GSE136831). The heatmaps show the normalized expression of genes (horizontal axes) encoding CXC chemokines. PF4=CXCL4, PPBP= CXCL7.

      The latter results are in discrepancy with those resulting from transcriptomic analysis of microarray data obtained on purified lung CD8+ CD103+ and CD8+ CD103- T cells, showing a significant level of chemokines expression (Hombrink et al., 2016), and a differential expression of CCL2, CCL26, CXCL2, CXCL8 and CCL3L1 between CD8+ T lymphocytes of control and COPD patients (Figure 2A in the revised manuscript). The reason for these differences is unclear, and could be attributed to biological differences (samples obtained from different patients) or, more likely, to differences in sample processing (cell sorting by flow cytometry for microarray analysis, that could activate minimally CD8+ cells) and/or methodological differences (differences of sensitivity between microarray and scRNA seq).

      Nevertheless, microarray data regarding CXCL8 expression are in good agreement with our in vitro experiments, showing an enhanced CXCL8 expression by CD8+ T cells purified from COPD lungs, in comparison with that of control subjects. In addition, the CXCL8 blocking antibody fully abrogates the increase of migration induced by secretion of COPD CD8+ T cells, to the same extent as the blocking of CXCR1/2 by reparixin. This suggests that this supplementary chemotaxis is mainly due to CXCL8 and not other CXCR1/2 binding CXCL chemokines, and correlates CXCL8 measurements to functional experiments. This precision has been now added in the results section of the revised version.

      2) Equally, it would strengthen the work if multiplex ELISA assays could be provided on the supernatants used in Fig 2D to provide a more comprehensive view of CXCR1/2 binding chemokines.

      In order to have a complete view of CXCR1/2 binding chemokines, we have now performed supplementary ELISA assays to measure the concentrations of CXCL1, 3, 5, 6 and 7, in addition of the measurements of CXCL2 and CXCL8 already presented in the previous version of the manuscript (Figure 2D). Results of these new assays are now presented in the revised version of Figure 2. Concentrations of CXCL1, 3, 5, 6 and 7 were unchanged between the control and COPD conditions.

      3) In the functional analyses, I missed information on the activation of the fibrocytes. Equally, the focus on CD8 T cells was mainly on proliferation in the functional work. RNAseq analyses on the cells, comparing CD8 T cells and fibrocytes, alone and in co-culture to each other would help to identify interaction patterns in comprehensive detail. Such an experiment would bolster the significance of the studies by providing impact analysis not only on the T cells beyond proliferation but by expanding on the effect of the interaction on the fibrocyte as well.

      Regarding the activation state of fibrocytes, we apologize if this was not clear: in our in vitro co-culture experiments, we chose not to activate the fibrocytes. This setting is in agreement with previous findings, demonstrating an antigen-independent T cell proliferation effect driven by fibrocytes (Nemzek et al., 2013), and it is now explicitly written in the results of the revised manuscript.

      Regarding the focus of the functional analyses:

      First, we have pushed forward the analysis of the consequences of the interaction beyond CD8+ T cells proliferation. In particular, having shown that fibrocytes promote CD8+ T cells expression of cytotoxic molecules such as granzyme B, we decided to investigate the cytotoxic capacity of CD8+ T cells against primary basal bronchial epithelial cells (see new Supplementary File 9 in the revised manuscript for patient characteristics).

      Direct co-culture with fibrocytes increased total and membrane expression of the cytotoxic degranulation marker CD107a, which was only significant in non-activated CD8+ T cells (see new Figure 6A-E in the revised manuscript). A parallel increase of cytotoxicity against primary epithelial cells was observed in the same condition (see new Figure 6F-H in the revised manuscript). This demonstrates that following direct interaction with fibrocytes, CD8+ T cells have the ability to kill target cells such as bronchial epithelial cells. This is now included in the results section of the revised manuscript.

      Second, we have now performed proteomic analyses on fibrocytes, alone or in co-culture during 6 days with CD8+ T cells either non-activated or activated (see new Figure 7A in the revised manuscript). Of the top ten pathways that were most significantly activated in co-cultured vs mono-cultured fibrocytes, largest upregulated genes were those of the dendritic cell maturation box, the multiple sclerosis signaling pathway, the neuroinflammation signaling pathway and the macrophage classical signaling pathway, irrespective of the activation state of CD8+ T cells (see new Figure 7B in the revised manuscript). The changes were globally identical in the two conditions of CD8+ T cell activation, with some upregulation more pronounced in the activated condition. They were mostly driven by up-regulation of a core set of Major Histocompatibility Complex class I (HLA-B, C, F) and II (HLA-DMB, DPA1, DPB1, DRA, DRB1, DRB3) molecules, co-simulatory and adhesion molecules (CD40, CD86 and CD54). Another notable proteomic signature was that of increased expression of IFN signaling-mediators IKBE and STAT1, and the IFN-responsive genes GBP2, GBP4 and RNF213. We also observed a strong downregulation of CD14, suggesting fibrocyte differentiation, and an upregulation of the matrix metalloproteinase-9 (MMP9) in the non-activated condition only. Altogether, these changes suggest that the interaction between CD8+ T cells and fibrocytes promotes the development of fibrocyte immune properties, which could subsequently impact the activation of CD4+ T cells activation.

      Up-regulated pathways identified in proteomic profile of fibrocytes co-cultured with CD8+ T cells are very consistent with a shift towards a proinflammatory phenotype rather than towards a reparative role. The activation of IFN-γ signaling could be triggered by CD8+ T cell secretion of IFN upon fibrocyte interaction, suggesting the existence of a positive feedback loop (see new Figure 10). Additionally, the priming of fibrocytes by CD8+ T cells could also induce CD4+ T cell activation.

      4) I suggest rewording the abstract to capture the main storyline and wording more. The abstract is good, but I see so many novelties in the paper that are not well sold in the abstract, particularly the modelling aspects.

      As suggested by the reviewer, we revised the abstract, as shown below and in the revised manuscript. The changes are indicated in red:

      Revised abstract:

      Bronchi of chronic obstructive pulmonary disease (COPD) are the site of extensive cell infiltration, allowing persistent contacts between resident cells and immune cells. Tissue fibrocytes interaction with CD8+ T cells and its consequences were investigated using a combination of in situ, in vitro experiments and mathematical modeling. We show that fibrocytes and CD8+ T cells are found in vicinity in distal airways and that potential interactions are more frequent in tissues from COPD patients compared to those of control subjects. Increased proximity and clusterization between CD8+ T cells and fibrocytes are associated with altered lung function. Tissular CD8+ T cells from COPD patients promote fibrocyte chemotaxis via the CXCL8-CXCR1/2 axis. Live imaging shows that CD8+ T cells establish short-term interactions with fibrocytes, that trigger CD8+ T cell proliferation in a CD54- and CD86-dependent manner, pro-inflammatory cytokines production, CD8+ T cell cytotoxic activity against bronchial epithelial cells and fibrocyte immunomodulatory properties. We defined a computational model describing these intercellular interactions and calibrated the parameters based on our experimental measurements. We show the model’s ability to reproduce histological ex vivo characteristics, and observe an important contribution of fibrocyte-mediated CD8+ T cell proliferation in COPD development. Using the model to test therapeutic scenarios, we predict a recovery time of several years, and the failure of targeting chemotaxis or interacting processes. Altogether, our study reveals that local interactions between fibrocytes and CD8+ T cells could jeopardize the balance between protective immunity and chronic inflammation in bronchi of COPD patients.

      5) The probabilistic model appears to suggest that reduced CD8 T cell death may also explain the increase in the pathology in COPD. Did the authors find that fibrocytes reduce cell death of the CD8 T cells?

      Taking advantage of the staining of CD8+ T cells with the death marker Zombie NIR™, we have quantified CD8+ T cell death in our co-culture assay. The presence of fibrocytes in the indirect co-culture assay did not affect CD8+ T cell death (see new Figure 3-figure supplement 3A-B in the revised manuscript). In direct co-culture, the death of CD8+ T cells was significantly increased in the non-activated condition but not in the activated condition (see new Figure 3-figure supplement 3C-D in the revised manuscript). Of note, these results are in agreement with a recent study showing the existence of CD8+ T cell-population-intrinsic mechanisms regulating cellular behavior, with induction of apoptosis to avoid an excessive increase in T cell population (Zenke et al., 2020). This is taken into account in our mathematical model by an increased probability p_(dC+) of dying when a CD8+ T cell is surrounded by many other T cells in its neighborhood. It also suggests that the reduced CD8+ T cell death evidenced in tissues from patients with COPD (Siena et al., 2011) might not be due to the specific interplay between fibrocyte and CD8+ T cells, but rather to a global pro-survival environment in COPD lungs.

      These new data have been described in the results section.

      6) Following the modeling in Figure 6, curiosity came to mind, which is how long it would take for the pathology to disappear if a drug would be applied to the patient. How much should the interactions be reduced and how long would it take to reach clinical benefit? Could such predictions be made? I understand that this may be outside the main message of the manuscript but perhaps this could be included in the discussion.

      This is a very interesting question, that we have addressed by performing additional simulations to investigate the outcomes of possible therapeutic interventions. First, we applied a COPD dynamics during 20 years, to generate the COPD state, that provide the basis for treatment implementation. Then, we applied a COPD dynamic during 7 years, that mimics the placebo condition (see new Figure 9A in the revised manuscript, and below), that we compared to a control dynamics (“Total inhibition”), that mimics an ideal treatment able to restore all cellular processes. As expected the populations of fibrocytes and CD8+ T cells, as well as the density of mixed clusters, decreased. These numbers reached levels similar of healthy subjects after approximately 2.5 years, and this time point can therefore be considered as the steady state (Figure 9B-E).

      Monitoring of the different processes revealed that these effects were mainly due to a reduction in fibrocyte-induced CD8+ T duplication, and a transient or more prolonged increase in basal fibrocyte and CD8+ T death (Figure 9C-D).

      Then, three possible realistic treatments were considered (Figure 9A). We tested the effect of directly inhibiting the interaction between fibrocytes and CD8+ T cells by blocking CD54. This was implemented in the model by altering the increased probability of a CD8+ T cell to divide when a fibrocyte is in its neighbourhood, as shown by the co-culture results (Figure 4). We also chose to reflect the effect of a dual CXCR1/2 inhibition by setting the displacement function of fibrocyte similar to that of control dynamics, in agreement with the in vitro experiments (Figure 2E). Blocking CD54 only slightly reduced the density of CD8+ T cells compared to the placebo condition, and had no effect on fibrocyte and mixed cluster densities (Figure 9B). CXCR1/2 inhibition was a little bit more potent on the reduction of CD8+ T cells than CD54 inhibition, and it also significantly decreased the density of mixed clusters (Figure 9B). As expected, this occurred through a reduction of fibrocyte-induced duplication, which was affected more strongly by CXCR1/2 blockage than by CD54 blockage (Figure 9C-E). Combining both therapies (CD54 and CXCR1/2 inhibition) did not strongly major the effects (Figure 9B-E). In all the conditions tested, the size of the fibrocyte population remained unchanged, suggesting that other processes such as fibrocyte death or infiltration should be targeted to expect broader effects.

      The results section has been altered accordingly.

      Using the simulations, we were also able to estimate the characteristic time to reach a stationary state reminiscent of a resolution of the COPD condition. This time of approximately 2.5 years was totally unpredictable by in vitro experiments, and indicates that a treatment aiming at restoring these cellular processes should be continued during several years to obtain significant changes.

      We have also investigated the outcomes of more realistic treatments, modifying specifically processes such as chemotaxis or targeting directly the intercellular interactions. The modification of parameters controlling these processes only slightly affected the final state, suggesting that such treatments may be more effective when used in combination with other drugs e.g. those affecting fibrocyte infiltration and/or death.

      The discussion section has been altered accordingly.

      Reviewer #3 (Recommendations For The Authors):

      1) Broader assessment of cell types in the lung: Staining for other cell types such as dendritic cells, CD4 cells, and interstitial macrophages, and comparing their proximity to fibrocytes with that of CD8 cells would better justify the CD8 focus.

      We agree with the reviewer that multiple stainings would have better justified the focus on CD8+ T cells. However, it is difficult to distinguish fibrocytes, dendritic cells and interstitial macrophages on the basis of immunohistochemistry, as we and others previously showed (Dupin et al., 2019; Mitsuhashi et al., 2015; Pilling et al., 2009). On the other hand, the study of Afroj et al. indicated the possible interaction between fibrocytes and CD8+ T cells in cancer context, with the induction of CD8+ T cell proliferation (Afroj et al., 2021). This T cell-costimulatory function of fibrocytes and CD8+ T cells was further confirmed in a very recent study, together with the antitumor effects of PD-L1 and VEGF blockade (Mitsuhashi et al., 2023). These data, along with the specific implication on CD8+ T cells in COPD, relying mainly on their abundance in COPD bronchi (O’Shaughnessy et al., 1997), their overactivation state (Roos-Engstrand et al., 2009), their cytotoxic phenotype (Freeman et al., 2010; Wang et al., 2020) and the protection against lung inflammation and emphysema induced by their depletion (Maeno et al., 2007) justified the CD8 focus.

      To further justify this focus, we have now performed co-culture between fibrocytes and CD4+ T cells, indicating that the massive fibrocyte-mediated proliferation was specific to CD8+ T cells (see answer to comment 3 below). This is in agreement with the results obtained with the simulations, showing that considering fibrocytes and CD8+ T cells only was sufficient to reproduce the spatial patterns in the bronchi of healthy and COPD patients. Altogether, we think that focusing on the CD8+ T cell-fibrocyte interplay was pertinent in the context of COPD. It does obviously not exclude the possibility of other interactions, that could be the focus of other studies.

      2) Transcriptomic analysis: Using n=2 and only showing the chemokines as well as selected adhesion receptor data narrows the focus but does not provide broader insights into the interactions. Using a more robust sample size and performing a comprehensive pathway analysis would represent an unbiased analysis to determine the most dysregulated pathways. Importantly, the authors could use a single-cell RNA-seq dataset to broadly assess the transcriptomes of several cell types in the lung (such as the data from (Sauler et al, Characterization of the COPD alveolar niche using single-cell RNA sequencing).

      This very pertinent suggestion has also been raised by reviewer 2, see our answer to comment 1 of reviewer 2, and below:

      We agree with the reviewer that the rationale for the selection of chemokines of interest could be reinforced by the analysis of supplementary single-cell resources. We used data from the COPD cell atlas (Gene Expression Omnibus GSE136831 (Sauler et al., 2022)) to perform such an analysis of chemokine expression by CD8+ CD103+ and CD8+ CD103- T cells. However, the expression level of all chemokines was globally very low, and was not different between control and COPD patients (see Figure scRNAseq, in the answer to comment 1 of reviewer 2).

      These latter results are in discrepancy with those resulting from transcriptomic analysis of microarray data obtained on purified lung CD8+ CD103+ and CD8+ CD103- T cells, showing a significant level of chemokines expression (Hombrink et al., 2016), and a differential expression of CCL2, CCL26, CXCL2, CXCL8 and CCL3L1 between CD8+ T lymphocytes of control and COPD patients (Figure 2A in the revised manuscript). The reason for these differences is unclear, and could be attributed to biological differences (samples obtained from different patients) or, more likely, to differences in sample processing (cell sorting by flow cytometry for microarray analysis, that could activate minimally CD8+ cells) and/or methodological differences (differences of sensitivity between microarray and scRNA seq).

      Nevertheless, microarray data regarding CXCL8 expression are in good agreement with our in vitro experiments, showing an enhanced CXCL8 expression by CD8+ T cells purified from COPD lungs, in comparison with that of control subjects. In addition, the CXCL8 blocking antibody fully abrogates the increase of migration induced by secretion of COPD CD8+ T cells, to the same extent as the blocking of CXCR1/2 by reparixin. This suggests that this supplementary chemotaxis is mainly due to CXCL8 and not other CXCR1/2 binding CXCL chemokines, and correlates CXCL8 measurements to functional experiments. This precision has been now added in the text of the revised version.

      3) Inclusion of control/comparison cell types in co-culture studies would help establish that CD8 cells are more relevant for interactions with fibrocytes than for example CD4 cells.

      We have now performed co-cultures between fibrocytes and CD4+ T cells, with the same settings than for CD8+ T cells. The results from these experiments show that fibrocytes did not have any significant effect of CD4+ T cells death, regardless of their activation state (see new Figure 3-figure supplement 2A-C in the revised manuscript, and below). Fibrocytes were able to promote CD4+ T cells proliferation in the activated condition but not in the non-activated condition (see new Figure 3-figure supplement 2A-D in the revised manuscript). Altogether this indicates that although fibrocyte-mediated effect on proliferation is not specific to CD8+ T cells, the amplitude of the effect is much larger on CD8+ T cells than on CD4+ T cells.

      These new data have been added in the results section.

      4) In vitro analysis of cells from non-COPD patients would also help assess whether the circulating cells from COPD patients have a level of baseline activation which promotes the vicious cycle but may not exist in healthy cells.

      Regarding circulating cells, the present study relies on the COBRA cohort (COhort of BRonchial obstruction and Asthma), which includes only asthma and COPD patients, and therefore does not grant access to healthy subjects’ blood samples (Pretolani et al., 2017). Unfortunately, we have no other ongoing study with healthy subjects that would allow us to retrieve blood for research, and fibrocytes can only be grown from freshly drawn blood samples. We agree with the reviewer that it is a limitation of our study, which is now acknowledged at the end of the discussion section.  

      References

      Afroj, T., Mitsuhashi, A., Ogino, H., Saijo, A., Otsuka, K., Yoneda, H., Tobiume, M., Nguyen, N. T., Goto, H., Koyama, K., Sugimoto, M., Kondoh, O., Nokihara, H., & Nishioka, Y. (2021). Blockade of PD-1/PD-L1 Pathway Enhances the Antigen-Presenting Capacity of Fibrocytes. The Journal of Immunology, 206(6), 1204‑1214. https://doi.org/10.4049/jimmunol.2000909

      Araki, K., Youngblood, B., & Ahmed, R. (2010). The role of mTOR in memory CD8+ T-cell differentiation. Immunological reviews, 235(1), 234‑243. https://doi.org/10.1111/j.0105-2896.2010.00898.x

      Bucala, R. J. (2022). Targeting fibrocytes in autoimmunity. Proceedings of the National Academy of Sciences, 119(5), e2121739119. https://doi.org/10.1073/pnas.2121739119

      Douglas, R. S., Kahaly, G. J., Patel, A., Sile, S., Thompson, E. H. Z., Perdok, R., Fleming, J. C., Fowler, B. T., Marcocci, C., Marinò, M., Antonelli, A., Dailey, R., Harris, G. J., Eckstein, A., Schiffman, J., Tang, R., Nelson, C., Salvi, M., Wester, S., … Smith, T. J. (2020). Teprotumumab for the Treatment of Active Thyroid Eye Disease. The New England Journal of Medicine, 382(4), 341‑352. https://doi.org/10.1056/NEJMoa1910434

      Dupin, I., Henrot, P., Maurat, E., Abohalaka, R., Chaigne, S., Hamrani, D. E., Eyraud, E., Prevel, R., Esteves, P., Campagnac, M., Dubreuil, M., Cardouat, G., Bouchet, C., Ousova, O., Dupuy, J.-W., Trian, T., Thumerel, M., Begueret, H., Girodet, P.-O., … Berger, P. (2023). CXCR4 blockade alleviates pulmonary and cardiac outcomes in early COPD (p. 2023.03.10.529743). bioRxiv. https://doi.org/10.1101/2023.03.10.529743

      Dupin, I., Thumerel, M., Maurat, E., Coste, F., Eyraud, E., Begueret, H., Trian, T., Montaudon, M., Marthan, R., Girodet, P.-O., & Berger, P. (2019). Fibrocyte accumulation in the airway walls of COPD patients. The European Respiratory Journal, 54(3), Article 3. https://doi.org/10.1183/13993003.02173-2018

      Fernando, R., Caldera, O., & Smith, T. J. (2021). Therapeutic IGF-I receptor inhibition alters fibrocyte immune phenotype in thyroid-associated ophthalmopathy. Proceedings of the National Academy of Sciences, 118(52), e2114244118. https://doi.org/10.1073/pnas.2114244118

      Freeman, C. M., Han, M. K., Martinez, F. J., Murray, S., Liu, L. X., Chensue, S. W., Polak, T. J., Sonstein, J., Todt, J. C., Ames, T. M., Arenberg, D. A., Meldrum, C. A., Getty, C., McCloskey, L., & Curtis, J. L. (2010). Cytotoxic potential of lung CD8+ T cells increases with COPD severity and with in vitro stimulation by IL-18 or IL-15. Journal of immunology (Baltimore, Md. : 1950), 184(11), 6504‑6513. https://doi.org/10.4049/jimmunol.1000006

      Gillen, J. R., Zhao, Y., Harris, D. A., LaPar, D. J., Stone, M. L., Fernandez, L. G., Kron, I. L., & Lau, C. L. (2013). Rapamycin Blocks Fibrocyte Migration and Attenuates Bronchiolitis Obliterans in a Murine Model. The Annals of thoracic surgery, 95(5), 1768‑1775. https://doi.org/10.1016/j.athoracsur.2013.02.021

      Hombrink, P., Helbig, C., Backer, R. A., Piet, B., Oja, A. E., Stark, R., Brasser, G., Jongejan, A., Jonkers, R. E., Nota, B., Basak, O., Clevers, H. C., Moerland, P. D., Amsen, D., & van Lier, R. A. W. (2016). Programs for the persistence, vigilance and control of human CD8+ lung-resident memory T cells. Nature Immunology, 17(12), Article 12. https://doi.org/10.1038/ni.3589

      Maeno, T., Houghton, A. M., Quintero, P. A., Grumelli, S., Owen, C. A., & Shapiro, S. D. (2007). CD8+ T Cells are required for inflammation and destruction in cigarette smoke-induced emphysema in mice. Journal of Immunology (Baltimore, Md.: 1950), 178(12), 8090‑8096. https://doi.org/10.4049/jimmunol.178.12.8090

      Manjarres, D. C. G., Axell-House, D. B., Patel, D. C., Odackal, J., Yu, V., Burdick, M. D., & Mehrad, B. (2023). Sirolimus suppresses circulating fibrocytes in idiopathic pulmonary fibrosis in a randomized controlled crossover trial. JCI Insight. https://doi.org/10.1172/jci.insight.166901

      Mehrad, B., Burdick, M. D., & Strieter, R. M. (2009). Fibrocyte CXCR4 regulation as a therapeutic target in pulmonary fibrosis. The International Journal of Biochemistry & Cell Biology, 41(8‑9), 1708‑1718. https://doi.org/10.1016/j.biocel.2009.02.020

      Mitsuhashi, A., Goto, H., Saijo, A., Trung, V. T., Aono, Y., Ogino, H., Kuramoto, T., Tabata, S., Uehara, H., Izumi, K., Yoshida, M., Kobayashi, H., Takahashi, H., Gotoh, M., Kakiuchi, S., Hanibuchi, M., Yano, S., Yokomise, H., Sakiyama, S., & Nishioka, Y. (2015). Fibrocyte-like cells mediate acquired resistance to anti-angiogenic therapy with bevacizumab. Nature Communications, 6(1), Article 1. https://doi.org/10.1038/ncomms9792

      Mitsuhashi, A., Koyama, K., Ogino, H., Afroj, T., Nguyen, N. T., Yoneda, H., Otsuka, K., Sugimoto, M., Kondoh, O., Nokihara, H., Hanibuchi, M., Takizawa, H., Shinohara, T., & Nishioka, Y. (2023). Identification of fibrocyte cluster in tumors reveals the role in antitumor immunity by PD-L1 blockade. Cell Reports, 112162. https://doi.org/10.1016/j.celrep.2023.112162

      Nemzek, J. A., Fry, C., & Moore, B. B. (2013). Adoptive transfer of fibrocytes enhances splenic T-cell numbers and survival in septic peritonitis. Shock (Augusta, Ga.), 40(2), 106‑114. https://doi.org/10.1097/SHK.0b013e31829c3c68

      O’Shaughnessy, T. C., Ansari, T. W., Barnes, N. C., & Jeffery, P. K. (1997). Inflammation in bronchial biopsies of subjects with chronic bronchitis : Inverse relationship of CD8+ T lymphocytes with FEV1. American Journal of Respiratory and Critical Care Medicine, 155(3), 852‑857. https://doi.org/10.1164/ajrccm.155.3.9117016

      Pilling, D., Fan, T., Huang, D., Kaul, B., & Gomer, R. H. (2009). Identification of markers that distinguish monocyte-derived fibrocytes from monocytes, macrophages, and fibroblasts. PloS One, 4(10), e7475. https://doi.org/10.1371/journal.pone.0007475

      Pombo-Suarez, M., & Gomez-Reino, J. J. (2019). Abatacept for the treatment of rheumatoid arthritis. Expert Review of Clinical Immunology, 15(4), 319‑326. https://doi.org/10.1080/1744666X.2019.1579642

      Pretolani, M., Soussan, D., Poirier, I., Thabut, G., Aubier, M., COBRA Study Group, & COBRA cohort Study Group. (2017). Clinical and biological characteristics of the French COBRA cohort of adult subjects with asthma. The European Respiratory Journal, 50(2), 1700019. https://doi.org/10.1183/13993003.00019-2017

      Roos-Engstrand, E., Ekstrand-Hammarström, B., Pourazar, J., Behndig, A. F., Bucht, A., & Blomberg, A. (2009). Influence of smoking cessation on airway T lymphocyte subsets in COPD. COPD, 6(2), 112‑120. https://doi.org/10.1080/15412550902755358

      Rozelle, A. L., & Genovese, M. C. (2007). Efficacy results from pivotal clinical trials with abatacept. Clinical and Experimental Rheumatology, 25(5 Suppl 46), S30-34.

      Sauler, M., McDonough, J. E., Adams, T. S., Kothapalli, N., Barnthaler, T., Werder, R. B., Schupp, J. C., Nouws, J., Robertson, M. J., Coarfa, C., Yang, T., Chioccioli, M., Omote, N., Cosme, C., Poli, S., Ayaub, E. A., Chu, S. G., Jensen, K. H., Gomez, J. L., … Rosas, I. O. (2022). Characterization of the COPD alveolar niche using single-cell RNA sequencing. Nature Communications, 13(1), Article 1. https://doi.org/10.1038/s41467-022-28062-9

      Siena, L., Gjomarkaj, M., Elliot, J., Pace, E., Bruno, A., Baraldo, S., Saetta, M., Bonsignore, M. R., & James, A. (2011). Reduced apoptosis of CD8+ T-lymphocytes in the airways of smokers with mild/moderate COPD. Respiratory Medicine, 105(10), 1491‑1500. https://doi.org/10.1016/j.rmed.2011.04.014

      Smith, T. J., Kahaly, G. J., Ezra, D. G., Fleming, J. C., Dailey, R. A., Tang, R. A., Harris, G. J., Antonelli, A., Salvi, M., Goldberg, R. A., Gigantelli, J. W., Couch, S. M., Shriver, E. M., Hayek, B. R., Hink, E. M., Woodward, R. M., Gabriel, K., Magni, G., & Douglas, R. S. (2017). Teprotumumab for Thyroid-Associated Ophthalmopathy. The New England Journal of Medicine, 376(18), 1748‑1761. https://doi.org/10.1056/NEJMoa1614949

      Vincenti, F., Rostaing, L., Grinyo, J., Rice, K., Steinberg, S., Gaite, L., Moal, M.-C., Mondragon-Ramirez, G. A., Kothari, J., Polinsky, M. S., Meier-Kriesche, H.-U., Munier, S., & Larsen, C. P. (2016). Belatacept and Long-Term Outcomes in Kidney Transplantation. The New England Journal of Medicine, 374(4), 333‑343. https://doi.org/10.1056/NEJMoa1506027

      Wang, X., Zhang, D., Higham, A., Wolosianka, S., Gai, X., Zhou, L., Petersen, H., Pinto-Plata, V., Divo, M., Silverman, E. K., Celli, B., Singh, D., Sun, Y., & Owen, C. A. (2020). ADAM15 expression is increased in lung CD8+ T cells, macrophages, and bronchial epithelial cells in patients with COPD and is inversely related to airflow obstruction. Respiratory Research, 21(1), 188. https://doi.org/10.1186/s12931-020-01446-5

      Zenke, S., Palm, M. M., Braun, J., Gavrilov, A., Meiser, P., Böttcher, J. P., Beyersdorf, N., Ehl, S., Gerard, A., Lämmermann, T., Schumacher, T. N., Beltman, J. B., & Rohr, J. C. (2020). Quorum Regulation via Nested Antagonistic Feedback Circuits Mediated by the Receptors CD28 and CTLA-4 Confers Robustness to T Cell Population Dynamics. Immunity, 52(2), 313-327.e7. https://doi.org/10.1016/j.immuni.2020.01.018

    1. Author Response:

      We are grateful to the reviewers for their insightful comments, suggestions, and criticism. In the updated version of the manuscript, all these will be properly reflected. Here we briefly address the main points raised:

      Reviewer #1:

      1.1. Patient selection and tumor area selection are crucial for this study but not very carefully defined. Why are some core and others not? Figure referral is an issue here (sup figure 6 where all core and non-core samples are supposed to be according to the legend of Fig 4 is likely sup fig 7 but this is then a complete copy paste of Figure 4). In the methods it is stated that the core samples are based on limited contamination of additional morphotypes (<20%) but Fig 4 suggests that all tumours listed have multiple morphotypes.

      The tissue samples were obtained from a hospital cohort of patients with stage II-IV colorectal cancer (at diagnostic time), with no particular selection criteria imposed, as this was an exploratory study.

      Tumor regions were marked for macro-dissection by an experienced pathologist following the standard practice for whole-tumor transcriptomics studies. The subregions (morphological regions) were marked by the same experienced pathologist for macro-dissection (in an adjacent section) and reassessed later with respect to their “morphological purity”. It is impossible to macro-dissect regions containing a single morphological pattern. Hence, those regions which contained significant amount (>=20%) of other morphologies were considered “non-core”, while the rest were called “core” regions. This distinction applies to morphological regions solely and not to whole-tumor samples.

      Indeed, the reference in caption to Figure 4, should refer to Supp. Fig. 7 (which needs to be updated).

      1.2. CMS subtype should be performed with single sample predictor rather than CMScaller.

      We agree that a single-sample predictor for CMS is needed, however CMScaller is the de facto classifier for CMS (>130 citations) so we used it to illustrate the practical implications.

      1.3. A couple of surprising observations need specification. MUC2 is a strong CMS3 reporter gene yet Mucinous tumours appear to end up in CMS4 rather than 3. Can the authors show that indeed stroma cells are very evident in these samples?

      We do not have a direct estimation of the amount of stromal cells, but the high scores of the various fibroblast-related signatures in mucinous regions (Fig2 B, D) indicate that, indeed, there is an enrichment in stroma. In the follow-up study we plan to perform specific staining as well as spatial transcriptomics of these regions to further investigate our findings.

      1.4. The SE PP and CT are assigned to CMS2, but in Figure 4 this appears a lot more variable than the authors would make the reader believe. The full data are not completely clear (see point 1).

      In the paper, we transparently state that PP, SE, and CT were assigned to CMS2 in 62.5%, 41.7% and 41.9% of cases, respectively. These proportions referred to all samples for which CMSCaller made a prediction. In Fig.4, we also show the proportion of cases in which CMSCaller did not predict any subtype.

      1.5. The tumor response rates are rather weird as this is likely dependent on the complete tumour and not so much the subareas. It is not very well described what we see in this analysis.

      We did not compute any response rates but simple prognostic scores as (weighted, if weights were provided) means of genes in the specific signatures (see Methods). The question addressed was whether these scores were comparable between whole tumor and corresponding tumor regions (within same tumor). Given the observed (relative) variability, the more important follow-up question - which we cannot answer with our limited survival data – is whether a higher score in a region in comparison with whole-tumor is indeed indicative of a higher risk of relapse.

      1.6. Serrated adenomas have previously been aligned with CMS4. Is this different from serrated areas in cancers?

      We do not have data from adenomas to compare with the serrated carcinoma regions. But a comparison of (regions of) both traditional serrated and sessile serrated adenomas to serrated carcinoma would be interesting.

      1.7. The fact that iCMS2 and iCMS3 align rather well with the current analysis of the distinct regions suggests that the analysis that was reported last year is the proper way to view tumor intrinsic signatures. The authors now propose a rather similar outcome to this issue which does take away a lot of the novelty of the findings of this study.

      Our goal was not to propose another stratification paradigm for colorectal cancer, but rather to study the associations between morphology and transcriptome and its implications in practice. As such, our analyses are not limited to molecular subtypes and the respective observations are but a small part of our findings. Indeed, the intrinsic subtypes (iCMS 2/3) are stable and robust, as they are based on the genes expressed in epithelial cells, and they may well prove to be of clinical importance too. However, they do not cover all aspects (e.g. fibroblasts subtypes) and, as stated in Joanito et al. Nat Gen 54, pages 963–975 (2022), “iCMS, MSI status and CMS jointly inform the molecular classification of CRC”. Last, in our opinion, the molecular classification of CRC, while a useful point of view in tumour classification, is not covering all the necessary perspectives on tumour heterogeneity.

      Reviewer #2:

      2.1. Overall, the manuscript provides an interesting histological/morphological framework through which we can consider heterogeneity in colorectal carcinoma and an approach by which we might improve the performance of gene expression-based classifiers in predicting clinical behaviour and/or responses to therapy. Exploration of CRC morphotypes and their differences was quite interesting. However, more work is needed to support the claims made by the authors. While I appreciate that the authors themselves identify limitations of their study within the manuscript, I believe awareness of these limitations is not reflected in some of the claims made in the abstract and at points in the main text when discussing the use of expression-based classifiers.

      We will improve the manuscript to stress the exploratory nature of our analyses and their limitations.

    1. Author Response

      The following is the authors’ response to the original reviews.

      This important work reports the identification of a list of proteins that may participate in the clearance of paternal mitochondria during fertilization, which is known as essential for normal fertilization and embryonic and fetal development. While the main method used is state of the art and the supporting data are solid, the vigor of the biochemical assays and function validation is inadequate. This work will be of interest to developmental and reproductive biologists working on fertilization. Key revisions (for the authors) include 1) Use a mitochondria-enriched fraction instead of whole sperm for the assays, and add more control samples to monitor what got lost during sperm and oocyte treatments before the coincubation step. 2) Functional validation of the key proteins identified.

      We thank Editors of eLife, as well as Special Issue Guest-Editors and Reviewers for a favorable assessment and helpful recommendations for key revisions. Provisional revisions included in our revised article are detailed below. We agree with Editors’ comment about the use of mitochondrion enriched fractions and additional functional validation of key proteins. In fact, we are developing experimental protocols for oocyte extract coincubation with isolated sperm heads and tails, and eventually with purified mitochondrial sheaths, to separate the ooplasmic sperm nucleus remodeling factors from the mitophagic ones. Such experiments, as well as functional validations using porcine zygotes are contingent upon anticipated post-pandemic rebound in the availability of porcine oocytes, obtained from ovaries harvested on slaughterhouse floors, requiring currently unavailable workforce which has hampered our access to this necessary resource.

      Reviewer #1 (Peer Review):

      Could the authors make clear how much the presented pictures reflect the described localisation? There is no information on the number of spermatozoa and embryos observed nor the fraction of these embryos showing the presented pattern of localisation. This must be included.

      Two hundred spermatozoa were counted per replicate of the cell-free system co-incubation and 20 zygotes per replicate, with 3 replicates of immunolabelling for each phase/picture which were examined to establish the typical localization patterns that were observed. The displayed patterns were observed in 65 to 88% of examined spermatozoa/zygotes; varying dependent on protein, replicate, and phase of immunolabelling. In all cases, the signal displayed is the typical pattern that was displayed in most cells. This information has been added to the Materials and Methods section for clarification.

      It is not clear if the authors also examined the localization of other proteins and obtained a different pattern than anticipated from the proteomic approach or if they only tested these 6 proteins and got a 100% of correlation.

      These are the 6 proteins which were selected based on extensive literature review into known functions of all identified proteins, as well as extensive research into available and reliable antibodies to detect such proteins within our porcine systems. Even so, no particular localization patterns were anticipated; instead, we presented the patterns actually observed and even some patterns which defied our expectations (i.e., the localization of BAG5 in the sperm acrosome).

      The authors use "MS" in the text to indicate "mitochondrial Sheath" and "Mass spectrometry". this is confusing.

      The authors agree and the usage of MS as an acronym for either has been removed entirely to avoid confusion.

      In the introduction the author refers to Ankel-Simons and Cummins, 1996 as a reference for the number of sperm mitochondria in mammalian species, this is incorrect since the quoted paper is about the number of mtDNA molecules and mentioned an earlier publication.

      This has been revised and the appropriate citation has been used.

      Reviewer #2 (Peer Review):

      Major:

      1) It has been proved from the earlier studies from this group that the porcine cell-free system is useful to observe spermatozoa interacting with ooplasmic proteins in a single trial and could recapitulate fertilization sperm mitophagy events that take place in a zygote without affecting later cell-division process. However, the post-fertilization sperm mitophagy process is a complex time-associated event that many processes that occur sequentially and interactively, which means ooplasmic proteins might be involved in this process but may not directly interact with sperm or may associate with sperm-ooplasmic protein complex at different time points. It is certainly a great advance already in knowledge to identify "the candidate players" from the list of 185 proteins; however, with the time-resolution (4 and 24hr) in the current study and without functional validation experiments at this stage, it is still difficult to postulate the importance of these identified proteins. The functional validation experimental designs, in my opinion, is critically important for better interpretation of the data.

      The authors agree with this reviewer’s sentiments and do plan to conduct further functional analysis. This project was able to generate a list of candidate, sperm-mitophagy promoting proteins and we were further able to show that many of these proteins were detectable both via mass spectrometry and via immunocytochemistry in spermatozoa exposed to our cell-free system. Furthermore, similar localization patterns were found in spermatozoa that were detected within newly fertilized zygotes. These results boost our confidence in our cell-free system and show that our list of candidate proteins is truly a useful list for future localization and functional analyses. We are certainly aware that we have not captured every protein that may play a role in post-fertilization sperm mitophagy and that the proteins captured are just candidates until proven otherwise. Likewise, we have almost certainly captured multiple proteins that are currently candidates that will likely not be shown to play a role in postfertilization sperm mitophagy, while it is plausible that at least some of these candidate proteins do play a role in mitophagy and some of them likely participate (perhaps have yet to be described roles) in other fertilization events, in which we would be extremely interested in as well.

      2) As shown in Figure 1, whole sperm was used in the co-incubation and the later MS analysis; thus, proteins identified in the current study might be relevant in fertilization processes other than postfertilization sperm mitophagy, as proteins identified in the current study may be associated with other parts of the sperm (e.g. sticky sperm head, e.g. PSMG2 associated with sperm midpieces, tail at 4hr coincubation, but then only associate with sperm head at 24hr co-incubation) rather than sperm midpiece, despite the fact that authors applied immunohistochemistry to show the localization of this protein, but the evidence is indirect, so how authors functionally differentiate these 6 identified proteins from sperm mitophagy process with other processes and to confirm (or to associate) the relevance of these proteins with sperm mitophagy process?

      The authors agree that the 6 proteins which were further studied by using immunocytochemistry may be playing roles in other processes such as pronuclear formation. We discussed some potential roles including and beyond post-fertilization mitophagy, in the Supplemental Discussion. After reviewer comments, we moved the Supplemental Discussion back in the main Discussion section. Thus, this section now considers additional putative pathways in which the said 6 proteins cold participate, though we concede that thorough functional studies must still be performed.

      3) Class 3 proteins were present in both the gametes or only the primed control spermatozoa, but are decreased in the spermatozoa after co-incubation, which authors interpreted as sperm-borne mitophagy determinants and/or sperm-borne proteolytic substrates of the oocyte autophagic system, this data categorization may need to be revised as sperm-borne proteolytic substrates of the oocyte autophagic system only, not for sperm borne mitophagy determinants. The argument for this disagreement is due to the fact that if the protein is a sperm-borne mitophagy determinant, after coincubation, to execute the mitophagy process, this protein should still be associated with the sperm at least at the early stage (of 4hr) (constant under MS detection when comparing control with 4hr treated) rather than being released from the sperm. Or alternatively, they could result in class 3 proteins (but not all those 6 were in class 3). Nevertheless, if these proteins serve as substrates, they can be used (consumed) and show decreased under MS detection.

      This argument for redefining the Class 3 proteins more accurately is understood and we agree. The definition is revised in the paper.

      4) Of particular interest among the 6 proteins that were further investigated. Unlike other proteins, MVP was highly significant (p<0.001) after 4hr incubation, but the significance became less after 24hr (p=0.19). Interpretation of this dynamic change in the relevance of the mitophagy process would facilitate the readers to understand the relevance and the role of MVP.

      The differences in significance are likely influenced by the abundance of MVP detectable by mass spectrometry. As the time of cell-free system incubation increases, the variability between replicates also seemed to increase, likely due to the sustained proteolytic activity taking place in our system. This work was based on three replicates of mass spectrometry for each time point; additional replicates likely would have reduced the p-value for the 24hr cell-free data set, for MVP and potentially other proteins also. At both time points, MVP was only detectable in spermatozoa after they had been exposed to the cell-free system treatment which is the criteria that truly interested us more than the actual differences in content between the timepoints and is why it was added to our list of candidate proteins.

      5) In figure 3, the association of ooplasmic MVP to sperm midpiece is not convincing enough as sperm midpiece and tail often show some levels of non-specific signals under fluorescent microscopy. And the dynamic association of ooplasmic MVP to sperm midpiece in Fig. 3F-G is difficult to reach a conclusion solely based on data presented in the manuscript. Additional negative control of sperm MVP staining from the primed and treated sperm would be helpful. Additionally, a quantitative comparison (15 vs 25hr) of sperm-associated MVP signals from the fertilized embryo or a stack image from different angles would clarify the doubts raised here.

      For all images and all replicates, serum controls were also generated. These controls were then viewed under fluorescent microscope, and light intensities and exposures thresholds for each fluorescent light channel were set based on the background intensity that came from these nonimmune serum-treated control samples. We set our light intensity/acquisition time below a threshold where the non-specific signal began to appear. All the presented patterns are based on setting this peak intensity threshold and as such the signal we see should be the true signal. Furthermore, 200 spermatozoa were counted per treatment per replicate of the cell-free system co-incubation and 20 zygotes per replicate, with 3 replicates of immunolabelling for each protein and data point, which was used to represent the typical localization patterns that were observed. The displayed patterns were observed between in 65- 88% of examined spermatozoa/zygotes. Invariably, the signal displayed in the manuscript is the typical pattern that was seen in a majority of cells. This information has now been added to the Materials & Methods section for clarification.

      6) Same concerns for the other 5 proteins (PSMG2, PSMA3, FUNDC2, SAMM50, BAG5) as indicated above.

      See response to Question 5.

      7) The patterns of these 6 proteins under the immunofluorescent study are confusing as the pattern varies after co-incubation (treated), and mostly, the signal of these proteins observed from the fertilized embryos is not really associated with sperm midpieces. Therefore, the evidence of these proteins involving in post-fertilization sperm mitophagy is, at this moment, weak based on the data presented. But the relevance of these proteins in events post-fertilization or early embryo development is certainly (evidence did not strong enough to support "sperm mitophagy," in my opinion).

      The authors agree that some of these proteins seem to be playing roles beyond postfertilization sperm mitophagy and that there is a need for true functional studies before the authors can state with certainty that these proteins play a role in any of the discussed fertilization events. We state this in the discussion: “Considering the dynamic proteomic remodeling of both the oocyte and spermatozoa which takes place during early fertilization, these 185 proteins which have been identified likely play roles in processes beyond sperm mitophagy.” It should be noted that the authors went into greater detail about potential alternative protein functions based on the present data and literature review in the Supplemental Discussion. Based on this comment and other reviewer comments we have now included the Supplemental Discussion as part of the main Discussion section, and this will hopefully help clarify some of the authors’ thoughts about the 6 candidate proteins which were further analyzed during this study.

      Minor:

      1) To my understanding, statistical significance (relevance) is normally set at a p-value of either <0.1 or 0.05. The reason for loosening the p-value of 0.2 in the current study needs to be justified as this was not a common statistical criterium, and the interpretation of those candidates from this loosened criterium should also be careful.

      The loosening of statistical relevance in this study to 0.2, only applied to our Class 1 proteins. This is because for a protein to fall into the Class 1 proteins it was a protein that was only present in samples after they were exposed to the cell-free system. In the case of these Class 1 proteins, this happened for all 3 replicates at each stated timepoint. We found this pattern of detection to be important whether the p-value fell under 0.1 or 0.2. As such, we loosened our statistical threshold for our Class 1 proteins. Any proteins added to our candidate list will be subject to further investigation before definitive conclusions can be drawn, and as such we think that capturing more proteins was more important for the goals of this study than limiting the number of proteins captured, especially for those Class 1 proteins. An explanation of this has been added to the Materials & Methods section Mass Spectrometry Data Statistical Analysis.

      2) First cell cleavage of porcine embryo normally occurs within 48hr post-insemination or activation; therefore, the 4 and the 24hr time points used in the current study require justification included in the discussion or methods and material section.

      First cleavage of porcine embryos normally occurs around 24 - 28 hours post-insemination. Thus, for both the cell-free system and the embryo studies we were capturing an advanced 1 cell stage zygote/zygote like system with our 24 hour and 25-hour time points.

      3) In figure 2, colors used in different time points and in two different classes represent (sometimes) different protein categories, would be easier for the readers for quick comparisons if the same color could be used to represent the same protein category throughout the graph. (E.g, proteins for early zygote development are shown in red in "A", but blue in "B")

      This has been corrected and the color scheme for Figure 2 has been revised for easier comparisons.

      Reviewer #3 (Peer Review):

      I am not used to seeing a supplementary discussion in a manuscript. I also believe it should be incorporated into normal discussion.

      The Supplemental Discussion has been incorporated into the main Discussion now.

      It would be very helpful to make an additional figure in which the proposed interactome of identified factors with the sperm mitochondria before and after incubation are drawn schematically and also which factors are not IDed in both cases (when comparing to somatic mito- or autophagy). This eases to get through the discussion and will beautifully summarize and illustrate the importance and progress that the authors have made with this assay.

      We made a diagram that depicts the changes in protein localization patterns overtime within our cell-free system. This diagram has been added to the manuscript as Figure 9.

      Reviewer #1 (Public Review):

      In this manuscript, the authors used an unbiased method to identify proteins from porcine oocyte extracts associated with permeabilised boar spermatozoa in vitro. The identification of the proteins is done by mass spectrometry. A previous publication of this lab validated the cell-free extract purification methods as recapitulating early events after sperm entry in the oocyte. This novel method with mammalian gametes has the advantage that it can be done with many spermatozoa at the time and allows the identification of proteins associated with many permeabilised boar spermatozoa at the time. This allowed the authors to establish a list of proteins either enriched or depleted after incubation with the oocytes extract or even only associated with spermatozoa after incubation for 4h or 24h. The total number of proteins identified in their test is around 2 hundred and with very few present in the sample only when spermatozoa were incubated with the extracts. The list of proteins identified using this approach and these criteria provide a list of proteins likely associated with spermatozoa remnants after their entry and either removed or recruited for the transformation of spermatozoa-derived structures. Using WB and histochemistry labelling of spermatozoa and early embryos using specific antibodies the authors confirmed the association/dissociation of 6 proteins suspected to be involved in autophagy.

      While this unique approach provides a list of potential proteins involved in sperm mitochondria clearance it's (only) a starting point for many future studies and does not provide the demonstration that any of these proteins has indeed a role in the processes leading to sperm mitochondria clearance since the protein identified may also be involved in other processes going-on in the oocyte at this time of early development.

      We thank reviewer 1 for positive comments. We added a sentence in Discussion addressing the obvious shortcoming of present study, as further functional validations of candidate mitophagy factors are planned.

      Concerning the localisation of the 6 proteins further analysed, the authors must add how much the presented picture represents the observed patterns. They must include the details on the fraction of spermatozoa and embryos displaying the presented pattern.

      We now specify that the patterns depicted in manuscript are typical and representative of data from at least three replicates of immunolabeling in spermatozoa and zygotes. For each of these replicates, 200 spermatozoa were examined per replicate of the cell-free system co-incubation or 20 zygotes per replicate. The displayed patterns were observed between 65-88% in examined spermatozoa/zygotes. Invariably, the signal displayed in manuscript is the typical pattern that was seen in a majority of cells. This information has now been added to the Materials & Methods section for clarification.

      Reviewer #2 (Public Review):

      Mitochondria are essential cellular organelles that generate ATPs as the energy source for maintaining regular cellular functions. However, the degradation of sperm-borne mitochondria after fertilization is a conserved event known as mitophagy to ensure the exclusively maternal inheritance of the mitochondrial DNA genome. Defects on post-fertilization sperm mitophagy will lead to fatal consequences in patients. Therefore, understanding the cellular and molecular regulation of the postfertilization sperm mitophagy process is critically important. In this study, Zuidema et. al applied mass spectrometry in conjunction with a porcine cell-free system to identify potential autophagic cofactors involved in post-fertilization sperm mitophagy. They identified a list of 185 proteins that might be candidates for mitophagy determinants (or their co-factors). Despite the fact that 6 (out of 185) proteins were further studied, based on their known functions, using a porcine cell-free system in conjunction with immunocytochemistry and Western blotting, to characterize the localization and modification changes these proteins, no further functional validation experiments were performed. Nevertheless, the data presented in the current study is of great interest and could be important for future studies in this field.

      We thank reviewer 2 for positive comments. As we explain in our response to Editors and Reviewer 1, further validation studies will be resumed once the availability of slaughterhouse ovaries for such studies improves. Examples of such functional validation of pro-mitophagic proteins SQSTM1 and VCP are included in our previous studies (DOI: 10.1073/pnas.1605844113 and DOI: 10.3390/cells10092450) that led to the development of cell-free system reported here, and are cited in present study.

      Reviewer #3 (Public Review):

      In this manuscript, a cytosolic extract of porcine oocytes is prepared. To this end, the authors have aspirated follicles from ovaries obtained from by first maturing oocytes to meiose 2 metaphase stage (one polar body) from the slaughterhouse. Cumulus cells (hyaluronidase treatment) and the zona pellucida (pronase treatment) were removed and the resulting naked mature oocytes (1000 per portion) were extracted in a buffer containing divalent cation chelator, beta-mercaptoethanol, protease inhibitors, and a creatine kinase phosphocreatine cocktail for energy regeneration which was subsequently triple frozen/thawed in liquid nitrogen and crushed by 16 kG centrifugation. The supernatant (1.5 mL) was harvested and 10 microliters of it (used for interaction with 10,000 permeabilized boar sperm per 10 microliter extract (which thus represents the cytosol fraction of 6.67 oocytes). The sperm were in this assay treated with DTT and lysoPC to prime the sperm's mitochondrial sheath. After incubation and washing these preps were used for Western blot (see point 2) for Fluorescence microscopy and for proteomic identification of proteins.

      Points for consideration:

      1) The treatment of sperm cells with DTT and lysoPC will permeabilize sperm cells but will also cause the liberation of soluble proteins as well as proteins that may interact with sperm structures via oxidized cysteine groups (disulfide bridges between proteins that will be reduced by DTT).

      This is certainly a possibility, the lysoPC and DTT permeabilization steps were designed to mimic natural processing (plasma membrane removal and sperm protein disulfide bond reduction), which the spermatozoa would undergo during fertilization. However, we do realize that this is a chemically induced processing and thus is not a perfect recapitulation of fertilization processes. However, in this study and in previous studies with this system, we were able to show alignment between proteomic interactions taking place in the cell-free system and within the zygotes.

      2) Figure 3: Did the authors really make Western blots with the amount of sperm cells and oocyte extracts as the description in the figures is not clear? This point relates to point 1. The proteins should also be detected in the following preparations (1) for the oocyte extract only (done) (2) for unextracted nude oocytes to see what is lost by the extraction procedure in proteins that may be relevant (not done) (3) for the permeabilized (LPC and DTT treated and washed) sperm only (not done) (4) For sperm that were intact (done) (5) After the assay was 10,000 permeabilized sperm and the equivalent of 6.67 oocyte extracts were incubated and were washed 3 times (or higher amounts after this incubation; not done). Note that the amount of sperm from one assay (10,000) likely will give insufficient protein for proper Western blotting and or Coomassie staining. In the materials and methods, I cannot find how after incubation material was subjected to western blotting the permeabilized sperm. I only see how 50 oocyte extracts and 100 million sperm were processed separately for Western blot.

      The authors did make Western blots with the number of spermatozoa and oocytes stated in the materials and methods, a total protein equivalent of 10 to 20 million spermatozoa (equivalent to ~20-40 µg of total protein load) and 100 MII oocytes (equivalent to ~20 µg of total protein load). These numbers have been corrected in the Materials & Methods. Also, we did find in the Materials & Methods section that the Co-Incubation of Permeabilized Mammalian Spermatozoa with Porcine Oocyte Extracts section refers to using cell-free exposed spermatozoa for electrophoresis; however, for none of the presented Western blot work was this true. Rather, all of the presented Western blots as per their descriptions are utilizing ejaculated or capacitated sperm or oocytes. This line has been removed from the Materials & Methods to reduce confusion.

      Regarding preparation (2), we have previously assessed the difference between oocyte extract and intact oocytes in this manner internally and we are certainly losing proteins due to the oocyte extraction process. We make caveats in this vein throughout the article such as: “Furthermore, this cell-free system while useful does not perfectly capture all the events which take place during in vivo fertilization. The cell-free system is intended to mimic early fertilization events but is presumably not the exact same as in vitro fertilization.”

      3) Figures 4, 5, 6, 7, and 8 see point 2. I do miss beyond these conditions also condition 1 despite the fact that the imaged ooplasm does show positive staining.

      For all the presented Western blots, the tissue type is stated in the image description and the protocol which was used to prepare these samples is stated in the Materials & Methods.

      4) These points 1-3 are all required for understanding what is lost in the sperm and oocyte treatments prior to the incubation step as well as the putative origin of proteins that were shown to interact with the mitochondrial sheath of the oocyte extract incubated permeabilized sperm cells after triple washing. Is the origin from sperm only (Figs 5-8) or also from the oocyte? Is the sperm treatment prior to incubation losing factors of interest (denaturation by DTT or dissolving of interacting proteins preincubation Figs 3-8)?

      The authors understand that there are proteins and interactions lost on both sides of the cellfree system equation and we have added a sentence to the Discussion to caveat this limitation in the system.

      5) Mass spectrometry of the permeabilized sperm incubated with oocyte extracts and subsequent washing has been chosen to identify proteins involved in the autophagy (or cofactors thereof). The interaction of a number of such factors with the mitochondrial sheath of sperm has been shown in some cases from sperm and others for an oocyte origin. Therefore, it is surprising that the authors have not sub-fractionated the sperm after this incubation to work with a mitochondrial-enriched subfraction. I am very positive about the porcine cell-free assay approach and the results presented here. However, I feel that the shortcomings of the assay are not well discussed (see points 1-5) and some of these points could easily be experimentally implemented in a revised version of this manuscript while others should at least be discussed.

      We agree that the use of a mitochondrial-enriched subfraction for further analysis would be interesting and useful. We are actively developing experimental protocols for oocyte extract coincubation with isolated sperm heads and tails, and eventually with purified mitochondrial sheaths. However, such experiments are contingent upon our access to porcine oocytes, which has continued to be a struggle since the COVID-19 pandemic compromised our ability to attain oocytes in large, cheap, and reliable quantities. This was a continuous problem with preparing materials for this very paper and has continued to be an issue for our laboratory as well as many others at our university and across the country. We continue to maximize oocytes every time we can get access to them, but the unfortunate reality is that this access has become sparce and unreliable over the past three years.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      The expression and localization of Foxc2 strongly suggest that its role is mainly confined to As undifferentiated spermatogonia (uSPGs). Lineage tracing demonstrated that all germ cells were derived from the FOXC2+ uSPGs. Specific ablation of the FOXC2+ uSPGs led to the depletion of all uSPG populations. Full spermatogenesis can be achieved through the transplantation of Foxc2+ uSPGs. Male germ cell-specific ablation of Foxc2 caused Sertoli-only testes in mice. CUT&Tag sequencing revealed that FOXC2 regulates the factors that inhibit the mitotic cell cycle, consistent with its potential role in maintaining a quiescent state in As spermatogonia. These data made the authors conclude that the FOXC2+ uSPG may be the true SSCs, essential for maintaining spermatogenesis. The conclusion is largely supported by the data presented, but two concerns should be addressed: 1) terminology used is confusing: primitive SSCs, primitive uSPGs, transit amplifying SSCs... 2) the GFP+ cells used for germ cell transplantation should be better controlled using THY1+ cells.

      Thanks for your good comments. According to your suggestions, we have addressed your two concerns as follows:

      1> Overall our work suggest that FOXC2+ SSCs are a subpopulation of SSCs in a quiescent state, thus we have replaced the term ‘primitive’ with ‘quiescent’ in the revised manuscript. In general, ‘transient amplifying SSCs’ is considered to be ‘progenitors’, thus we have replaced ‘transient amplifying SSCs’ with ‘progenitors’ in the revised manuscript.

      2> The transplantation experiment was conducted using MACS-sorted THY1+, FACS sorted THY1+, and FACS-sorted GFP+ (FOXC2+) uSPGs simultaneously. To be consistent with the single-cell RNA-seq using the MACS-sorted THY1+ uSPGs, we only presented the results from MACS-sorted THY1+ and FACS-sorted GFP+ (FOXC2+) uSPGs in the previous manuscript. Following the reviewer’s suggestion, we have included the results derived from FACS sorted THY1+ uSPGs as the control. The overall conclusion is still fully supported by the more comprehensive dataset, i.e. FOXC2+ cells generated significant higher numbers of colonies than THY1+ cells after transplantation (Figure 2D, E).

      Reviewer #2 (Public Review):

      The authors found FOXC2 is mainly expressed in As of mouse undifferentiated spermatogonia (uSPG). About 60% of As uSPG were FOXC2+ MKI67-, indicating that FOXC2 uSPG were quiescent. Similar spermatogonia (ZBTB16+ FOXC2+ MKI67-) were also found in human testis.

      The lineage tracing experiment using Foxc2iCreERT2/+;Rosa26LSL-T/G/LSL-T/G mice demonstrated that all germ cells were derived from the FOXC2+ uSPG. Furthermore, specific ablation of the FOXC2+ uSPGs using Foxc2iCreERT2/+;Rosa26LSL-DTA/+ mice resulted in the depletion of all uSPG population. In the regenerative condition created by busulfan injection, all FOXC2+ uSPG survived and began to proliferate at around 30 days after busulfan injection. The survived FOXC2+ uSPGs generated all germ cells eventually. To examine the role of FOXC2 in the adult testis, spermatogenesis of Foxc2f/-;Ddx4Cre/+ mice was analyzed. From a 2-month-old, the degenerative seminiferous tubules were increased and became Sertoli cell-only seminiferous tubules, indicating FOXC2 is required to maintain normal spermatogenesis in adult testes. To get insight into the role of FOXC2 in the uSPG, CUT&Tag sequencing was performed in sorted FOXC2+ uSPG from Foxc2iCreERT2/+;Rosa26LSL-T/G/LSL-T/G mice 3 days after TAM diet feeding. The results showed some unique biological processes, including negative regulation of the mitotic cell cycle, were enriched, suggesting the FOXC2 maintains a quiescent state in spermatogonia.

      Lineage tracing experiments using transgenic mice of the TAM-inducing system was well-designed and demonstrated interesting results. Based on all data presented, the authors concluded that the FOXC2+ uSPG are primitive SSCs, an indispensable subpopulation to maintain adult spermatogenesis.

      The conclusion of the mouse study is mostly supported by the data presented, but to accept some of the authors' claims needs additional information and explanation. Several terminologies define cell populations used in the paper may mislead readers.

      1) "primitive spermatogonial stem cell (SSC)" is confusing. SSCs are considered the most immature subpopulation of uSPG. Thus, primitive uSPGs are likely SSCs. The naming, primitive SSCs, and transit-amplifying SSCs (Figure 7K) are weird. In general, the transit-amplifying cell is progenitor, not stem cell. In human and even mouse, there are several models for the classification of uSPG and SSCs, such as reserved stem cells and active stem cells. The area is highly controversial. The authors' definition of stem cells and progenitor cells should be clarified rigorously and should compare to existing models.

      Thanks for your good comments. Considering that our results showed that FOXC2+ SSCs are in a quiescent state and that Mechanistically FOXC2 maintained the quiescent state of SSCs by promoting the expression of negative regulators of cell cycle, we have replaced ‘primitive SSCs’ with ‘quiescent SSCs’ in the revised manuscript. We agree with the reviewer that ‘transient amplifying SSCs’ is considered to be ‘progenitors’, thus we have replaced ‘transient amplifying SSCs’ with ‘progenitors’ in the revised manuscript. Further,from our point of view, the FOXC2+Ki67+ SSCs could be regarded as active stem cells, and the FOXC2+Ki67- SSCs could be regarded as reserved stem cells, although further research evidence is still needed to confirm this.

      2) scRNA seq data analysis and an image of FOXC2+ ZBTB16+ MKI67- cells by fluorescent immunohistochemistry are not sufficient to conclude that they are human primitive SSCs as described in the Abstract. The identity of human SSCs is controversial. Although Adark spermatogonia are a candidate population of human SSCs, the molecular profile of the Adark spermatogonia seems to be heterogeneous. None of the molecular profiles was defined by a specific cell cycle phase. Thus, more rigorous analysis is required to demonstrate the identity of FOXC2+ ZBTB16+ MKI67- cells and Adark spermatogonia.

      We agree with the reviewer that the identity of human SSCs remain elusive even though Adark population demonstrates certain characteristics of SSCs. To acknowledge this notion, we have revised our conclusion as such that only suggests FOXC2+ZBTB16+MKI67- represents a quiescent state of human SSCs.

      3) FACS-sorted GFP+ cells and MACS-THY1 cells were used for functional transplantation assay to evaluate SSC activity. In general, the purity of MACS is significantly lower than that of FACS. Therefore, FACS-sorted THY1 cells must be used for the comparative analysis. As uSPGs in adult testes express THY1, the percentage of GFP+ cells in THY1+ cells determined by flow cytometry is important information to support the transplantation data.

      Thanks for your good comments. According to your suggestions, we have addressed your concerns as follows:

      1> The transplantation experiment was conducted using MACS-sorted THY1+, FACS sorted THY1+, and FACS-sorted GFP+ (FOXC2+) uSPGs simultaneously. To be consistent with the single-cell RNA-seq using the MACS-sorted THY1+ uSPGs, we only presented the results from MACS-sorted THY1+ and FACS-sorted GFP+ (FOXC2+) uSPGs in the previous manuscript. Following the reviewer’s suggestion, we have included the results derived from FACS sorted THY1+ uSPGs as the control. The overall conclusion is still fully supported by the more comprehensive dataset, i.e. FOXC2+ cells generated significant higher numbers of colonies than THY1+ cells after transplantation (Figure 2D, E).

      2> We performed FACS analysis to determine the proportion of GFP+ cells in FACS-sorted THY1+ cells from Rosa26LSL-T/G/LSL-T/G or Foxc2iCreERT2/+;Rosa26LSL-T/G/LSL-T/G mice at day 3 post TAM induction, and the result showed that GFP+ cells account for approximately 20.9±0.21% of THY1+ cells, See Author response image 1.

      Author response image 1.

      4) The lineage tracing experiments of FOXC2+-SSCs in Foxc2iCreERT2/+;Rosa26LSL-T/G/LSL-T/G showed ~95% of spermatogenic cells and 100% progeny were derived from the FOXC2+ (GFP+) spermatogonia (Figure 2I, J) at month 4 post-TAM induction, although FOXC2+ uSPG were quiescent and a very small subpopulation (~ 60% of As, ~0.03% in all cells). This means that 40% of As spermatogonia and most of Apr/Aal spermatogonia, which were FOXC2 negative, did not contribute to spermatogenesis at all eventually. This is a striking result. There is a possibility that FOXC2CRE expresses more widely in the uSPG population although immunohistochemistry could not detect them.

      Thanks for your good comments. From our lineage tracing results, over 95% of the spermatogenic cells are derived from the FOXC2+ SSCs in the testes of 4-month-old mice, which means that FOXC2+ SSCs maintain a long-term stable spermatogenesis. In addition, previous studies have shown that only a portion of As spermatogonia belong to SSCs with complete self-renewal ability (PMID: 28087628, PMID: 25133429), which is consistent with our findings. Therefore, we speculate that 40% of As spermatogonia and most of Apr/Aal spermatogonia, which were FOXC2 negative, did contribute to spermatogenesis but cannot maintain a long-term spermatogenesis due to limited self-renewal ability.

      5) The CUT&Tag_FOXC2 analysis on the FACS-sorted FOXC2+ showed functional enrichment in biological processes such as DNA repair and mitotic cell cycle regulation (Figure 7D). The cells sorted were induced Cre recombinase expression by TAM diet and cut the tdTomato cassette out. DNA repair process and negative regulation of the mitotic cell cycle could be induced by the Cre/lox recombination process. The cells analyzed were not FOXC2+ uSPG in a normal physiological state.

      We do appreciate the reviewer’s concern on the possibility of the functions enriched in the analysis as referred might be derived from Cre/lox recombination. However, we think it is unlikely that the Cre/lox recombination process, supposed to be rather local and specific, can trigger such a systemic and robust response by the DNA damage and cell cycle regulatory pathways. The reasons are as follows: First, as far as we are aware, there has been sufficient data to support this suggested scenario. Second, we did not observe any alteration in either the SSC behaviors or spermatogenesis in general upon the TAM-induced genomic changes, suggesting the impact from the Cre/lox recombination on DNA damage or cell cycle was not significant. Third, no factors associated with the DNA repair process were revealed in the differential analysis of single-cell transcriptomes of FOXC2-WT and FOXC2-KO.

      6) Wei et al (Stem Cells Dev 27, 624-636) have published that FOXC2 is expressed predominately in As and Apr spermatogonia and requires self-renewal of mouse SSCs; however, the authors did not mention this study in Introduction, but referred shortly this at the end of Discussion. Their finding should be referred to and evaluated in advance in the Introduction.

      Thanks for your good comments. According to your suggestion, we have revised the introduction to refer this latest parallel work on FOXC2. We are happy to see that our discoveries are converged to the important role of FOXC2 in regulating SSCs in adult mammals.  

      Reviewer #3 (Public Review):

      By popular single-cell RNA-seq, the authors identified FOXC2 as an undifferentiated spermatogonia-specific expressed gene. The FOXC2+-SSCs can sufficiently initiate and sustain spermatogenesis, the ablation of this subgroup results in the depletion of the uSPG pool. The authors provide further evidence to show that this gene is essential for SSCs maintenance by negatively regulating the cell cycle in adult mice, thus well-established FOXC2 as a key regulator of SSCs quiescent state.

      The experiments are well-designed and conducted, the overall conclusions are convincing. This work will be of interest to stem cell and reproductive biologists.

      Thanks for the positive feedback.  

      Reviewer #1 (Recommendations for the Authors):

      The authors should address the following concerns:

      1) The most primitive uSPGs should be the true SSCs. The term "primitive SSCs" is very confusing.

      2) In addition to FACS-sorted GFP+ cells, FACS-sorted THY1+ cells should also be used for transplantation.

      Thanks for your good comments. According to your suggestions, we have addressed your two concerns as follows:

      1) Overall our work suggest that FOXC2+ SSCs are a subpopulation of SSCs in a quiescent state, thus we have replaced the term ‘primitive’ with ‘quiescent’ in the revised manuscript.

      2) The transplantation experiment was conducted using MACS-sorted THY1+, FACS sorted THY1+, and FACS-sorted GFP+ (FOXC2+) uSPGs simultaneously. To be consistent with the single-cell RNA-seq using the MACS-sorted THY1+ uSPGs, we only presented the results from MACS-sorted THY1+ and FACS-sorted GFP+ (FOXC2+) uSPGs in the previous manuscript. Following the reviewer’s suggestion, we have included the results derived from FACS sorted THY1+ uSPGs as the control. The overall conclusion is still fully supported by the more comprehensive dataset, i.e. FOXC2+ cells generated significant higher numbers of colonies than THY1+ cells after transplantation (Figure 2D, E).

      Reviewer #3 (Recommendations for the Authors):

      The experiments are well-designed and conducted, the overall conclusions are convincing. The only concerns are the writing, especially the introduction which was not well-rationalized. Sounds the three subtypes and three models for SSCs' self-renew are irrelevant to the major points of this manuscript. I don't think you need to talk too much about the markers of SSCs. Instead, I suggest you provide more background about the quiescent or activation states of the SSCs. In addition to that, as a nuclear-localized protein, it cannot be used to flow cytometric sorting, I don't think it should be emphasized as a marker. You identified a key transcription factor for maintaining the quiescent state of the primitive SSCs, that's quite important!

      Appreciate the positive feedback and constructive suggestions on the writing. We have substantially revised our manuscript to include the relevant advances and understanding from the field as well as highlight the importance of FOXC2 in regulating the quiescent state of SSCs.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer 1 (Recommendations For The Authors):

      1) The strikingly different conclusion from the previous Bourane study seems to stem from the experimental approaches. Rather than using genetic crosses that target all neurons from the hindbrain and spinal cord that express Npy at any point in development, Boyle et al target their manipulations specifically to the lumbar region of the superficial dorsal horn in adult mice using direct viral injections. Thus, Boyle is almost certainly manipulating much fewer neurons that the original study. How then is their behavioral effects so much greater? At the minimum, the authors need to discuss this discrepancy head on. Better would be a direct molecular/anatomical comparison of the neurons targeted by each approach. This could be done using Nyp-Cre mice crossed to a Rosa-LSL-reporter strain and quantifying the overlap with the same markers used here. Perhaps, the intersectional approach with Lbx1 resulted in labeling of a different population of neurons than the adult AAV injections? Although likely outside the scope, given this work directly questions the main conclusion of the Bourane paper, it will be important to see a replication of the original finding of selectivity to mechanical itch.

      We agree that our approach should be manipulating a smaller population of neurons, and that it is therefore suprising that we see greater behavioural effects. Please see our response to "Weakness 1" of Reviewer 2 for consideration of this point. We have already provided a direct molecular comparison as requested by the reviewer, and this appears in Figure 1 supplement 1. Here we used tissue from NPY::Cre that had been crossed with Ai9 mice (i.e. a Rosa-LSL-reporter) and had received intraspinal injections of AAV.flex.GFP. We then characterised the neurochemistry of tdTomato+ cells that were GFP+ or GFP-negative.

      2) The authors state that, "91.6% ± 0.3% of cells classed as Cre-positive cells were also Npy-positive, and these accounted for 62.1% ± 0.6% of Npy-positive cells" If I am reading this correctly, does that mean that 40% of the Npy+ cells are Cre negative? If so, how is this possible?

      This interpretation is correct. For quantification of RNAscope data we used a cut-off level of 4 transcripts, and cells with fewer than 4 transcripts were classed as negative. It is likely that some of the NPY cells classified as negative for Cre would have had some Cre mRNA (sufficient to cause recombination), but at a level below this threshold. It is also possible that some NPY+ cells would fail to express Cre, since this is a BAC transgenic mouse, rather than a knock-in.

      3) Similarly, the authors state that "great majority of FP-expressing neurons in laminae I-III were immunoreactive (IR) for NPY (78.5% ± 3.6%), and these accounted for 74.6% ± 109 1.9% of the NPY-IR neurons in this area". So does this mean 20% of the recombination is non-specific/in other cell types that could be involved in pain/itch sensation?

      Our finding that 91.6% of cells with Cre mRNA were also positive for Npy mRNA (see above) indicates that Cre expression was largely restricted to NPY cells. The failure to detect NPY peptide in some of these cells probably results from the relatively low level of peptide seen in the cell bodies of peptidergic neurons, which results from the rapid transport of peptides into their axons.

      4) Comparing Fig 3B and Fig4B it seems the control baseline von Frey responses are different. In fact, baseline response in Fig4b is quite like the CNO effect in Fig 3B. Unless I'm misunderstanding something, this seems quite odd?

      We agree that there is a difference between the baseline responses. We are not aware of any particular reason for this, and we think that it reflects a degree of variability that is seen with the von Frey test. Interestingly, the baseline values for the SNI cohort (Fig 4E) lies between the values in Fig 3B and Fig 4B.

      5) In Fig 4E, the behavior of the CNO treated mice is quite variable. Can the authors comment as to how this might be happening? Does the effect correlate with viral transduction?

      We did not see any obvious correlation between the extent of viral transduction and the behaviour of individual mice.

      6) Fig6, the PDyn-Cre experiment, is a bit of a non sequitur?

      Please see our response to "Weakness 2" of Reviewer 2 for consideration of this point.

      7) The conclusion is unusually long. I recommend trimming it to make it more concise.

      We presume that this refers to the Discussion. However, this was ~1550 words, and we do not feel that that is unusually long.

      Reviewer 2 (Public Review):

      Weaknesses

      1) There is inadequate discussion about previous studies of NPY interneurons. Specifically, the authors should address why a more restricted subset of these neurons (this study) have broader effects than seen previously.

      We have expanded the discussion on the discrepancies between our findings and those reported previously. We state at the outset that we are targeting a more restricted population (lines 509-10), and we now go into more detail concerning both similarities and differences between our findings and the reasons that we think may underlie any discrepancies (various changes between lines 522-575).

      2) I cannot see the reason for including results from manipulation of Dyn+ interneurons in this paper. First, the title does not reflect roles of spinal Dyn+ population. In addition, without further experiments characterizing relationships between NPY and Dyn interneurons in modulating itch and/or nociception, Dyn datasets seem to deviate from the main theme.

      We had previously shown that activating Dyn-INs suppressed pruritogen-evoked itch (Huang et al 2018), but it was important to test whether silencing these cells would have the opposite effect. Our finding of overlap in function (i.e. both NPY-INs and Dyn-INs suppress itch, and that both innervate GRPR cells) provides strong evidence against the idea that neurochemically-defined interneuron populations have highly specific functions, and we now state this in the Discussion. The anatomical experiments (which follow on from the functional studies) provide important new information concerning synaptic circuitry of the dorsal horn, by showing that NPY-INs preferentially innervate GRPR cells, and provide around twice as many synapses on these cells, compared to the Dyn-INs. Interestingly, this correlates with the relatively large optogenetically-evoked IPSCs that we saw when NPY-INs were activated, compared to those reported by Liu et al (2019) when galanin-expressing (which largely correspond to Dyn-INs) were activated. By including these findings in the paper, we are able to make comparisons between these two populations.

      3) While the authors provided convincing evidence that GRPR+ neurons serve as a downstream effector of NPY+ neuron evoked itch, the relationship between GRPR and NPY neurons in modulating pain is not examined. Therefore, Fig. 7B is pure speculation and should be removed.

      We feel that our recent findings that GRPR neurons correspond to vertical cells, that they respond to noxious stimuli, and that activating them results in pain-related behaviours, makes it reasonable to speculate that the NPY/GRPR circuit may also be involved in the anti-nociceptive action of NPY cells. The legend for Fig 7B already refers to this as a "potential circuit", and we have toned down the corresponding part of the discussion to say that our findings "raise the possibility" that this is the case (lines 605-7). We feel that this part of the figure is important, as otherwise our summary diagram ignores some of the main findings of the paper, and we hope that this is now acceptable.

      Recommendations For The Authors

      1) Fig. 1G: the "misexpression" of tdTomato neurons was much more prominent in deep dorsal horn laminae but not in the superficial ones. Was this representative? Can the authors perform a laminae specific characterization?

      We did test for this possibility in 2 NPY::Cre;Ai9 mice that had received intraspinal injections of AAV.flex.GFP, and found that there was a modest difference - 62% of tdTomato+ cells in laminae I-II, but only 39% of those in lamina III, were GFP+. This suggests that "misexpression" may have differed slightly between these regions. However, since the difference was quite modest, and we were only able to analyse tissue from two mice in this way, we did not include these findings in the paper.

      2) I have a lot of problems interpreting the c-Fos data in Fig. 2 E and F. For the mCherry- population, how was the quantification performed? From the image, it does not look like 2030% of cells express c-Fos; at a minimum a clear stain of neurons would be needed. Similarly, the identification of NPY cells is not particularly convincing (e.g., middle arrowhead lower 2 panels in C).

      We have provided further details on how the analysis was performed (changes made to lines 1016-29). NeuN staining was used to reveal all neurons, and a modified optical disector method was performed from somatotopically appropriate regions of the dorsal horn. As noted by the Reviewer, NeuN staining was required to allow identification of mCherrynegative cells. However, we have not included the NeuN immunoreactivity in the image, as this would add considerably to the complexity. These images are from single optical sections, and therefore the overall numbers of cells are low (in comparison to what would be seen in a projected image). The intensity of mCherry staining varied between cells. However, for all mCherry-positive cells (including the example referred to by the Reviewer), there was clear staining in the membrane, which could be followed in serial sections.

      3) Please add individual data points for all quantifications.

      These have been added.

      Reviewer 3 Recommendations For The Authors:

      1) It is somewhat surprising that there is no effect on CPP after activating spinal NPY neurons in neuropathic mice, given the almost complete rescue of hypersensitivity to baseline values in the nociceptive tests. Based on the methods, it appears that conditioning was carried out already 5 min after CNO injection. Yet, suppression of c-fos activity in excitatory spinal dh neurons was observed 30min after CNO injection. Also, it is not clear to me when CNO was injected prior to the nociceptive or CQ testing?

      Have the authors considered that conditioning from 5-35 min after CNO injection might be too short after CNO injection to achieve a profound analgetic effect?

      In a previous study (Polgár et al 2023), we had observed the timecourse of CNO-evoked itch and pain behaviours in mice in which GRPR cells expressed hM3Dq. We found that these started within 5 minutes of i.p. CNO injection (e.g. Fig S2 in that paper). In addition, the timecourse of action of gabapentin and CNO (both given i.p.) are likely to be similar, and there was a preference for the chamber paired with gabapentin. We are therefore confident that the conditioning period with CNO was adequate. We now explain this in the Methods section (lines 846-52). The timing of CNO injections for the nociceptive and CQ tests is now described (lines 749-55).

      2) The authors claim that tonic pain was not affected based on the conditioned place preference test. Efficacy in withdrawal response tests and in the CPP differ by more than duration of the stimulus. I'd suggest using more cautious wording here.

      We agree that caution is needed in interpreting the results of the CPP experiments. We have therefore replaced "does" with "may" in the Results section (line 336) and "did" with "may" in the Discussion (line 620).

      3) On page 9 the authors state "...suggesting that they suppress the transmission of pain- and itch-related information in the dorsal horn." However, pain is not affected in the loss of function experiments suggesting some qualitative differences in the role of the NPY neurons in itch and pain. This should also be reflected more clearly in this statement and in the discussion e.g. "suppress itch" and "can suppress pain".

      We accept the point made by the Reviewer. We have slightly altered the wording in lines 249-51 and 610 to reflect this.

    1. Author Response

      The following is the authors’ response to the original reviews.

      Reviewer #1 (Public Review):

      [...] Weaknesses

      Showing that A-2 and especially A-3 are outliers in the PCA analysis is useful, but it may be hiding other interesting signals in the data. The other strains are remarkably colinear on these plots, hinting that if the outliers were removed, one main component would emerge along which they are situated. It also seems possible that this additional analysis step would allow the second dimension to better differentiate them in a way that is interesting with respect to their mutator status or mutations in key metabolic or regulatory genes.

      We thank the reviewer for their positive comments and their constructive feedback on the manuscript. Following reviewer’s recommendation, we performed the PCA analysis on metabolism data after removing A-2 and A-3 data. We have detailed those results below. Consistent with a similar analysis performed on RNA-seq datasets in our previous publication, we find that removing these outliers has only a modest effect on separating mutators from non-mutators. We find that, while the new PC2 separates most mutators from the non-mutators, the separation is rather weak. Moreover, we do not see a similar distinction when looking at metabolic data in the Stationary phase. In the interest of improving the readability of the manuscript, we recommend not including these analysis in the final manuscript. We have presented the data for the reviewer’s benefit in Author response image 1, 2 and 3.

      Author response image 1.

      Author response image 2.

      Author response image 3.

      There is a missed opportunity to connect some key results to what is known about LTEE mutations that reduce the activity of pykF (pyruvate kinase I). This gene is mutated in all 12 LTEE populations, and often these mutations are frameshifts or transposon insertions that should completely knock out its activity. At first glance, inactivating an enzyme for a step in glycolysis does not make sense when the nutrient source in the growth medium is glucose, even though PykF is only one of two isozymes E. coli encodes for this reaction. There has been speculation that inactivating pykF increases the concentration of phosphoenolpyruvate (PEP) in cells and that this can lead to increased rates of glucose import because PEP is used by the phosphotransferase system of E. coli to import glucose (see https://doi.org/10.1002/bies.20629). The current study has confirmed the higher PEP levels, which is consistent with this model.

      We thank the reviewer for pointing out this missed opportunity. We have expanded the discussion around the role of pykF mutations and the elevated concentrations of PEP observed in our data in section 3.4.

      In the introduction, the papers cited to show the importance of changes in metabolism for adaptation do not seem to fit the focus of this study very well. They stress production of toxins and secondary metabolites, which do not seem to be mechanisms that are at work in the LTEE. I can think of two areas of background that would be more relevant: (1) studies of how bacterial metabolism evolves in adaptive laboratory evolution (ALE) experiments to optimize metabolic fluxes toward biomass production (for example, https://doi.org/10.1038/nature01149), and (2) discussions of how cross-feeding, metabolic niche specialization, and metabolic interdependence evolve in microbial communities, including in other evolution experiments (for example, https://doi.org/10.1073/pnas.0708504105 and https://doi.org/10.1128/mBio.00036-12).

      We thank the reviewer for pointing out missed citations in our introduction. We agree that these papers are relevant to the topic and have added their citations. Additionally, following the suggestion of another reviewer, we have reorganized the introduction so that the concept of the role of metabolism in evolution is presented first and the LTEE second.

      Reviewer #2 (Public Review):

      [...] Overall, this is a significant and well-executed research study. It offers new insights into the complex relationship between genetic changes and observable traits in evolving populations and utilizes metabolomics in the LTEE, a novel approach in combination with RNA-seq and mutation datasets.

      However, the paper's overall clarity is lacking. It is spread too thin and covers many topics without a clear focus. I strongly recommend a substantial rewrite of the manuscript, emphasizing structure and readability. The science is well executed, but the current writing does not do it justice.

      We thank the reviewer for their positive comments and their constructive feedback on the lack of clarity in writing. Following the reviewer’s suggestions, we have rewritten parts of the manuscript and reorganizd a few sections to improve readability. We hope the revised manuscript is significantly improved.

      Recommendations for the authors

      Reviewer #1 (Recommendations For The Authors):

      1) Title and Abstract: Add the study organism to the abstract, and probably also the title. Currently, E. coli is not mentioned in either! I'm also not sure that the LTEE is a sufficiently well-known acronym to abbreviate this in the title.

      We have revised the title of the manuscript and now spell out LTEE and included E. coli in the title and the abstract.

      2) Abstract: I would switch the usage of metabolome to metabolism in a few more places. For example, "changes in its metabolism", "networked and convoluted nature of metabolism". The metabolome, the concentrations of all metabolites, is what is being measured, but I think of this as a phenotypic readout of how metabolism evolving.

      We have changed “metabolome” to “metabolism” in cases where we refer to what is evolving and use “metabolome” when we refer to what is being measured.

      3) Line 16: Technically, the 12 LTEE populations were not initially identical. The Ara- differed from the Ara+ ancestors by one intentional mutation and one unintentional mutation that was not discovered until whole genomes were sequenced. I would rephrase this to "where 12 replicate populations of E. coli are propagated" or something similar so that it can be correct without needing to describe this unnecessary detail.

      The line has been rephrased as suggested.

      4) General Note: The text refers to populations as Ara-3 but the figures use A-3. I'd suggest going with A-3 and similar throughout for consistency.

      Instances of Ara have been changed to A+/-, and a sentence specifying as such has been added to the intro to make mention of this.

      5) Lines 43-44, 97-98. My understanding is that both S and L ecotypes in A-2 can use both glucose and acetate, but that the differentiation is related to their specialization that leads to each one being better on one or the other nutrient. The descriptions make it sound like each grows at a different time. Also, by definition, cells are not growing during "stationary phase". The change from glucose utilization (and acetate secretion) to acetate utilization during one cycle of growth is better described as a diauxic shift.

      We have reworded this part to remove mention of “growth” during stationary phase and changed the wording such that it no longer sounds like they grow at different times.

      6) Line 54: The statement "provide the ability to test hypotheses from previous data" is vague. Either provide an example or delete.

      We have removed this sentence as suggested.

      7) Lines 71-72: The terms "interphase" and "intraphase" sound too much like parts of the cell cycle. I'd suggest describing the comparisons as between and within growth phases.

      The use of intra and interphase have been changed as suggested.

      8) Line 79: The citrate is presumably still a chelating agent, so change phrasing to "Citrate is present in the medium because it was originally added as a chelating agent" or something similar.

      This sentence has been rewritten as suggested.

      9) Line 83: Write out "mutation accumulations" so it is easier to understand as "the number of mutations that have accumulated".

      The phrase has been changed as suggested.

      10) Line 116: It's unclear whether the abundances of metabolites are "strategies of survival" in stationary phase. An equally valid explanation is that there is less selection on the metabolome to have a specific composition during stationary phase to have high fitness.

      We have added a line about the possibility for alternative hypotheses.

      11) Figure 1: There seems to be some information missing from the legend. What are R06 and R07 in Panels A and B? Is panel D exponential phase and panel E stationary phase?

      This information was inadvertently missing from the caption and has been added.

      12) Figures 2 and 3: Gene names should be in italics. To me, the gray for deleted genes is hard to tell apart from the blue/red. Perhaps you could put a little X in these boxes instead? I think that having a little triangle pointing from each gene or metabolite name its corresponding abundance panel would help the reader track which information goes with which features. In Fig. 3 the placement of L-aspartate is a bit awkward. I'd suggest moving it down so the dashed line does not have to go through the abundance panel.

      These figures have been edited to include small triangles that link a gene or metabolite and its heatmap. Additionally, an X has been added where genes have suffered inactivating mutations and the placement of some elements has been moved to improve overall clarity.

      13) Lines 183-185: It would be easier to see and judge the consistency of these argR related relationships if a correlation graph of some kind was shown, probably as a supplemental figure. This plot could, for example, have genes/metabolites across the x-axis and fold-change on the y-axis with lines connecting points corresponding to each of the twelve populations across these categories (like Fig S8 but with lines added). Alternatively, it could be a heat map with the populations across one axis and the genes/metabolites across the other axis (like Fig S3).

      We have added a supplementary figure consisting of heatmaps showing the consistency of these changes within an evolved line. It is now figure S9.

      14) Line 195: I think adding a sentence elaborating on what exactly mutation accumulation means in this context would be helpful to readers.

      We have attempted to clarify the meaning of this by specifically stating that it is due to the accumulation of deleterious mutations.

      15) Line 293: Is standard LTEE medium DM25? These omics experiments with the LTEE sometimes use similar media with different glucose concentrations, and this is a very important detail to precisely specify.

      We reference “standard” LTEE medium in the methods section and have additionally specified the amount of sugar to make it clear that we are not supplementing the media with additional sugar.

      16) Figure S8B. Is "cystine" used instead of "cysteine" on purpose here since the compound is oxidized in the metabolomics treatment?

      The use of cystine is intentional, we detect the oxidized compound.

      Reviewer #2 (Recommendations For The Authors):

      Title:

      The abbreviation "LTEE" should not be in the title. Most readers will not recognize what it means. Instead, either the full name of the experiment, "Long-Term Evolution Experiment with E. coli," should be used, or the title should be rephrased to "Linking genotypic and phenotypic changes during a long-term evolution experiment using metabolomics."

      We have spelled out LTEE and included E. coli in the title.

      Abstract:

      Sentence 1: Consider softening the statement: "Do changes in an organism's environment, genome, or gene expression patterns often lead to changes in its metabolome?"

      We have rephrased this sentence to “Changes in an organism's environment, genome, or gene expression patterns can lead to changes in its metabolism”.

      Sentence 4: Use a hyphen for "Long-Term."

      This addition has been made.

      Sentence 4: Replace "transduce" with a more appropriate term: "...how the effects of mutations can be distributed through a cellular network to eventually affect metabolism and fitness."

      We have rewritten this sentence as “to understand how mutations can eventually affect metabolism and perhaps fitness”.

      Sentence 5: Clarify the use of "both" to refer to the ancestor of the LTEE and its descendant populations as two classes.

      We have reworded this sentence so it’s clear that the ancestors and evolved lines are two separate classes “We used mass-spectrometry to broadly survey the metabolomes of the ancestral strains and all 12 evolved lines…”.

      Sentence 6: Reverse the order for better emphasis: "Our work provides a better understanding of how mutations might affect fitness through the metabolome in the LTEE, and thus provides a major step in developing a complete genotype-phenotype map for this experimental system."

      We have rearranged this sentence per the reviewers suggestion.

      Introduction:

      Revise the introduction for clarity, readability, and logical narrative progression. Start with the second paragraph to set up the basic scientific principles being studied and then transition to describing the LTEE as a model system to examine those principles.

      The introduction has been rearranged and reworded in parts to increase clarity.

      Sentence 1: Revise for clarity: "The Long-Term Evolution Experiment (LTEE) has studied 12 initially identical populations of Escherichia coli as they have evolved in a carbon-limited, minimal glucose medium under a daily serial transfer regime."

      Sentence 2: Suggestion: "Begun in 1988, the LTEE populations have evolved for more than 75,000 generations, making it the longest-running experiment of its kind."

      Paragraph 2, sentence 2: Italicize "Drosophila."

      Paragraph 3, sentence 2: Make an important distinction: "Ara-3 is unique in that it evolved the ability to grow aerobically on citrate."

      Paragraph 3, sentence 4: Introduce the IS-mediated loss of the rbs operon in the LTEE as if it has not been described elsewhere.

      These suggestions have been incorporated into the manuscript.

      Results:

      Section 3.1: The use of samples from hours 2 and 24 to represent exponential and stationary phase may present some issues. For instance, capturing Ara-3 during its exponential growth on glucose, but not citrate, at hour 2. Furthermore, except for Ara-3, the LTEE populations reach stationary phase after approximately 4 hours, and there could be significant differences between early, mid, and late stationary phase. This possibility should be acknowledged, and future follow-up work should consider exploring these differences.

      We have added sentences in the first paragraph of the results section to include these details. We have also added a short paragraph to the conclusions suggesting additional studies of stationary phase, citing work on evolution of E. coli during long term stationary phase.

      Paragraph 3: While Turner et al. 2017 is an essential reference regarding resource use differences between Ara-3 and other LTEE populations, it would be more suitable to reference Blount et al. 2012 for the mutations that enabled access to citrate. Also, it is important to note that the difference lies in the ability to grow aerobically on citrate, rather than the ability to metabolize it.

      This citation has been added.

      Paragraph 4: As mentioned elsewhere, most LTEE populations exhibit balanced polymorphisms. Therefore, it is more appropriate to state that Ara-2 is the best-understood example of long-term diversity. It is likely that there are important metabolic differences between co-existing lineages in other LTEE populations.

      We now refer to Ara-2 as being the best-understood example of long term diversity..

      Paragraph 5: The first sentence of this paragraph should likely end with "levels."

      The word “levels” was added to the end of this sentence.

      Figure 3: It is preferable to refer to the "Superpathway of arginine and polyamine biosynthesis," citing EcoCyc as a reference, rather than a descriptor.

      This has been changed to a reference.

      Section 3.3, Paragraph 3: While higher intracellular amino acid abundances may facilitate higher translation rates and faster growth, the higher abundances themselves do not evaluate the hypothesis. To evaluate the hypothesis, it is necessary to demonstrate that higher abundances are associated with higher translation or growth rates. Therefore, the final sentence of this paragraph is not meaningful.

      We have reworded this sentence to say that it’s not possible to tell what the additional amino acids are being used for given only this data and that additional experiments are needed to confirm this hypothesis.

      Section 3.4: The first paragraph of this section misstates how evolution works. The low level of glucose in the LTEE does not drive innovation; instead, innovation occurs at random through the introduction of variation by mutation. Although the existence of the citrate resource acts as a reward that selects for variation that provides access to it, it is essential to remember that evolution is blind to such a reward. Moreover, regarding the evolution of the Cit+ trait, it is incorrect to assert that low glucose contributed to its evolution. As shown by Quandt et al. (2015), it seems probable that Cit+ evolution was potentiated by adaptation to specialization on acetate, which is produced by overflow metabolism resulting from rapid growth on glucose. This rapid growth only occurs when glucose is relatively abundant. The level of glucose seems low to us because it is low relative to traditional levels in bacteriological media, but not to the bacteria.

      We agree that this is a semantical, but important distinction. We have reworded this part as to not suggest that evolution has any forward thinking properties and is indeed blind to any rewards that might occur as the result of adaptation.

      In general, all instances of "utilize" and its cognates should be replaced with "use" and its cognates.

      Instances of “utilize” have been changed to use and its cognates.

      There is some uncertainty about the expectation of ramping up the TCA cycle in the LTEE. Overflow metabolism and acetate production appear to be prevalent in the LTEE, suggesting that many lineages only partially oxidize carbon derived from glucose, thereby bypassing the TCA cycle. While it is possible that this interpretation is incorrect, it would be helpful to see it addressed in the manuscript.

      We agree that this is a plausible hypothesis, we have added a paragraph at the end of this section that discusses the implications of overflow metabolism as an alternative hypothesis.

    1. Author Response

      Reviewer #1 (Public Review):

      In this study, the authors study the effect of dynactin disruption on kinetochore fiber (k-fiber) length in spindles of dividing cultured mammalian cells. Dynactin disruption is known to interfere with dynein function and hence spindle pole formation. The main findings are that poles are not required for correct average k-fiber length and that severed k-fibers can regrow to their correct length both in the presence and absence of poles by modulating their dynamic properties at both k-fiber ends. In the presence of poles, regrowth is faster and the variation between k-fiber lengths is smaller. This is a very interesting study with high-quality quantitative imaging data that provides important new insight into potential mechanisms of spindle scaling, extending in an original manner previous work on this topic in cultured cells and in Xenopus egg extract. The Discussion is interesting to read as several possible mechanisms for k-fiber length control are discussed. The technical quality of the study is very high, the experiments are very original, and most conclusions are well supported by the data. Especially, the experiments observing the regrowth of k-fibers after severing and the study of the dynamic properties of these k-fibers provide very novel insight. Addressing the following concerns could potentially improve the manuscript:

      We thank the reviewer for their fair, rigorous, and conceptually engaging remarks.

      (1) The phenotype generated here by disrupting dynactin via overexpressing p50 appears to be different from that caused by knocking down NuMA or dynein - as previously reported by the Dumont lab (Hueschen et al., 2019). In this study here, unfocused spindles are observed whereas earlier turbulent spindles were observed. This raises the question of whether dynein activity that contributes to pole focusing is really completely inhibited here. These discrepancies in phenotypes seem to deserve an explanation. Is k-fiber length in cultured mammalian cells only maintained in the case of this specific type of inhibition?

      We thank the reviewer for the important point about the different phenotypes observed in different dynein inhibition conditions and we refer them to our response to Essential Revision #1. In summary, we believe that different dynein inhibition phenotypes are similar. Unfocused spindles appear turbulent on longer timescales and appear to reach a steady-state on shorter timescales. The amount of pole-unfocusing also seems to correspond to the severity of dynein inhibition (Figure 1—figure supplement 1). We have chosen to study inhibited spindles that were steady-state and unfocused. We have added this discussion in line 129 as well as better characterized our system of dynein inhibition by adding two new figures (Figure 1—figure supplement 1, Figure 1—figure supplement 3).

      Furthermore, we address the question of whether dynein might still be responsible for length regulation despite poles being unfocused in line 433 of the Discussion: “recent work has revealed that mammalian spindles can achieve similar architecture whether or not dynein (or its recruiter NuMA) is knocked out (Neahring et al., 2021). This suggests that the severe defects in spindle coordination (Figure 1, Figure 5) and maintenance (Figure 2) observed in p50-unfocused spindles are more likely due to the loss of spindle poles than due to the loss of dynein activity per se.”

      We have additionally overexpressed p50 in human RPE1 cells and observed qualitatively similarly unfocused yet generally bi-oriented spindles as in rat kangaroo PtK2 cells, showing that the formation of unfocused spindles in PtK2 is not an artifact unique to that cell line (see newly added Figure 1—figure supplement 3). However, these unfocused RPE1 spindles did not have clear, resolvable k-fibers as in PtK2, so length was not quantified. The only method we are aware of that robustly unfocuses poles in PtK2 spindles is p50 overexpression.

      (2) p50 addition and also p150-cc1 addition was often used in Xenopus egg extract in order to inhibit dynein function. Considerably larger concentrations of p50 than p150-cc1 needed to be used. Can the authors estimate the level of overexpression of p50 in the cells they study? It seems that could be possible given that a mCherry fusion protein can be overexpressed. Was it necessary to select cells with a particular level of mCherry-p50 overexpression to observe the reported phenotypes?

      We thank the reviewers for the suggestion to quantify p50 expression and have added Figure 1—figure supplement 1. Due to gradual red laser power loss over months, data from a single day were plotted for proper comparison, but trends were always consistent within any given day. As discussed above, we observed that higher levels of mean p50 intensity corresponded to unfocused spindles. We have clarified that we chose to study these highly overexpressing unfocused spindles in the text and methods, and we speculate that level of p50 overexpression correlates with amount of dynein inhibition and subsequent pole-unfocusing. This is also consistent with the higher concentrations of p50 needed to inhibit dynein in Xenopus.

      (3) Some comparison to previous experiments using p50 and p150-cc1 addition to Xenopus egg extract spindles could put this study better into the context of the available literature. It seems from previous publications that the p50 addition produced short, unfocused, barrel-shaped spindles, indicating that spindle length is maintained without poles, whereas the p150-cc1 addition produced elongating spindles (e.g. Gaetz & Kapoor, 2004).

      We appreciate the reviewer’s discussion of dynein inhibition in the Xenopus context.

      While Xenopus has been used to study spindle size regulation, it has not been as useful to study k-fiber length regulation, which we focus on. Xenopus spindles have a different architecture, with k-fibers that are not discrete and continuous like in mammalian spindles. Indeed, while p50 and p150-CC1 overexpression alter spindle length in Xenopus, they do not have the same effect in mammalian spindles. Additionally, p150-CC1 does not robustly unfocus poles in mammalian spindles as it does in Xenopus; instead, it leads to an inconsistent variety of spindle disorganization phenotypes with frequently focused poles in PtK2 (data not shown). We speculate this variety of spindle phenotypes arise from a different mechanism of dynein inhibition that does not fully target pole-focusing.

      However, we agree that referencing prior Xenopus work establishes important context and precedent. In line 95 of the Introduction, we state “…inhibiting dynein unfocuses poles but spindles still form albeit with altered lengths in Drosophila (Goshima et al., 2005) and Xenopus (Gaetz and Kapoor, 2004; Heald et al., 1996; Merdes et al., 1996), and without a clear effect on mammalian spindle length (Guild et al., 2017; Howell et al., 2001),” addressing the different effects of dynein inhibition in Xenopus compared to mammalian spindles. We have also added direct mentions of p50 in Xenopus in line 129 (see Essential Revision #1 response).

      Finally, we have added a figure showing overexpression of p50 in a human RPE1 cells to show reproducibility of pole unfocusing across other mammalian cell types (see newly added Figure 1—figure supplement 3).

      (4) In this context, it seems that some more explanation is required for the observations presented in Fig. 1D and 1E. It appears that spindle length and k-fiber length have been measured quite differently. Not much information is provided for how spindle length was defined and measured (please expand this part of the Methods). Could the two different methods of measurement be the reason for the mean k-fiber length remaining unaltered in dynactin-disrupted spindles, whereas the spindle length increases in these cells? If not, do non-k-fiber microtubules contribute to unfocused spindles being longer or are chromosomes not aligned in the metaphase plate causing the increase in spindle length by misalignment of k-fiber sister pairs?

      We thank the reviewers for pointing out the lack of clarity in Figures 1D and 1E. We have expanded and clarified the Methods section describing how spindle axes were measured and how k-fiber lengths were measured, as well as included examples and cartoons to illustrate them (see newly added Figure1—figure supplement 4).

      To clarify, we did not intend to directly measure spindle length, but we did approximate the size of each spindle’s “footprint” in Figure 1D as well as measure individual k-fiber length in Figure 1E. It is now clarified in the Methods line 898 as “Spindle minor and major axes lengths were determined by cropping, rotating, then thresholding spindle images with the Otsu filter using SciKit. Ellipses were fitted to thresholded spindles to approximate the length of their major and minor axes using SciKit’s region properties measurement (Figure1—figure supplement 4A). In control spindles, the major axis corresponded to spindle length along the pole-to-pole axis, and the minor axis corresponded to spindle width along the metaphase plate axis. However, unfocused spindles were disorganized along both axes to the extent where the minor axis did not always correspond to the metaphase plate axis. Thus, Figure 1D reports ”spindle minor axis length” and “spindle major axis length” rather than “spindle width” and “spindle length”. Furthermore, it is worth noting that in unfocused spindles, spindle length is decoupled from k-fiber length because of k-fiber disorganization along both axes. Thus, spindle length was not measured in unfocused spindles...”

      We additionally removed the potentially confusing terminology of “wider” and “longer” in the Results section to make clear that we are approximating spindle size, not spindle length and width, and we now state in line 168,“ k-fibers were more spread out in the cell, with spindles covering a larger area compared to control along both its major and minor axes (Figure 1D).”

      We believe our clarification and expansion of the Methods section, as well as inclusion of a new supplementary figure and cartoon address the reviewer’s points, and we thank them for pointing out the lack of clarity.

      (5) It seems that in the Discussion it is implied that k-fibers can respond to severing in both focused and unfocused spindles by modulating their dynamics at both ends of the k-fibers, but in the Results section the wording is more cautious because of the difference in 'flux' in severed and unsevered unfocused spindles is not significant (Fig. 4D, blue data). It appears indeed that there is also a difference in flux between severed and unsevered unfocused spindles, but the number of data points is too small. Depending on how difficult these experiments are, it could be worth increasing the size of the data set to come to a clear conclusion, given that the data shown in Figs. 3 and 4 are quite remarkable and form the core of the study.

      We appreciate the reviewer’s close reading and pertinent suggestions.

      As detailed in our response to Essential Revision #3, we did not increase the sample size for unfocused spindles since it would not be reasonably feasible to show significant differences in flux. However, we performed more ablations and photomarking in control spindles as detailed in our response to this reviewer’s point 6 below, a different but related point.

      (6) Can the authors exclude that the stopping of 'flux' at minus ends after severing is due to some sort of permanent damage induced by ablation? In other words, do severed spindles begin to flux again once they have regrown to their original length?

      We thank the reviewer for their important points.

      We have addressed this question in the newly added Figure 4—figure supplement 1 as described in our response to Essential Revision #3 to show that flux resumes after length recovery. In summary, we observed no adverse effects of ablation on k-fiber minus-ends. Severed k-fibers have restored lengths, and minus-end dynamics several minutes after ablation.

      (7) To this reader, the conceptualization of distinguishing between 'global' and 'local' effects/behavior was a little confusing, both in the title and also later in the text. The concept of 'local' regulation of k-fiber length appears to contradict the observation that k-fiber length can be regained after severing by changes in the dynamics at both ends (so at two very different locations) which is a rather remarkable finding. Maybe distinguishing between 'individual' and 'collective' k-fiber behavior could be clearer.

      We appreciate the reviewer’s consideration of terminology. We have addressed this by clearly defining our use of ‘local’ to refer to individual k-fibers as a unit where appropriate in the text (lines 271, 449). We chose these terms since they can help describe individual versus collective properties, while simultaneously emphasizing the aspects of global architecture and spatial organization in the spindle.

      (8) Can the authors exclude that some of the differences between unfocused and focused spindles could be due to altered dynein activity at kinetochores? Or due to the dynein-dependent accumulation of certain spindle proteins along microtubules towards the minus ends of k-fibers or other spindle microtubules, instead of being due to only the presence versus absence of poles? Could this be tested by ablating both poles? If this is too challenging, a discussion of these possibilities could be justified.

      We appreciate the reviewer’s consideration of kinetochore activity as well as other methods of removing poles. However, p50 overexpression is currently the only method to robustly unfocus spindles in PtK2 cells – ablating poles or removing pole-associated structures such as centrosomes does not abolish pole-focusing in this system (Khodjakov et al., 2000). Furthermore, we now discuss the possibility that altered dynein activity (such as activity at kinetochores) may give rise to the phenotypes we describe in our work in line 433: “…recent work has revealed that mammalian spindles can achieve similar architecture whether or not dynein (or its recruiter NuMA) is knocked out (Neahring et al., 2021). This suggests that the severe defects in spindle coordination (Figure 1, Figure 5) and maintenance (Figure 2) observed in p50-unfocused spindles are more likely due to the loss of spindle poles than due to the loss of dynein activity per se. Though we cannot exclude it, this also suggests that the findings we make in unfocused spindles are not due changes in activity of the dynein population at kinetochores.”

      Reviewer #2 (Public Review):

      The mitotic spindle of eukaryotic cells is a microtubule-based assembly responsible for chromosome segregation during cell division. For a given cell type, the steady-state size and shape of this structure are remarkably consistent. How this morphologic consistency is achieved, particularly when one considers the complex interplay between dynamic microtubules, spatial and temporal regulation of microtubule nucleation, and the activities of several microtubule-based motor proteins, remains a fundamental unanswered question in cell biology. In this work by Richter et al., the authors use biochemical and biophysical perturbations to explore the feedback between mitotic spindle shape and the dynamics of one of its main structural elements, kinetochore fibers (k-fibers) - bundles of microtubules that extend from kinetochores to spindle poles. Overexpression of the p50 dynactin subunit in mammalian tissue culture cells (Ptk2) was used to inhibit the microtubule motor cytoplasmic dynein resulting in misshapen spindles with unfocused poles. Measurements of k-fiber lengths in control and unfocused conditions showed that although mean k-fiber length was not statistically different, the variation of length was significantly higher in unfocused spindles, suggesting that k-fiber length is set locally, occurring in the absence of focused poles. With a clever combination of live-cell imaging with photoablation and/or photobleaching of fluorescently-labeled k-fibers, the authors went on to explore the mechanistic bases of this length regulation. K-fiber regrowth following ablation occurred in both conditions, albeit more slowly in unfocused spindles. Paired ablation and localized photobleaching on the same k-fiber revealed that microtubule dynamics, specifically those at the plus-end, can be tuned at the level of individual k-fiber. Lastly, the authors show that chromosome segregation is severely impaired when cells with unfocused spindles are forced to enter mitosis. The work's biggest strength is the application of an innovative experimental approach to address thoughtful and well-articulated hypotheses and predictions. Conclusions stemming from the experiments are generally well-supported, though the experiments addressing the "tuning" of k-fiber dynamics could be bolstered by additional data points and perhaps better presented. The manuscript would also benefit from the inclusion of some investigation of spatial differences in the observed effects as well as the molecular and biophysical basis of the observed feedback between k-fiber length and focused poles.

      We appreciate the reviewer providing pertinent, rigorous, and intellectually astute suggestions.

      Comments/Concerns/Questions:

      1) In the discussion, the authors acknowledge that the changes in spindle morphology resulting from p50 overexpression are likely also causing changes in the well-characterized RanGTP/SAF gradients that radiate from chromosome surfaces. Why did the authors did not include an analysis of k-fiber length as a function of positioning within the spindle? The inclusion of this data would not require more experimentation and could be added as a plot showing K-fiber length versus distance from the geometric center of the spindle (defined by the intersection of the major and minor axes perhaps?).

      We thank the reviewer for this pertinent suggestion and refer them to our response to Essential Revision #2. Briefly, we have added the recommended analyses to Figure 1—figure supplement 6 by correlating k-fiber length to position along the spindle’s longitudinal and latitudinal axes.

      2) The authors also acknowledge the established relationship between MT length and MT end dynamics, yet in their ablation studies, the average initial k-fiber length at ablation in control spindles was higher than that for k-fibers in unfocused spindles. It seems that this difference makes the interpretation of the data, particularly the conclusion that fiber growth rates differ due to the absence of focused poles, a bit tenuous. To address this, the authors should consider including plots of grow-back rates versus k-fiber length (again, this should not require additional experiments, just more analysis).

      We thank the reviewer for their critical thinking about experiments. We would like to clarify to the reviewer that initial k-fiber lengths within unfocused spindles preceding ablation were not actually longer on average compared to the average length of control k-fibers from Figure 1E (Figure 2—figure supplement 1). We apologize that this unexpected artifact was not clear in the text and have now reworded line 232 to be more straightforward: “Mean k-fiber lengths in unfocused spindles before ablation appeared to be shorter (Figure 2D); however, this was due to not capturing the full length of k-fibers in a single z-plane while imaging ablated k-fibers. Indeed, length analysis of full z-stacks from unfocused spindles before ablation yielded an indistinguishable mean k-fiber length compared to control k-fibers in Figure 1E (Figure 2—figure supplement 1). Thus, ablated k-fibers were compared to their unablated neighbors as internal controls.”

      We believe that this language clearly calls out the perceived inconsistency, and that our use of internal controls overcomes this confounding factor to make meaningful conclusions. We address the relationship of k-fiber length and growth rate in our response to Essential Revision #2. We are not including it in the manuscript based on our inability to make any meaningful conclusion to either support or exclude the possibility of length-dependent growth rates.

      3) As presented, the data shown in Figure 4 is confusing and does not seem very compelling. The relationship between the kymographs and time series is unclear as is the relationship between the dashed lines in the kymographs and the triangles and the plots in the 4B time series and 4C, respectively. Furthermore, it's not always clear what the triangles are pointing to (e.g. in the unfocused condition time series). The authors might want to consider reworking this figure and providing more measurements of flux following ablation in both the control and unfocused conditions. Lastly, the authors should clarify what negative displacement means.

      We apologize for the unclear figure annotations and thank reviewers for their suggestions. As discussed in our response to Essential Revision #3, we believe we have improved the clarity and presentation of figures and kymographs. More measurements of flux after ablation in unfocused spindles was not feasible as discussed; however, we have performed these measurements in control spindles and added Figure 4—figure supplement 1 to strengthen conclusions about turning flux off/on after ablation.

      We have additionally clarified axis titles by replacing “negative displacement” with the more intuitive descriptor “photomark position relative to minus-end” and clearly defining it in the figure legends in line 565 as follows: “Figure 3 […] (D) Minus-end dynamics, where photomark position over time describes how the mark approaches the k-fiber’s minus-end over time in control and unfocused k-fibers.”

      We thank reviewers for their suggestions to improve clarity and bolster our conclusions.

    1. Author Response

      We thank the Editor for his assessment. We agree that the data we present in this manuscript can be a starting point for more in-depth analysis. We are currently developing a mathematical model of HIV transmission dynamics; we plan to use the data that we present in this paper as parameter values.

      Reviewer #1 (Public Review):

      One aim of this paper was to study historical migration from Botswana during the time of the development of the HIV epidemic. The second aim was to test whether the migration networks impacted the development of the epidemic. The first aim was achieved: this paper used historical census data in a clear way, to describe the qualities of characteristics of migration in the country at four points in time, from 1981 to 2011. Very detailed data are presented in clear ways, using network chord diagrams, sharing age- and sex-specific migration rates, and urban-rural classifications. However, data was not presented to achieve the second aim. The authors reviewed some important literature about migration and HIV. They suggested that the migration patterns, such as from specific mining towns and mostly between districts, could have been important in supporting the generalized spread of HIV. But without evidence linking HIV prevalence over time in the linked districts in Botswana, this aim was not supported.

      We have now made it clear that we are not testing whether the migration networks impacted the development of Botswana’s HIV epidemic: this is what the Reviewer describes as the second aim of our paper. We have only one aim: to test the hypothesis that, during the development of Botswana’s HIV epidemic, the population was extremely mobile and highly connected through migratory flows and counter-flows. This is based on the fact that these conditions are necessary for the development of a generalized HIV epidemic. However – previous to our analysis – these conditions have not been shown to occur during the development of a generalized HIV epidemic. Given that our results support our mobility hypothesis (i.e., that the population was very mobile and essentially all the districts were connected throughout the country), in the discussion (lines 338-362) we describe how the migration networks that we have identified may have impacted the development of the generalized hyperendemic HIV epidemic in Botswana. We have also clarified that our study has only one hypothesis that we are testing by referring to this single hypothesis as the mobility hypothesis (Abstract: lines 25-29).

      One other limitation of the paper was that very little context, outside of migration rates, was provided. Is there any additional information about economic growth, or political event for example, that could clarify or add context to these migration flows? As it stands now, these analyses are quite basic and don't take into account underlying demographic, economic, or political trends.

      In response to this concern we have expanded the text in the introduction to provide more context regarding political, demographic and economic factors (Introduction: lines 66-75). We have also expanded our discussion of the implications of our results (and of additional results that we have included: lines 263-283) for understanding the role of internal migration on urbanization in Botswana (Discussion: lines 379-420); urbanization occurred simultaneously to the development of Botswana’s generalized hyperendemic HIV epidemic.

      The data presented in this paper has potential impact. As the paper stands now, it could be quite useful for future work when linked to additional data sources on HIV prevalence over time (or other questions that could have been influenced by migration patterns).

      We thank this Reviewer for their helpful comments.

      Reviewer #2 (Public Review):

      To provide context into the HIV epidemic in Botswana over the latter half of the 20th century and the beginning of the 21st, the authors have analyzed micro census data to examine patterns of migration. They use this dataset to show how patterns between urban and rural areas have changed over several decades, and the demographic characteristics of migrants. The dataset used for this study is a very reliable source, and the insights in terms of migration patterns are interesting. The primary weakness of the analyses regards the link to HIV transmission: micro-census data only examine mobility that leads to individuals changing residence for longer periods of time, without accounting for shorter-term trips that may also lead to HIV transmission, such as seasonal migration or short trips. This is likely less of an issue with HIV than other diseases, however, due to its transmission often involving new sexual partners, which will generally be less likely to occur during short trips. Broadly, however, this is an interesting report on the migration patterns during a critical period for HIV transmission nationwide.

      We thank the Reviewer for their comments.

      In our current manuscript, we have discussed the potential impact of mobility on Botswana’s HIV epidemic, and focused on migration, i.e., one directional movement in terms of a permanent re-location of residency. This type of migration, by changing an individual’s sexual network and social environment, has been shown to increase the risk of acquiring HIV for both women and men. Short-term mobility (e.g., short-term circular migration, where the trip can range in duration from overnight to an entire season) can also affect HIV transmission dynamics. Circular migrants have been shown to both have an increased risk of acquiring HIV, and of transmitting HIV. The greater the number of trips and/or the duration of the trip, the greater the risk. We note that both migration and short-term mobility are important, and their relative importance to each other is likely to evolve over time as a generalized HIV epidemic diffuses through the population. Their relative importance is also likely to vary amongst countries in sub-Saharan Africa.

      We have added all of the previous paragraph, with citations, to the text (Discussion: lines 364-377).

    1. Author Response

      Reviewer #1 (Public Review):

      1) Although I found the introduction well written, I think it lacks some information or needs to develop more on some ideas (e.g., differences between the cerebellum and cerebral cortex, and folding patterns of both structures). For example, after stating that "Many aspects of the organization of the cerebellum and cerebrum are, however, very different" (1st paragraph), I think the authors need to develop more on what these differences are. Perhaps just rearranging some of the text/paragraphs will help make it better for a broad audience (e.g., authors could move the next paragraph up, i.e., "While the cx is unique to mammals (...)").

      We have added additional context to the introduction and developed the differences between cerebral and cerebellar cortex, also re-arranging the text as suggested.

      2) Given that the authors compare the folding patterns between the cerebrum and cerebellum, another point that could be mentioned in the introduction is the fact that the cerebellum is convoluted in every mammalian species (and non-mammalian spp as well) while the cerebrum tends to be convoluted in species with larger brains. Why is that so? Do we know about it (check Van Essen et al., 2018)? I think this is an important point to raise in the introduction and to bring it back into the discussion with the results.

      We now mention in the introduction the fact that the cerebellum is folded in mammals, birds and some fishes, and provide references to the relevant literature. We have also expanded our discussion about the reasons for cortical folding in the discussion, which now contains a subsection addressing the subject (this includes references to the work of Van Essen).

      3) In the results, first paragraph, what do the authors mean by the volume of the medial cerebellum? This needs clarification.

      We have modified the relevant section in the results, and made the definition of the medial cerebellum more clear indicating that we refer to the vermal region of the cerebellum.

      4) In the results: When the authors mention 'frequency of cerebellar folding', do they mean the degree of folding in the cerebellum? At least in non-mammalian species, many studies have tried to compare the 'degree or frequency of folding' in the cerebellum by different proxies/measurements (see Iwaniuk et al., 2006; Yopak et al., 2007; Lisney et al., 2007; Yopak et al., 2016; Cunha et al., 2022). Perhaps change the phrase in the second paragraph of the result to: "There are no comparative analyses of the frequency of cerebellar folding in mammals, to our knowledge".

      We have modified the subsection in the methods referring to the measurement of folial width and folial perimeter to make the difference more clear. The folding indices that have been used previously (which we cite) are based on Zilles’s gyrification index. This index provides only a global idea of degree of folding, but it’s unable to distinguish a cortex with profuse shallow folds from one with a few deep ones. An example of this is now illustrated in Fig. 3d, where we also show how that problem is solved by the use of our two measurements (folial width and perimeter). The problem is also discussed in the section about the measurement of folding in the discussion section:

      “Previous studies of cerebellar folding have relied either on a qualitative visual score (Yopak et al. 2007, Lisney et al. 2008) or a “gyrification index” based on the method introduced by Zilles et al. (1988, 1989) for the study of cerebral folding (Iwaniuk et al. 2006, Cunha et al. 2020, 2021). Zilles’s gyrification index is the ratio between the length of the outer contour of the cortex and the length of an idealised envelope meant to reflect the length of the cortex if it were not folded. For instance, a completely lissencephalic cortex would have a gyrification index close to 1, while a human cerebral cortex typically has a gyrification index of ~2.5 (Zilles et al. 1988). This method has certain limitations, as highlighted by various researchers (Germanaud et al. 2012, 2014, Rabiei et al. 2018, Schaer et al. 2008, Toro et al. 2008, Heuer et al. 2019). One important drawback is that the gyrification index produces the same value for contours with wide variations in folding frequency and amplitude, as illustrated in Fig. 3d. In reality, folding frequency (inverse of folding wavelength) and folding amplitude represent two distinct dimensions of folding that cannot be adequately captured by a single number confusing both dimensions. To address this issue we introduced 2 measurements of folding: folial width and folial perimeter. These measurements can be directly linked to folding frequency and amplitude, and are comparable to the folding depth and folding wavelength we introduced previously for cerebral 3D meshes (Heuer et al. 2019). By using these measurements, we can differentiate folding patterns that could be confused when using a single value such as the gyrification index (Fig. 3d). Additionally, these two dimensions of folding are important, because they can be related to the predictions made by biomechanical models of cortical folding, as we will discuss now.”

      5) Sultan and Braitenberg (1993) measured cerebella that were sagittally sectioned (instead of coronal), right? Do you think this difference in the plane of the section could be one of the reasons explaining different results on folial width between studies? Why does the foliation index calculated by Sultan and Braitenberg (1993) not provide information about folding frequency?

      The measurement of foliation should be similar as far as enough folds are sectioned perpendicular to their main axis. This will be the case for folds in the medial cerebellum (vermis) sectioned sagittally, and for folds in the lateral cerebellum sectioned coronally. The foliation index of Sultan and Braitenberg does not provide a similar account of folding frequency as we do because they only measure groups of folia (what some called lamellae), whereas we measure individual folia. It is not easy to understand exactly how Sultan and Braitenberg proceeded from their paper. We contacted Prof. Fahad Sultan (we acknowledge his help in our manuscript). Author response image 1 provides a more clear description of their procedure:

      Author response image 1.

      As Author response image 1 shows, each of the structures that they call a fold is composed of several folia, and so their measurements are not comparable with ours which measure individual folia (a). The flattened representation (b) is made by stacking the lengths of the fold axes (dashed lines), separating them by the total length of each fold (the solid lines), which each may contain several folia.

      6) Another point that needs to be clarified is the log transformation of the data. Did the authors use log-transformed data for all types of analyses done in the study? Write this information in the material and methods.

      Yes, we used the log10 transformation for all our measurements. This is now mentioned in the methods section, and again in the section concerning allometry. We are including a link to all our code to facilitate exact replication of our entire method, including this transformation.

      7) The discussion needs to be expanded. The focus of the paper is on the folding pattern of the cerebellum (among different mammalian species) and its relationship with the anatomy of the cerebrum. Therefore, the discussion on this topic needs to be better developed, in my opinion (especially given the interesting results of this paper). For example, with the findings of this study, what can we say about how the folding of the cerebellum is determined across mammals? The authors found that the folial width, folial perimeter, and thickness of the molecular layer increase at a relatively slow rate across the species studied. Does this mean that these parameters have little influence on the cerebellar folding pattern? What mostly defines the folding patterns of the cerebellum given the results? Is it the interaction between section length and area? Can the authors explain why size does not seem to be a "limiting factor" for the folding of the cerebellum (for example, even relatively small cerebella are folded)? Is that because the 'white matter' core of the cerebellum is relatively small (thus more stress on it)?

      We have expanded the discussion as suggested, with subsections detailing the measuring of folding, the modelling of folding for the cerebrum and the cerebellum, and the role that cerebellar folding may play in its function. We refer to the literature on cortical folding modelling, and we discuss our results in terms of the factors that this research has highlighted as critical for folding. From the discussion subsection on models of cortical folding:

      “The folding of the cerebral cortex has been the focus of intense research, both from the perspective of neurobiology (Borrell 2018, Fernández and Borrell 2023) and physics (Toro and Burnod 2005, Tallinen et al. 2014, Kroenke and Bayly 2018). Current biomechanical models suggest that cortical folding should result from a buckling instability triggered by the growth of the cortical grey matter on top of the white matter core. In such systems, the growing layer should first expand without folding, increasing the stress in the core. But this configuration is unstable, and if growth continues stress is released through cortical folding. The wavelength of folding depends on cortical thickness, and folding models such as the one by Tallinen et al. (2014) predict a neocortical folding wavelength which corresponds well with the one observed in real cortices. Tallinen et al. (2014) provided a prediction for the relationship between folding wavelength λ and the mean thickness (𝑡) of the cortical layer: λ = 2π𝑡(µ/(3µ𝑠))1/3. (...)”

      From this biomechanical framework, our answers to the questions of the Reviewer would be:

      • How is the folding of the cerebellum determined across mammals? By the expansion of a layer of reduced thickness on top of an elastic layer (the white matter)

      • Folial width, folial perimeter, and thickness of the molecular layer increase at a relatively slow rate across the species studied. Does this mean that these parameters have little influence on the cerebellar folding pattern? On the contrary, that indicates that the shape of individual folia is stable, providing the smallest level of granularity of a folding pattern. In the extreme case where all folia had exactly the same size, a small cerebellum would have enough space to accommodate only a few folia, whereas a large cerebellum would accommodate many more.

      • What mostly defines the folding patterns of the cerebellum given the results? Is it the interaction between section length and area? It’s the mostly 2D expansion of the cerebellar cortical layer and its thickness.

      • Can the authors explain why size does not seem to be a "limiting factor" for the folding of the cerebellum? Because even a cerebellum of very small volume would fold if its cortex were thin enough and expanded sufficiently. That’s why the cerebellum folds even while being smaller than the cerebrum: because its cortex is much thinner.

      8) One caveat or point to be raised is the fact that the authors use the median of the variables measured for the whole cerebellum (e.g., median width and median perimeter across all folia). Although the cerebellum is highly uniform in its gross internal morphology and circuitry's organization across most vertebrates, there is evidence showing that the cerebellum may be organized in different functional modules. In that way, different regions or folia of the cerebellum would have different olivo-cortico-nuclear circuitries, forming, each one, a single cerebellar zone. Although it is not completely clear how these modules/zones are organized within the cerebellum, I think the authors could acknowledge this at the end of their discussion, and raise potential ideas for future studies (e.g., analyse folding of the cerebellum within the brain structure - vermis vs lateral cerebellum, for example). I think this would be a good way to emphasize the importance of the results of this study and what are the main questions remaining to be answered. For example, the expansion of the lateral cerebellum in mammals is suggested to be linked with the evolution of vocal learning in different clades (see Smaers et al., 2018). An interesting question would be to understand how foliation within the lateral cerebellum varies across mammalian clades and whether this has something to do with the cellular composition or any other aspect of the microanatomy as well as the evolution of different cognitive skills in mammals.

      We now address this point in a subsection of the discussion which details the implications of our methodological decisions and the limitations of our approach. It is true that the cerebellum is regionally variable. Our measurements of folial width, folial perimeter and molecular layer thickness are local, and we should be able to use them in the future to study regional variation. However, this comes with a number of difficulties. First, it would require sampling all the cerebellum (and the cerebrum) and not just one section. But even if that were possible that would increase the number of phenotypes, beyond the current scope of this study. Our central question about brain folding in the cerebellum compared to the cerebrum is addressed by providing data for a substantial number of mammalian species. As indicated by Reviewer #3, adding more variables makes phylogenetic comparative analyses very difficult because the models to fit become too large.

      Reviewer #2 (Public Review):

      1) The methods section does not address all the numerical methods used to make sense of the different brain metrics.

      We now provide more detailed descriptions of our measurements of foliation, phylogenetic models, analysis of partial correlations, phylogenetic principal components, and allometry. We have added illustrations (to Figs. 3 and 5), examples and references to the relevant literature.

      2) In the results section, it sometimes makes it difficult for the reader to understand the reason for a sub-analysis and the interpretation of the numerical findings.

      The revised version of our manuscript includes motivations for the different types of analyses, and we have also added a paragraph providing a guide to the structure of our results.

      3) The originality of the article is not sufficiently brought forward:

      a) the novel method to detect the depth of the molecular layer is not contextualized in order to understand the shortcomings of previously-established methods. This prevents the reader from understanding its added value and hinders its potential re-use in further studies.

      The revised version of the manuscript provides additional context which highlights the novelty of our approach, in particular concerning the measurement of folding and the use of phylogenetic comparative models. The limitations of the previous approaches are stated more clearly, and illustrated in Figs. 3 and 5.

      b) The numerous results reported are not sufficiently addressed in the discussion for the reader to get a full grasp of their implications, hindering the clarity of the overall conclusion of the article.

      Following the Reviewer’s advice, we have thoroughly restructured our results and discussion section.

      Reviewer #3 (Public Review):

      1) The first problem relates to their use of the Ornstein-Uhlenbeck (OU) model: they try fitting three evolutionary models, and conclude that the Ornstein-Uhlenbeck model provides the best fit. However, it has been known for a while that OU models are prone to bias and that the apparent superiority of OU models over Brownian Motion is often an artefact, a problem that increases with smaller sample sizes. (Cooper et al (2016) Biological Journal of the Linnean Society, 2016, 118, 64-77).

      Cooper et al.’s (2016) article “A Cautionary Note on the Use of Ornstein Uhlenbeck Models in Macroevolutionary Studies” suggests that comparing evolutionary models using the model’s likelihood leads often to incorrectly selecting OU over BM even for data generated from a BM process. However, Grabowski et al (2023) in their article ‘A Cautionary Note on “A Cautionary Note on the Use of Ornstein Uhlenbeck Models in Macroevolutionary Studies”’ suggest that Cooper et al.’s (2016) claim may be misleading. The work of Clavel et al. (2019) and Clavel and Morlon (2017) shows that the penalised framework implemented in mvMORPH can successfully recover the parameters of a multivariate OU process. To address more directly the concern of the Reviewer, we used simulations to evaluate the chances that we would decide for an OU model when the correct model was BM – a similar procedure to the one used by Cooper et al.’s (2016). However, instead of using the likelihood of the fitted models directly as Cooper et al. (2016) – which does not control for the number of parameters in the model – we used the Akaike Information Criterion, corrected for small sample sizes: AICc. The standard Akaike Information Criterion takes the number of parameters of the model into account, but this is not sufficient when the sample size is small. AICc provides a score which takes both aspects into account: model complexity and sample size. This information has been added to the manuscript:

      “We selected the best fitting model using the Akaike Information Criterion (AIC), corrected for 𝐴𝐼𝐶 = − 2 𝑙𝑜𝑔(𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑) + 2 𝑝. This approximation is insufficient when the𝑝 sample size small sample sizes (AICc). AIC takes into account the number of parameters in the model: is small, in which case an additional correction is required, leading to the corrected AIC: 𝐴𝐼𝐶𝑐 = 𝐴𝐼𝐶 + (2𝑝2 + 2𝑝)/(𝑛 − 𝑝 − 1), where 𝑛 is the sample size.”

      In 1000 simulations of 9 correlated multivariate traits for 56 species (i.e., 56*9 data points) using our phylogenetic tree, only 0.7% of the times we would decide for OU when the real model was BM.

      2) Second, for the partial correlations (e.g. fig 7) and Principal Components (fig 8) there is a concern about over-fitting: there are 9 variables and only 56 data points (violating the minimal rule of thumb that there should be >10 observations per parameter). Added to this, the inclusion of variables lacks a clear theoretical rationale. The high correlations between most variables will be in part because they are to some extent measuring the same things, e.g. the five different measures of cerebellar anatomy which include two measures of folial size. This makes it difficult to separate their effects. I get that the authors are trying to tease apart different aspects of size, but in practice, I think these results (e.g. the presence of negative coefficients in Fig 7) are really hard or impossible to interpret. The partial correlation network looks like a "correlational salad" rather than a theoretically motivated hypothesis test. It isn't clear to me that the PC analyses solve this problem, but it partly depends on the aims of these analyses, which are not made very clear.

      PCA is simply a rigid rotation of the data, distances among multivariate data points are all conserved. Neither our PCA nor our partial correlation analysis involve model fitting, the concept of overfitting does not apply. PCA and partial correlations are also not used here for hypothesis testing, but as exploratory methods which provide a transformation of the data aiming at capturing the main trends of multivariate change. The aim of our analysis of correlation structure is precisely to avoid the “correlational salad” that the Reviewer mentions. The Reviewer is correct: all our variables are correlated to a varying degree (note that there are 56 data points per variable = 56*9 data points, not just 56 data points). Partial correlations and PCA aim at providing a principled way in which correlated measurements can be explored. In the revised version of the manuscript we include a more detailed description of partial correlations and PCA (phylogenetic). Whenever variables measure the same thing, they will be combined into the same principal component (these are the combinations shown in Fig. 8 b and d). Additionally, two variables may be correlated because of their correlation with a third variable (or more). Partial correlations address this possibility by looking at the correlations between the residuals of each pair of variables after all other variables have been covaried out. We provide a simple example which should make this clear, providing in particular an intuition for the meaning of negative correlations:

      “All our phenotypes were strongly correlated. We used partial correlations to better understand pairwise relationships. The partial correlation between 2 vectors of measurements a and b is the correlation between their residuals after the influence of all other measurements has been covaried out. Even if the correlation between a and b is strong and positive, their partial correlation could be 0 or even negative. Consider, for example, 3 vectors of measurements a, b, c, which result from the combination of uncorrelated random vectors x, y, z. Suppose that a = 0.5 x + 0.2 y + 0.1 z, b = 0.5 x - 0.2 y + 0.1 z, and c = x. The measurements a and b will be positively correlated because of the effect of x and z. However, if we compute the residuals of a and b after covarying the effect of c (i.e., x), their partial correlation will be negative because of the opposite effect of y on a and b. The statistical significance of each partial correlation being different than 0 was estimated using the edge exclusion test introduced by Whittaker (1990).”

      The rationale for our analyses has been made more clear in the revised version of the manuscript, aided by the more detailed description of our methods. In particular, we describe better the reason for our 2 measurements of folial shape – width and perimeter – which measure independent dimensions of folding (this is illustrated in Fig. 3d).

      3) The claim of concerted evolution between cortical and cerebellar values (P 11-12) seems to be based on analyses that exclude body size and brain size. It, therefore, seems possible - or even likely - that all these analyses reveal overall size effects that similarly influence the cortex and cerebellum. When the authors state that they performed a second PC analysis with body and brain size removed "to better understand the patterns of neuroanatomical evolution" it isn't clear to me that is what this achieves. A test would be a model something like [cerebellar measure ~ cortical measure + rest of the brain measure], and this would deal with the problem of 'correlation salad' noted below.

      The answer to this question is in the partial correlation diagram in Fig. 7c. This analysis does not exclude body weight nor brain weight. It shows that the strong correlation between cerebellar area and length is supported by a strong positive partial correlation, as is the link between cerebral area and length. There is a significant positive partial correlation between cerebellar section area and cerebral section length. That is, even after covarying everything else, there is still a correlation between cerebellar section area and cerebral section length (this partial correlation is equivalent to the suggestion of the Reviewer). Additionally, there is a positive partial correlation between body weight and cerebellar section area, but not significant partial correlation between body weight and cerebral section area or length. Our approach aims at obtaining a general view of all the relationships in the data. Testing an individual model would certainly decrease the number of correlations, however, it would provide only a partial view of the problem.

      4) It is not quite clear from fig 6a that the result does indeed support isometry between the data sets (predicted 2/3 slope), and no coefficient confidence intervals are provided.

      We have now added the numerical values of the CIs to all our plots in addition to the graphical representations (grey regions) in the previous version of the manuscript. The isometry slope (0.67) is either within the CIs (both for the linear and orthogonal regressions) or at the margin, indicating that if the relationships are not isometric, they are very close to it.

      Referencing/discussion/attribution of previous findings

      5) With respect to the discussion of the relationship between cerebellar architecture and function, and given the emphasis here on correlated evolution with cortex, Ramnani's excellent review paper goes into the issues in considerable detail, which may also help the authors develop their own discussion: Ramnani (2006) The primate cortico-cerebellar system: anatomy and function. Nature Reviews Neuroscience 7, 511-522 (2006)

      We have added references to the work of Ramnani.

      6) The result that humans are outliers with a more folded cerebellum than expected is interesting and adds to recent findings highlighting evolutionary changes in the hominin human cerebellum, cerebellar genes, and epigenetics. Whilst Sereno et al (2020) are cited, it would be good to explain that they found that the human cerebellum has 80% of the surface area of the cortex.

      We have added this information to the introduction:

      “In humans, the cerebellum has ~80% of the surface area of the cerebral cortex (Sereno et al. 2020), and contains ~80% of all brain neurons, although it represents only ~10% of the brain mass (Azevedo et al. 2009)”

      7) It would surely also be relevant to highlight some of the molecular work here, such as Harrison & Montgomery (2017). Genetics of Cerebellar and Neocortical Expansion in Anthropoid Primates: A Comparative Approach. Brain Behav Evol. 2017;89(4):274-285. doi: 10.1159/000477432. Epub 2017 (especially since this paper looks at both cerebellar and cortical genes); also Guevara et al (2021) Comparative analysis reveals distinctive epigenetic features of the human cerebellum. PLoS Genet 17(5): e1009506. https://doi.org/10.1371/journal. pgen.1009506. Also relevant here is the complex folding anatomy of the dentate nucleus, which is the largest structure linking cerebellum to cortex: see Sultan et al (2010) The human dentate nucleus: a complex shape untangled. Neuroscience. 2010 Jun 2;167(4):965-8. doi: 10.1016/j.neuroscience.2010.03.007.

      The information is certainly important, and could have provided a wider perspective on cerebellar evolution, but we would prefer to keep a focus on cerebellar anatomy and address genetics only indirectly through phylogeny.

      8) The authors state that results confirm previous findings of a strong relationship between cerebellum and cortex (P 3 and p 16): the earliest reference given is Herculano-Houzel (2010), but this pattern was discovered ten years earlier (Barton & Harvey 2000 Nature 405, 1055-1058. https://doi.org/10.1038/35016580; Fig 1 in Barton 2002 Nature 415, 134-135 (2002). https://doi.org/10.1038/415134a) and elaborated by Whiting & Barton (2003) whose study explored in more detail the relationship between anatomical connections and correlated evolution within the cortico-cerebellar system (this paper is cited later, but only with reference to suggestions about the importance of functions of the cerebellum in the context of conservative structure, which is not its main point). In fact, Herculano-Houzel's analysis, whilst being the first to examine the question in terms of numbers of neurons, was inconclusive on that issue as it did not control for overall size or rest of the brain (A subsequent analysis using her data did, and confirmed the partially correlated evolution - Barton 2012, Philos Trans R Soc Lond B Biol Sci. 367:2097-107. doi: 10.1098/rstb.2012.0112.)

      We apologise for this oversight, these references are now included.

    1. Author Response

      Reviewer #2 (Public Review):

      Root growth is driven by cell elongation, and its local control allows roots to navigate the complex soil environment. Cell growth is driven by the relaxation of the cell wall, a process requiring a drop in pH. Auxin is a key regulator of root development that inhibits root growth. Auxin effects on proton dynamics are complex, it can promote both acidification and alkalinization of the extracellular space through different signaling modules, some only recently uncovered. Serre et al. report on using a new dye to monitor extracellular pH in the region surrounding the Arabidopsis thaliana root. Their manuscript aims to clarify the relationships between pH around the root, proton flux, auxin, cell elongation, and root growth with this tool. They show a typical zonation of pH values along the root: a more acidic domain corresponding to the transit-amplifying compartment, followed by a more alkaline one at the transition and early elongation zones and a more acidic one in the late elongation/root hair zone. This zonation is in agreement with previous reports obtained by other methods. A particularly puzzling aspect is the origin of the more alkaline domain. Serre et al. present evidence supporting the involvement of the AUX1-AFB1-CNGC14 module for the emergence of this more alkaline domain and how it can contribute to the ability of the root to navigate its environment.

      Serre et al. show that the more alkaline domain in the transition zone is not directly determined by the activity or localization of the AHA proton pumps but rather by the auxin influx carrier AUX1. They show that the components of the rapid auxin response pathway, in particular, the auxin co-receptor AFB1 and the calcium channel CNGC14, contribute to the emergence of this more alkaline domain. Finally, they show that mutants in these two genes, impaired in the rapid auxin response pathway, show less efficient navigation of the root tip.

      The manuscript is clear and well-written. The logic is sound, and the conclusions are supported by the data.

      The new dye appears as a promising tool for monitoring the pH in the rhizosphere with advantages over the previous ones. Yet, as pointed out by the authors in the discussion, it reports on pH at the organ scale in the region around the root, not in the apoplast or the cell wall, which can eventually complexify the elaboration of a mechanistic model joining auxin, proton efflux, cell wall properties, cell elongation, and root growth. Although several of the findings confirm previous reports, the manuscript brings novelty by demonstrating the involvement of the rapid auxin response. I am overall supportive of the manuscript. Yet, several points should be addressed:

      • The presentation of the more acidic and alkaline domains could be easier to visualize.

      • The authors refer to acidic and alkaline domains but do not report on absolute pH values; they monitor the emission ratio of the dye. They justify why to use relative pH value in the discussion and refer there to internal controls that are not clearly defined. In my opinion, the wording should be more consistent across the text and figures and refer to more acidic and more alkaline domains rather than acidic (pH<7) and alkaline (pH>7) domains.

      • The data related to the unaltered distribution of AHA using antibody staining should be backed up.

      • The way the pH profile and the statistical analyses should be improved.

      • The authors should test the effect of extracellular auxin perception (tmk, abp) mutants on pH zonation.

      • Conclusion could be strengthened by moving several pieces of data currently in supplemental material to the main text.

      We agree with the comment to the definition of ‘acidic’ and ‘alkaline’ domains; we altered the text and explained that we observe ‘relatively alkaline’ and ‘relatively acidic’ domains in comparison to the medium pH in the first part of results.

      We defined the ‘internal controls’ in the text – by this we mean mock treated or wild type plants imaged together with the treated or mutant plants.

      To address the role of the apoplastic auxin pathway in the root surface pH, we analyzed the tmk1, tmk4 and abp1 mutants. Surprisingly, all three mutants appear undistinguishable from the controls, showing the crucial importance of the cytoplasmic AFB1 auxin perception pathway. We have included the data as Fig.S4-1.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper studies color vision in anemonefish. The central conclusion of the paper is that anemonefish use signals from their UV cones to discriminate colors that would not otherwise be distinguishable; this differs from other fish in which UV cones extend the range of wavelengths of sensitivity but do not add a dimension to color vision. The work fits into a rich history of studies investigating how color vision fits into an animal's ecological niche. My primary concerns regard the microspectrophotometry data from single cones and some aspects of the presentation of the behavioral data.

      Microspectrophotometry

      The spectral properties of the cone types are a key issue for interpreting the results. These were measured using MSP, and fits are shown in Figure 2. The raw data shown in Fig. S1 appears more complicated than indicated in the main text. The templates miss the measurements across broad wavelength bands in each cone type. Particularly concerning is the high UV absorbance across cone types and the long-wavelength absorbance in the UV cone. It is not clear how this picture supports the relatively simple description of cone types and spectral sensitivities given in the main text and which forms the basis of the modeling.

      Microspectrophotometry is an inherently noise-prone measurement technique, particularly for very small photoreceptor outer segments such as that of single cones, which are also difficult to detect as intact, isolated (nonoverlapping) cells. As such, the absorbance curve fitting and derived lambda max (λmax) values should be treated as estimates. The accuracy of these estimates is adequate for this type of study, and visual modelling results have been shown to be robust against small errors (±10 nm λmax) in photoreceptor sensitivity for multiple species [see Lind, O. & Kelber, A. (2009). Vis Res. 49(15), 1939-1947; and Bitton, PP. et al. (2017). PLOS ONE, 12: e0169810]. We consider it highly unlikely that small shifts in cone λmax from measurement error would make a meaningful difference to the colour discrimination thresholds.

      It should be noted that the raw data shown in the original Supplementary Figure 1, included all scans overlain with an average absorbance curve for presentation purposes; however, the actual lambda max values for different cone types were measured and then averaged among individual scans fitted with photopigment absorbance curve templates. For clarity and transparency, we have now provided three multipaned plots (see Figure 1 – figure supplements 1-3) showing the individual pre- and post-bleach scans of absorbance spectra, fitted absorbance curve templates, and R2 values from the best visual pigment template fit.

      It is worth noting that most of the cone absorbance spectra found in our study closely resemble those in λmax and quality to those measured in another anemonefish species (Amphiprion akindynos) [see Supplementary Figure 1 in Stieb S. et al. (2019). Sci Rep. 9, 16459]. These cone λmax values can also be reconciled with previous estimates on opsin λmax based on amino acid sequences and cone opsin expression in the A. ocellaris retina characterised in Mitchell LJ et al. (2021). GBE, 13: evab184.

      Evidence that the unusual long-wavelength absorbance detected in a couple of the single cone (pre-bleach) measurements were not of visual pigment in origin comes from post-bleach scans, which showed their persistence (i.e., did not show a photobleaching response) and were likely instead contaminants (e.g., blood, RPE pigment). UV absorbance in some of the double cone measurements (above that expected of the prebleached beta peak from chromophore spectral absorption) can be attributed to either noise from scans as is quite typical of MSP and/or partial (accidental) bleaching from stray light sources. Although utmost care was taken to minimise contamination and unintended bleaching sometimes it is unavoidable.

      We refer the Reviewer to multiple published studies for further examples of typical MSP measurements that share similar levels of noise to ours e.g., see Figure 1 in Knott B. et al. (2013). JEB, 216:4454-4461; Figure 3 in Schott, RK et al. (2015). PNAS, 113(2): 356-361; Figure 2 in Dalton BE et al. (2014). Proc R Soc B. 281; Figure 5 in Tosetto, JE et al. (2021). Brain Behav Evol. 96: 103-123.

      Presentation

      The results are not presented in a straightforward way - at least for this reviewer. What is missing for me is a clear link between the psychometric curves in Figure 3A and the discrimination thresholds indicated in Figure 3B and Figure 4. Figure 3A is only discussed in the text on line 289 - after Figure 4 has been introduced and discussed. It would have been very helpful for me if the psychometric curves were first introduced and described, then the relation to Figure 3B was clearly indicated (perhaps with a single psychometric curve as an example). Similarly for Figure 4 the relationship between specific psychometric curves and the threshold plotted would be quite helpful. Currently it takes a careful reading to understand why being below the dashed line in Figure 4 is important.

      We have made the following changes, including the introduction of the psychometric curves earlier in the results (lines 236-249) and moved the psychometric function comparison before the mention of Figure 4. Additionally, to make the association between the plotted colour loci and psychometric curves clearer, we have added a smaller psychometric curve plot adjacent to the colour space (in Figure 3B) using red as an example which has an averaged psychometric curve overlying the individual fish curves. The figure caption (lines 250-274) explains that the plotted colour loci and given thresholds are mean values calculated from the individual fish behavioural data.

      We have also added a brief reminder that the theoretical limit of colour discrimination is predicted by the RNL model as 1∆S, where in our task fish should be just able to distinguish targets from grey distractors (see lines 222-224). To clarify, the plotted values in Figure 4B are both the individual fish thresholds (points) and average threshold (black bar) per colour set. The individual threshold values are taken at a correct choice probability of 50% from fitted psychometric curves of fish behavioural performance (shown in Figure 3A).

      RNL model

      The data is fit and interpreted in the context of the receptor noise limited model. The paragraph in the discussion about complementary color pairs suggests that this model is incorrect (text around line 332). Consideration of how the results depend on the RNL model is important, especially given the interpretation here.

      The inability of the RNL model to account for the observed asymmetry between color discrimination thresholds implies that they cannot be solely attributed to photoreceptor noise. We can therefore infer from the asymmetry that thresholds are set by a higher-level process, whether that involves post-receptor processes within the inner retina or in the brain remains to be investigated. As explained in lines 396-397 one possibility is that activation of the UV receptor suppresses noise in the visual pathway or enhances the saliency of colors for anemonefish. The high sensitivity to violet-green, which was found in all six of the fish tested, is consistent with the heightened saliency of this color (lines 397-399).

      Figure 3B

      This is the key figure in the paper. But several issues make seeing the data in this figure difficult. First, the important part of the figure is buried near the origin and hard to see. Can you show a surface that connects the thresholds in the different chromatic directions, or otherwise highlight the regions of discriminable and not discriminable colors?

      See previous comment. In short, we have taken the advice of the Reviewer and added highlighted areas around the regions of discriminable colors in Figure 3B to help visually separate them from the non-discriminable regions of colors (from grey). Additionally, we have added an inset showing an enlarged image of the area surrounding the centre of colour space.

      Reviewer #2 (Public Review):

      Mitchell and colleagues examined the contribution of a UV-sensitive cone photoreceptor to chromatic detection in Amphiprion ocellaris, a type of anemonefish. First, they used biophysical measurements to characterize the response properties of the retinal receptors, which come in four spectrally-distinct subtypes: UV, M1, M2, and L. They then used these spectral sensitivities to construct a 4-dimensional (tetrahedral) color space in which stimuli with known spectral power distributions can be represented according to the responses they elicit in the four cone types. A novel five-LED display was used to test the fish's ability to detect "chromatic" modulations in this color space against a background of random-intensity, "achromatic" distractors that produce roughly equal relative responses in the four cone types. A subset of stimuli, defined by their high positive UV contrast, were more readily detected than other colors that contained less UV information. A well-established model was used to link calculated receptor responses to behavioral thresholds. This framework also enabled statistical comparisons between models with varying number of cone types contributing to discrimination performance, allowing inferences to be drawn about the dimensionality of color vision in anemonefish.

      The authors make a compelling case for how UV light in the anemonefish habitat is likely an important ecological source of information for guiding their behavior. The authors are to be commended for developing an elegant behavioral paradigm to assess visual performance and for incorporating a novel display device especially suited to addressing hypotheses about the role of UV light in color perception. While the data are suggestive of behavioral tetrachromacy in anemonefish, there are some aspects of the study that warrant additional consideration:

      1) One challenge faced by many biological imaging systems is longitudinal chromatic aberration (LCA) - that is, the focal power of the system depends on wavelength. In general, focal power increases with decreasing wavelength, such that shorter wavelengths tend to focus in front of longer wavelengths. In the human eye, at least, this focal power changes nonlinearly with wavelength, with the steepest changes occurring in the shorter part of the visible spectrum (Atchison & Smith, 2005). In the fish eye, where the visible spectrum extends to even shorter wavelengths, it seems plausible that a considerable amount of LCA may exist, which could in turn cause UV-enriched stimuli to be more salient (relative to the distractor pixels) due to differences in perceived focus rather than due solely to differences in their respective spectral compositions. Such a mechanism has been proposed by Stubbs & Stubbs (2016) as a means for supporting "color vision" in monochromatic cephalopods (but see Gagnon et al. 2016). It would be worth discussing what is known about the dispersive properties of the crystalline lens in A. ocellaris (or similar species), and whether optical factors could produce sufficient cues in the retinal image that might explain aspects of the behavioral data presented in the current study.

      This is an interesting point, and we appreciate the reviewer’s thoughtful comment regarding this topic especially as LCA increases exponentially in the UV. Although we certainly cannot disprove such a mechanism in the present study, we are highly sceptical that LCA could be used by reef fish and is involved in the heightened saliency of UV stimuli. Previous work has found that LCA is mostly corrected for in the teleost retina of both marine and freshwater species by graded, multifocal lenses that focus different wavelengths at the same depth as their maximally sensitive cone photoreceptors [e.g., for evidence in African cichlids see Kröger, R. H. H. et al. (1999). J Comp Physiol. A, 184, 361-369; Malkki, P. E. & Kröger, R. H. H. (2005). J Opt. A, 7, 691-700; and for various reef fishes see Karpestam, B. et al. (2007). J Exp Biol., 210, 16: 2923-2931]. In essence, LCA is corrected in the eyes of many teleosts by accurately tuning longitudinal spherical aberration through having a graded density lens. We draw particular attention to the latter reference which comparatively examined the optical properties of reef fish lenses, including diurnal, planktivorous damselfishes (from the same family as anemonefishes, Pomacentridae). They found that not only were the lenses of these species highly UV-transmissive (as we show in anemonefish), but all were multifocal and capable of focusing both visible (non-UV) and UV wavelengths. Considering the coastal cephalopod species examined thus far, all of them contain only one type of visual pigment which is packed in their long photoreceptor (150-450µm long outer segment) across an entire retina (Chung and Marshall 2016, Proceeding B). Theoretically, given these long photoreceptors, the LCA and the resulting differentials of focal length onto different patches of photoreceptors or different depth of the outer segment might provide cues for colour discrimination even though no behavioural evidence exists to prove this hypothesis yet. Unlike the cephalopod case, the four specific spectral cones arranged in a mosaic pattern along with their very short outer segments (5-10µm) in the anemonefish retina likely makes the LCA less effective in this retinal design.

      We have added a short paragraph (Lines 400-412) discussing the possibility of an optical mechanism contributing to heightened UV saliency with a particular focus on LCA and our thoughts on why we consider it an unlikely mechanism in anemonefish.

      2) The authors provide a quantitative description of anemonefish visual performance within the context of a well-developed receptor-based framework. However, it was less clear to me what inferences (if any) can be drawn from these data about the post-receptoral mechanisms that support tetrachromatic color vision in these organisms. Would specific cone-opponent processes account for instances where behavioral data diverged from predictions generated with the "receptor noise limited" model described in the text? The general reader may benefit from more discussion centered on what is known (or unknown) about the organization of cone-opponent processing in anemonefish and related species.

      In short, we do not know the specific opponent interactions of anemonefish cones. The RNL model assumes all possible opponent interactions in its calculations. From our results, very little can be said about the post-receptor mechanisms involved in their putative tetrachromatic vision. We would like to avoid overreaching beyond what our data can show. A future directions section has now been added to the discussion (lines 467-497), which briefly mentions the known UV opponency in larval zebrafish and that future investigation in anemonefish should attempt to disentangle the specific opponent (chromatic) and non-opponent (achromatic) circuits in the anemonefish retina.

      Reviewer #3 (Public Review):

      The comments below focus mainly on ways that the data and analysis as currently present do not to this reviewer compel the conclusions the authors wish to draw. It is possible that further analysis and/or clarification in the presentation would more persuasively bolster the authors' position. It also seems possible that a presentation with more limited conclusions but clarity on exactly what has been demonstrated and where additional future work is needed would make a strong contribution to the literature.

      • Fig 3A. It might be worth emphasizing a bit more explicitly that the x-axis (delta S) is the result of a model fit to the data being shown, since this then means that if RNL model fit the data perfectly, all of the thresholds would fall at deltaS = 1. They don't, so I would like to see some evaluation from the authors' experience with this model as to whether they think the deviations (looks like the delta S range is ~0.4 to ~1.6 in Figure 4B) represent important deviations of the data from the model, the non-significant ANOVA notwithstanding. For example, Figure 4B suggests that the sign of the fit deviations is driven by the sign of the UV contrast and that this is systematic, something that would not be picked up by the ANOVA. Quite a bit is made of the deviations below, but that the model doesn't fully account for the data should be brought out here I think. As the authors note elsewhere, deviations of the data from the RNL model indicate that factors other than receptor noise are at play, and reminding the reader of this here at the first point it becomes clear would be helpful.

      We have now stated more explicitly in the figure caption for Figure 3A, that the delta S values presented were calculated by fitting fish behavioral data to the RNL model. To test the overall effect that the sign of the UV contrast had on the discrimination threshold, we have now included ‘contrast’ (positive or negative) as another fixed effect in the linear mixed effects model. We have now included details of this test in the results which shows the systematic effect (lines 338-340). Additionally, as suggested we now briefly introduce in the results the idea that factors other than receptor noise are causing the observed deviations in data from the RNL model.

      • Line 217 ff, Figure 4, Supplemental Figure 4). If I'm understanding what the ANOVA is telling us, it is that the deviations of the data across color directions and fish (I think these are the two factors based on line 649) is that the predictions deviate significantly from the data, relative to the inter-fish variability), for the trichromatic models but not the tetrachromatic model. If that's not correct, please interpret this comment to mean that more explanation of the logic of the test would be helpful.

      The interpretation of the ANOVA by the Reviewer is mostly correct. We had the variables color set and Fish ID, with threshold delta S as the dependent variable. This showed that deviations from the predicted threshold were significant relative to the inter-fish variability for the trichromatic models. Missing details describing the ANOVA have now been added to the methods (lines 789-798).

      Assuming that the above is right about the nature of the test, then I don't think the fact that the tetrachromatic model has an additional parameter (noise level for the added receptor type) is being taken into account in the model comparison. That is, the trichromatic models are all subsets of the tetrachromatic model, and must necessarily fit the data worse. What we want to know is whether the tetrachromatic model is fitting better because its extra parameter is allowing it to account for measurement noise (overfitting), or whether it is really doing a better job accounting for systematic features of the data. This comparison requires some method of taking the different number of parameters into account, and I don't think the ANOVA is doing that work. If the models being compared were nested linear models, than an F-ratio test could be deployed, but even this doesn't seem like what is being done. And the RNL model is not linear in its parameters, so I don't think that would be the right model comparison test in any case.

      Typical model comparison approaches would include a likelihood ratio test, AIC/BIC sorts of comparisons, or a cross-validation approach.

      If the authors feel their current method does persuasively handle the model comparison, how it does so needs to be brought out more carefully in the manuscript, since one of the central conclusions of the work hinges at least in part on the appropriateness of such a statistical comparison.

      Our visual model comparisons were aimed at assessing whether a trichromatic or tetrachromatic model best fit the colour discrimination data. The trichromatic and tetrachromatic models assume two and three opponency pathways, respectively. If the fish were not tetrachromatic, and instead trichromatic, then we would expect that the RNL model should better fit the data with two opponency mechanisms (rather than three). Our reason for making this assessment, is because of the possibility that not all the cones could be contributing to colour vision and could be used exclusively for achromatic tasks (e.g., luminance vision or motion detection). However, according to our finding that the data best fit the tetrachromatic model (i.e., how the behavioural discrimination thresholds more closely fitted the theoretical prediction of 1∆S), it is likely that anemonefish used all four cones for colour vision.

      We have also now repeated our analysis using unweighed delta S values which are calculated using general n-dimensional models of colour vision (using the PAVO2 package). These models essentially follow the same initial steps followed by the RNL model (and many others) but omit the receptor noise correction stage. After comparing (using ANOVA, see lines 303-311) the predicted thresholds with the data in this non-RNL space, it was found that again the tetrachromatic model predictions did not deviate significantly from the data relative to individual fish performance; however, we also found that the trichromatic model without M2 cone input no longer differed from the predicted values. In this case, it seems that the extra noise parameter did contribute to the difference in fit. Whether this is a biologically meaningful comparison (as all photoreceptors contain noise) is an open question. We have added a short statement explicitly framing our interpretation of anemonefish having a 3-D colour space to being in accordance with the closeness of RNL model predictions (lines 370-371, 506-508).

      • Also on the general point on conclusions drawn from the model fits, it seems important to note that rejecting a trichromatic version of the RNL model is not the same as rejecting all trichromatic models. For example, a trichromatic model that postulates limiting noise added after a set of opponent transformations will make predictions that are not nested within those of RNL trichromatic models. This point seems particularly important given the systematic failures of even the tetrachromatic version of the RNL model.

      This is a good point. We have limited our conclusions to specifically address trichromatic models generated within the framework of the RNL model by adding in the conclusion section that fish psychophysical thresholds were best explained by the RNL model when all four cone types contributed to colour vision (see lines 370-371, 506-508). In this same sentence, we have also added in parentheses that “suggesting (but not proving) tetrachromacy” (line 508). We have also edited the abstract to state that our results were “…best described by a tetrachromatic model using all four cone types…”, rather than stating we have shown tetrachromacy (lines 36-37).

      • More generally, attempts to decide whether some human observers exhibit tetrachromacy have taught us how hard this is to do. Two issues, beyond the above, are the following. 1) If the properties of a trichromatic visual system vary across the retina, then by imaging stimuli on different parts of the visual field an observer can in principle make tetrachromatic discriminations even though visual system is locally trichromatic at each retinal location. 2) When trying to show that there is no direction in a tetrachromatic receptor space to which the observer is blind, a lot of color directions need to be sampled. Here, 9 directions are studied. Is that enough? How would we know? The following paper may be of interest in this regard: Horiguchi, Hiroshi, Jonathan Winawer, Robert F. Dougherty, and Brian A. Wandell. "Human trichromacy revisited." Proceedings of the National Academy of Sciences 110, no. 3 (2013): E260-E269. Although I'm not suggesting that the authors conduct additional experiments to try to address these points, I do think they need to be discussed. We agree with the reviewer, that colour discriminability achieved by tetrachromatic vision could in theory be achieved by the combined effect of localised, distinct forms of trichromacy. Evidence in other fishes suggests that such multiple forms of trichromacy across the retina likely exist in many species. However, the behavioural effects of this retinal setup remain to be studied likely due to its extremely difficult nature. We have added a new section titled “future directions” (Lines 474-489), in which we discuss the possibility that distinct forms of trichromacy in the anemonefish retina could in theory achieve colour discrimination on par with tetrachromatic vision. We also give suggestions on how this could be investigated.

      Although we tried to include as many colour directions as practically possible in our experiment, we have certainly not provided an exhaustive range that completely encompasses anemonefish colour space. Whether 9 colour directions are adequate to assess the dimensionality of their color vision is difficult to say. As addressed in the previous comment, we now acknowledge this limitation by refining our conclusion, saying that our results do not prove tetrachromacy.

      • Line 277 ff. After reading through the paper several times, I remain unsure about what the authors regard as their compelling evidence that the UV cone has a higher sensitivity or makes an omnibus higher contribution to sensitivity than other cones (as stated in various forms in the title, Lines 37-41, 56-57, 125, 313, 352 and perhaps elsewhere).

      At first, I thought they key point was that the receptor noise inferred via the RNL model as slightly lower (0.11) for the UV cone than for the double cones (0.14). And this is the argument made explicitly at line 326 of the discussion. But if this is the argument, what needs to be shown is that the data reject a tetrachromatic version of the RNL model where the noise value of all the cones is locked to be the same (or something similar), with the analysis taking into account the fewer parametric degrees of freedom where the noise parameters are so constrained. That is, a careful model comparison analysis would be needed. Such an analysis is not presented that I see, and I need more convincing that the difference between 0.11 and 0.14 is a real effect driven by the data. Also, I am not sanguine that the parameters of a model that in some systematic ways fails to fit the data should be taken as characterizing properties of the receptors themselves (as sometimes seems to be stated as the conclusion we should draw).

      We have performed various modelling scenarios where receptor noise was adjusted for each channel; however, the UV channel was consistently found to be more sensitive than the other channels. In (the original) Supplementary Figure 6 (now Figure 4 – figure supplements 1 and 2), we show predicted dS values calculated using receptor noise levels in the exact manner that the Reviewer suggests by ranging from 0.05 to 0.15, and most importantly, included scenarios where receptor noise was held equal across cone types and others where it was varied between single cones and double cones. None of the models adjusted the data so that sensitivity was equal across all four channels, which means that by an unknown mechanism, the UV channel is more sensitive, but this is unrelated to noise levels. Our best-fit receptor noise values of 0.11 (for single cones) and 0.14 (for double cones) are estimate values and should be treated as such till actual receptor noise measurements are made.

      Then, I thought maybe the argument is not that the noise levels differ, but rather that the failures of the model are in the direction of thresholds being under predicted for discriminations that involve UV cone signals. That's what seems to be being argued here at lines 277 ff, and then again at lines 328 ff of the discussion. But then the argument as I read it more detail in both places switches from being about the UV cones per se to being about postive versus negative UV contrast. That's fine, but it's distinct from an argument that favors omnibus enhanced UV sensitivity, since both the UV increments and decrements are conveyed by the UV cone; it's an argument for differential sensitivity for increments versus decrements in UV mediated discriminations. The authors get to this on lines 334 of the discussion, but if the point is an increment/decrement asymmetry the title and many of the terser earlier assertions should be reworked to be consistent with what is shown.

      To clarify our argument, we found that the colour discrimination thresholds were systematically lower than predicted by the RNL model for colours which elicited higher UV cone stimulation relative to other cone types. These colours we refer to as UV positive based on the sign direction of their contrast against grey distractors produced by higher UV/V LED channel (i.e., in a positive direction). Whereas colours with UV negative chromatic contrast had lower UV cone stimulation relative to the other cone types. Therefore, our interpretation of the importance of UV cone signals for colour discrimination are congruent with the results. In the discussion, we suggest a possibility that activation of the UV receptor suppresses noise downstream in the visual pathway or enhances the saliency of colours (see lines 397-398). This activation of the UV receptor would, of course, be at its highest for colours with positive UV chromatic contrast.

      Note that we have added to the discussion the possibility that colour preferences or a difference in attentiveness might have contributed to differences in discrimination thresholds (see discussion lines 412-413, 427-428, 433-435, 456-466, and 469-473). However, we consider it a less likely explanation due to a couple of reasons, including 1) a lack of difference in responsiveness across colour sets in their timing to peck the target, and 2) any non-learnt bias would have likely been overridden or at least weakened by training prior to the experiment where colours were rewarded equally (see lines 462-466).

      We have edited the results (lines 334-352) to make our point clearer and by changing the subtitle to be more explicit: “Lower discrimination thresholds induced by positive UV contrast”. The subsection begins by explaining the different types of UV chromatic contrast by elevation angle and, finally, how this division among colour sets was a major determinant of colour discrimination thresholds.

      Perhaps the argument with respect to model deviations and UV contrast independent of sign could be elaborated to show more systematically that the way the covariation with the contrasts of the other cone stimulations in the stimulus set goes, the data do favor deviations from the RNL in the direction of enhanced sensitivity to UV cone signals, but if this is the intent I think the authors need to think more about how to present the data in a manner that makes it more compelling than currently, and walk the reader carefully through the argument.

      We have added to the results the linear mixed-effects model output with ‘contrast’ (positive/negative) added as a fixed effect. This analysis shows that the sign direction of UV contrast was a strong predictor of threshold (see address to previous comments and lines 399-401, 790-799).

      • On this point, if the authors decide to stick with the enhanced UV sensitivity argument in the revision, a bit more care about what is meant by "the UV cone has a comparatively high sensitivity (line 313 and throughout)" needs more unpacking. If it is that these cones have lower inferred noise (in the context of a model that doesn't account for at least some aspects of the data), is this because of properties of the UV cones, or the way that post-receptoral processing handles the signals from these cones mimicking a cone effect in the model. And if it is thought that it is because of properties of the cones, some discussion of what those properties might be would be helpful. As I understand the RNL model, relative numbers of cones of each type are taken into account, so it isn't that. But could it be something as simple as higher photopigment density or larger entrance aperture (thus more quantum catches and higher SNR)?

      It is unknown what aspect of the cone morphology or physiology sets the activation or inactivation threshold. Electrophysiological data collected from the UV cones of other fish species e.g., in goldfish and zebrafish [see Hawryshyn & Beauchamp (1985). 25, Vis Res.; and Yoshimatsu et al. (2020). 107, Neuron.] show that they have exceptionally high sensitivity. What has not been shown is that having a UV cone can improve colour discrimination.

      Previous quantitative cone opsin gene expression analysis showed that the single cone opsins (SWS1 and SWS2B) are expressed at lower levels than all double cone opsin genes. This difference in expression combined with the smaller size of single cone outer segments than the double cones make it unlikely that a larger photoreceptor size, higher volume or packing density of visual pigment is responsible. Contrary to our findings, these aspects of the different cone types (if they had an effect) would instead predict that double cones have a higher SNR, and non-UV colours would be more discriminable. We have now added these details to the discussion (see lines 391-397).

      • Line 288 ff. The fact that the slopes of the psychometric functions differed across color directions is, I think, a failure of the RNL model to describe this aspect of the data, and tells us that a simple summary of what happens for thresholds at delta S = 1 does not generalize across color directions for other performance levels. Since one of the directions where the slope is shallower is the UV direction, this fact would seem to place serious limits on the claim that discrimination in the UV direction is enhanced relative to other directions, but it goes by here without comment along those lines. Some comment here, both about implications for fit of RNL model and about implications for generalizations about efficacy of UV receptor mediated discrimination and UV increment/decrement asymmetries, seems important.

      The variation in the psychometric functions is difficult to interpret and cannot be explained by the RNL model. What the RNL model predicts is delta S based on low level factors (namely receptor noise). In the discussion, we completely agree with the notion that the asymmetry in thresholds from predicted values, and the variation in psychometric slopes cannot be explained by the RNL model, e.g., this is heavily implied by “colour discrimination thresholds cannot be directly attributed to noise in the early stages of the visual pathway…” (lines 388-390). To clarify the inability of the RNL model to account for this aspect of the data, we have included a statement (see line 390).

      It is a good point that this could be an indication of heterogeneity in colour space. Heterogeneity in discrimination thresholds across animal colour space (both surrounding the threshold area and for more saturated regions) has been explored in detail using trichromatic triggerfish by Green N. F. et al. (2022). JEB, 7(225):jeb243533. We have added this idea to the discussion (see lines 490-498). For UV, it seems that two of the five fish (#34 and 20) had noticeably shallower curves than the others tested for UV (fish #19, 33, 36). Both also varied more in their ability to distinguish targets, as shown by their wider confidence intervals. One of these two fish (#34) was retested for UV at the end of the experiment, and in the secondary assessment had a steeper psychometric curve more in line with the other fish in the experiment (see Figure 3 – figure supplement 1 and added lines 247-250). Based on this discrepancy in performance between assessments, it is also possible that individual learning effects had a role in impacting the shape of the psychometric curve. Note, this had minimal effect on colour discrimination thresholds and any differences were in the direction of change observed across colour sets in the experiment (i.e., lower dS for UV positive directions).

      • Line 357 ff. Up until this point, all of the discussion of differences in threshold across stimulus sets has been in terms of sensitivity. Here the authors (correctly) raise the possibility that a difference in "preference" across stimulus sets could drive the difference in thresholds as measured. Although the discussion is interesting and germaine, it does to some extent further undercut the security of conclusions about differential sensitivity across color directions relative to the RNL model predictions, and that should be brought out for the reader here. The authors might also discuss about how a future experiment might differentiate between a preference explanation and a sensitivity explanation of threshold differences.

      We have now added a paragraph (see lines 469-473) discussing that future work should test for color preferences and suggest how this could be done using a similar foraging task. We also include our thoughts immediately prior on why it is unlikely that a colour preference was a major contribution towards the results. In short, we consider it unlikely as fish showed no evidence of reduced latency for pecking at targets across the colour sets and because the training regime prior to the experiment equally rewarded fish for all colours and would likely have overridden a strong preference (at least in this specific foraging context).

      • RNL model. The paper cites a lot of earlier work that used the RNL model, but I think many readers will not be familiar with it. A bit more descriptive prose would be helpful, and particularly noting that in the full dimensional receptor space, if the limiting noise at the photoreceptors is Gaussian, then the isothreshold contour will be a hyper-ellipsoid with its axes aligned with the receptor directions.

      There is now added explanation of the RNL model (see lines 141-151), particularly on its assumptions that it only receives chromatic input and that discrimination is limited by noise arising in the photoreceptors and not by any specific opponent mechanisms. We also added the mention of the expected hyper-ellipsoid shape of isothreshold contours if receptor noise is Gaussian. Note, while we appreciate the importance of the reader to understand the basic functionality of the model, we wanted to avoid overloading the introduction with details on the RNL model which is not the focus of the paper. The RNL model is well-established in the field of visual ecology and animal vision research for well over a decade and has been thoroughly dissected by previous methodological reviews. We refer to one of these more recent reviews by Olsson et al. (2018) Behav Ecol. 29(2):273-282, and direct the reader to the methods section for further details on the RNL model.

      • Use of cone isolating stimuli? For showing that all four cone classes contribute to what the authors call color discrimination, a more direct approach would seem to be to use stimuli that target stimulation of only one class of cone at a time. This might require a modified design in which the distractors and target were shown against a uniform background and approximately matched in their estimated effect on a putative achromatic mechanism. Did the authors consider this approach, and more generally could they discuss what they see as its advantages and disadvantages for future work.

      The Reviewer is correct in that a targeted approach of isolated cone stimulation would be the optimal approach to demonstrating tetrachromatic colour vision. However, the extreme spectral overlap in the absorption curves of anemonefish cones, particularly in the mid-wavelength region makes this problematic in using the current LED display. We added to the discussion ways that this could be studied in the future (see lines 474-489). This might be possible (but still challenging) using a monochromator, but such technology severely limits the diversity of stimuli which can be created and usually restricts experiments to a simple paired choice design (or grey card experiment). The traditional paired choice experiment requires animals to be trained to distinguish a specific colour, while the Ishihara-like task trains animals to distinguish targets using an odd-one-out approach. This latter approach is highly efficient, as it does not require retraining when testing a new colour (i.e., fish learnt the task not a specific colour). Here, we wanted to assess colour discrimination in multiple directions to compare performance, and the flexible LED display combined with a generalisable task was important.

      The above assumes that anemonefish do not use multiple trichromatic systems. In which case, the use of standard experimental stimuli (e.g., a monochromator, an LED display) would be unsuitable as they illuminate the whole retina. To definitively test the range of opponent interactions, it would be necessary to make electrophysiological measurements targeting the transmitting neurons using a retinal multielectrode array (MEA) approach or by in-vivo calcium imaging (lines 484-486).

      We understand that our results are not a direct test of the dimensionality of anemonefish colour vision and should not be interpreted as such, as we do not have direct evidence of tetrachromacy. To recognize this limitation of our data, we have drawn back some of our conclusive statements that claimed to have demonstrated tetrachromacy.

    1. Author Response

      Reviewer #1 (Public Review):

      Precise regulation of gamete fusion ensures that offspring will have the same ploidy as the parents. However, breaking this regulation can be useful for plant breeding. Haploid induction followed by chemical-induced genome doubling can be used to fix desirable genotypes, while triparental hybrids where two sperm cells with two different genotypes fertilize an egg cell can be advantageous for bypassing hybridization barriers to create interspecies hybrids with increased fitness. This manuscript follows up on a previous study from the same research group that used a clever high throughput polyspermy detection assay (HIPOD) to show that wild-type Arabidopsis naturally forms triparental hybrids at very low frequencies (less than 0.05% of progeny) and that these triparental hybrids can bypass dosage barriers in the endosperm (Nakel, et al., 2017). Mao and co-authors hypothesized that mutants that conferred polytubey, the attraction of multiple pollen tubes by mutant female gametophytes, would also increase the rate of triparental hybrids. They used a double mutant in the endopeptidase genes ECS1 and ECS2 which had previously been reported to induce supernumerary pollen tube attraction to test this hypothesis with their two-component HIPOD system in which one pollen donor constitutively expresses the mGAL4-VP16 transcription factor while the second pollen donor carries an herbicide resistance gene regulated by the GAL4-responsive UAS promoter. Triparental hybrids are detected as herbicide-resistant progeny from wild-type Arabidopsis flowers that have been pollinated by the two paternal genotypes. The authors convincingly show that the ecs1 ecs2-1 double mutant more than doubled the frequency of triparental, triploid hybrids in HIPOD crosses. They next tested the hypothesis that this increase in triparental hybrids was due to a gametophytic effect by using an ecs1-/- ecs2-1/ECS2 maternal parent in the HIPOD assay and testing whether the ecs2-1 mutant allele was preferentially inherited in triparental hybrids. The mutant allele was inherited at a much higher rate than expected, confirming their hypothesis.

      The triparental hybrid results with the ecs1 ecs2 mutant were not that surprising since the presence of extra sperm cells gives more opportunities for triparental hybrids to form, especially if gamete fusion is misregulated. However, an unexpected result came when the authors used aniline blue staining to analyze the ecs1 ecs2 polytubey phenotype. They confirmed that the double mutant had increased levels of polytubey compared to wild-type ovules, but they also noticed that 13% of seeds were not developing normally. This phenotype was confirmed with a second ecs2 allele and was complemented with both ECS1 and ECS2 transgenes under their native promoters. Microscopic analysis revealed normal gametophyte morphology before fertilization, but 8% of pollinated ovules failed to develop an embryo and 7% failed to develop endosperm, suggesting single fertilization events. In a logical set of experiments, they followed up on this result by crossing ecs1 ecs2 with pollen carrying a fluorescent reporter that would be expressed in developing embryos and endosperm. In this experiment, they were again surprised. Some of the wild-type-looking seeds lacked a paternal contribution (i.e. no fluorescent signal from the paternal reporter construct) in the embryo. This prompted them to look more closely at the progeny, upon which they detected small plants that were haploid. They confirmed the haploid nature by chromosome spreads. Finally, they used interaccession crosses between ecs1 ecs2 (Col-0) and Landsberg to verify that haploid progeny only carried maternal alleles of markers on all five chromosomes, indicating that the ecs1 ecs2 genotype can induce maternal haploids.

      This interesting study highlights the importance of following up on unexpected results. The conclusions are well-supported by the data and quite exciting. Paternal haploid inducers have been discovered in several species, but this is one of only two examples of maternal haploid induction. While the percentage of maternal haploids is very low, this phenomenon could be useful for plant breeding.

      Weaknesses

      The data in the manuscript is intriguing, but the question of how the same mutant combination promotes the formation of both triploid and haploid progeny remains unanswered and is not thoroughly discussed, nor is any model suggested for how the ECS1/2 peptidases could play a role in regulating gamete fusion and/or repressing parthenogenesis. A second unanswered question is whether the maternal haploids are a result of failed plasmogamy or karyogamy between the egg and sperm leading to parthenogenesis or a result of paternal genome elimination after plasmogamy. In figure 3B, the authors attempted to test whether plasmogamy occurs between the male and female gametes in ecs1 ecs2 ovules by crosses with pollen that expresses a mitochondrial marker under control of the pRPS5a promoter which is active in sperm cells as well as embryos and endosperm of fertilized ovules. This experiment allowed them to detect sperm cells that had not fused with the egg and central cell at 2 days after pollination. They also counted the percentage of seeds that expressed the mitochondrial marker in both embryo and endosperm at 2 DAP and found that ecs1 ecs2 mutants had a 20% reduction of visible mitochondria in embryo sacs compared to wildtype. They conclude that the result indicates a potential plasmogamy defect. However, the dependability of this marker is questionable since only ~55% of wild-type seeds had detectable signal in the embryo and endosperm. The authors imply that this experiment could be used to test plasmogamy, but it is not clear how any conclusions related to the abnormal seed phenotype could be drawn from examining the rate of signal in both the embryo and endosperm. Since the mitochondrial marker was not expressed from a sperm-specific promoter, the fluorescent signal at 2DAP is likely due to new gene expression from pRPS5a in the fertilized embryo and endosperm, not an indication of the presence of sperm-derived mitochondria. Perhaps an earlier timepoint could be used as well as a spermspecific promoter instead of pRPS5a to answer the question of whether plasmogamy is happening in the ecs1 ecs2 ovules.

      Thanks for the suggestion. We here provide two additional new data sets to provide evidence that ecs1 ecs2 mutant plants indeed exhibit single fertilization that lead to fertilization recovery.

      We determined the fertilization failure by checking the decondensation HTR10-RFP labelled sperm nuclei 8-10 HAP (Figure 3B) and the frequency of heterofertilization through dual pollination experiment (Figure 3C-E) (see above).

      Reviewer #2 (Public Review):

      The manuscript reports the triploid and haploid productions using an ecs1ecs2 mutant as the maternal donor, in addition to the evaluation of the sexual process observed in the mutant. The indicated data show exquisite quality. To improve the content, I recommend carefully reconsidering the descriptions because some of the insights would cause a stir in the controversy regarding ECS1&2 functions in plant reproduction.

      Strengths

      Triploid production by a combination of ecs1ecs2 mutant and HIPOD system has potential as a future plant breeding tool. Moreover, it's intriguing that both triploid and haploid productions were achieved using the same mutant as a maternal donor. I think authors can claim the value of their results more by adding descriptions about the usefulness of the aneuploid plants in plant breeding history.

      The evidence of the persistent synergid nucleus (Figure 3A) is critical insight reported by this study. As Maruyama et al. (2013) reported by live cell imaging, synergid-endosperm fusion had occurred at the two endosperm nuclei stage. It would be valuable to claim the observed fact by citing Maruyama's previous observation.

      Weakness

      As the authors suggested, the higher triploid frequency observed in ecs1ecs2 than WT was likely caused by the increased polyspermy. However, it also could be that reduction of normal seed number in ecs1ecs2 (whichever is due to failure of fertilization or embryo development arrest) accounts for the increased frequency of the triploid compared to WT.

      The results in Figure 3C-E suggested the single fertilization for both egg and central cells at similar frequencies. This is an exciting result, but it is still possible that the fertilized egg or central cell degenerated after fertilization resulting in the disappearance of paternally inherited fluorescence. Evaluation of fertilization patterns at 7-10HAP in ecs1ecs2 mutant may provide more confident insight, although unfused sperm cell was evaluated at 1DAP (Figure 3-figure supplement 1B). The fertilization states can be distinguished depending on the HTR10RFP sperm nuclei morphology and their positions, as reported by Takahashi et al (2018).

      Thank you for your suggestion. We added the requested experiment see Figure 3B in the revised manuscript. In addition, we conducted a dual pollination experiment, that provides evidence for the activation of the fertilization recovery machinery (Figure 3C-E) (see above).

      Several recent studies have reported exciting insights on ECS1&2 functions; however, various results from different laboratories have raised controversy. Though, the commonly found feature is the repression of polytubey. For readers, it would be helpful to organize the explanation about which insights are concordant or different.

      Thank you for your suggestion. We now indicate using terms like in line with or in contrast to, where our data confirms /or contradicts with previous data.

      In addition, a drawing that explains the time course in the process from pollination to seed development (up to 6DAP) based on WT would help to understand which point is evaluated in each data.

      Thank you for your suggestion. We added a model figure (Figure 4E) at the end of the manuscript that brings the concepts together and facilitates the understandings.

      Reviewer #3 (Public Review):

      In this manuscript, Mao et al. reported that the two proteases ECS1 and ECS2 participate in both polyspermy block and gamete fusion in Arabidopsis thaliana. The authors could observe polytubey phenotype which has been reported previously and obtain both triparental plants and haploids in ecs1 ecs2 mutants. Therefore, they proposed that the triparental plants resulted from the polytubey block defect, whereas the haploids were caused by the gamete fusion defect. Together with two other previous reports, I think it is very interesting to see these two proteases participating in so many different but connected processes. Although they did not provide the molecular mechanism of how ECS participated in polyspermy block and gamete fusion, their findings provide more options for and thus promote plant breeding. The work may have a wide application in the future and will be of broad interest to cell biologists working on gamete fusion and plant breeders.

      We thank the reviewer for their positive comments.

      Although most of the conclusions in this paper are well supported by the data, it could be improved with a minor revision including providing clearer data analysis and descriptions, images with higher resolution, and more discussions.

    1. Author Response

      Reviewer #2 (Public Review):

      In the discussion, the authors suggest that the binding of CHAPS could be an inspiration to develop compounds, targeting, for instance, mammalian receptors, that would bind to both the orthosteric site and a potential groove underneath loop C (where the sterol moiety of CHAPS binds in Alpo4). A figure (SI4) shows a few homologues in surface representation, giving an idea of whether this groove is generally present in the family.

      Seeing this figure, I wondered if it would be relevant to compare several conformations of one or a few chosen homologues. Given that gating always impacts the quaternary assembly, is this groove more pronounced in say the inhibited state of a given homologue than in its agonist-bound state?

      The width of the groove in 7 does change as the channel transition from apo to open state. This is now demonstrated with an additional Figure 3 – figure supplement 1b and the discussion was adjusted accordingly p 18, line 379:

      “The sterol group connected by a linker binds in between subunits and induces conformational changes which also change the width of the groove in Alpo4 (Figure 3f, g), therefore it likely plays an active role in the observed quaternary twist. The changes in the groove shape are not specific to Alpo4 but are also observed for example in nicotinic 7 receptor (Figure 3 – supplement 1b) suggesting that the groove can be targeted for allosteric modulation of the channel. ”

      A related thought was that some of the protein binders affecting pLGIC function (toxins, VHH) contact two subunits and wrap around/below loop C. Do these have binding sites that overlap with the groove?

      We inspected the structures of pLGICs homologs with bound -bungarotoxin (6UWZ, 4HQP, 7Z14, and 7KOO) and 2 with bound VHHs (6SSI and 6HJY). The toxins were bound in similar conformations but not the VHHs. The examples of the complexes are now shown as Supplementary Figure 13a (see above). In the case of ELIC, the nanobody Nb72 was bound on top of the sterol-binding cavity, but it did not interact with the interior of the cavity. This is now explained on p 17 from line 374:

      “When binding sites of larger know binders, including VHH47,48 and -bungarotoxin10,49 were examined (Figure 3 – supplement 1a) a nanobody bound to ELIC in the site covering the sterol-binding groove was identified, however, its interactions with ELIC did not overlap significantly with the interior of the sterol-binding groove. This suggests that the latter is a novel target location for binders.”

      Very interestingly, the binding of CHAPS stabilizes a conformation that differs from the apo one. It includes a twist of the ECDs but does not lead to a significant opening of the M2 bundle. The authors note that the direction of the twist is reversed to that often associated with the binding of ligands in homologues. This reversion is quite a feature, which deserves to be shown in a supplementary movie (e.g overlay of the Alpo apo>CHAPs transition with the nico>apo transition of a7).

      We have re-examined the rotation and compared it to the conformational changes in nACh 7 and 5-HT3 receptors. Upon closer examination, it became clear that relative rotation of the ECD and the TMD provides a very simplistic view of the quaternary conformational changes which are more complex 3D quaternary changes than a simple relative domain rotation. Careful alignment of the structures to the extracellular side of the trans-membrane pore showed that in both channels resting-> open state transition is associated with clockwise rotation, but resting-> desensitized state transition in 5-HT3 involves a counterclockwise rotation. Thus, 1) the direction of rotation is not a ‘universal’ feature of pLGICs and 2) the clockwise rotation is the direction of channel activation for α7 nACh receptor and 5-HT3 and shares similarities with rearrangements observed in Alpo4. However, the relative movement of the ECDs is different between Alpo upon CHAPS binding and α7 nACh and 5-HT3 receptor upon activation. To demonstrate this, we added Video 2 which shows quaternary changes for all 3 channels and the text has been modified as follows on page 11 line 208:

      “Quaternary changes in Alpo4 induced upon CHAPS binding and those associated with the activation of related α7 nACh and 5-HT3 receptors induced rotation of ECD relative to TMD in the same direction, however, the shifts of principal relative to complementary subunits were different (Video 2). In Alpo4, the complementary subunit slides upward whereas in the two other channels it consistently shifts towards the principal subunit and tilts relative to the TMD. The tilt is less pronounced in Alpo4 which is probably why it does not lead to the pore dilation.”

      We are grateful to the reviewer for drawing our attention to this point, which permitted us to correct initially inaccurate statements.

    1. Author Response

      Reviewer #2 (Public Review):

      Here, a simple model of cerebellar computation is used to study the dependence of task performance on input type: it is demonstrated that task performance and optimal representations are highly dependent on task and stimulus type. This challenges many standard models which use simple random stimuli and concludes that the granular layer is required to provide a sparse representation. This is a useful contribution to our understanding of cerebellar circuits, though, in common with many models of this type, the neural dynamics and circuit architecture are not very specific to the cerebellum, the model includes the feedforward structure and the high dimension of the granule layer, but little else. This paper has the virtue of including tasks that are more realistic, but by the paper’s own admission, the same model can be applied to the electrosensory lateral line lobe and it could, though it is not mentioned in the paper, be applied to the dentate gyrus and large pyramidal cells of CA3. The discussion does not include specific elements related to, for example, the dynamics of the Purkinje cells or the role of Golgi cells, and, in a way, the demonstration that the model can encompass different tasks and stimuli types is an indication of how abstract the model is. Nonetheless, it is useful and interesting to see a generalization of what has become a standard paradigm for discussing cerebellar function.

      We appreciate the Reviewer’s positive comments. Regarding the simplifications of our model, we agree that we have taken a modeling approach that abstracts away certain details to permit comparisons across systems. We now include an in-depth discussion of our simplifying assumptions (Assumptions & Extensions section in the Discussion) and have further noted the possibility that other biophysical mechanisms we have not accounted for may also underlie differences across systems.

      Our results predict that qualitative differences in the coding levels of cerebellum-like systems, across brain regions or across species, reflect an optimization to distinct tasks (Figure 7). However, it is also possible that differences in coding level arise from other physiological differences between systems.

      Reviewer #3 (Public Review):

      1) The paper by Xie et al is a modelling study of the mossy fiber-to-granule cell-to-Purkinje cell network, reporting that the optimal type of representations in the cerebellar granule cell layer depends on the type task. The paper stresses that the findings indicate a higher overall bias towards dense representations than stated in the literature, but it appears the authors have missed parts of the literature that already reported on this. While the modelling and analysis appear mathematically solid, the model is lacking many known constraints of the cerebellar circuitry, which makes the applicability of the findings to the biological counterpart somewhat limited.

      We thank the Reviewer for suggesting additional references to include in our manuscript, and for encouraging us to extend our model toward greater biological plausibility and more critically discuss simplifying assumptions we have made. We respond to both the comment about previous literature and about applicability to cerebellar circuitry in detail below.

      2) I have some concerns with the novelty of the main conclusion, here from the abstract: ’Here, we generalize theories of cerebellar learning to determine the optimal granule cell representation for tasks beyond random stimulus discrimination, including continuous input-output transformations as required for smooth motor control. We show that for such tasks, the optimal granule cell representation is substantially denser than predicted by classic theories.’ Stated like this, this has in principle already been shown, i.e. for example: Spanne and Jo¨rntell (2013) Processing of multi-dimensional sensorimotor information in the spinal and cerebellar neuronal circuitry: a new hypothesis. PLoS Comput Biol. 9(3):e1002979. Indeed, even the 2 DoF arm movement control that is used in the present paper as an application, was used in this previous paper, with similar conclusions with respect to the advantage of continuous input-output transformations and dense coding. Thus, already from the beginning of this paper, the novelty aspect of this paper is questionable. Even the conclusion in the last paragraph of the Introduction: ‘We show that, when learning input-output mappings for motor control tasks, the optimal granule cell representation is much denser than predicted by previous analyses.’ was in principle already shown by this previous paper.

      We thank the Reviewer for drawing our attention to Spanne and Jo¨rntell (2013). Our study shares certain similarities with this work, including the consideration of tasks with smooth input-output mappings, such as learning the dynamics of a two-joint arm. However, our study differs substantially, most notably the fact that we focus our study on parametrically varying the degree of sparsity in the granule cell layer to determine the circumstances under which dense versus sparse coding is optimal. To the best of our ability, we can find no result in Spanne and J¨orntell (2013) that indicates the performance of a network as a function of average coding level. Instead, Spanne and Jo¨rntell (2013) propose that inhibition from Golgi cells produces heterogeneity in coding level which can improve performance, which is an interesting but complementary finding to ours. We therefore do not believe that the quantitative computations of optimal coding level that we present are redundant with the results of this previous study. We also note that a key contribution of our study is mathemetical analysis of the inductive bias of networks with different coding levels which supports our conclusions.

      We have included a discussion of Spanne and Jo¨rntell (2013) and (2015) in the revised version of our manuscript:

      "Other studies have considered tasks with smooth input-output mappings and low-dimensional inputs, finding that heterogeneous Golgi cell inhibition can improve performance by diversifying individual granule cell thresholds (Spanne and J¨orntell, 2013). Extending our model to include heterogeneous thresholds is an interesting direction for future work. Another proposal states that dense coding may improve generalization (Spanne and Jo¨rntell, 2015). Our theory reveals that whether or not dense coding is beneficial depends on the task."

      3) However, the present paper does add several more specific investigations/characterizations that were not previously explored. Many of the main figures report interesting new model results. However, the model is implemented in a highly generic fashion. Consequently, the model relates better to general neural network theory than to specific interpretations of the function of the cerebellar neuronal circuitry. One good example is the findings reported in Figure 2. These represent an interesting extension to the main conclusion, but they are also partly based on arbitrariness as the type of mossy fiber input described in the random categorization task has not been observed in the mammalian cerebellum under behavior in vivo, whereas in contrast, the type of input for the motor control task does resemble mossy fiber input recorded under behavior (van Kan et al 1993).

      We agree that the tasks we consider in Figure 2 are simplified compared to those that we consider elsewhere in the paper. The choice of random mossy fiber input was made to provide a comparison to previous modeling studies that also use random input as a benchmark (Marr 1969, Albus 1971, Brunel 2004, Babadi and Sompolinsky 2014, Billings 2014, LitwinKumar et al., 2017). This baseline permits us to specifically evaluate the effects of lowdimensional inputs (Figure 2) and richer input-output mappings (Figure 2, Figure 7). We agree with the Reviewer that the random and uncorrelated mossy fiber activity that has been extensively used in previous studies is almost certainly an unrealistic idealization of in vivo neural activity—this is a motivating factor for our study, which relaxes this assumption and examines the consequences. To provide additional context, we have updated the following paragraph in the main text Results section:

      "A typical assumption in computational theories of the cerebellar cortex is that inputs are randomly distributed in a high-dimensional space (Marr, 1969; Albus, 1971; Brunel et al., 2004; Babadi and Sompolinsky, 2014; Billings et al., 2014; Litwin-Kumar et al., 2017). While this may be a reasonable simplification in some cases, many tasks, including cerebellumdependent tasks, are likely best-described as being encoded by a low-dimensional set of variables. For example, the cerebellum is often hypothesized to learn a forward model for motor control (Wolpert et al., 1998), which uses sensory input and motor efference to predict an effector’s future state. Mossy fiber activity recorded in monkeys correlates with position and velocity during natural movement (van Kan et al., 1993). Sources of motor efference copies include motor cortex, whose population activity lies on a lowdimensional manifold (Wagner et al., 2019; Huang et al., 2013; Churchland et al., 2010; Yu et al., 2009). We begin by modeling the low dimensionality of inputs and later consider more specific tasks."

      4) The overall conclusion states: ‘Our results....suggest that optimal cerebellar representations are task-dependent.’ This is not a particularly strong or specific conclusion. One could interpret this statement as simply saying: ‘if I construct an arbitrary neural network, with arbitrary intrinsic properties in neurons and synapses, I can get outputs that depend on the intensity of the input that I provide to that network.’ Further, the last sentence of the Introduction states: ‘More broadly, we show that the sparsity of a neural code has a task-dependent influence on learning...’ This is very general and unspecific, and would likely not come as a surprise to anyone interested in the analysis of neural networks. It doesn’t pinpoint any specific biological problem but just says that if I change the density of the input to a [generic] network, then the learning will be impacted in one way or another.

      We agree with the Reviewer that our conclusions are quite general, and we have removed the final sentence as we agree it was unspecific. However, we disagree with the Reviewer’s paraphrasing of our results.

      First, we do not select arbitrary intrinsic properties of neurons and synapses. Rather, we construct a simplified model with a key quantity, the neuronal threshold, that we vary parametrically in order to assess the effect of the resulting changes in the representation on performance. Second, we do not vary the intensity/density of inputs provided to the network – this is fixed throughout our study for all key comparisons we perform. Instead, we vary the density (coding level) of the expansion layer representation and quantify its effect on inductive bias and generalization. Finally, our study’s key contribution is an explanation of the heterogeneity in average coding level observed across behaviors and cerebellum-like systems. We go beyond the empirical statement that there is a dependence of performance on the parameter that we vary by developing an analytical theory. Our theory describes the performance of the class of networks that we study and the properties of learning tasks that determine the optimal expansion layer representation.

      To clarify our main contributions, we have updated the final paragraph of the Introduction. We have also removed the sentence that the Reviewer objects to, as it was less specific than the other points we make here.

      "We propose that these differences can be explained by the capacity of representations with different levels of sparsity to support learning of different tasks. We show that the optimal level of sparsity depends on the structure of the input-output relationship of a task. When learning input-output mappings for motor control tasks, the optimal granule cell representation is much denser than predicted by previous analyses. To explain this result, we develop an analytic theory that predicts the performance of cerebellum-like circuits for arbitrary learning tasks. The theory describes how properties of cerebellar architecture and activity control these networks’ inductive bias: the tendency of a network toward learning particular types of input-output mappings (Sollich, 1998; Jacot et al., 2018; Bordelon et al., 2020; Canatar et al., 2021; Simon et al., 2021). The theory shows that inductive bias, rather than the dimension of the representation alone, is necessary to explain learning performance across tasks. It also suggests that cerebellar regions specialized for different functions may adjust the sparsity of their granule cell representations depending on the task."

      5) The interpretation of the distribution of the mossy fiber inputs to the granule cells, which would have a crucial impact on the results of a study like this, is likely incorrect. First, unlike the papers that the authors cite, there are many studies indicating that there is a topographic organization in the mossy fiber termination, such that mossy fibers from the same inputs, representing similar types of information, are regionally co-localized in the granule cell layer. Hence, there is no support for the model assumption that there is a predominantly random termination of mossy fibers of different origins. This risks invalidating the comparisons that the authors are making, i.e. such as in Figure 3. This is a list of example papers, there are more: van Kan, Gibson and Houk (1993) Movement-related inputs to intermediate cerebellum of the monkey. Journal of Neurophysiology. Garwicz et al (1998) Cutaneous receptive fields and topography of mossy fibres and climbing fibres projecting to cat cerebellar C3 zone. The Journal of Physiology. Brown and Bower (2001) Congruence of mossy fiber and climbing fiber tactile projections in the lateral hemispheres of the rat cerebellum. The Journal of Comparative Neurology. Na, Sugihara, Shinoda (2019) The entire trajectories of single pontocerebellar axons and their lobular and longitudinal terminal distribution patterns in multiple aldolase C-positive compartments of the rat cerebellar cortex. The Journal of Comparative Neurology.

      6) The nature of the mossy fiber-granule cell recording is also reviewed here: Gilbert and Miall (2022) How and Why the Cerebellum Recodes Input Signals: An Alternative to Machine Learning. The Neuroscientist. Further, considering the re-coding idea, the following paper shows that detailed information, as it is provided by mossy fibers, is transmitted through the granule cells without any evidence of re-coding: Jo¨rntell and Ekerot (2006) Journal of Neuroscience; and this paper shows that these granule inputs are powerfully transmitted to the molecular layer even in a decerebrated animal (i.e. where only the ascending sensory pathways remains) Jo¨rntell and Ekerot 2002, Neuron.

      We agree that there is strong evidence for a topographic organization in mossy fiber to granule cell connectivity at the microzonal level. We thank the Reviewer for pointing us to specific examples. We acknowledge that our simplified model does not capture the structure of connectivity observed in these studies.

      However, the focus of our model is on cerebellar neurons presynaptic to a single Purkinje cell. Random or disordered distribution of inputs at this local scale is compatible with topographic organization at the microzonal scale. Furthermore, while there is evidence of structured connections at the local scale, models with random connectivity are able to reproduce the dimensionality of granule cell activity within a small margin of error (Nguyen et al., 2022). Finally, our finding that dense codes are optimal for learning slowly varying tasks is consistent with evidence for the lack of re-coding – for such tasks, re-coding may absent because it is not required.

      We have dedicated a section on this issue in the Assumptions and Extensions portion of our Discussion:

      "Another key assumption concerning the granule cells is that they sample mossy fiber inputs randomly, as is typically assumed in Marr-Albus models (Marr, 1969; Albus, 1971; LitwinKumar et al., 2017; Cayco-Gajic et al., 2017). Other studies instead argue that granule cells sample from mossy fibers with highly similar receptive fields (Garwicz et al., 1998; Brown and Bower, 2001; J¨orntell and Ekerot, 2006) defined by the tuning of mossy fiber and climbing fiber inputs to cerebellar microzones (Apps et al., 2018). This has led to an alternative hypothesis that granule cells serve to relay similarly tuned mossy fiber inputs and enhance their signal-to-noise ratio (Jo¨rntell and Ekerot, 2006; Gilbert and Chris Miall, 2022) rather than to re-encode inputs. Another hypothesis is that granule cells enable Purkinje cells to learn piece-wise linear approximations of nonlinear functions (Spanne and J¨orntell, 2013). However, several recent studies support the existence of heterogeneous connectivity and selectivity of granule cells to multiple distinct inputs at the local scale (Huang et al., 2013; Ishikawa et al., 2015). Furthermore, the deviation of the predicted dimension in models constrained by electron-microscopy data as compared to randomly wired models is modest (Nguyen et al., 2022). Thus, topographically organized connectivity at the macroscopic scale may coexist with disordered connectivity at the local scale, allowing granule cells presynaptic to an individual Purkinje cell to sample heterogeneous combinations of the subset of sensorimotor signals relevant to the tasks that Purkinje cell participates in. Finally, we note that the optimality of dense codes for learning slowly varying tasks in our theory suggests that observations of a lack of mixing (J¨orntell and Ekerot, 2002) for such tasks are compatible with Marr-Albus models, as in this case nonlinear mixing is not required."

      7) I could not find any description of the neuron model used in this paper, so I assume that the neurons are just modelled as linear summators with a threshold (in fact, Figure 5 mentions inhibition, but this appears to be just one big lump inhibition, which basically is an incorrect implementation). In reality, granule cells of course do have specific properties that can impact the input-output transformation, PARTICULARLY with respect to the comparison of sparse versus dense coding, because the low-pass filtering of input that occurs in granule cells (and other neurons) as well as their spike firing stochasticity (Saarinen et al (2008). Stochastic differential equation model for cerebellar granule cell excitability. PLoS Comput. Biol. 4:e1000004) will profoundly complicate these comparisons and make them less straight forward than what is portrayed in this paper. There are also several other factors that would be present in the biological setting but are lacking here, which makes it doubtful how much information in relation to the biological performance that this modelling study provides: What are the types of activity patterns of the inputs? What are the learning rules? What is the topography? What is the impact of Purkinje cell outputs downstream, as the Purkinje cell output does not have any direct action, it acts on the deep cerebellar nuclear neurons, which in turn act on a complex sensorimotor circuitry to exert their effect, hence predictive coding could only become interpretable after the PC output has been added to the activity in those circuits. Where is the differentiated Golgi cell inhibition?

      Thank you for these critiques. We have made numerous edits to improve the presentation of the details of our model in the main text of the manuscript. Indeed, granule cells in the main text are modeled as linear sums of mossy fiber inputs with a threshold-linear activation function. A more detailed description of the model for granule cells can now be found in Equation 1 in the Results section:

      "The activity of neurons in the expansion layer is given by: h = φ(Jeffx − θ), (1) where φ is a rectified linear activation function φ(u) = max(u,0) applied element-wise. Our results also hold for other threshold-polynomial activation functions. The scalar threshold θ is shared across neurons and controls the coding level, which we denote by f, defined as the average fraction of neurons in the expansion layer that are active."

      Most of our analyses use the firing rate model we describe above, but several Supplemental Figures show extensions to this model. As we mention in the Discussion, our results do not depend on the specific choice of nonlinearity (Figure 2-figure supplement 2). We have also considered the possibility that the stochastic nature of granule cell spikes could impact our measures of coding level. In Figure 7-figure supplement 1 we test the robustness of our main conclusion using a spiking model where we model granule cell spikes with Poisson statistics. When measuring coding level in a population of spiking neurons, a key question is at what time window the Purkinje cell integrates spikes. For several choices of integration time windows, we show that dense coding remains optimal for learning smooth tasks. However, we agree with the Reviewer that there are other biological details our model does not address. For example, our spiking model does not capture some of the properties the Saarinen et al. (2008) model captures, including random sub-threshold oscillations and clusters of spikes. Modeling biophysical phenomena at this scale is beyond the scope of our study. We have added this reference to the relevant section of the Discussion:

      "We also note that coding level is most easily defined when neurons are modeled as rate, rather than spiking units. To investigate the consistency of our results under a spiking code, we implemented a model in which granule cell spiking exhibits Poisson variability and quantify coding level as the fraction of neurons that have nonzero spike counts (Figure 7-figure supplement 1; Figure 7C). In general, increased spike count leads to improved performance as noise associated with spiking variability is reduced. Granule cells have been shown to exhibit reliable burst responses to mossy fiber stimulation (Chadderton et al., 2004), motivating models using deterministic responses or sub-Poisson spiking variability. However, further work is needed to quantitatively compare variability in model and experiment and to account for more complex biophysical properties of granule cells (Saarinen et al., 2008)."

      A second concern the Reviewer raises is our implementation of Golgi cell inhibition as a homogeneous rather than heterogeneous input onto granule cells. In simplified models, adding heterogeneous inhibition does not dramatically change the qualitative properties of the expansion layer representation, in particular the dimensionality of the representation (Billings et al., 2014, Cayco-Gajic et al., 2017, Litwin-Kumar et al., 2017). We have added a section about inhibition to our Discussion:

      "We also have not explicitly modeled inhibitory input provided by Golgi cells, instead assuming such input can be modeled as a change in effective threshold, as in previous studies (Billings et al., 2014; Cayco-Gajic et al., 2017; Litwin-Kumar et al., 2017). This is appropriate when considering the dimension of the granule cell representation (Litwin-Kumar et al., 2017), but more work is needed to extend our model to the case of heterogeneous inhibition."

      Regarding the mossy fiber inputs, as we state in response to paragraph 3, we agree with the Reviewer that the random and uncorrelated mossy fiber activity that has been used in previous studies is an unrealistic idealization of in vivo neural activity. One of the motivations for our model was to relax this assumption and examine the consequences: we introduce correlations in the mossy fiber activity by projecting low-dimensional patterns into the mossy fiber layer (Figure 1B):

      "A typical assumption in computational theories of the cerebellar cortex is that inputs are randomly distributed in a high-dimensional space (Marr, 1969; Albus, 1971; Brunel et al., 2004; Babadi and Sompolinsky, 2014; Billings et al., 2014; Litwin-Kumar et al., 2017). While this may be a reasonable simplification in some cases, many tasks, including cerebellumdependent tasks, are likely best-described as being encoded by a low-dimensional set of variables. For example, the cerebellum is often hypothesized to learn a forward model for motor control (Wolpert et al., 1998), which uses sensory input and motor efference to predict an effector’s future state. Mossy fiber activity recorded in monkeys correlates with position and velocity during natural movement (van Kan et al., 1993). Sources of motor efference copies include motor cortex, whose population activity lies on a low-dimensional manifold (Wagner et al., 2019; Huang et al., 2013; Churchland et al., 2010; Yu et al., 2009). We begin by modeling the low dimensionality of inputs and later consider more specific tasks.

      We therefore assume that the inputs to our model lie on a D-dimensional subspace embedded in the N-dimensional input space, where D is typically much smaller than N (Figure 1B). We refer to this subspace as the “task subspace” (Figure 1C)."

      The Reviewer also mentions the learning rule at granule cell to Purkinje cell synapses. We agree that considering online, climbing-fiber-dependent learning is an important generalization. We therefore added a new supplemental figure investigating whether we would still see a difference in optimal coding levels across tasks if online learning were used instead of the least squares solution (Figure 7-figure supplement 2). Indeed, we observed a similar task dependence as we saw in Figure 2F. We have added a new paragraph in the Discussion under Assumptions and Extensions describing our rationale and approach in detail:

      "For the Purkinje cells, our model assumes that their responses to granule cell input can be modeled as an optimal linear readout. Our model therefore provides an upper bound to linear readout performance, a standard benchmark for the quality of a neural representation that does not require assumptions on the nature of climbing fiber-mediated plasticity, which is still debated. Electrophysiological studies have argued in favor of a linear approximation (Brunel et al., 2004). To improve the biological applicability of our model, we implemented an online climbing fiber-mediated learning rule and found that optimal coding levels are still task-dependent (Figure 7-figure supplement 2). We also note that although we model several timing-dependent tasks (Figure 7), our learning rule does not exploit temporal information, and we assume that temporal dynamics of granule cell responses are largely inherited from mossy fibers. Integrating temporal information into our model is an interesting direction for future investigation."

      Finally, regarding the function of the Purkinje cell, our model defines a learning task as a mapping from inputs to target activity in the Purkinje cell and is thus agnostic to the cell’s downstream effects. We clarify this point when introducing the definition of a learning task:

      "In our model, a learning task is defined by a mapping from task variables x to an output f(x), representing a target change in activity of a readout neuron, for example a Purkinje cell. The limited scope of this definition implies our results should not strongly depend on the influence of the readout neuron on downstream circuits."

      8) The problem of these, in my impression, generic, arbitrary settings of the neurons and the network in the model becomes obvious here: ‘In contrast to the dense activity in cerebellar granule cells, odor responses in Kenyon cells, the analogs of granule cells in the Drosophila mushroom body, are sparse...’ How can this system be interpreted as an analogy to granule cells in the mammalian cerebellum when the model does not address the specifics lined up above? I.e. the ‘inductive bias’ that the authors speak of, defined as ‘the tendency of a network toward learning particular types of input-output mappings’, would be highly dependent on the specifics of the network model.

      We agree with the Reviewer that our model makes several simplifying assumptions for mathematical tractability. However, we note that our study is not the first to draw analogies between cerebellum-like systems, including the mushroom body (Bell et al., 2008; Farris, 2011). All the systems we study feature a sparsely connected, expanded granule-like layer that sends parallel fiber axons onto densely connected downstream neurons known to exhibit powerful synaptic plasticity, thus motivating the key architectural assumptions of our model. We have constrained anatomical parameters of the model using data as available (Table 1). However, we agree with the Reviewer that when making comparisons across species there is always a possibility that differences are due to physiological mechanisms we have not fully understood or captured with a model. As such, we can only present a hypothesis for these differences. We have modified our Discussion section on this topic to clearly state this.

      "Our results predict that qualitative differences in the coding levels of cerebellum-like systems, across brain regions or across species, reflect an optimization to distinct tasks (Figure 7). However, it is also possible that differences in coding level arise from other physiological differences between systems."

      9) More detailed comments: Abstract: ‘In these models [Marr-Albus], granule cells form a sparse, combinatorial encoding of diverse sensorimotor inputs. Such sparse representations are optimal for learning to discriminate random stimuli.’ Yes, I would agree with the first part, but I contest the second part of this statement. I think what is true for sparse coding is that the learning of random stimuli will be faster, as in a perceptron, but not necessarily better. As the sparsification essentially removes information, it could be argued that the quality of the learning is poorer. So from that perspective, it is not optimal. The authors need to specify from what perspective they consider sparse representations optimal for learning.

      This is an important point that we would like to clarify. It is not the case that sparse coding simply speeds up learning. In our study and many related works (Barak et al. 2013; Babadi and Sompolinsky 2014; Litwin-Kumar et al. 2017), learning performance is measured based on the generalization ability of the network – the ability to predict correct labels for previously unseen inputs. As our study and previous studies show, sparse codes are optimal in the sense that they minimize generalization error, independent of any effect on learning speed. To communicate this more effectively, we have added the following sentence to the first paragraph of the Introduction:

      "Sparsity affects both learning speed (Cayco-Gajic et al., 2017), and generalization, the ability to predict correct labels for previously unseen inputs (Barak et al., 2013; Babadi and Sompolinsky, 2014; Litwin-Kumar et al., 2017)."

      10) Introduction: ‘Indeed, several recent studies have reported dense activity in cerebellar granule cells in response to sensory stimulation or during motor control tasks (Knogler et al., 2017; Wagner et al., 2017; Giovannucci et al., 2017; Badura and De Zeeuw, 2017; Wagner et al., 2019), at odds with classic theories (Marr, 1969; Albus, 1971).’ In fact, this was precisely the issue that was addressed already by Jo¨rntell and Ekerot (2006) Journal of Neuroscience. The conclusion was that these actual recordings of granule cells in vivo provided essentially no support for the assumptions in the Marr-Albus theories.

      In our reading, the main finding of J¨orntell and Ekerot (2006) is that individual granule cells are activated by mossy fibers with overlapping receptive fields driven by a single type of somatosensory input. However, there is also evidence of nonlinear mixed selectivity in granule cells in support of the re-coding hypothesis (Huang et al., 2013; Ishikawa et al., 2015). Jo¨rntell and Ekerot (2006) also suggest that the granule cell layer shares similar topographic organization as mossy fibers, organized into microzones. The existence of topographic organization does not invalidate Marr-Albus theories. As we have suggested earlier, a local combinatorial expansion can coexist with a global topographic organization.

      We have described these considerations in the Assumptions and Extensions portion of the Discussion:

      "Another key assumption concerning the granule cells is that they sample mossy fiber inputs randomly, as is typically assumed in Marr-Albus models (Marr, 1969; Albus, 1971; LitwinKumar et al., 2017; Cayco-Gajic et al., 2017). Other studies instead argue that granule cells sample from mossy fibers with highly similar receptive fields (Garwicz et al., 1998; Brown and Bower, 2001; J¨orntell and Ekerot, 2006) defined by the tuning of mossy fiber and climbing fiber inputs to cerebellar microzones (Apps et al., 2018). This has led to an alternative hypothesis that granule cells serve to relay similarly tuned mossy fiber inputs and enhance their signal-to-noise ratio (Jo¨rntell and Ekerot, 2006; Gilbert and Chris Miall, 2022) rather than to re-encode inputs. Another hypothesis is that granule cells enable Purkinje cells to learn piece-wise linear approximations of nonlinear functions (Spanne and J¨orntell, 2013). However, several recent studies support the existence of heterogeneous connectivity and selectivity of granule cells to multiple distinct inputs at the local scale (Huang et al., 2013; Ishikawa et al., 2015). Furthermore, the deviation of the predicted dimension in models constrained by electron-microscopy data as compared to randomly wired models is modest (Nguyen et al., 2022). Thus, topographically organized connectivity at the macroscopic scale may coexist with disordered connectivity at the local scale, allowing granule cells presynaptic to an individual Purkinje cell to sample heterogeneous combinations of the subset of sensorimotor signals relevant to the tasks that Purkinje cell participates in. Finally, we note that the optimality of dense codes for learning slowly varying tasks in our theory suggests that observations of a lack of mixing (J¨orntell and Ekerot, 2002) for such tasks are compatible with Marr-Albus models, as in this case nonlinear mixing is not required."

      We have also included the Jo¨rntell and Ekerot (2006) study as a citation in the Introduction:

      "Indeed, several recent studies have reported dense activity in cerebellar granule cells in response to sensory stimulation or during motor control tasks (Jo¨rntell and Ekerot, 2006; Knogler et al., 2017; Wagner et al., 2017; Giovannucci et al., 2017; Badura and De Zeeuw, 2017; Wagner et al., 2019), at odds with classic theories (Marr, 1969; Albus, 1971)."

      11) Results: 1st para: There is no information about how the granule cells are modelled.

      We agree that this should information should have been more readily available. We now more completely describe the model in the main text. Our model for granule cells can be found in Equation 1 in the Results section and also the Methods (Network Model):

      "The activity of neurons in the expansion layer is given by: h = φ(Jeffx − θ), (2)

      where φ is a rectified linear activation function φ(u) = max(u,0) applied element-wise. Our results also hold for other threshold-polynomial activation functions. The scalar threshold θ is shared across neurons and controls the coding level, which we denote by f, defined as the average fraction of neurons in the expansion layer that are active."

      12) 2nd para: ‘A typical assumption in computational theories of the cerebellar cortex is that inputs are randomly distributed in a high-dimensional space.’ Yes, I agree, and this is in fact in conflict with the known topographical organization in the cerebellar cortex (see broader comment above). Mossy fiber inputs coding for closely related inputs are co-localized in the cerebellar cortex. I think for this model to be of interest from the point of view of the mammalian cerebellar cortex, it would need to pay more attention to this organizational feature.

      As we discuss in our response to paragraphs 5 and 6, we see the random distribution assumption at the local scale (inputs presynaptic to a single Purkinje cell) as being compatible with topographic organization occurring at the microzone scale. Furthermore, as discussed earlier, we specifically model low-dimensional input as opposed to the random and high-dimensional inputs typically studied in prior models.

      "A typical assumption in computational theories of the cerebellar cortex is that inputs are randomly distributed in a high-dimensional space (Marr, 1969; Albus, 1971; Brunel et al., 2004; Babadi and Sompolinsky, 2014; Billings et al., 2014; Litwin-Kumar et al., 2017). While this may be a reasonable simplification in some cases, many tasks, including cerebellumdependent tasks, are likely best-described as being encoded by a low-dimensional set of variables. For example, the cerebellum is often hypothesized to learn a forward model for motor control (Wolpert et al., 1998), which uses sensory input and motor efference to predict an effector’s future state. Mossy fiber activity recorded in monkeys correlates with position and velocity during natural movement (van Kan et al., 1993). Sources of motor efference copies include motor cortex, whose population activity lies on a low-dimensional manifold (Wagner et al., 2019; Huang et al., 2013; Churchland et al., 2010; Yu et al., 2009). We begin by modeling the low dimensionality of inputs and later consider more specific tasks. We therefore assume that the inputs to our model lie on a D-dimensional subspace embedded in the N-dimensional input space, where D is typically much smaller than N (Figure 1B). We refer to this subspace as the “task subspace” (Figure 1C)."

      References

      Albus, J.S. (1971). A theory of cerebellar function. Mathematical Biosciences 10, 25–61.

      Apps, R., et al. (2018). Cerebellar Modules and Their Role as Operational Cerebellar Processing Units. Cerebellum 17, 654–682.

      Babadi, B. and Sompolinsky, H. (2014). Sparseness and expansion in sensory representations. Neuron 83, 1213–1226.

      Badura, A. and De Zeeuw, C.I. (2017). Cerebellar granule cells: dense, rich and evolving representations. Current Biology 27, R415–R418.

      Barak, O., Rigotti, M., and Fusi, S. (2013). The sparseness of mixed selectivity neurons controls the generalization–discrimination trade-off. Journal of Neuroscience 33, 3844– 3856.

      Bell, C.C., Han, V., and Sawtell, N.B. (2008). Cerebellum-like structures and their implications for cerebellar function. Annual Review of Neuroscience 31, 1–24.

      Billings, G., Piasini, E., Lo˝rincz, A., Nusser, Z., and Silver, R.A. (2014). Network structure within the cerebellar input layer enables lossless sparse encoding. Neuron 83, 960–974.

      Bordelon, B., Canatar, A., and Pehlevan, C. (2020). Spectrum dependent learning curves in kernel regression and wide neural networks. International Conference on Machine Learning 1024–1034.

      Brown, I.E. and Bower, J.M. (2001). Congruence of mossy fiber and climbing fiber tactile projections in the lateral hemispheres of the rat cerebellum. Journal of Comparative Neurology 429, 59–70.

      Brunel, N., Hakim, V., Isope, P., Nadal, J.P., and Barbour, B. (2004). Optimal information storage and the distribution of synaptic weights: perceptron versus Purkinje cell. Neuron 43, 745–757.

      Canatar, A., Bordelon, B., and Pehlevan, C. (2021). Spectral bias and task-model alignment explain generalization in kernel regression and infinitely wide neural networks. Nature Communications 12, 1–12.

      Cayco-Gajic, N.A., Clopath, C., and Silver, R.A. (2017). Sparse synaptic connectivity is required for decorrelation and pattern separation in feedforward networks. Nature Communications 8, 1–11.

      Chadderton, P., Margrie, T.W., and Ha¨usser, M. (2004). Integration of quanta in cerebellar granule cells during sensory processing. Nature 428, 856–860.

      Churchland, M.M., et al. (2010). Stimulus onset quenches neural variability: a widespread cortical phenomenon. Nature Neuroscience 13, 369–378.

      Farris, S.M. (2011). Are mushroom bodies cerebellum-like structures? Arthropod structure & development 40, 368–379.

      Garwicz, M., Jorntell, H., and Ekerot, C.F. (1998). Cutaneous receptive fields and topography of mossy fibres and climbing fibres projecting to cat cerebellar C3 zone. The Journal of Physiology 512 ( Pt 1), 277–293.

      Gilbert, M. and Chris Miall, R. (2022). How and Why the Cerebellum Recodes Input Signals: An Alternative to Machine Learning. The Neuroscientist 28, 206–221.

      Giovannucci, A., et al. (2017). Cerebellar granule cells acquire a widespread predictive feedback signal during motor learning. Nature Neuroscience 20, 727–734.

      Huang, C.C., et al. (2013). Convergence of pontine and proprioceptive streams onto multimodal cerebellar granule cells. eLife 2, e00400.

      Ishikawa, T., Shimuta, M., and Ha¨usser, M. (2015). Multimodal sensory integration in single cerebellar granule cells in vivo. eLife 4, e12916.

      Jacot, A., Gabriel, F., and Hongler, C. (2018). Neural tangent kernel: Convergence and generalization in neural networks. Advances in Neural Information Processing Systems 31.

      Jo¨rntell, H. and Ekerot, C.F. (2002). Reciprocal Bidirectional Plasticity of Parallel Fiber Receptive Fields in Cerebellar Purkinje Cells and Their Afferent Interneurons. Neuron 34, 797–806.

      Jorntell, H. and Ekerot, C.F. (2006). Properties of Somatosensory Synaptic Integration in Cerebellar Granule Cells In Vivo. Journal of Neuroscience 26, 11786–11797.

      Knogler, L.D., Markov, D.A., Dragomir, E.I., Stih, V., and Portugues, R. (2017). Senso-ˇ rimotor representations in cerebellar granule cells in larval zebrafish are dense, spatially organized, and non-temporally patterned. Current Biology 27, 1288–1302.

      Litwin-Kumar, A., Harris, K.D., Axel, R., Sompolinsky, H., and Abbott, L.F. (2017). Optimal degrees of synaptic connectivity. Neuron 93, 1153–1164. Marr, D. (1969). A theory of cerebellar cortex. Journal of Physiology 202, 437–470.

      Nguyen, T.M., et al. (2022). Structured cerebellar connectivity supports resilient pattern separation. Nature 1–7.

      Saarinen, A., Linne, M.L., and Yli-Harja, O. (2008). Stochastic Differential Equation Model for Cerebellar Granule Cell Excitability. PLOS Computational Biology 4, e1000004.

      Simon, J.B., Dickens, M., and DeWeese, M.R. (2021). A theory of the inductive bias and generalization of kernel regression and wide neural networks. arXiv: 2110.03922.

      Sollich, P. (1998). Learning curves for Gaussian processes. Advances in Neural Information Processing Systems 11.

      Spanne, A. and Jo¨rntell, H. (2013). Processing of Multi-dimensional Sensorimotor Information in the Spinal and Cerebellar Neuronal Circuitry: A New Hypothesis. PLOS Computational Biology 9, e1002979.

      Spanne, A. and Jo¨rntell, H. (2015). Questioning the role of sparse coding in the brain. Trends in Neurosciences 38, 417–427.

      van Kan, P.L., Gibson, A.R., and Houk, J.C. (1993). Movement-related inputs to intermediate cerebellum of the monkey. Journal of Neurophysiology 69, 74–94.

      Wagner, M.J., Kim, T.H., Savall, J., Schnitzer, M.J., and Luo, L. (2017). Cerebellar granule cells encode the expectation of reward. Nature 544, 96–100.

      Wagner, M.J., et al. (2019). Shared cortex-cerebellum dynamics in the execution and learning of a motor task. Cell 177, 669–682.e24.

      Wolpert, D.M., Miall, R.C., and Kawato, M. (1998). Internal models in the cerebellum. Trends in Cognitive Sciences 2, 338–347.

      Yu, B.M., et al. (2009). Gaussian-process factor analysis for low-dimensional single-trial analysis of neural population activity. Journal of Neurophysiology 102, 614–635.

    1. Author Response

      Reviewer #1 (Public Review):

      This study combines in vitro somatic and dendritic recordings and computational modeling to study how cholinergic agonists modulate the response of CA1 pyramidal neurons to triangular current injections. The authors have previously used a similar approach (Upchurch, 2022, JNeuroscience) to show that CA1 neurons exhibit asymmetric AP firing (more firing on the upward ramp) in response to such current injections and that this effect is due to Na channel inactivation. The present work builds on these results by showing that cholinergic modulation changes this response, i.e., there is more firing on the downward part of the ramp. This change appears to require an intracellular Ca2+ concentration increase (mediated via IP3 and voltage-gated Ca2+ channels), which activates TRPM4 channels. In this scheme, cholinergic activity increases IP3, and the depolarizing current injection opens voltage-gated Ca2+ channels. This study will be of some interest to cellular neurophysiology experts working on the hippocampus.

      1) This study claims that the triangular current injections recapitulate hippocampal place cell activity. However, it has been shown recently that the asymmetric firing of CA1 place cells is due to synaptic weight changes resulting from synaptic plasticity (e.g., Bittner et al., 2017). This suggests that the asymmetric firing of place cells is primarily the result of asymmetric synaptic input. Therefore, the authors should test whether carbachol similarly affects a synaptically driven membrane potential ramp. If this is not the case, the strong claim that this work has implications for place cell firing is not justified, in my opinion.

      We have added the results showing the effects of cholinergic modulation on a synaptically-driven membrane potential ramp, obtained by electrically stimulating the Schaffer collaterals with a stimulation frequency that was adjusted according to a linear, symmetric ramp (see also Hsu et al, Neuron 99,147-162, 2018). These results have been added to the manuscript in the Results section for new Figure 2 (lines 169-197) and in the Methods section (lines 716-726).

      2) Along the same lines, it has been shown before that the precision of spike timing depends on the stimulation pattern in vitro (Mainen and Sejnowski, 1995). Constant stimuli led to imprecise AP firing trains, whereas current injections that included fluctuations resembling synaptic input generated spike trains that were more reliable and reproducible in terms of timing. This study concluded that a low intrinsic noise level in spike generation was essential in generating informative spike sequences. Following this pivotal work, the authors could add noise to their current stimulus and observe the effect on the AP firing patterns. If this is not possible, the authors should at least report the sweep-to-sweep variability for the data shown, e.g., in panels 1A2, 1B2, 1D2, and 1E2.

      We thank the reviewer for this suggestion to acknowledge the variability in the data across trials and we have added the Mainen and Sejnowski, 1995 citation to the manuscript (see Results lines 128-134). We addressed sweep-to-sweep variability among the various trials.

      3) In most of the data presented in this manuscript, Carbachol appears to induce a 3 mV hyperpolarization and increase input resistance. As a result, the amount of current injected during Carbachol is drastically lower than during the controls. This should be emphasized more, and the input resistance should be quantified for each experimental condition. It should also be discussed whether this change in input resistance can account for the changes in the firing pattern observed. Finally, it should be clearly stated how the amount of the current injected was chosen for each cell, and data from a range of injected current ramps should be shown for each cell.

      We thank the reviewers for this comment, which made us realize that our initial presentation was not clear, in particular with regard to the traces that were chosen as examples in the initial submission of the paper. We now clarify on page 5 (lines 113-125) of the manuscript as follows:

      “In some trials, under control conditions, we applied a baseline depolarization prior to the ramp, in order to capture the variability observed in vivo (Harvey et al Nature 461:941–946, 2009; Epsztein et al. Neuron 70:109–120, 2011). Application of the cholinergic agonist carbachol (CCh, 2 µM) caused a depolarization of 2-6 mV. We compensated for this depolarization by injecting tonic hyperpolarizing current to reestablish the original membrane potential (see also Losonczy, et al., Nature 452, 436-442, 2008), as indicated by an offset from the 0 pA current level in the traces of the injected current ramps. The amplitude of background fluctuations in the resting membrane potential increased from a few tenths of a mV in control to 2-4 mV in CCh. Moreover, the threshold for action potential generation became more hyperpolarized. For all these reasons, we were not able to consistently vary the membrane potential using baseline depolarizations in the presence of CCh, because baseline depolarization alone frequently evoked spiking.”

      For this reason, many of the carbachol example traces in the initial submission had more hyperpolarized Vm than their control counterparts. Acetylcholine also caused a depolarization in a dose-dependent manner, that was compensated for in the same way. In this new version of the manuscript, we systematically report the effects of cholinergic agonists on membrane potential and neuronal excitability. Further, we show example traces with resting membrane potentials within 1 mV for each pharmacological comparison, therefore removing this variable and hopefully making results clearer. We also now state how the amount of injected current was chosen for each condition, and that the amount of injected current was generally lower in the presence of cholinergic agonists. Both the tonic hyperpolarizing current and the amplitude of the injected ramp for each example can now be appreciated in each figure.

      Finally, the reviewers’ comment also made us realize that, in principle, the center of mass of firing could be systematically skewed by the initial membrane potential, the amplitude of the current ramp injection and/or the input resistance. For this reason, we added a supplementary figure (1-2) where the adaptation index was plotted as a function of each these variables. In all cases, it is apparent that the main factor determining whether the center of mass of firing is shifted earlier or later in the ramp is the presence or absence of carbachol rather than initial membrane potential, current injection amplitude, or input resistance.

      4) It remains unclear how the current result that TRPM4 channels can mediate the firing pattern change relates to the previous finding that the current injection evoked CA1 neuronal firing pattern is due to long-term Na channel inactivation.

      We thank the reviewers for this suggestion, which helps to clarify our initial results. New Figure 8 addresses the connection between long-term inactivation of Na+ channels and the activation of TRPM4 channels, as characterized by the model (see Results lines 375-391). Furthermore, the model was instrumental in assessing how the Ca2+ and voltage-dependence of TRPM4 channels synergize to contribute to the shift in the center of mass of firing (Figure 9). Figure 9 illustrates the positive feedback loop between Ca2+ entry and the additional depolarization produced by Ca2+ activation of TRPM4 channels that can potentially accelerate firing (see Results lines 392-427).

      5) Figure 8: Panel C is supposed to confirm the prediction from the model that the carbachol-mediated change of firing activity is related to intracellular Ca2+ domains. However, the example cell shown is depolarized to -52 mV, and there is no hyperpolarization following Carbachol. Is this an effect of the high concentration of BAPTA? Again, what was the current injected under this experimental condition?

      Again, we thank the reviewer for pointing out the lack of clarity in the presentation of our results. We have now rewritten the results section for former Figure 8 (now Figure 10) to more clearly present these findings. The reviewer is correct that with the combination of 30 mM BAPTA + 10 nM free Ca2+ added to the intracellular solution (panel C of current Figure 10) the addition of carbachol did not change the membrane potential, as there were no changes in the holding current. Also, the amplitude of the ramp is comparable in control conditions and in the presence of carbachol under these conditions.

      We have now added all these details in the Results section for figure 10C.

      Reviewer #2 (Public Review):

      The manuscript focuses on the cholinergic modulation of TRPM4 channels in the CA1 pyramidal neurons. The authors presented solid convincing evidence that TRPM4 but not TRPC channels are the Ca2+-activated nonselective cation channel in CA1 pyramidal neurons being modulated by activation of muscarinic receptors. Using bi-directional ramp protocol, the authors revealed that ACh modulation could lead to forward shifts in place field center of mass, whereas decreased ACh modulation could contribute to backward shifts. This represents a significant molecular/cellular finding that links neuromodulation of intrinsic properties to place field shifts, a phenomenon seen in vivo. The authors used a computational approach to model this CA1 neuron spiking to further reveal the mechanism.

      To further improve the manuscript, I have the following suggestions/questions:

      1) The triangular ramp stimulation (introduced by the same group; Upchurch et al., 2022) makes it possible to emulate the hill-shaped depolarization during place field firing. However, one concern is the time scale/duration of the ramp (2 sec) compared to the physiological pattern (100ms~200ms in the in vivo recording in freely moving rat, Epsztein et al., 2011). Using a longer ramp to generate more spikes for calculating the adaptation index is understandable. However, considering the Ca entry/accumulation during prolonged depolarization, repeating one set of experiments with a shorter ramp is crucial to verify the major findings.

      When determining the duration of the current injections for our ramps, we relied on the data recorded in vivo in freely moving rats (Epsztein et al. Neuron 70:109–120, 2011) or in head-fixed mice running on spherical a treadmill immersed in virtual reality (Harvey et al Nature 461:941–946, 2009). In those papers, the voltage deflections are shown as a function of time, and gray bars or boxes represent the time the animals spend traversing the place field. We interpret those figures as showing that the hill-shaped depolarizations have variable durations, on the order of 1-20 s; we therefore think that our experiments with 2 and 10 second-long ramps cover a fair range of these durations. The place fields in Epsztein et al., 2011 were 4 cm long, and the authors give an example in Figure 3, in which the 2 meter track is traversed 1.5 times in 3 minutes. At that rate, the rat spent on average 2.4 seconds in each place field. We interpret the numerous shorter epochs of firing on the order of 100-200 ms shown Figure 2 in Epsztein et al. as the result of ongoing theta modulation within one overall depolarization during a single place field traversal. The following quote from that paper supports our interpretation “Some (Figure 2E, trace 1), but not all (trace 2), passes revealed spiking associated with a series of large (to ~-25 mV), long-lasting (~100 ms) depolarizations (Kandel and Spencer, 1961; Wong and Prince, 1978; Traub and Llinás, 1979; Takahashi and Magee, 2009) occurring rhythmically at ~4–5 Hz (theta frequency).” We thank the reviewer for pointing out these traces; our results are more directly applicable to the traces without theta modulation. Adding theta modulation is beyond the scope of this study but will be considered in future studies. Our average results in Figure 1 show that carbachol similarly affects 2 s and 10 s ramps, therefore we decided to present only the data on 2 second ramps for all the subsequent figures (see Results lines 156-157).

      2) Strictly speaking, the term "Ca2+-induced Ca2+ release (CICR)" is only used in ER Ca2+ release via ryanodine receptors (RyR) rather than IP3Rs. The author should be careful since it is used in the abstract (Line 36). In addition, pharmacology inhibition experiments should be incorporated to further dissect the role of RyR-induced CICR.

      We thank the reviewer for pointing out the possible confusion regarding the use of the term Ca2+-induced Ca2+ release (CICR) and we removed it from the text. Further, for this resubmission, we have pharmacologically dissected the role of IP3 vs ryanodine receptors in the cholinergic shift in the center of mass of firing due to the activation of TRPM4 channels, as suggested by the reviewer (see new Figure 6). To our surprise, neither the IP3R antagonist, Xestospongin C (1-2 µM), nor the RyR antagonist ryanodine (40 µM) were effective in preventing the cholinergic shift of the center of mass of firing when added to the intracellular solution (see Results lines 310-340).

      3) Applying strong buffering BAPTA not only removed the IP3R-TRPM nanodomain but also hindered Ca entry via VGCC. To validate the role of ER Ca2+ release in regulating TRPM, depletion of ER Ca2+ pool with SERCA inhibitor (e.g. thapsigargin) would be a more direct way to test the model (also make sure to add TRPC inhibitor to avoid the store-operated Ca2+ entry).

      We agree with the reviewer that 30 mM BAPTA also disrupts intracellular Ca2+ elevation via voltage-dependent Ca2+ channels on the neuronal membrane. Given that our experiments excluded a role of Ca2+ release from the intracellular stores (see below), our new model includes a nanodomain where, during cholinergic activation, the Ca2+ entry through VGCC is amplified to reach micromolar concentrations, through a currently unknown mechanism. As pointed out by the reviewer, the experimental results with 30 mM BAPTA support the existence of a nanodomain for the activation of TRPM4 channels, regardless of the nature of the calcium source.

      We have also addressed the role of ER Ca2+ release in our experiments.

      4) How does the TRPM current overcome the long-term inactivation of Nav? A channel state model should be added to the manuscript to make it easier to understand.

      Figure 11C now shows the Markov model of the NaV channel and new Figure 8 is devoted to explaining the mechanism by which current through the TRPM4 channels overcomes the long-term inactivation of the NaV channel.

      Reviewer #3 (Public Review):

      Combining slice physiology and simulation, Combe and colleagues discovered that TRPM4 channels activated by Ca2+ in nanodomains mediate ICAN currents in CA1 pyramidal neurons that drive the cholinergic modulation of firing rate. The finding is novel and interesting.

      Strengths:

      1) Identification of TRPM4 channels as the carrier of ICAN currents with independent pharmacological inhibitors and other supporting evidence.

      2) Physiological and simulational verification of physically closely located Ca2+ source and TRPM4 channels required for ICAN activation.

      Weaknesses:

      1) The conclusion of the cholinergic role in down-ramp or backward firing shifts is not convincing.

      We agree with the reviewer that our interpretation is somewhat speculative, and we have now included disclaimers throughout the manuscript as well as placed most of these interpretations in a portion of the discussion titled “Ideas and speculations: Implications of our results for place fields in intact rodents”. In addition, we added the word “potential” in the title.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript by Masschelin et al. describes how Vitamin B2 deficiency affects body composition, energy expenditure, and glucose metabolism. B2 deficient mice have lower O2 consumption, and locomotor activity, with no difference in food intake. These mice also have lower liver FAD levels, which is expected given that B2 is a necessary cofactor for this coenzyme. Additionally, these mice have lower blood glucose levels following pyruvate injection, implying a lower capacity for gluconeogenesis. Using PPAR KO mice, they show that this effect on pyruvate tolerance is due to PPARα activation, though there is still a minor difference between wildtype and KO mice. Importantly, they show that fenofibrate PPAR agonism can improve glucose output following pyruvate injection in the absence of B2. The authors also perform robust metabolomics in each experimental condition and phenotype of the mouse well.

      Thank you for the positive input.

      1) The authors have yet to explore other explanations of differences in glucose metabolism under B2D +/Fenofibrate. The canonical targets of PPARα are involved in fatty acid oxidation, ketogenesis, and VLDL/HDL metabolism, in addition to gluconeogenesis (Bougarne et al. 2018). Gluconeogenesis is more of a fasting response due to CREB, FOXO1/PGC1a activation rather than PPAR. In response to B2D, the PPARα KO mice have increased plasma TGs, which may suggest a difference in VLDL TG secretion (Suppl. S3). Perhaps lipid metabolism is more directly affected, and changes in glucose metabolism are secondary to that of triglyceride metabolism. Regarding ketogenesis, the fenofibrate+ B2D fed mice have decreased plasma betahydroxybutyrate, suggesting decreased ketogenesis, which is a more canonical PPARα pathway (Suppl. S3). Testing each of these processes would help control that this mechanism is specific to gluconeogenesis and not secondary to something else.

      We value this reviewer’s comment. To address this point, we considered other mechanisms in our revised Discussion. In future studies, we plan to further explore these metabolic effects and to use ATAC-Seq to understand the transcription factors responsive to B2D. We anticipate these studies will take additional years to complete. Nonetheless, the present studies set the foundation for future work to investigate how FAD influences transcriptional regulation of metabolism.

      2) Is the effect on ISR dependent on PPARα? Is the mechanism of Fenofibrate on the liver, or on another cell type? In Figure 1, the authors state that Riboflavin deficiency alters body composition and energy expenditure, and then focuses on the liver. However, FAD levels are also increased in the heart and kidneys in addition to the liver. These tissues also respond to PPARα agonism, in addition to the muscle which plays a role in regulating glucose metabolism (B2D mice also have a higher lean mass (Fig 1e)). Additionally, the authors haven't shown specifically if the effects of Fenofibrate on electron transport and the ISR are dependent on the presence of PPARα (Figure 5, 6).

      We agree that knowing whether the effects of Fenofibrate on the ISR require liver PPARA is a critical issue, which will require dedicated studies for a thorough and meaningful conclusion. In new experiments, we knocked down Ppara in the liver using AAV8-Cre administration to Pparaflox/flox mice. Our data show liver-specific Ppara knockdown recapitulates whole-body B2D effects on pyruvate tolerance and hepatic steatosis (Figure 3I). These results agree with findings in whole-body Ppara knockout mice (Supplemental Figure 4), reinforcing the idea that the direct impact of B2D mainly occurs via PPARA activity in the liver. We acknowledge in the discussion ATF4 and ISR activation may contribute to PPARA-independent responses to B2D (Biochem J 443:165–71, 2012; Gut 65:1202-1214, 2016).

      An assessment of genetic requirements will require a large, rigorous set of experiments to identify the ratelimiting responses for fenofibrate activities during B2D, which we plan to do in the future. For this report, we decided to focus exclusively on tissue-specific knockout of Ppara. We will establish evidence for ISR responses to B2D in a separate study based on the feedback received here.

      Reviewer #2 (Public Review):

      The objective of this work by Masschelin et al. is to investigate the physiological relevance of flavin adenine dinucleotide (FAD). In particular, FAD supports the activity of flavoproteins involved in the production of cellular energy. Mutations in genes encoding flavoproteins often are associated with inborn errors of metabolism (IEMs), thus the clinical interest in investigating in more depth the physiological role of FAD. In this study, the authors first subjected male mice to a vitamin B12 deficient diet (B2D), demonstrating that loss of B12 replicates the phenotypes often observed with IEMs, including loss of body weight, hypoglycemia, and fatty liver. Using a combination of metabolomic phenotyping, transcriptomic analyses, and pharmacology (treatment with Fenofibrate, a PPARa agonist), the authors then reach the general conclusion that activation of the nuclear receptor PPARa can rescue the B2D phenotypes, thus revealing that PPARa directly controls the metabolic responses to FAD availability. Although the phenotypic analysis of the mice subjected to B2D increases our knowledge of the physiological impact of depleting the FAD pools on global energy metabolism, not all conclusions and statements made by the authors are totally supported by the data. In particular, the study is overall too descriptive and lacks mechanistic insights. While PPARa is likely an important player in the metabolic response to FAD availability, the molecular details on how FAD controls the activity of PPARa either directly or indirectly are entirely missing. Therefore, the authors are encouraged to directly assess whether B2D directly influences PPARa activity on the genes identified in the study, perform rescue experiments in the liver of PPARa KO mice and explore the possibility that other factors (including nuclear receptors) also participate in the response to B2 deficiency and diminished FAD pools.

      We appreciate the input from Reviewer 2. The direct and indirect effects of B2D on PPARA activity are likely not trivial. However, we performed experiments to determine how FAD depletion affects PPARA transcriptional activity using the riboflavin analog and competitive inhibitor lumiflavin (Figure 3L). We found lumiflavin reduced PPRE-luciferase activity in the presence of PPARA agonist. Although the assay is a synthetic reporter expressed in vitro, the experiment provides evidence of how B2D influences PPARA transcriptional activity. And, yes, we agree that our manuscript does not completely reconcile the factor(s) explaining the effects of B2D on gene expression, and expanded the discussion to comment on this point. In future studies, we intend to identify which transcription factor(s) regulate the liver responses to B2D, and further elucidation of the molecular mechanisms will be a central objective of future work.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, Scagliotti and colleagues investigate the role of Dlk1 in regulating pituitary size in multiple mouse models with different Dlk1 gene dosages in order to understand the mechanisms of organ size control. They find that overexpression of Dlk1 leads to pituitary overgrowth and loss of Dlk1 causes undergrowth. Authors find two compartments of Dlk1 expression in the pituitary, in the marginal zone stem cell compartment and the parenchymal differentiated cell compartment, and by combing genetic mouse models show that a specific interaction of Dlk1 expression in both regions is necessary to affect pituitary organ size. They present to suggest that Dlk1 may repress Wnt signaling during development to control a shift from progenitor proliferation to differentiation. The data are meticulous, high quality, and clear.

      I have some questions about the interpretation of their data regarding the mechanism of Dlk1 regulation of pituitary organ size, as I believe there could be potential alternative explanations for their observations:

      I was wondering about the cause of the enlargement of the pituitary gland in Fig 1E, and whether it is caused by an increased number of cells (hyperplasia), an increased cell size (hypertrophy), or both. Line 104 states it is hyperplasia, and that cell size was not affected in WT-TG ('not shown', line 121). However, line 444 says the TG is hypertrophic. It would be good if the authors could elaborate on this and show or state how cell size was determined. Figs 5/6 show that WT-Tg proliferation is generally similar to WT, which suggests the increased size is not hyperplasia. It would be good to know whether this is correct. Some previous studies have shown that in pregnancy, lactotroph hypertrophy can be responsible for pituitary enlargement without hyperplasia (Castrique 2010, Hodson 2012).

      We have now clarified this point throughout the manuscript. We had previously counted cells per field in the analysis shown in Figure 1D as a proxy for cell number (these did not significantly differ by genotype). We have now performed a more robust examination. Cell number was determined using a well-established stereological technique: For each animal the maximal cross-sectional area (CSA) was determined from the volumetric analysis. At this level 3 independent sections were used to measure anterior pituitary CSA and count haematoxilin-stained nuclei, giving a mean cells/CSA measurement per individual. This number was multiplied by the AP volume to give an estimate of cell number.

      This analysis was performed on mice from the new cohort of animals containing litter matched adults of all 4 genotypes, and shown in Figure 4E. WT-TG animals had a significant increase in cell number compared to WT littermates (p = 0.0443), therefore pituitary expansion occurs by hyperplasia.

      Related to the organ size question above, I had a question about the cell number and proportions in Fig 1D/E/F, which shows the maintenance of endocrine cell proportions and an increase in the volume of ~30% in WT-Tg. For the cell proportions to be maintained, I thought the increase in volume per cell type (Fig 1G) would therefore have to also increase proportionally in every cell type, while 1G appears to show an increase in GH (sig) and PRL/TSH cells (ns). It would be good if the authors could discuss this briefly.

      We agree and indeed we see this trend across all cell types. When the data in Figure 1G is compared by 2-Way ANOVA we see a significant effect by cell type (p< 0.0001) and by genotype (p = 0.0009). However, for other hormone producing cells the effect size is does not overcome the variation in a smaller cell population so the difference between genotypes does not pass multiple significance testing with the relatively small sample size used. We have modified the legend to Figure 1G to make the ANOVA result clearer.

      This study is impactful and will be of interest to several research communities, including those interested in pituitary development and function, organ size control, and gene imprinting mechanisms.

      Reviewer #2 (Public Review):

      Scagliotti et al address how organ size is regulated by imprinted genes. Using a series of mouse models to modulate the dosage of the paternally expressed gene, Dlk1, the authors demonstrate that DLK1 is important for the maintenance of the stem cell compartment leading to the growth of the pituitary gland and the expansion of growth hormone-producing cells. The authors show that overexpression of Dlk1 leads to pituitary hyperplasia while deletion of the paternal allele leads to reduced pituitary size. Reduced pituitary size is accompanied by reduced cell proliferation in the cleft at e13.5 and an increase in the number of POU1F1+ cells, suggesting that loss of Dlk1 alters the balance between the number of cells remaining in the replicating stem cell pool and those differentiating into the POU1F1 lineage. An elegant caveat of this paper is the rescue of Dlk1 expression in the population of cells expressing Pou1f1 but not in SOX2+ stem cells. Expression of Dlk1 only in POU1F1+ cells is not sufficient to rescue pituitary size. The authors suggest that this is because DLK1 must be present in stem cells which then activate paracrine WNT signaling to promote cell proliferation in POU1F1+ cells.

      Strengths:

      This is an important study that provides a mechanistic understanding of how the imprinted gene, Dlk1, regulates organ size. The study employs an elegant experimental design to address the dosage requirement for Dlk1 in regulating pituitary gland size. Rescuing Dlk1 in the POU1F1+ cells, but not the marginal zone SOX2+ cells provides intriguing results about a possible role for DLK1 in paracrine signaling between these different pituitary cell types. The study uses publicly available scRNAseq and ChIPseq data to further support their findings and identify Dlk1 as a likely target of POU1F1.

      Weaknesses:

      The study only analyzes females for the adult time point. For embryonic and postnatal time points sexes are pooled. Gender differences in pituitary gene expression embryonically or postnatally could potentially affect experimental outcomes.

      We have now added adult data for both sexes.

      The authors employ a mouse model that rescues Dlk1 expression starting at e15.5 in POU1F1+ parenchymal cells but not in marginal zone stem cells. Rescuing Dlk1 expression in a specific population of cells is one of the strengths of this study. Based on this information and the fact that overexpression of Dlk1 leads to increased pituitary size, the authors suggest that DLK1+ marginal zone stem cells and DLK+ parenchymal cells may interact to promote postnatal proliferation. However, the ability to more carefully parse out the complex spatial and temporal contributions of DLK1 to pituitary size would be enhanced by the addition of a mouse model that rescues Dlk1 expression only in SOX2+ cells and a model that rescues expression in both stem cells and POU1F1+ cells.

      We agree that the addition of a model where Dlk1 is only expressed in SOX2+ cells would add significant mechanistic insight. To our knowledge an inducible gain-of-function Dlk1 model does not yet exist. Moreover, use of a SOX2-Cre driver would also increase Dlk1 expression in the hypothalamus as well as Rathke’s pouch, further complicating the analysis.

    1. Author Response

      Reviewer #1 (Public Review):

      In this manuscript, Huang et al., assess cognitive flexibility in rats trained on an animal model of anorexia nervosa known as activity-based anorexia (ABA). For the first time, they do this in a way that is fully automated and free from experimenter interference, as apparently experimenter interference can affect both the development of ABA as well as the effect on behaviour. They show that animals that are more cognitively flexible (i.e. animals that had received reversal training) were better able to resist weight loss upon exposure to ABA, whereas animals exposed to ABA first show poorer cognitive flexibility (reversal performance).

      Strengths:

      • The development of a fully-automated, experimenter-free behavioural assessment paradigm that is capable of identifying individual rats and therefore tracking their performance.

      • The bidirectional nature of the study - i.e. the fact that animals were tested for cognitive flexibility both before and after exposure to ABA, so that direction of causality could be established.

      • The analyses are rigorous and the sample sizes sufficient.

      • The use of touchscreens increases the translational potential of the findings.

      Weaknesses

      • Some descriptions of methods and results are confusing or insufficiently detailed.

      We have been through all methods and results to include additional details as requested by this reviewer below.

      It seems to me that performance on the pairwise discrimination task cannot be directly (statistically) compared to performance on reversal (as in Figure 4E), as these are tapping into fundamentally different cognitive processes (discrimination versus reversal learning). I think comparing groups on each assessment is valid, however.

      We agree that discrimination and reversal are different cognitive processes, and statistical comparisons between these two components of the task were only made when examining the speed of learning in the validation of the novel testing system. Moreover, our inclusion of the pink and purple bars on graphs such as Figure 4C & 4E represent “main effects of ABA exposure”, regardless of learning phase (PD or reversal) rather than, as you describe, comparing PD to R1. Perhaps this comparison wasn’t clear, so we have amended the text to say ‘main effect of ABA exposure p=.0017’ rather than just “exposure”.

      Not necessarily a 'weakness' but I would have loved to see some assessment of the alterations in neural mechanisms underlying these effects, and/or some different behavioural assessments in addition to those used here. In particular, the authors mention in the discussion that this manipulation can affect cholinergic functioning in the dorsal striatum We (Bradfield et al., Neuron, 2013) and a number of others have now demonstrated that cholinergic dysfunction in the dorsomedial striatum impairs a different kind of reversal learning that based on alterations in outcome identity and thus relies on a different cognitive process (i.e. 'state' rather than 'reward' prediction error). It would be interesting perhaps in the future to see if the ABA manipulation also alters performance on this alternative 'cognitive flexibility' task.

      This is an excellent suggestion and we have already begun exploring this in other ongoing work in the laboratory. Due to ‘compulsive’ wheel running being a hallmark of ABA, we are interested in determining if this also translates to a goal-directed action impairment using the well-established outcome-specific devaluation task. Perhaps with ABA it may be more relevant to investigate outcome-reversals rather than stimulus-reversals, and if this is the case, it would further support the use of the ABA model for investigating cognitive dysfunction relevant to AN. We have included an additional section in the discussion text relating to our hypotheses regarding outcome-specific reversal learning in the ABA model.

      Nevertheless, I certainly think the manuscript provides a solid appraisal of cognitive flexibility using more traditional tasks, and that the authors have achieved their aims. I think the work here will be of importance, certainly to other researchers using the ABA model, but perhaps also of translational importance in the future, as the causal relationship between ABA and cognitive inflexibility is near impossible to establish using human studies, but here evidence points strongly towards this being the case.

      Reviewer #2 (Public Review):

      Huang and colleagues present data from experiments assessing the role of cognitive inflexibility in the vulnerability to weight loss in the activity-based anorexia paradigm in rats. The experiments employ a novel in-home cage touchscreen system. The home cage touch screen system allows reduced testing time and increased throughput compared with the more widely used systems resulting in the ability to assess ABA following testing cognitive flexibility in relatively young female rats. The data demonstrate that, contrary to expectations, cognitive inflexibility does not predispose to greater ABA weight loss, but instead, rats that performed better in the reversal learning task lost more weight in the ABA paradigm. Prior ABA exposure resulted in poorer learning of the task and reversal. An additional experiment demonstrated that rats that had been trained in reversal learning resisted weight loss in the ABA paradigm. The findings are important and are clearly presented. They have implications for anorexia nervosa both in terms of potentially identifying those at risk also in understanding the high rates of relapse.

      Thanks for a great summary of the manuscript.

      Reviewer #3 (Public Review):

      Activity-based anorexia (ABA), which combines access to a running wheel and restricted access to food, is a most common paradigm used to study anorexic behavior in rodents. And yet, the field has been plagued by persistent questions about its validity as a model of anorexia nervosa (AN) in humans. This group's previous studies supported the idea that the ABA paradigm captures cognitive inflexibility seen in AN. Here they describe a fully automated touchscreen cognitive testing system for rats that makes it possible to ask whether cognitive inflexibility predisposes individuals to severe weight loss in the ABA paradigm. They observed that cognitive inflexibility was predictive of resistance to weight loss in the ABA, the opposite of what was predicted. They also reported reciprocal effects of ABA and cognitive testing on subsequent performance in the other paradigm. Prior exposure to the ABA decreased subsequent cognitive performance, while prior exposure to the cognitive task promoted resistance to the ABA. Based on these findings, the authors argue that the ABA model can be used to identify novel therapeutic targets for AN.

      The strength of this manuscript is primarily as a methods paper describing a novel automated cognitive behavioral testing system that obviates the need for experimentalist handling and single housing, which can interfere with behavioral testing, and accelerate learning on the task. Together, these features make it feasible to perform longitudinal studies to ask whether cognitive performance is predictive of behavior in a second paradigm during adolescence, a peak period of vulnerability for many psychiatric disorders. The authors also used machine learning tools to identify specific behaviors during the cognitive task that predicted later susceptibility to the ABA paradigm. While the benefits of this system are clear, the rigor and reproducibility of experiments using this paradigm would be enhanced if the authors provided clear guidelines about which parameters and analyses are most useful. In their absence, the large amount of data generated can promote p-hacking.

      The authors use their automated behavioral testing paradigm to ask whether cognitive inflexibility is a cause or consequence of susceptibility to ABA, an issue that cannot be addressed in AN. They provide compelling evidence that there are reciprocal effects of the two behavioral paradigms, but do not perform the controls needed to evaluate the significance of these observations. For example, the learning task involves sucrose consumption and food restriction, conditions that can independently affect susceptibility to the ABA. Similarly, the ABA paradigm involves exercise and restricted access to food, which can both affect learning.

      In the Discussion, the authors hypothesize that the ABA paradigm produces cognitive inflexibility and argue that uncovering the underlying mechanism can be used to identify new therapeutic targets for AN. The rationale for their claim of translational relevance is undermined by the fact that the biggest effect of the ABA paradigm is seen in the pair discrimination task, and not reversal learning. This pattern does not fit clinical observations in AN.

      In summary, the significance of this manuscript lies in the development of a new system to test cognitive function in rats that can be combined with other paradigms to explore questions of causality. While the authors clearly demonstrate that cognitive flexibility does not promote susceptibility to ABA, the experiments presented do not provide a compelling case that their model captures important features of the pathophysiology of AN.

      We thank the reviewer for this detailed review and note that we have now both explicitly defined the most useful parameters for analyses from the novel touchscreen system as well as removed some comparisons that could be considered superfluous. We argue that the additional information provided by the machine learning analyses are, at this stage, exploratory, and rather than reveal independent descriptions of behavioural change in ABA exposed versus naïve rats this information will aid in the generation of hypotheses to be tested in future studies. Therefore, the figures pertaining to these analyses have now been provided as supplements to Figures 3 & 4 (Figure 3-figure supplement 3; Figure 4-figure supplements 3&4). We have also clarified our intention to explore possible behavioural differences using this technique in the methods and discussion.

      We have also completed the essential control experiment, defined in the “essential revisions” section of this review, whereby we show only moderate impairments in reversal learning following a matched period of food restriction without rapid weight loss, suggesting that the substantial impairment seen following ABA exposure was not due to food restriction alone (see updated Figure 4 and supplements).

      However, we do not agree with this reviewer “that the biggest effect of the ABA paradigm is seen in the pair discrimination task” and point to the outcomes of both reciprocal experiments.

      In the first experiment, rats that went onto be susceptible or resistant to ABA did not differ on pairwise discrimination learning but specifically on performance at the reversal of reward contingencies (Figure 3B & E). Although this result was not in the hypothesised direction, this suggests that reversal learning specifically and not pairwise discrimination can differentiate those rats that go on to be susceptible to weight loss. We have included additional discussion in the text related to this finding (see line 490-497).

      In the second experiment, it is clear by the number of ABA exposed rats that were unable to learn the reversal component even after being able to learn pairwise discrimination, that flexible learning is more impaired by ABA. While it is true that ABA exposed rats that were successful in learning the reversal task were slower to learn the pairwise discrimination component than naïve rats (Figure 4E), this was not related to their ability to learn the reversal task overall – with equivalent learning rates in pairwise discrimination to ABA exposed rats that failed to learn the reversal component (Figure 4G-I). The absence of significant differences between ABA exposed and naïve animals in Figure 4F relates to the fact that the large proportion of ABA exposed animals never reached performance criterion in the reversal phase of the task and therefore data from these animals could not be included in the figure. This is where the trials completed within each session becomes important for interpretation (i.e. Figure 4-figure supplement 1M-O), whereby ABA exposure caused impaired responding specifically within the reversal phase of the task. The results text has been updated to better reflect this critical point.

      Overall, this suggests that the impairment in cognitive flexibility caused by ABA exposure was related both to an associative learning impairment (slower to learn PD than naïve animals) and an impairment in the integration of new and existing learning (failure to learn R1 in a large proportion of animals).

    1. Author Response

      Reviewer #1 (Public Review):

      Weaknesses

      1) I was curious as to how novel this setup is. Although I do not do head-fixed research myself, I thought there were already some open-source, relatively cheap systems available. I'm not sure how the current setup differs from those already available. Personally, even if this system involves only the wheel turning, as this is a truly operant response, that is novel enough for my liking.

      The novelty of the system stems from the synergistic combination of functionality, the low-cost open source nature of the design, and the breadth of behavioral procedures the system is able to support. The use of a wheel as an operant response was adapted from the International Brain Laboratory rig which has been used extensively for visual discrimination tasks. We adapted this wheel design to make the response closer to lever pressing through the use of the wheel brake, which ensures that subjects have to rotate the wheel in discrete rotational bouts rather than continuously spinning the wheel and potentially disengaging and allowing the wheel to rotate independently. There are no examples of systems capable of delivering 5+ solutions within a behavioral session or conducting valence testing with a modification of real-time place preference without the cost and complexity associated with virtual reality. We believe that the combination of factors, the flexibility and scalability of the system makes OHRBETS a novel and useful system for diverse motivation and consumption behaviors in head-fixed mice.

      2) It would be useful to have a bit more detail in the manuscript (not just on the GitHub link - in supplemental material perhaps?) on how to build such a system, just to get a sense of how difficult building such a system might be and how many components it has.

      With this submission we have included detailed assembly instructions as a supplement to the main manuscript and added reference to the file within the methods section. We have also added details, including time estimates, to the methods section.

      3) I wasn't sure how to feel about the comparisons across experimental set-ups in Figures 2 and 3. Usually, these sorts of comparisons are not considered statistically valid due to the many variables that differ between set-ups. However, I do see that the intent here is a bit different - i.e. is to show that despite all these alterations in variables the behavioural outputs are still highly correlated. However, without commenting on this intent, I did find these comparisons a little jarring to read.

      Thank you for highlighting this. We have added in a justification for why we measured the consistency in behavior measured with each head-fixed system.

      4) The only dataset I was not wholly convinced by was that in Figure 3 (real-time place preference and aversion). I think the authors have done the best job that they can of replicating such a procedure in a head-fixed mouse, but the head-fixed version is going to necessarily differ from the freely moving version in a fundamental way when the contextual cues and spatial navigation form part of the RTPT task. Giving a discrete cue, such as a tone, just is not a sufficient substitute for contextual cues, and I think the two types of task would engage fundamentally different brain cells and circuits (e.g. only the free-moving version is likely to engage place cells in the hippocampus).

      To avoid confusion regarding the place component of the real-time place preference assay name, we have renamed the head-fixed assay for assessing valence to Wheel-Time Preference (WTP). We have also added a full paragraph to the discussion where we outline the differences in the task requirements and relevant neuronal circuits between the freely-moving RTPP and head-fixed WTP. We understand that the head-fixed task is not a perfect analog of the RTPP task, however based on the similarity in the resulting time spent in the stimulation chamber/zone we believe that the WTP is able to replicate the valence assessment that many in the field uses RTPP to measure. We believe that the WTP with OHRBETS opens up new possibilities for assessing preference in head-fixed mice and this justifies keeping the figure within the main manuscript.

      To thoroughly address the potential confound of spatial information during the multi-spout experiment, we have added an additional supplemental figure (Figure 4- figure supplement 5) that depicts the proportion of trials with licking and added a paragraph to the discussion centered on the potential confound associated with learning the solution identity.

      5) Personally, I found having the statistics in a separate file confusing.

      Thank you for raising this concern. With our initial submission, we were concerned that including all of the statistics within the main text would make the paper difficult to read due to the extensive amount of statistics. With this submission, in addition to the statistics table, we have included statistics within the figure legends and main text where applicable.

      6) Line 589-594. Suggesting the medial/lateral shell recording results mean that the medial shell 'tracks value, and the range of values during the multi-spout consumption of gradients of NaCl is greater than the range of values during multi-spout consumption of gradients of sucrose" seems to engage in circular logic to me. That is, the authors should use behavioural data to infer what the animal is experiencing and whether it is a change in value, and/or a greater change in value during NaCl vs. sucrose consumption, and only then should they make an inference about what the larger medial shell response means.

      Thank you for identifying this potential site of confusion. To address this concern we have modified the language to better communicate our interpretation of the data.

      “If we assume that the range of values is greater during multi-spout consumption of gradients of NaCl compared to gradients of sucrose, as indicated by a greater range in licking behavior (Figure 8- Figure Supplement 4), then the greater range of dopamine release in the NacShM could imply that dopamine release in this structure tracks value.”

    1. Author Response

      Reviewer #1 (Public Review):

      Wang, Y. et al. investigated the role of TPL2 signaling in acute and chronic neuroinflammatory conditions using small molecule inhibitors and a TPL2 kinase-dead mutant mouse line. They find that TPL2 is upregulated by various brain-resident cells, including microglia, astrocytes, and endothelial cells, during neurodegenerative disease progression and following peripheral LPS injection. They show that upon pharmacological and genetic inhibition during acute LPS stimulation, pro-inflammatory cytokine concentration, microgliosis, and neuronal loss can be reversed. In chronic neuroinflammation, as seen in a tauopathy mouse model, the loss of TPL2 rescues reactive gliosis, immune cell infiltration, neurodegeneration, and cognitive health. Interestingly, TPL2 loss of function was not significantly beneficial in models of nerve injury and stroke. By analyzing their multiple sequencing datasets and those of other research teams, the authors find that TPL2 aids to upregulate transcripts for the DAM signature, immediate early genes, and astrocyte reactivity. These data build together to further emphasize the intricacy and importance of the immune component in neurodegeneration and other neuroinflammatory conditions.

      The conclusions of this paper are mostly well supported by their data, but further confirmation of sequencing results and microglia intrinsic mechanisms need to be expanded.

      1) In the discussion section, it will be important to highlight that TPL2 could also be directly contributing to tauopathy disease progression through its actions in brain-resident endothelial cells. They spend a lot of time characterizing the effects of TPL2 on in vitro microglial responses and do not adequately discuss the potential that their disease phenotypes in the tauopathy model have more to do with TPL2's ability to regulate BBB permeability or facets of endothelial biology. It will be important to highlight that there are various discrete cellular mechanisms (e.g. functions for TPL2 in microglia, endothelial cells, astrocytes, peripheral immune cells, etc.) that could be underlying the disease readouts seen in their global TPL2 kinase-dead mice. They should discuss this in the context of previous literature demonstrating roles for TPL2 in other non-microglial cell types (e.g. Nanou et al PMID: 34038728).

      Thank you for this comment. We agree that while TPL2 is most highly expressed in microglia in the brain, TPL2 expression in endothelial cells and other cell types could potentially contribute to the disease. We have added discussion of this to the manuscript including discussion of the Nanou et al paper which raises the possibility that the TPL2-dependent infiltration of peripheral immune cells in TauP301S mice could be due to regulation of the BBB by TPL2 activity in endothelial cells. We also discuss potential roles for TPL2 in the various other cell types. In addition, we have now added characterization of cell-autonomous TPL2-dependent phenotypes in cultured astrocytes and have provided additional analysis of TPL2-dependent changes in endothelial cells in the scRNAseq experiment in TauP301S mice.

      2) Hippocampal single-cell RNA sequencing led the authors to report that TLP2KD in the PS19 model of tauopathy reduced the number of T-cell and dendritic cell (DC) infiltrates into the brain. The authors should corroborate this finding with immunohistochemistry or flow cytometry to confirm the presence of changing CD4+, CD8+, and DC populations. Most notably, it is critical for them to enumerate the cell numbers in an effort to validate that there are indeed empirical, and not just proportional, reductions in these cell populations.

      Thank you for the suggestion. We have performed immunohistochemistry to examine T cells in fixed brain tissue sections. We have included the data for T cell staining in Figure 5-figure supplement 2. We focused the IHC analysis on staining for CD8+ T cells based on the substantially greater abundance of CD8+ T cells compared to CD4+ T cells or DC in the single cell data (Figure 5C, Figure 5-figure supplement 5) and the availability of an antibody that worked well in our hands. These results corroborate the single cell data by empirically showing significantly increased numbers of T cells in TauP301S mice and significantly reduced numbers in the TauP301S x TPL2KD mice (Figure 5-figure supplement 2).

      3) The authors concluded from Figure 3 that TPL2 plays a key role in in vivo microglia and astrocyte activation. Adding in an in vitro study, like those done in Figures 1, 2, and S4, that looks at a cell-autonomous role for TPL2 in astrocyte reactivity would strengthen this claim and rule out a microglial-independent pathway of TPL2 inflammation.

      Thank you for the suggestion. To investigate the potential cell-autonomous role of TPL2 in astrocytes, we cultured primary mouse astrocyte and stimulated astrocytes with either LPS or cytokines, in the absence or presence of TPL2 inhibitor and measured stimulation induced changes in cytokine release and gene expression. Data are included in Figure 3-figure supplement 1 and the results are discussed in the manuscript. In contrast to the broader TPL2-dependence of cytokine release by cultured microglia only a much more restricted set of cytokines exhibited TPL2-dependence in cultured astrocytes. Furthermore, RT-qPCR analysis of TPL2-dependent activated astrocyte genes identified in the LPS in vivo study found much less TPL2-dependent activation in cultured astrocytes. We discuss that these results suggest that the TPL2-dependent astrocyte activation observed in vivo was probably largely contributed to indirectly by the function of TPL2 in microglia, but there was also potentially some contribution of cell-autonomous function of TPL2 in astrocytes.

      4) Although the TPL2KD mouse line is a valuable tool to impair TPL2's function while retaining its expression, the researchers failed to comment on the potential effects a global mutation in TPL2 could have in their model systems. Peripheral immunological challenges, like their IP injections of LPS, could behave differently and affect the nervous system in a microglia-independent pathway if monocyte/macrophage signaling is also impaired.

      We agree that during peripheral immunological challenges TPL2 could affect the nervous system in a microglia-independent manner. We have added this point to the discussion.

      5) Oligodendrocytes and OPCs have comparable numbers of DEGs to astrocytes (Figure S11a). What is changing within their transcriptional profile?

      In this manuscript we focused on TPL2-dependent DEGs in the Tauopathy model, which were all in microglia. We agree the TPL2-independent changes in the TauP301S mice in other cell types are also interesting. This data set has been uploaded to public data repository (GSE180041) and analysis of the changes in oligodendrocytes has been performed from this data set, as well as other disease models, in a recent publication: “Disease-associated oligodendrocyte responses across neurodegenerative diseases” (PMID: 36001972).

    1. Author Response

      Reviewer #1 (Public Review):

      Strengths

      This paper is well situated theoretically within the habit learning/OCD literature. Daily training in a motor-learning task, delivered via smartphone, was innovative, ecologically valid and more likely to assay habitual behaviors specifically. Daily training is also more similar to studies with non-humans, making a better link with that literature. The use of a sequential-learning task (cf. tasks that require a single response) is also more ecologically valid. The in-laboratory tests (after the 1 month of training) allowed the researchers to test if the OCD group preferred familiar, but more difficult, sequences over newer, simpler sequences.

      The authors achieved their aims in that two groups of participants (patients with OCD and controls) engaged with the task over the course of 30 days. The repeated nature of the task meant that 'overtraining' was almost certainly established, and automaticity was demonstrated. This allowed the authors to test their hypotheses about habit learning. The results are supportive of the authors' conclusions.

      We truly appreciate the positive assessment of referee 1, particularly the consideration that our study is theoretically strong and that ‘the results are supportive of the authors' conclusions’. This is an important external endorsement of our conclusions, contrasting somewhat with the views of referee 2.

      Weaknesses

      The sample size was relatively small. Some potentially interesting individual differences within the OCD group could have been examined more thoroughly with a bigger sample (e.g., preference for familiar sequences). A larger sample may have allowed the statistical testing of any effects due to medication status.

      The authors were not able to test one criterion of habits, namely resistance to devaluation, due to the nature of the task

      We agree with the reviewer that the proof of principle established in our study opens new avenues for research into the psychological and behavioral determinants of the heterogeneity of this clinical population. However, considering the study timeline and the pandemic constraints, a bigger sample was not possible. Our sample can indeed be considered small if one compares it with current online studies, which do not require in-person/laboratory testing, thus being much easier to recruit and conduct. However, given the nature of our protocol (with 2 demanding test phases, 1-month engagement per participant and the inclusion of OCD patients without comorbidities only) and the fact that this study also involved laboratory testing, we consider our sample size reasonable and comparable to other laboratory studies (typically comprising on average between 30-50 participants in each group).

      This article is likely to be impactful -- the delivery of a task across 30 days to a patient group is innovative and represents a new approach for the study of habit learning that is superior to an inlaboratory approach.

      An interesting aspect of this manuscript is that it prompts a comparison with previous studies of goal-directed/habitual responding in OCD that used devaluation protocols, and which may have had their effects due to deficits in goal-directed behavior and not enhanced habit learning per se.

      Thank you for acknowledging the impact of our study, in particular the unique ability of our task to interrogate the habit system.

      Reviewer #2 (Public Review):

      In this study, the researchers employed a recently developed smartphone application to provide 30 days of training on action sequences to both OCD patients and healthy volunteers. The study tested learning and automaticity-related measures and investigated the effects of several factors on these measures. Upon training completion, the researchers conducted two preference tests comparing a learned and unlearned action sequences under different conditions. While the study provides some interesting findings, I have a few substantial concerns:

      1) Throughout the entire paper, the authors' interpretations and claims revolve around the domain of habits and goal-directed behavior, despite the methods and evidence clearly focusing on motor sequence learning/procedural learning/skill learning. There is no evidence to support this framing and interpretation and thus I find them overreaching and hyperbolic, and I think they should be avoided. Although skills and habits share many characteristics, they are meaningfully distinguishable and should not be conflated or mixed up. Furthermore, if anything, the evidence in this study suggests that participants attained procedural learning, but these actions did not become habitual, as they remained deliberate actions that were not chosen to be performed when they were not in line with participants' current goals.

      We acknowledge that the research on habit learning is a topic of current controversy, especially when it comes to how to induce and measure habits in humans. Therefore, within this context referee’s 2 criticism could be expected. Across disQnct fields of research, different methodologies have been used to measure habits, which represent relaQvely stereotyped and autonomous behavioral sequences enacted in response to a specific sQmulus without consideraQon, at the Qme of iniQaQon of the sequence, of the value of the outcome or any representaQon of the relaQonship that exists between the response and the outcome. Hence these are sQmulus-bound responses which may or may not require the implementaQon of a skill during subsequent performance. Behavioral neuroscienQsts define habits similarly, as sQmulus-response associaQons which are independent of reward or outcome, and use devaluaQon or conQngency degradaQon strategies to probe habits (Dickinson and Weiskrantz, 1985; Tricomi et al., 2009). Others conceptualize habits as a form of procedural memory, along with skills, and use motor sequence learning paradigms to invesQgate and dissect different components of habit learning such as acQon selecQon, execuQon and consolidaQon (Abrahamse et al., 2013; Doyon et al., 2003; Squire et al., 1993). It is also generally agreed that the autonomous nature of habits and the fluid proficiency of skills are both usually achieved with many hours of training or pracQce, respecQvely (Haith and Krakauer, 2018).

      We consider that Balleine and Dezfouli (2019) made an excellent attempt to bring all these different criteria within a single framework, which we have followed. We also consider that our discussion in fact followed a rather cautious approach to interpretation solely in terms of goaldirected versus habitual control.

      Referee 2 does not actually specify criteria by which they define habits and skills, except for asserting that skilled behavior is goal-directed, without mentioning what the actual goal of the implantation of such skill is in the present study: the fulfillment of a habit? We assume that their definition of habit hinges on the effects of devaluation, as a single criterion of habit, but which according to Balleine and Dezfouli (2019) is only 1 of their 4 listed criteria. We carefully addressed this specific criterion in our manuscript: “We were not, however, able to test the fourth criterion, of resistance to devaluation. Therefore, we are unable to firmly conclude that the action sequences are habits rather than, for example, goal-directed skills. Regardless of whether the trained action sequences can be defined as habits or goal-directed motor skills, it has to be considered…”. Therefore, we took due care in our conclusions concerning habits and thus found the referee’s comment misleading and unfair.

      We note that our trained motor sequences did in fact fulfil the other 3 criteria listed by Balleine and Dezfouli (2019), unlike many studies employing only devaluation (e.g. Tricomi et al 2009; Gillan et al 2011). Moreover, we cited a recent study using very similar methodology where the devaluation test was applied and shown to support the habit hypothesis (Gera et al., 2022).

      Whether the initiation of the trained motor sequences in experiment 3 (arbitration) are underpinned by an action-outcome association (or not) has no bearing on whether those sequences were under stimulus-response control after training (experiment 1). Transitions between habitual and goal-directed control over behavior are quite well established in the experimental literature, especially when choice opportunities become available (Bouton et al (2021), Frölich et al (2023), or a new goal-directed schemata is recruited to fulfill a habit (Fouyssac et al, 2022). This switching between habits and goal-directed responding may reflect the coordination of these systems in producing effective behavior in the real world.

      • Fouyssac M, Peña-Oliver Y, Puaud M, Lim NTY, Giuliano C, Everitt BJ, Belin D. (2021).Negative Urgency Exacerbates Relapse to Cocaine Seeking After Abstinence. Biological Psychiatry. doi: 10.1016/j.biopsych.2021.10.009

      • Frölich S, Esmeyer M, Endrass T, Smolka MN and Kiebel SJ (2023) Interaction between habits as action sequences and goal-directed behavior under time pressure. Front. Neurosci. 16:996957. doi: 10.3389/fnins.2022.996957

      • Bouton ME. 2021. Context, attention, and the switch between habit and goal-direction in behavior. Learn Behav 49:349– 362. doi:10.3758/s13420-021-00488-z

      2) Some methodological aspects need more detail and clarification.

      3) There are concerns regarding some of the analyses, which require addressing.

      We thank referee 2 for their detailed review of the methods and analyses of our study and for the helpful feedback, which clearly helps improve our manuscript. We will clarify the methodological aspects in detail and conduct the suggested analysis. Please see below our answers to the specific points raised.

      Introduction:

      4) It is stated that "extensive training of sequential actions would more rapidly engage the 'habit system' as compared to single-action instrumental learning". In an attempt to describe the rationale for this statement the authors describe the concept of action chunking, its benefits and relevance to habits but there is no explanation for why sequential actions would engage the habit system more rapidly than a single-action. Clarifying this would be helpful.

      We agree that there is no evidence that action sequences become habitual more readily than single actions, although action sequences clearly allow ‘chunking’ and thus likely engage neural networks including the putamen which are implicated in habit learning as well as skill. In our revised manuscript we will instead state: “we have recently postulated that extensive training of sequential actions could be a means for rapidly engaging the ‘habit system’ (Robbins et al., 2019)]”

      5) In the Hypothesis section the authors state: “we expected that OCD patients... show enhanced habit attainment through a greater preference for performing familiar app sequences when given the choice to select any other, easier sequence”. I find it particularly difficult to interpret preference for familiar sequences as enhanced habit attainment.

      We agree that choice of the familiar response sequence should not be a necessary criterion for habitual control although choice for a familiar sequence is, in fact, not inconsistent with this hypothesis. In a recent study, Zmigrod et al (2022) found that 'aversion to novelty' was a relevant factor in the subjective measurement of habitual tendencies. It should also be noted that this preference was present in patients with OCD. If one assumes instead, like the referee, that the familiar sequence is goal-directed, then it contravenes the well-known 'egodystonia' of OCD which suggests that such tendencies are not goal-directed.

      To clarify our hypothesis, we will amend the sentence to the following: “Finally, we expected that OCD patients would generally report greater habits, as well as attribute higher intrinsic value to the familiar app sequences manifested by a greater preference for performing them when given the choice to select any other, easier sequence”.

      A few notes on the task description and other task components:

      6) It would be useful to give more details on the task. This includes more details on the time/condition of the gradual removal of visual and auditory stimuli and also on the within practice dynamic structure (i.e., different levels appear in the video).

      These details will be included in the revised manuscript. Thank you for pointing out the need for further clarification of the task design.

      7) Some more information on engagement-related exclusion criteria would be useful (what happened if participants did not use the app for more than one day, how many times were allowed to skip a day etc.).

      This additional information will be added to the revised manuscript. If participants omitted to train for more than 2 days, the researcher would send a reminder to the participant to request to catch up. If the participant would not react accordingly and a third day would be skipped, then the researcher would call to understand the reasons for the lack of engagement and gauge motivation. The participant would be excluded if more than 5 sequential days of training were missed. Only 2 participants were excluded given their lack of engagement.

      8) According to the (very useful) video demonstrating the task and the paper describing the task in detail (Banca et al., 2020), the task seems to include other relevant components that were not mentioned in this paper. I refer to the daily speed test, the daily random switch test, and daily ratings of each sequence's enjoyment and confidence of knowledge.

      If these components were not included in this procedure, then the deviations from the procedure described in the video and Banca al. (2020) should be explicitly mentioned. If these components were included, at least some of them may be relevant, at least in part, to automaticity, habitual action control, formulation of participants' enjoyment from the app etc. I think these components should be mentioned and analyzed (or at least provide an explanation for why it has been decided not to analyze them).

      This is also true for the reward removal (extinction) from the 21st day onwards which is potentially of particular relevance for the research questions.

      The task procedure was indeed the same as detailed in Banca et al., 2020. We did not include these extra components in this current manuscript for reasons of succinctness and because the manuscript was already rather longer than a common research article, given that we present three different, though highly inter-dependent, experiments in order to answer key interrelated questions in an optimal manner. However, since referee 2 considers this additional analysis to be important, we will be happy to include it in the supplementary material of the revised manuscript.

      Training engagement analysis:

      9)I find referring to the number of trials including successful and unsuccessful trials as representing participants "commitment to training" (e.g. in Figure legend 2b) potentially inadequate. Given that participants need at least 20 successful trials to complete each practice, more errors would lead to more trials. Therefore, I think this measure may mostly represent weaker performance (of the OCD patients as shown in Figure 2b). Therefore, I find the number of performed practice runs, as used in Figure 2a (which should be perfectly aligned with the number of successful trials), a "clean" and proper measure of engagement/commitment to training.

      We acknowledge referee’s concern on this matter and agree to replace the y-axis variable of Figure 2b to the number of performed practices (thus aligning with Figure 2a). This amendment will remove any potential effect of weaker performance on the engagement measurement and will provide clearer results.

      10) Also, to provide stronger support for the claim about different diurnal training patterns (as presented in Figure 2c and the text) between patients and healthy individuals, it would be beneficial to conduct a statistical test comparing the two distributions. If the results of this test are not significant, I suggest emphasizing that this is a descriptive finding.

      We will conduct the statistical test and report accordingly.

      Learning results:

      11) When describing the Learning results (p10) I think it would be useful to provide the descriptive stats for the MT0 parameter (as done above for the other two parameters).

      Thank you for pointing this out. The descriptive stats for MT0 will be added to the revised version of the manuscript.

      12) Sensitivity of sequence duration and IKI consistency (C) to reward:

      I think it is important to add details on how incorrect trials were handled when calculating ∆MT (or C) and ∆R, specifically in cases where the trial preceding a successful trial was unsuccessful. If incorrect trials were simply ignored, this may not adequately represent trial-by-trial changes, particularly when testing the effect of a trial's outcome on performance change in the next trial.

      This is an important question. Our analysis protocol was designed to ensure that incorrect trials do not contaminate or confound the results. To estimate the trial-to-trial difference in ∆MT (or C) and ∆R, we exclusively included pairs of contiguous trials where participants achieved correct performance and received feedback scores for both trials. For example, if a participant made a performance error on trial 23, we did not include ∆R or ∆MT estimates for the pairs of trials 23-22 and 24-23. Instead of excluding incorrect trials from our analyses, we retained them in our time series but assigned them a NaN (not a number) value in Matlab. As a result, ∆R and ∆MT was not defined for those two pairs of trials. Similarly for C. This approach ensured that our analyses are not confounded by incremental or decremental feedback scores between noncontiguous trials. In the past, when assessing the timing of correct actions during skilled sequence performance, we also considered events that were preceded and followed by correct actions. This excluded effects such as post-error slowing from contaminating our results (Herrojo Ruiz et al., 2009, 2019). Therefore, we do not believe that any further reanalysis is required.

      • Ruiz MH, Jabusch HC, Altenmüller E. Detecting wrong notes in advance: neuronal correlates of error monitoring in pianists. Cerebral cortex. 2009 Nov 1;19(11):2625-39.

      • Bury G, García-Huéscar M, Bhattacharya J, Ruiz MH. Cardiac afferent activity modulates early neural signature of error detection during skilled performance. NeuroImage. 2019 Oct 1;199:704-17.

      13) I have a serious concern with respect to how the sensitivity of sequence duration to reward is framed and analyzed. Since reward is proportional to performance, a reduction in reward essentially indicates a trial with poor performance, and thus even regression to the mean (along with a floor effect in performance [asymptote]) could explain the observed effects. It is possible that even occasional poor performance could lead to a participant demonstrating this effect, potentially regardless of the reward. Accordingly, the reduced improvement in performance following a reward decrease as a function of training length described in Figure 5b legend may reflect training-induced increased performance that leaves less room for improvement after poor trials, which are no longer as poor as before. To address this concern, controlling for performance (e.g., by taking into consideration the baseline MT for the previous trial) may be helpful. If the authors can conduct such an analysis and still show the observed effect, it would establish the validity of their findings."

      Thank you for raising this point. Figure 5b illustrates two distinct effects of reward changes on behavioral adaptation, which are expected based on previous research.

      I. Practice effects: Firstly, we observe that as participants progress across bins of practice, the degree of improvement in behavior (reflected by faster movement time, MT) following a decrease in reward (∆R−) diminishes, consistent with our expectations based on previous work. Conversely, we found that ∆MT does not change across bins of practices following an increase in reward (∆R+). We appreciate the reviewer's suggestion regarding controlling for the reference movement time (MT) in the previous trial when examining the practice effect in the p(∆T|∆R−) and p(∆T|∆R+) distributions. In the revised manuscript, we will conduct the proposed control analysis to better understand whether the sensitivity of MT to score decrements changes across practice when normalising MT to the reference level on each trial. But see below for a preliminary control analysis.

      II. Asymmetry of the effect of ∆R− and ∆R+ on performance: Figure 5b also depicts the distinct impact of score increments and decrements on behavioural changes. When aggregating data across practice bins, we consistently observed that the centre of the p(∆T|∆R−) distribution was smaller (more negative) than that of p(∆T|∆R+). This suggests that participants exhibited a greater acceleration following a drop in scores compared to a relative score increase, and this effect persisted throughout the practice sessions. Importantly, this enhanced sensitivity to losses or negative feedback (or relative drops in scores) aligns with previous research findings (Galea et al., 2015; Pekny et al., 2014; van Mastrigt et al., 2020).

      We have conducted a preliminary control analysis to exclude the potential impact that reference movement time (MT) values could have on our analysis. We have assessed the asymmetry between behavioural responses to ∆R− and ∆R+ using the following analysis: We estimated the proportion of trials in which participants exhibited speed-up (∆T < 0) or slow-down (∆T > 0) behaviour following ∆R− and ∆R+ across different practice bins (bins 1 to 4). By discretising the series of behavioural changes (∆T) into binary values (+1 for slowing down, -1 for speeding up), we can assess the type of changes (speed-up, slow-down) without the absolute ∆T or T values contributing to our results. We obtained several key findings:

      • Consistent with expectations (sanity check), participants exhibited more instances of speeding up than slowing down across all reward conditions.

      • Participants demonstrated a higher frequency of speeding up following ∆R− compared to ∆R+, and this asymmetry persisted throughout the practice sessions (greater proportion of -1 events than +1 events). 53% events were speed-up events in the in the p(∆T|∆R+) distribution for the first bin of practices, and 55% for the last bin. Regarding p(∆T|∆R-), there were 63% speed-up events throughout each bin of practices, with this proportion exhibiting no change over time.

      • Accordingly, the asymmetry of reward changes on behavioural adaptations, as revealed by this analysis, remained consistent across the practice bins.

      Thus, these preliminary findings provide an initial response to referee 2 and offer valuable insights into the asymmetrical effects of positive/negative reward changes on behavioural adaptations. We plan to include these results in the revised manuscript, as well as the full control analysis suggested by the referee. We will further expand upon their interpretation and implications.

      14) Another way to support the claim of reward change directionality effects on performance (rather than performance on performance), at least to some extent, would be to analyze the data from the last 10 days of the training, during which no rewards were given (pretending for analysis purposes that the reward was calculated and presented to participants). If the effect persists, it is less unlikely that the effect in question can be attributed to the reward dynamics.

      The reviewer’s concern is addressed in the previous quesQon. Also, this analysis would not be possible because our Gaussian fit analyses use the Qme series of conQnuous reward scores, in which ∆R− or ∆R+ are embedded. These events cannot be analyzed once reward feedback is removed because we do not have behavioral events following ∆R− or ∆R+ anymore.

      15) This concern is also relevant and should be considered with respect to the sensitivity of IKI consistency (C) to reward. While the relationship between previous reward/performance and future performance in terms of C is of a different structure, the similar potential confounding effects could still be present.

      We will conduct this analysis for the revised manuscript, similarly to the control analysis suggested by referee 2 on MT. Our preliminary control analysis, as explained above, suggests that the fundamental asymmetry in the effect of ∆R+ and ∆R+ on behavioral changes persists when excluding the impact of reference performance values in our Gaussian fit analysis.

      16) Another related question (which is also of general interest) is whether the preferred app sequence (as indicated by the participants for Phase B) was consistently the one that yielded more reward? Was the continuous sequence the preferred one? This might tell something about the effectiveness of the reward in the task.

      We have now conducted this analysis. There is in fact no evidence to conclude that the continuously rewarded sequence was the preferred one. The result shows that 54.5% of HV and 29% of the OCD sample considered the continuous sequence to be their preferred one. Of note, this preference may not necessarily be linked to the trial-by-trial reward sensitive analysis. The latter assesses how learning may be affected by reward. The overall preference may be influenced by many other factors, such as, for example, the aesthetic appeal of particular combinations of finger movements.

      Regarding both experiments 2 and 3:

      17) The change in context in experiment 2 and 3 is substantial and include many different components. These changes should be mentioned in more detail in the Results section before describing the results of experiments 2 and 3.

      Following referee’s advice, we will move these details (currently written in the Methods section) to the Results section, when we introduce Phase B and before describing the results of experiments 2 and 3.

      Experiment 2:

      18) In Experiment 2, the authors sometimes refer to the "explicit preference task" as testing for habitual and goal-seeking sequences. However, I do not think there is any justification for interpreting it as such. The other framings used by the authors - testing whether trained action sequences gain intrinsic/rewarding properties or value, and preference for familiar versus novel action sequences - are more suitable and justified. In support of the point I raised here, assigning intrinsic rewarding properties to the learned sequences and thereby preferring these sequences can be conceptually aligned with goal-directed behavior just as much as it could be with habit.

      We clearly defined the theoretical framing of experiment 2 as a test of whether trained action sequences gain intrinsic value and we are pleased to hear that the referee agrees with this framing. If the referee is referring to the paragraph below (in the Discussion), we actually do acknowledge within this paragraph that a preference for the trained sequences can either be conceptually aligned with a habit OR a goal-directed behavior.

      “On the other hand, we are describing here two potential sources of evidence in favor of enhanced habit formation in OCD. First, OCD patients show a bias towards the previously trained, apparently disadvantageous, action sequences. In terms of the discussion above, this could possibly be reinterpreted as a narrowing of goals in OCD (Robbins et al., 2019) underlying compulsive behavior, in favor of its intrinsic outcomes”

      This narrowing of goals model of OCD refers to a hypothetically transiQonal stage of compulsion development driven by behavior having an abnormally strong, goal-directed nature, typically linked to specific values and concerns.

      If the referee is referring to the penulQmate sentence of hypothesis secQon, this has been amended in response to Q5. We cannot find any other possible instances in this manuscript stating that experiment 2 is a test of habitual or goal-directed behavior.

      Experiment 3:

      19) Similar to Experiment 2, I find the framing of arbitration between goal-directed/habitual behavior in Experiment 3 inadequate and unjustified. The results of the experiment suggest that participants were primarily goal-directed and there is no evidence to support the idea that this reevaluation led participants to switch from habitual to goal-directed behavior.

      Also, given the explicit choice of the sequence to perform participants had to make prior to performing it, it is reasonable to assume that this experiment mainly tested bias towards familiar sequence/stimulus and/or towards intrinsic reward associated with the sequence in value-based decision making.

      This comment is aligned with (and follows) the referee’s criticism of experiment 1 not achieving automatic and habitual actions. We have addressed this matter above, in response 1 to Referee 2.

      Mobile-app performance effect on symptomatology: exploratory analyses:

      20) Maybe it would be worth testing if the patients with improved symptomatology (that contribute some of their symptom improvement to the app) also chose to play more during the training stage.

      We have conducted analysis to address this relevant question. There is no correlation between the YBOCS score change and the number of total practices, meaning that the patients who improved symptomatology post training did not necessarily chose to play the app more during the training stage (rs = 0.25, p = 0.15). Additionally, we have statistically compared the improvers (patients with reduced YBOCS scores post-training) and the non-improvers (patients with unchanged or increased YBOCS scores post-training) in their number of app completed practices during the training phase and no differences were observed (U = 169, p = 0.19).

      Discussion:

      21) Based on my earlier comments highlighting the inadequacy and mis-framing of the work in terms of habit and goal-directed behavior, I suggest that the discussion section be substantially revised to reflect these concerns.

      We do not agree that the work is either "inadequate or mis-framed" and will not therefore be substantially revising the Discussion. We will however clarify further the interpretation we have made and make explicit the alternative viewpoint of the referee. For example, we will retitle experiment 3 as “Re-evaluation of the learned action sequence: possible test of goal/habit arbitration” to acknowledge the referee’s viewpoint as well as our own interpretation.

      22) In the sentence "Nevertheless, OCD patients disadvantageously preferred the previously trained/familiar action sequence under certain conditions" the term "disadvantageously" is not necessarily accurate. While there was potentially more effort required, considering the possible presence of intrinsic reward and chunking, this preference may not necessarily be disadvantageous. Therefore, a more cautious and accurate phrasing that better reflects the associated results would be useful.

      We recognize that the term "disadvantageously" may be semantically ambiguous for some readers and therefore we will remove it.

      Materials and Methods:

      23) The authors mention: "The novel sequence (in condition 3) was a 6-move sequence of similar complexity and difficulty as the app sequences, but only learned on the day, before starting this task (therefore, not overtrained)." - for the sake of completeness, more details on the pre-training done on that day would be useful.

      Details of the learning procedure of the novel sequence (in condition 3, experiment 3) will be provided in the methods of the revised version of the manuscript.

      Minor comments:

      24) In the section discussing the sensitivity of sequence duration to reward, the authors state that they only analyzed continuous reward trials because "a larger number of trials in each subsample were available to fit the Gaussian distributions, due to feedback being provided on all trials." However, feedback was also provided on all trials in the variable reward condition, even though the reward was not necessarily aligned with participants' performance. Therefore, it may be beneficial to rephrase this statement for clarity.

      We will follow this referee’s advice and will rephrase the sentence for clarity.

      25) With regard to experiment 2 (Preference for familiar versus novel action sequences) in the following statement "A positive correlation between COHS and the app sequence choice (Pearson r = 0.36, p = 0.005) further showed that those participants with greater habitual tendencies had a greater propensity to prefer the trained app sequence under this condition." I find the use of the word "further" here potentially misleading.

      The word "further" will be removed.

    1. Author Response:

      The following is the authors' response to the original reviews.

      Thank you for considering our manuscript “An Unexpected Role of Neutrophils in Clearing Apoptotic Hepatocytes In Vivo". We also thank the referees for their review. We have addressed their comments in detail and added new data to buttress our conclusions.

      Reviewer #1 (Public Review):

      This study by Cao et al. demonstrates role of Neutrophil in clearing apoptotic hepatocytes by directly burrowing into the apoptotic hepatocytes and ingesting the effete cells from inside without causing inflammation. The authors applied intravital microscopy, Immunostaining and electron microscopy to visualize perforocytosis of neutrophil in hepatocytes. They also found that neutrophil depletion impairs the clearance of apoptotic hepatocytes causing impaired liver function and generation of autoantibodies, implying a role of defective neutrophil- mediated clearance of apoptotic cells in Autoimmune Liver disease. The experiments were well designed and conducted, the results were reasonably interpreted, and the manuscript was clearly written with logical inputs.

      Thank you for your comments.

      One weak point is that the signals/mechanisms that determine why neutrophil specifically target apoptotic hepatocytes in liver and no other organs or cells is not clearly understood.

      We are still studying why neutrophils selectively phagocytose hepatocytes but not HUVEC or 293 cells. We have some intriguing preliminary data so far showing that apoptotic 293 cells have no significant increase of IL-1β production as compared with their nonapoptotic controls; both apoptotic 293 cells and HUVECs do not have increased surface selectin proteins (new Fig. S3C).

      Reviewer #2 (Public Review):

      […] By examination of HE-stained, noncancerous liver tissue sections from patients with hepatocellular carcinoma and hepatic hemangioma, the authors observed that cells with neutrophil nuclear morphology were inside apoptotic hepatocytes. The authors also further characterized this observation by staining the sections with neutrophil and apoptosis markers. In addition, the authors observed the same phenomena in mouse livers using intravital microscopy, which also recorded the time course of the disappearance of a neutrophil-associated apoptotic cell. The author went on further characterization of neutrophil-mediated efferocytosis of cultured hepatic cells in vitro and demonstrated the process was specific for apoptotic hepatic cells, but not HEK293 or endothelial cells. The in vitro system was then used to characterize the molecular bases for neutrophil-mediated efferocytosis of apoptotic hepatic cells. The evidence was provided to suggest that IL1b and IL-8 released from and selectins upregulated in apoptotic hepatic cells were important. Importantly, the authors used two methods to deplete the neutrophils and showed that the neutrophil depletion increased apoptotic cells in livers. Finally, the authors showed that neutrophil depletion caused defects in liver function parameters. At the end, the authors presented evidence to suggest that AIL disease may be due to defective neutrophils that fail to perform "perforocytosis."

      Thank you for your comments.

      Point #1. Although the evidence in its totality indicates that neutrophils burrow into apoptotic hepatocytes, the significance of this "perforocytosis" phenomenon and the circumstances under which it may occur remain to be better defined. In both neutrophil depletion models, the TNUEL-positive cells were not definitively identified rather than assuming they were hepatocytes.

      Anatomically, the apoptotic hepatocytes are randomly distributed in the hepatic plate from the central vein to the portal region (please refer to the image below: hematoxylin staining of liver tissues, black arrowhead indicates perforocytosis sites).

      Author response image 1.

      Histologically, the structure of liver/hepatic lobe are well defined, and the cell types in the livers are easy to histologically identify based on their location, morphology and the relationship to hepatic plate and sinusoid. In addition, the hepatocytes are well known for its rich cytoplasmic components, cellular connection and prominent large round nucleus. Thus, hepatocytes are very easy to identify even without using specific molecular markers such as E-cadherin or albumin. Based on these characteristics, the TUNEL positive cells that we displayed in Fig. 5A are apoptotic hepatocytes.

      Point #2. In addition, there are discrepancies in the number of neutrophils and apoptotic cells in mouse liver studies; Fig. 2a WT (many neutrophils; locations unclear) vs Fig. 5A Ctr (a few neutrophils that appear in or near a vessel), and Fig. 2a DTR (a few apoptotic cells) vs Fig. 5A Depletion (many apoptotic cells).

      In response, Fig. 2A demonstrates a larger area of the mouse liver (bar, 100 µm), while Fig. 5A exhibits a relatively small area of the liver sample (bars, 20 µm for Ctrl and 15 µm for DTR). Similarly, apoptotic cells in Fig. 2A DTR need to zoom in to quantify. We apologize for the confusion, and we did quantify the apoptotic cells in Fig.2A WT vs DTR (see the bar graph next to the images in Fig. 2A).

      Point #3. Importantly, Fig 5a Ctrl, which is presumably a section from a mouse without any surgical treatment or without inflammation, the sole TUNNEL signal does not appear to be associated with neutrophils. Does this mean that "perforocytosis" primarily occurs in inflamed livers (Of note, human liver samples in Fig 1 are from patient with tumors. There should be inflammation in the livers of these patients).

      In Fig 5A Ctrl, the TUNEL signal indicates apoptotic hepatocytes. The neutrophils (stained with anti-NE antibody, red) are associated with the apoptotic hepatocyte (Fig. 5A). We observed that perforocytosis primarily occurs in normal noninflamed livers.

      Human liver samples in Fig 1 are from patient with tumors, hence it is possible that neutrophil burrowing is somehow associated with cancerous/inflammatory livers as the reviewer pointed out. This possibility was ruled out based on our method of sample preparation and experimental results themselves.

      1) Both noncancerous and cancerous liver samples were sliced based on the anatomical appearance of normal and cancer tissues (differences were rather easy to identify, and these samples were prepared by highly experienced pathologists from the Liver Cancer Center of Zhongshan Hospital, Shanghai). Furthermore, the results were confirmed by determining whether the surrounding tissue contained microlesions characteristic of metastatic tumors. We only counted apoptotic hepatocytes in noncancerous regions having normal liver lobes and morphologically normal hepatocytes, plates, sinusoid and Kupffer cells. We also excluded hepatoma, chronic inflammatory regions, and necrotic regions.

      2) We did not observe recruitment of neutrophils into apoptotic HCC cells, indicating that the clearance of apoptotic cancer cells was not mediated by neutrophils (unpublished observations).

      3) It is hard for us to obtain normal human liver samples; however, we did study samples from patients with liver hemangioma characterized by aberrant vasculature in livers but with normal liver functions and the structure of hemangioma livers that we analyzed are nearly identical to a healthy liver in histology (these liver samples contained no cancerous regions and there was no apparent cirrhosis or inflammation). And here we obtained similar results (these are shown in Fig. 1B; a total of 40 apoptotic hepatocytes were examined).

      4) Our data from normal mouse livers, isolated primary cells (hepatocytes and neutrophils) and cell lines (NCTC and HL60) all confirmed the central findings in this paper (Fig. 2, 3).

      Point #4. The data on human AIL patient neutrophils raises more questions: how many AIL patients have been examined? Do these AIL neutrophils lack IL1, IL8 receptors, and/or selectin ligands? Are there increases in apoptotic hepatocytes in AIL patients?

      In response, we have analyzed 16 AIL patient samples (see table below).

      Author response table 1.

      We performed microarray assay to screen the differential gene expression of neutrophils from normal and liver autoimmune patients. We have identified that IL-1β receptor, IL1R1 and selectin binding protein, P- selectin glycoprotein ligand 1 (PSGL-1) were all decreased in neutrophils from the AIL patients (new Fig 7D). These findings are consistent with our observations using cells and mouse models.

      Point #5. Additionally, the overall numbers of apoptotic cells even in the absence of neutrophils are rare; thus, it is questionable that such rarity of apoptotic cells can cause significant AIL phenotypes.

      We quantified apoptotic liver cells in percentages instead of overall numbers (Fig. 5, we were not able to precisely calculate the overall numbers, which could be large since billions of cells undergoing apoptosis daily). Depletion of neutrophils increased the percentage of apoptotic cells about 5-6-fold in livers, and we observed the generation of autoantibodies (Fig. 6).

      Reviewer #1 (Recommendations For The Authors):

      This study by Cao et al. was well designed and conducted, the results were reasonably interpreted, and the manuscript was clearly written with logical inputs.

      It would further gain the significance of this study if authors could address the following questions:

      1.  What are the mechanisms/ signals that prevents AIL Liver neutrophils from burrowing into hepatocytes?

      We have identified that IL-1β receptor, IL1R1 and selectin binding protein, P-selectin glycoprotein ligand 1 (PSGL-1) were all decreased in neutrophils from the AIL patients (new Fig 7D).

      2.  Have authors looked if autoantigens expressed on hepatocytes, which are often found in autoimmune liver disease trigger signaling events that activate neutrophils to burrow?

      Thank you for the comment, we have not examined autoantigens expressed in hepatocytes and plan to carry out this research as suggested.

      3.  Is perforocytosis observed in apoptotic hepatocytes induced by different agents like LPS, TNF-a , rapamycin, alcohol etc?

      We did not observe perforocytosis in LPS or TNF-a treated hepatocytes. One possible reason is that LPS or TNF-a we used induced massive necrosis instead of apoptosis. Howere, we did observe neutrophil perforocytosis in FasL-induced apoptotic hepatocytes (unpublished observations).

      Reviewer #2 (Recommendations For The Authors):

      In addition to the questions raised in the "Public review" section, the authors are also recommended to address the following issues:

      1) Why is CD11b+ not associated with the apoptotic sites as neutrophils express CD11b

      We have co-immunostained human liver samples with CD11b antibody (from Abcam: ab133357) and MPO antibody (from R&D: AF3667) and observed that tissue infiltrating neutrophils in livers have low to undetectable levels of CD11b expression (please refer the image below; white arrowheads point to neutrophils). Few CD11b+ cells in liver tissues express MPO (the CD11b+ cells are mostly macrophages, unpublished observations).

      Based on these data, we conclude that CD11b is hardly expressed in neutrophils inside livers.

      Author response image 2.

      2) Can TUNEL signals in Fig. S1C be from apoptotic neutrophils?

      In response, the fragmentation of nucleus is a hallmark of apoptosis hence TUNEL staining will uniformly label all fragmented parts of apoptotic nucleus. The nucleus of NE+ neutrophils are not labelled by TUNEL staining in Fig. S1C. The TUNEL+ nuclear fragments seen inside neutrophils are nuclear debris of apoptotic hepatocytes phagocytosed by neutrophils (Fig. S1C).

      3) The Fig 2B experiment may be done with induced apoptosis so that neutrophil burrowing steps may be recorded from the very beginning and a better time course for the entire process can be assessed.

      Thank you for the suggestions, we had tried many times with various conditions, yet still had no success to capture the very beginning of perforocytosis in vivo. We are continuing to work on this.

      4) In "we found thatU937 cells exhibited much lower phagocytosis of apoptotic NCTC cells than did HL60 cells (Fig. S2B, C)," the citation should be only S2C

      Thank you for pointing this out, we have corrected this in the manuscript.

      5) Both neutrophil depletion models cause neutrophil death, which may complicate the interpretation of the liver function and AIL disease phenotypes. A neutropenic model such as G-CSFR−/− or Cebpe-/- mice may be used to avoid the caveat of antibody/DTR-dependent depletion models.

      Thank you for this thoughtful suggestion. We have also induced AIL phenotypes in mice by using α- Galcer. α-Galcer did not cause neutrophil death but impaired neutrophil perforocytosis and futher generated AIL phenotypes in mice (unpublished observations). We plan to perform the simiarl experiments in G-CSFR−/− or Cebpe−/− mice as the reviewer suggested.

      6) RNAi silencing experiments need additional controls for off-target effects

      These RNAi silencing constructs were purchased from Santa Cruz Biotechnology and the off-target effects have been tested by the company. No significant off-target effects have been detected according to the manufacture report.

    1. Author Response

      Joint Public Review

      The molecular composition of synaptic vesicles (SVs) has been defined in substantial detail, but the function of many SV-resident proteins are still unknown. The present study focused on one such protein, the 'orphan' SV-resident transporter SLC6A17. By utilizing sophisticated and extensive mouse genetics and behavioral experiments, the authors provide convincing support for the notion that certain SLC6A17 variants cause intellectual disability (ID) in humans carrying such genetic variations. This is an important and novel finding. Furthermore, the authors propose, based on LCMS analyses of isolated SVs, that SLC6A17 is responsible for glutamine (Gln) transport into SVs, leading to the provocative idea that Gln functions as a neurotransmitter and that deficits in Gln transport into SVs by SLC6A17 represents a key pathogenetic mechanism in human ID patients carrying variants of the SLC6A17 gene.

      This latter aspect of the present paper is not adequately supported by the experimental evidence so that the main conceptual claims of the study appear insufficiently justified at this juncture. Key weaknesses are as follows:

      A) Detection of Gln, along with classical neurotransmitters such as glutamate, GABA, or ACh, in isolated SV fractions does not prove that Gln is transported into SVs by active transport. Gln is quite abundant in extracellular compartments. Its appearance in SV samples can therefore also be explained by trapping in SVs during endocytosis, presence in other - contaminating - organelles, binding to membrane surfaces, and other processes. Direct assays of Gln uptake into SVs, which have the potential to stringently test key postulates of the authors, are lacking.

      We have conducted multiple control experiments to exclude the possibility of contamination.

      1). Western blot analysis of SLC6A17-HA immunoisolation (Figure 4D and Figure 4—figure supplement 1) has shown that this faction contained little other organelles and membranes. These results are strong argument that contaminations in our isolated fraction were in very low level.

      2). We then examined the proportion of SLC6A17 localized SVs through quantifying the co-localization of Syp and SLC6A17 by anti-Syp immunoisolation in Slc6a17-2A-HA-iCre mice. We found that SLC6A17 is predominately localized on SVs (with 98.7% compared with classical SV marker, Author response image 1A). This further showed that immunoisolated SLC6A17 fraction was mainly composed of SVs.

      3). We also analyzed other SV marker proteins such as Syt1 and Syb2 for IP-LC-MS, all results supported Gln enrichment (Author response image 1B).

      4). Importantly, immunoisolation of the SLC6A17P633R-HA protein, which caused SLC6A17 mislocalization away from the SVs (Figure 3B and Figure 3—figure supplement 1C, D), showed no Gln enrichment (Author response image 1C).

      5). Moreover, immunoisolation of AAV-PHP.eb overexpressed cytoplasmic membrane Gln transporter SLC38A1-HA did not show Gln enrichment (Author response image 1D).

      6). We also tested whether trafficking organelles such as the lysosome could enrich Gln. As is shown in Author response image 1E, immunoisolation of AAV-PHP.eb overexpressed TMEM192-HA did not show Gln enrichment. For active transport, we tested the effects of proton dissipator FCCP, v-ATPase inhibitor NEM and ΔpH dissipator nigercin. As is shown in Author response image 1F, 1G, Gln level was reduced by these inhibitors, supporting active transport of Gln.

      Author response image 1.

      Control experiments to test for contamination. A. Anti-Syp immunoisolation in Slc6a17-2A-HA-iCre mice. B. Quantification of Gln level in anti-Syt1 and anti-Syb2 immunoisolated fraction. C. Anti-HA immunoisolation in SLC6A7-2A-HA and anti-Slc6a17P633R mice. D. Anti-HA immunoisolation in AAV-PHP.eb-hSyn-SLC38A1-HA overexperssion mice. E. Anti-HA immunoisolation in AAV-PHP.eb-hSyn-TMEM192-HA overexperssion mice. F. Anti-HA immunoisolation in SLC6A7-2A-HA mice under FCCP (50 μM) and NEM (200 μM). G. Anti-Syp immunoisolation in wild type mice under FCCP (50 μM) and Nigercin (20 μM).

      B) The authors generated multiple potentially very useful genetic tools and models. However, the validation of these models is incomplete. Most importantly, it remains unclear whether the different mutations affect SLC6A17 expression levels, subcellular localization, or the expression and trafficking of other SV and synapse components.

      The verification of transgenic mouse line is described in the Material and Methods section of our manuscript. There are numerous literatures published for CRISPR mediated gene editing in animals and the off-target effect of CRISPR-Cas9 system is widely studied with optimized design tools developed by many groups (Platt et al., 2014; Chu et al., 2015, 2016; Liu et al., 2017; Gemberling et al., 2021; Singh et al., 2022). The gRNAs used for animal generation were chosen carefully based on publically available tools. Apart from basic genomic PCR sequencing of target regions of all gene edited mouse models, Southern blots were performed by Biocytogen company for Slc6a17-HA-2A-iCre and Slc6a17P633R mice to rule out random insertions. Expression levels in Slc6a17-KO and Slc6a17P633R mice were not affected, as shown in Figure R2. HA-tagged protein in Slc6a17-HA-2A-iCre and Slc6a17P633R mice were detected by immunoisolation, immunofluorescence, and fractionation (Figure 3, 4, Figure 3—figure supplement 1, Figure 4—figure supplement 1). Both showed localizations expected from previous reports ().

      C) Apart from the caveats mentioned above regarding Gln uptake into SVs, the data interpretation provided by the authors lacks stringency with respect to the biophysics of plasma membrane and SV transporters.

      The biophysics of SLC6A17 was carefully studied (Para et al 2008; Zaia and Reimer, 2009). Our work focused on in vivo biochemical results, not biophysics.

      Author response image 2.

      Verification of genetic mouse models. A. q-PCR verification of Slc6a17-KO mice; B. q-PCR verification of Slc6a17P633R mice; C. Example of genomic primer design for Slc6a17-HA-2A-iCre mice founder mice screen; D. Example of genomic PCR for Slc6a17-HA-2A-iCre mice founder mice screen; E. Southern blot performed for Slc6a17-HA-2A-iCre mice.

      Reference

      Chu, Van Trung et al. “Increasing the efficiency of homology-directed repair for CRISPR-Cas9-induced precise gene editing in mammalian cells.” Nature biotechnology vol. 33,5 (2015): 543-8. doi:10.1038/nbt.3198

      Chu, Van Trung, et al. "Efficient generation of Rosa26 knock-in mice using CRISPR/Cas9 in C57BL/6 zygotes." BMC biotechnology 16.1 (2016): 1-15.

      Gemberling, Matthew P et al. “Transgenic mice for in vivo epigenome editing with CRISPR-based systems.” Nature methods vol. 18,8 (2021): 965-974. doi:10.1038/s41592-021-01207-2

      Liu, Edison T., et al. "Of mice and CRISPR: The post‐CRISPR future of the mouse as a model system for the human condition." EMBO reports 18.2 (2017): 187-193.

      Madisen, Linda, et al. "A robust and high-throughput Cre reporting and characterization system for the whole mouse brain." Nature neuroscience 13.1 (2010): 133-140.

      Parra, Leonardo A., et al. "The orphan transporter Rxt1/NTT4 (SLC6A17) functions as a synaptic vesicle amino acid transporter selective for proline, glycine, leucine, and alanine." Molecular pharmacology 74.6 (2008): 15211532.

      Platt, R.J., Chen, S., Zhou, Y., Yim, M.J., Swiech, L., Kempton, H.R., Dahlman, J.E., Parnas, O., Eisenhaure, T.M., Jovanovic, M., et al. (2014). CRISPR-Cas9 knockin mice for genome editing and cancer mode Yang, Hui, Haoyi Wang, and Rudolf Jaenisch. "Generating genetically modified mice using CRISPR/Cas-mediated genome engineering." Nature protocols 9.8 (2014): 1956-1968.ling. Cell 159, 440-455.

      Singh, Surender et al. “Opportunities and challenges with CRISPR-Cas mediated homologous recombination based precise editing in plants and animals.” Plant molecular biology, 10.1007/s11103-022-01321-5. 31 Oct. 2022, doi:10.1007/s11103-022-01321-5

      Zaia, K.A., and Reimer, R.J. (2009). Synaptic vesicle protein NTT4/XT1 (SLC6A17) catalyzes Na+-coupled neutral amino acid transport. J Biol Chem 284, 8439-8448.

    1. Author Response

      eLife assessment

      This study assesses homeostatic plasticity mechanisms driven by inhibitory GABAergic synapses in cultured cortical neurons. The authors report that up- or down-regulation of GABAergic synaptic strength, rather than excitatory glutamatergic synaptic strength, is critical for homeostatic regulation of neuronal firing rates. The reviewers noted that the findings are potentially important, but they also raised questions. In particular, the evidence supporting the findings is currently incomplete and demonstration of independent regulation of mEPSCs and mIPSCs is a necessary experiment to support the major claims of the study.

      We appreciate the detailed, thoughtful assessment of our paper by the reviewers and editors and will submit a revised version in the future that addresses the reviewers’ comments as detailed below in response to each concern. We will include a more open discussion of alternative possibilities. Further, we will repeat the optogenetic experiments assessing AMPAergic scaling in our mouse cortical cultures in order to demonstrate independent regulation of mEPSCs and mIPSCs as suggested.

      Reviewer #1 (Public Review):

      In the manuscript titled "GABAergic synaptic scaling is triggered by changes in spiking activity rather than transmitter receptor activation," the authors present an investigation of the role of GABAergic synaptic scaling in the maintenance of spike rates in networks of cultured neurons. Their main findings suggest that GABAergic scaling exhibits features consistent with a key homeostatic mechanism that contributes to the stability of neuronal firing rates. Their data demonstrate that GABAergic scaling is multiplicative and emerges when postsynaptic spike rates are altered. Finally, their data suggest that, in contrast to their prior data on glutamatergic scaling, GABAergic scaling is driven by spike rates. The authors set the paper up as an argument that GABAergic scaling, rather than glutamatergic scaling, serves as the critical homeostatic mechanism for spike rate regulation.

      While the paper is ambitious in its rhetorical scope and certainly presents intriguing findings, there are several serious concerns that need to be addressed to substantiate the interpretations of the data. For example, the CTZ data do not support the interpretations and conclusions drawn by the authors. Summarily, the authors argue that GABAergic scaling is measuring spiking (at the time scale of the homeostatic response, which they suggest is a key feature of a homeostat) yet their data in figure 5B show more convincingly that CTZ does not influence spiking levels - only one out of four time points is marginally significant (also, I suspect that the bootstrapping method mentioned in line 454-459 was conducted as a pairwise comparison of distributions. There is no mention of multiple comparisons corrections, and I have to assume that the significance at 3h would disappear with correction).

      We certainly understand the criticism here (similar to reviewer 2’s third point). In our resubmission we will do a better job discussing these complications, which we now summarize. First, we are presenting our entire dataset to be as transparent as possible. Unlike most synaptic scaling studies (including our own) that apply drugs to alter activity and assess mPSC amplitude at the final time point, here we are actually showing CTZ’s effect on spiking activity within the culture over time. This is critical because it has informed us of the drug’s true effect on spiking, the variability that is associated with these perturbations, and the ability and timing of the cultured network to homeostatically recover initial levels. This was important because it revealed that the drugs do not always influence activity in the way we assume, and this provides greater context to our results. Second, we are showing all of our data, and presenting it using estimation statistics which go beyond the dichotomy of a simple p value yes or no (Ho J, Tumkaya T, Aryal S, Choi H, Claridge-Chang A. 2019. Moving beyond P values: data analysis with estimation graphics. Nat Methods 16: 565-66). Estimation statistics have become a more standard statistical approach in the last 15 years and is the preferred method for the Society for Neuroscience’s eNeuro Journal. This method shows the effect size and the confidence interval of the distribution. For the 3 hr time point in Fig. 5B the CTZ/ethanol vs. ethanol data points exhibit very little overlap and the effect size demonstrates a near doubling of spike frequency, and the confidence interval shows a clear separation from 0. This was a pairwise comparison as we compared values at each time point after the addition of ethanol or ethanol/CTZ. Third, the plots illustrate an upward trend in spike frequency at 1 and 6 hrs, but that there is also clear variability. It is important to note that while these recordings help us to understand effects on spiking across the cultured network, they cannot directly speak to spiking activity in the principal neurons that we target. This complication along with the variability inherent in these cultures could make simple comparisons difficult to interpret. Regardless, we do see some increase in spiking with CTZ and we clearly see increases in mIPSC amplitude, thus providing some support for the idea that spiking could be a critical player in terms of GABAergic scaling, particularly when put in the context of our other findings. However, it is important to recognize that something other than total spike rate may contribute to GABAergic scaling, such as the pattern of spiking that produces a particular calcium transient, and this will be discussed in the resubmission.

      Then, the fact that TTX applied on top of CTZ drives a increase in mIPSC amplitude is interpreted as a conclusive demonstration that GABAergic scaling is sensing spiking. It is inevitable, however, that TTX will also severely reduce AMAP-R activation - a very plausible alternative explanation is that the augmentation of AMPAR activation caused by CTZ is not sufficient to overcome the dramatic impact of TTX. All together, these data do not provide substantial evidence for the conclusion drawn by the authors.

      We understand this point when considering the CTZ/TTX experiments by themselves. However, spiking appears to be a more straightforward trigger when the CTZ/TTX results are coupled with the prevention of GABAergic downscaling by optogenetic restoration of spiking in the presence of AMPAR antagonists. Further, an important point here is that our results with TTX vs. TTX + CTZ are different for GABAergic scaling (no difference) and AMPAergic scaling (CTZ diminished upward scaling) suggesting different triggers for the two forms of scaling. We will make this more clear in our resubmission.

      Specific points:

      • The logic of the basis for the argument is somewhat flawed: A homeostat does not require a multiplicative mechanism, nor does it even need to be synaptic. Membrane excitability is a locus of homeostatic regulation of firing, for example. In addition, synapse-specific modulation can also be homeostatic. The only requirement of the homeostat is that its deployment subserves the stabilization of a biological parameter (e.g., firing rate).

      We agree with the reviewer and should not have suggested that this was a necessary requirement for a spike rate hemostat. What we should have said was that historically this definition has been attributed to AMPAergic scaling, which is thought to be a spike rate homeostat. We will correct this in the resubmission.

      • Line 63 parenthetically references an important, but contradictory study as a brief "however". Given the tone of the writing, it would be more balanced to give this study at least a full sentence of exposition.

      Agreed, we will do this.

      • The authors state (line 11) that expression of a hyperpolarizing conductance did not trigger scaling. More recent work ('Homeostatic synaptic scaling establishes the specificity of an associative memory') does this via expression of DREADDs and finds robust scaling.

      The purpose of citing this study was to argue that the spike rate homeostat hypothesis doesn’t make sense for AMPAergic scaling based on a study that hyperpolarized an individual cell while leaving the rest of the network unaltered and therefore leaving network activity and neurotransmission largely normal. In this case scaling was not triggered, suggesting reduced spike rate within an individual cell was insufficient to trigger scaling. The study that the reviewer refers to hyperpolarizes a majority of cells in the network and therefore will also alter neurotransmission throughout the network, which does not separate the importance of spiking and receptor activation as in the above-mentioned study. We will make this point more clearly in the resubmission.

      • Supplemental figure 1 looks largely linear to me? Out of curiosity, wouldn't you expect the left end to be aberrant because scaling up should theoretically increase the strength of some synapses that would have been previously below threshold for detection?

      We agree that the scaling ratio plot is largely linear. To be clear, the linearity of the ratio plot was interesting but our main point here was that this line had a positive slope meaning ratios (CNQX mPSC amplitudes/control mPSC amplitudes) got bigger for the larger CNQX-treated mPSCs. Alternatively, a multiplicative relationship where mPSCs are all increased by a single factor (e.g. 2X) would be a flat line with 0 slope at the multiplicative value (e.g. 2). In terms of the left side of the plot, we do see values that rise abruptly from 1 - this is partially obstructed by the Y axis in this figure and we will adjust this. This left part of the plot is likely due the CNQX-induced increases in mPSC amplitudes of mini’s that were below our detection threshold of 5pA. Therefore, mini’s that were 4pAs could now be 5pAs after CNQX treatment and these are then divided by the smallest control mPSCs which are 5 pAs (ratio of 1). We will try to do a better job describing this in the resubmission.

      Given that figure 2B also shows warping at the tail ends of similar distributions, how is this to be interpreted?

      The left side of the ratio plot shows evidence consistent with the idea that mIPSCs are dropping into the noise after CNQX treatment (similar to above argument), while most of the distribution suggests mIPSCs are reduced to 50% by CNQX treatment. On the right side of the ratio plot the values appear to mostly increase. We are not sure why this is happening, but it looks like some mIPSCs are not purely multiplicative at 0.5, particularly in TTX. It is also important to point out that this is a relatively small percent of the total population and the biggest mPSCs can vary to a great degree from one cell to the next. We will discuss this in the resubmission.

      • The readability of the figures is poor. Some of them have inconsistent boundary boxes, bizarre axes, text that appears skewed as if the figures were quickly thrown together and stretched to fit.

      We will address these issues in the resubmission.

      • I'm concerned about the optogenetic restoration of activity experiment. Cortical pyramidal neuron mean firing rates are log normally distributed and span multiple orders of magnitude. The stimulation experiments can only address the total firing at a network-level - given than a network level "mean" is meaningless in a lognormal distribution, how are we to think about the effect of this manipulation when it comes to individual neurons homeostatically stabilizing their own activities? In essence, the argument is made at the single-neuron level, but the experiment is conducted with a network-level resolution.

      As described above, we do not have the capacity to know what the actual firing rate of a particular neuron was before and after introducing a drug and so we cannot absolutely say that we have restored the original firing rates of neurons. However, there is reason to believe that this is achieved to some extent. Our optogenetic stimulation is only 50-100 ms long activating a subset of neurons. This is sufficient to provide a synaptic barrage that then triggers a full blown network burst where the majority of spikes occur, but this is after the light is off. In other words, the optogenetic light pulse only initiates what becomes a normal network burst that fortunately allows the individual cells to express their relatively normal (pre-drug) activity pattern. In our previous study we show that this is the case for individual units - the spiking of an individual unit during a burst is similar before and after CNQX/optostim (see Figure 4b and Suppl. Fig 4 in Fong et al. 2015 Nat. Comm.). We are not claiming that we have restored spiking to exactly the pre-drug state, but bring it back toward those levels and we see this is associated with a return of the mIPSC amplitude to near control levels. We will include a description of this in the resubmission.

      • Line 198-99: multiplicativity is not a requirement of a homeostatic mechanism.

      • Line 264-265 - again, neither multiplicativity and synaptic mechanisms are fundamentally any more necessary for a homeostatic locus than anything else that can modulate firing rate in via negative feedback.

      Agreed, see above discussion of homeostat requirement. Will adjust these statements in our resubmission.

      • 277: do you mean AMPAR?

      We were not clear enough here. We actually do mean GABAR. The idea is that CTZ increases network activity and thus increases both AMPAergic and GABAergic transmission. We will clarify this in the resubmission.

      • Example: Figure 1A is frustratingly unreadable. The axes on the raster insets are microscopic, the arrows are strangely large, and it seems unnecessary to fill so much realestate with 4 rasters. Only one is necessary to show the concept of a network burst. The effect of time+CNQX on the frequency of burst is shown in B and C.

      • Example: Figure 2 appears warped and hastily assembled. Statistical indications are shown within and outside of bounding boxes. Axes are not aligned. Labels are not aligned. Font sizes are not equal on equivalent axes.

      We will adjust these issues in the resubmission.

      • The discussion should include mention of the limitations and/or constraints of drawing general conclusions from cell culture.

      We agree and will adjust the discussion. Also, this is why we cited studies that argue GABAergic neurons have a particularly important role in homeostatic regulation of firing following sensory deprivations in vivo.

      • The discussion should include mention of the role of developmental age in the expression of specific mechanisms. It is highly likely that what is studied at ~P14 is specific to early postnatal development.

      We will discuss caveats of cortical cultures at DIV 14-20.

      It is essential to ensure that the data presented in the paper adequately supports the conclusions drawn. A more cautious approach in interpreting the results may lead to a stronger argument and a more robust understanding of the underlying mechanisms at play.

      Agreed.

      Reviewer #2 (Public Review):

      Synaptic scaling has long been proposed as a homeostatic mechanism for the regulation for the activity of individual neurons and networks. The question of whether homeostasis is controlled by neuronal spiking or by the activation of specific receptor populations in individual synapses has remained open. In a previous work, the Wenner group had shown that upscaling of glutamatergic transmission is triggered by direct blockade of glutamate receptors rather than by the concomitant reduction in firing rate (Nat Comm 2015). In this manuscript they investigate the mechanisms regulating scaling of GABA-mediated responses in cortical cell cultures using whole-cell recordings to detect GABAergic currents and multielectrode arrays to monitor global firing activity, and find that spiking plays a fundamental role in scaling.

      Initially, the authors show that chronic blockade (24 h) of glutamatergic transmission by CNQX first reduces spontaneous spiking (at 2 h), but later (24 h) firing grows back towards higher frequencies, suggesting a compensatory mechanism. Then it is shown that either chronic CNQX treatment or TTX cause a reduction in the amplitude of GABAergic mIPSCs. Effects of CNQX on IPSCs are then reverted by replacing spontaneous network firing by chronic optogenetic stimulation of the entire culture, also indicating that GABAergic transmission is homeostatically regulated by global firing. Enhancing glutamatergic transmission with CTZ increases mIPSC amplitude, while addition of TTX in the presence of CTZ causes the opposite effect. Finally, increasing spiking activity using bicuculline also increases mIPSC amplitude, and the authors conclude that spiking activity rather than neurotransmission control homeostatic GABA scaling. The manuscript shows interesting properties in the regulation of global GABAergic transmission and highlight the important role of spiking activity in triggering GABA scaling. However, it is strongly recommended to address some caveats in order to better support the conclusions presented in the manuscript.

      Major points:

      1) The reason why CNQX does not completely eliminate spiking is unclear (Fig. 1). What is the circuit mechanism by which spiking continues, although at lower frequency, in the absence of AMPA-mediated transmission and what the mechanism by which spiking frequency grows back after 24h (still in the absence of AMPA transmission)?

      Is it possible that NMDA-mediated transmission takes over and triggers a different type of network plasticity?

      The bursting in AMPAR blockade is due to the remaining NMDA receptor mediated transmission. We showed this in our previous study in Suppl. Figure 2 and 6 of Fong et al., 2015 Nat. Comm.. Our ability to optically induce normal looking bursts of spikes was also dependent NMDAR activation. Further, in Dr Fong’s PhD dissertation it was shown that the bursting activity was abolished when AMPA and NMDA receptors were both blocked. There are likely many factors that contribute to the recovery of activity, and certainly one of them is likely to be the weakening of inhibitory GABAergic currents. These points will be discussed in the resubmission.

      2) A possible activation of NMDARs should be considered. One would think that experiments involving chronic glutamatergic blockade could have been conducted in the presence of NMDAR blockers. Why this was not the case?

      Unfortunately, it was not possible to optogenetically restore normal bursting in the presence of NMDAR blockade (even when AMPAergic transmission was intact), as NMDARs appeared to be critical for the optical restoration of the normal duration of the burst (see Suppl. Figure 6 Fong et al., 2015 Nat. Comm). The reviewer raises an excellent point about a possible NMDAR contribution to altered synaptic strength, however. It is likely that NMDAR signaling is reduced in the presence of CNQX since burst frequency was reduced along with AMPAR-mediated depolarizations. We cannot rule out the possibility that NMDAR signaling could contribute to the alterations in GABAergic mIPSCs and will discuss this in the resubmission. However, previous work suggests that 24/48 hour block NMDARs (APV) did not trigger AMPAergic scaling in cortical or hippocampal cultures (see Figure 1 Turrigiano et al., 1998 Nature and Suppl. Figure 4 Sutton et al., 2006 Cell), moreover, our previous study showed that restoring NMDAergic transmission optogentically, at least to some point, had no influence on AMPAergic scaling (Fong et al., 2015, Nat. Comm.). Regardless, we cannot rule out a role for NMDAergic transmission in GABAergic scaling and this discussion will be included in the resubmission.

      Also, experiments with global ChR2 stimulation with coincident pre and postsynaptic firing might also activate NMDARs and result in additional effects that should be taken into consideration for the global scaling mechanism.

      To be clear, our optical stimulation was turned off before the vast majority of spiking that occurred in the bursts, which played out in a relatively natural manner (see lower panel of Figure 3B optogenetic stimulation – short duration only at onset of burst – we will make this clearer in resubmission). Therefore, we were unlikely to trigger significant synchronous activation that does not normally occur in network bursts.

      3) Cultures exposed to CTZ to enhance AMPA receptors generated variable results (Fig. 5), somewhat increasing spiking activity in a non-significant manner but, at the same time, strengthening mIPSC amplitude. This result seems to suggest that spiking might be involved in GABAergic scaling, but it does not seem to prove it.Then, addition of TTX that blocked spiking reduced mIPSC amplitude. It was concluded here that the ability of CTZ to enhance GABAergic currents was primarily due to spiking, rather than the increase in AMPA-mediated currents. However, in addition to blocking action potentials, TTX would also prevent activation of AMPARs in the presence of CTZ due to the lack of glutamatergic release. Therefore, under these conditions, an effect of glutamatergic activation on GABAergic scaling cannot be ruled out.

      These concerns were very similar to reviewer 1’s first comments. We will address these issues in the resubmission, but to briefly repeat our responses: We are going a step beyond most scaling studies by assessing MEA-wide firing rate, but this still provides an incomplete picture of the particular cells that we target for patch recordings in terms of their firing before and after a drug. Further, we see considerable variability in effect on firing rate from culture to culture, which we will better recognize in the resubmission. Finally, While the CTZ results are not conclusive, taken together with the optogenetic results we think our results are most consistent with idea that GABAergic scaling is a strong candidate as a spike rate homeostat.

      4) The sample size is not mentioned in any figure. How many cells/culture dishes were used in each condition?

      The individual dots represent either individual cells for mIPSC amplitude or individual cultures in MEA experiments. Number of cultures for figures were: Figure 2 – con = 10, TTX = 3, CNQX = 6, Figure 4 – CNQX = 4, con = 10, CNQX/photostim = 6, Figure 5 – ethanol = 3, CTZ = 3, CTZ + TTX =3, Figure 6 – con = 10, bicuculline = 4. We will include the number of cultures for mIPSC amplitude experiments in the figure legends upon resubmission.

      5) Cortical cultures may typically contain about 5-10% GABAergic interneurons and 90-95 % pyramidal cells. One would think that scaling mechanisms occurring in pyramidal cells and interneurons could be distinct, with different impact on the network. Although for whole-cell recordings the authors selected pyramidal looking cells, which might bias recordings towards excitatory neurons, naked eye selection of recording cells is quite difficult in primary cultures. Some of the variability in mIPSC amplitude values (Fig. 2A for example) might be attributed to the cell type? One could use cultures where interneurons are fluorescently labeled to obtain an accurate representation. The issue of the possible differential effects of scaling in pyramidal cells vs. interneurons and the consequences in the network should be discussed.

      We will include this discussion in the resubmission. Briefly, we chose large cells, which will be predominantly glutamatergic neurons as suggested by the reviewer. Ultimately, even among glutamatergic principal cells there may be variability in the response to drug application. All of these issues could contribute to variability and we will expand our description of the variability in our results, including that based on cellular heterogeneity.

      Reviewer #3 (Public Review):

      This paper concerns whether scaling (or homeostatic synaptic plasticity; HSP) occurs similarly at GABA and Glu synapses and comes to the surprising conclusion that these are regulated separately. This is surprising because these were thought to be co-regulated during HSP and in fact, the major mechanisms thought to underlie downscaling (TTX or CNQX driven), retinoic acid and TNF, have been shown to regulate both GABARs and AMPARs directly. (As a side note, it is unclear that the manipulations used in Josesph and Turrigiano represent HSP, and so might not be relevant). Thus the main result, that GABA HSP is dissociable from Glu HSP, is novel and exciting. This suggests either different mechanisms underlie the two processes, or that under certain conditions, another mechanism is engaged that scales one type of synapse and not the other.

      However, strong claims require strong evidence, and the results presented here only address GABA HSP, relying on previous work from this lab on Glu HSP (Fong, et al., 2015). But the previous experiments were done in rat cultures, while these experiments are done in mice and at somewhat different ages (DIV). Even identical culture systems can drift over time (possibly due to changes in the components of B27 or other media and supplements). Therefore it is necessary to demonstrate in the same system the dissociation. To be convincing, they need to show the mEPSCs for Fig 4, clearly showing the dissociation. Doing the same for Fig 5 would be great, but I think Fig 4 is the key.

      We understand the concern of the reviewer as we do see significant variability within our cultures and they were plated in different places, by different people, in different species (rat vs mouse). Therefore, in the resubmission to strengthen the conclusions we will repeat our optogenetic studies restoring activity in the presence of AMPAergic blockade in our mouse cortical cultures and measuring AMPA mEPSCs to assess scaling.

      The paper also suggests that only receptor function or spiking could control HSP, and therefore if it is not receptor function then it must be spiking. This seems like a false dichotomy; there are of course other options. Details in the data may suggest that spiking is not the (or the only) homeostat, as TTX and CNQX causes identical changes in mIPSC amplitude but have different effects on spiking. Further, in Fig 5, CTZ had a minimal effect on spiking but a large effect on mIPSCs. Similar issues appear in Fig 6, where the induction of increased spiking is highly variable, with many cells showing control levels or lower spiking rates. Yet the synaptic changes are robust, across all cells. Overall, this is not persuasive that spiking is necessarily the homeostat for GABA synapses.

      Together our results argue against AMPAR or GABAR activation as a trigger for GABAergic scaling and that this is different than our results for AMPAergic scaling. These points alone are important to recognize. While changes in spiking do not perfectly follow the changes in GABAergic scaling they do always trend in the right direction. As mentioned above, total spiking activity is only one measure of spiking. It is possible that these drugs alter the pattern of spiking that translates into an altered calcium transient that is important for triggering the plasticity. Again, it is important to note that we are going a step beyond most homeostatic plasticity studies that add a drug and simply assume it is having an effect on spiking (e.g. CNQX was initially thought to completely abolish spiking, but clearly does not). Based on the variability that we observe and the nature of our MEA recordings we cannot precisely determine how the total activity or pattern of activity changes with drug application in the specific cells that we target for whole cell recordings. However, we believe our results are more consistent with our proposal that GABAergic scaling is a strong candidate as a spike rate homeostat. Regardless, in the resubmission we will include a broader discussion about these possibilities, and the reality that there could be multiple homeostatic mechanisms that act to recover spiking activity.

      The paper also suggests that the timing of the GABA changes coincides with the spiking changes, but while they have the time course of the spiking changes and recovery, they only have the 24h time point for synaptic changes. It is impossible to conclude how the time courses align without more data.

      We can only say that by the 24 hour CNQX time point, when overall spiking is recovered, that GABAergic scaling has already occurred. We will state this more clearly in the resubmission.

    1. Author Response:

      We are grateful to the editors for getting our study reviewed, and are pleased that the reviewers found value in our findings. We plan to submit a revision that we believe can resolve much of the remaining doubt about the major conclusions.

      Our current understanding is that much of the uncertainty stems from extensive diversity among synapses. The FM-dye de-staining technique does have single synapse resolution, so it should be possible to develop new kinds of analysis that can make each of our points at the level of individual synapses. For a preview, see Figure 2D (explained in lines 126-141), and Figure 2-Figure supplement 5 of the current version.

    1. Author Response

      The following is the authors’ response to the original reviews.

      We thank the reviewers for their time in evaluating the strengths and weaknesses of our manuscript.

      We are pleased to see that all reviewers recognized the high significance of our work, noting that the manuscript addresses “longstanding question of which cell types are infected during congenital or perinatal rubella virus infection”. As noted by reviewer 1, “This study reveals a new cellular target that will have important implications for basic studies on rubella virus-host interactions and for the potential development of therapies or improved vaccines targeting this virus. As the rubella virus is a pathogen of high concern during human pregnancy, this study also has important implications in the field of neonatal infectious diseases”.

      Below, we provide responses (in blue) to specific critiques:

      Reviewer #1 (Public Review):

      A weakness is that the current data do not provide information on the full replicative potential of the rubella virus in microglia, or whether the virus persists in this system.

      See our response below. Briefly, we include new experimental evidence from primary tissue, microglia-transplanted organoids, and Vero cells to further characterize the dynamics of viral infection.

      Reviewer #1 (Recommendations for the authors):

      Most of the viral assays in the brain slices and organoids examine viral protein synthesis, which is a surrogate for genome replication. However, basic virological characterization is lacking and would improve the robustness of the model and its potential utility to understand better rubella virus-microglia interactions. Questions the authors should consider with new experiments include:

      Are new virions produced? Can viruses be detected in the media?

      Or, are the infections abortive, with viral protein synthesis occurring, but no virus production?

      We performed RV titering experiments in dissociated microglia co-cultured with other cell types, as well as Vero cells as a control. While we can detect a robust increase in viral titer from Vero cells, it fell below detection levels in microglia co-cultures. See Author response image 1. We now include these data in Supplementary Figure 2D.

      Author response image 1.

      Rubella virus titering experiment performed in Vero cells (positive control) or dissociated microglia co-cultures. In primary microglia co- cultures, viral titer falls below detection levels after several days of infection.

      While we could not detect an increase in the viral particles from microglia mixed cultures, we confirmed the presence of GFP from the RV-GFP reporter construct, and we believe it serves as a proof that the virus can infect microglia cells and lead to production of functional viral protein (Author response image 2, Figure 1E-F of the current manuscript):

      Author response image 2.

      We also observed an increase in RV RNA over time in tissue slice infections, using qPCR (Author response image 3, not included in the manuscript).

      Author response image 3.

      Modest increase in RV RNA over time in brain slice infections. Rubella virus RNA measured by qPCR relative to GAPDH gene, in n=3 samples (2 technical replicates each condition). Brain slices were exposed to RV, then collected at end of inoculation (4 hours post infection), or at 3 or 5 days post infection, and processed for RNA extraction and RT-qPCR.

      How long do the infections persist in the model? What is the fate of infected microglia over time? Time courses to monitor infection and cell health would be useful.

      We performed a longer infection with RV in organoids transplanted with microglia, and after two weeks of infection, we can detect multiple microglia cells positive for the RV capsid. These data are now included in Figure 4 of the current manuscript.

      Author response image 4.

      After 2 weeks post infection, microglia remain positive for RV capsid.

      Reviewer #2 (Public Review):

      Weaknesses

      The set of data is rather descriptive. It suggests that microglia are the predominant brain target of RV in vivo, without identifying the targeting mechanism that provides cell type specificity. Moreover, what are the diffusible cues released from the brain environment that increase microglia infection and RV replication?

      We agree with the reviewer that identifying molecular mechanisms that underlie this phenotype will be very interesting to explore in future research, and we acknowledge the limitation of the study in the Discussion.

      It is unclear why brain organoids not supplemented by microglia are susceptible to RV inoculation.

      We could not detect RV capsid in organoids without microglia after 72 hours of inoculation. We attribute any changes seen at the level of single cell transcriptomics in the absence of microglia transplantation to exposure to virus-associated particles, including but not limited to viral RNA species, viral proteins, or even other components of the viral stocks made in Vero cells. These factors may induce transcriptomic differences even in the absence of RV infection. In the text, we take care to refer to these condition as “Rubella virus-exposed” rather than “Rubella virus- infected”. We now include the following panel from Author response image 5 in Figure 4B of the current manuscript.

      Author response image 5.

      Organoids without microglia do not show positive RV immunofluorescence.

      Reviewer #2 (Recommendations for the authors):

      Several points could be further addressed to improve the data set and shed more light on some aspects of this manuscript:

      • Figure 1. Additional microglia markers should be used to reinforce the evidence that microglia cells are the principal RV targets. Since Iba1 is a marker of activated microglia, does RV have a selective tropism to all microglia or only to activated ones in human fetal brain slices?

      The reviewer brings up an interesting point that, in our mind, can be separated into two independent questions:

      1. Are Iba1-positive cells bona fide microglia, or are there other cell populations of macrophage/monocyte origin that are labeled with Iba1? Therefore, additional markers should be used for immunolabeling;

      2. Is RV infection selective for microglia “activation” status, when only 5mmune-primed cells can be infected?

      For the first point, we have previously shown that in the developing human brain, virtually all Iba1-positive cells are also P2RY12-positive (unpublished; Author response image 6). Therefore, in primary human brain slices, there is a negligible amount of non-microglia macrophages. However, in culture microglia quickly lose their “homeostatic” identity, including P2RY12 expression, as quickly as six hours after ex vivo extraction (Gosselin et al., 2017; DOI: 10.1126/science.aal3222).

      Author response image 6.

      P2RY12 co-localizes with Iba1 in primary brain tissue from gestational week 17.5, including cells with more ameboid morphology (arrows)

      However, in organoids at 2 weeks post-RV exposure, we found microglia with both ameboid and more ramified morphology (Author response image 7). It would be challenging and beyond the scope of this manuscript to use morphology or Iba1 intensity levels to determine cause and effect as microglia activation state relates to RV infectivity (i.e. do activated microglia preferentially get infected with the virus, or do infected microglia become activated and upregulate Iba1 levels and change morphology).

      Author response image 7.

      Examples of microglia with round (top) and ramified (bottom) morphology that co-localize with RV capsid staining.

      Regarding RV tropism in the 2D culture of microglia, some Iba- cells are infected by RV as they show capsid staining. What are these cells? Are neurons and/or glia also susceptible to RV in vitro infection? Are non-microglial cells getting RV infected in the absence of microglia?

      In the absence of microglia cells, a small proportion of non-microglia cells get infected with RV. There is no statistically significant difference in the number of cells that get infected with RV in the presence or absence of microglia across different cell types. We add these data as Supplement Figure 3.

      Author response image 8.

      Rubella infection in non-microglia cells. A. Representative images of different cell types depleted of microglia. Cell cultures were stained RV capsid (green) and DAPI. B. Quantification of total cells that are positive for RV capsid across conditions. C. Quantification of RV+ cells that are not microglia across different cell populations. No statistically significant difference was detected in RV infectivity in cells c-cultured with or without microglia.

      • Figure 3. The low rate of Rubella virus infection in homogenous CD11b+ cell culture raises the question of whether the Rubella virus can infect microglia at a specific activation stage. It is also surprising that there is no infection of such cell population (also CD11b+) alone while cultured in 2D, as reported in figure 2. Why such a difference?

      It is well established that culture of microglial cells isolated from brain tissue alters their molecular properties, which likely alters the cell surface protein composition. In the revised discussion, we include activation as a possible mechanism that will require further investigation.

      • Fig 4A-B, it is unclear whether organoids that are not engrafted with microglia get infected upon RV (with active viral replication) inoculation. If non-microglia-supplemented organoids are indeed infected and allow RV replication, this suggests that organoids might not be the ideal system to model human fetal brain RV infection at GW18-23.

      We could not detect RV capsid in organoids without microglia after 72 hours of inoculation. We include the following panel from Author respone image 9 in Figure 4 now.

      Author response image 9.

      Organoids without microglia do not show positive RV immunofluorescence.

      • Figure 4E, why are cells derived from microglia-free organoids so much enriched in the UMAP plots as compared to the other organoid condition? Is RV impacting cell fitness, proliferation, or neurodifferentiation?

      This perceived difference is due to data presentation. Based on cell proportions, cells from organoids that were treated with microglia are more represented in the scRNAseq data, and this difference most likely comes from user-introduced imbalance in cell loading and possible cell losses during demultiplexing (Author response image 10, panel A). Cell number composition across different conditions and cell types, including RV and MG treatment, are shown in Supplement Figure 4 of the current manuscript (Author response image 10, panel B).

      Contribution of each condition can be visualized via UCSC single cell data browser: https://cells.ucsc.edu/?ds=rubella-organoids

      Author response image 10.

      Data composition depending on condition. A. Cell number contribution from organoids with and without microglia. B. Contribution of each condition to each cluster composition.

      • Figure 4F-H. If microglia is the predominant target for RV in the brain, why are microglia-free organoids susceptible to RV and who are the other cellular targets, whose infection leads to activation of interleukin pathway genes and dysregulation of brain developmental markers in selected subpopulations (RGCs, ENs..).

      Thank you for bringing this point. We did not detect any appreciable RV genomic RNA in our published single cell data, nor did we identify RV capsid in the RV-exposed organoids without microglia. Our experiments on dissociated cell cultures show that a small population (~1-4%) of other cell types was positive for the RV capsid, including neuron-enriched and glial-enriched fractions (Author response image 11; Supplementary Figure 3C in current manuscript). We expect a similar proportion of non-microglia cells to be infected in the brain organoids. One possible explanation for the robust interferon response even in the absence of productive infection in other cell types is exposure to virions and virus-associated particles, including but not limited to viral RNA species, viral proteins, or even other components of the viral stocks made in Vero cells (which is a cell line that should not produce interferons, but may produce other unmeasured cytokines as a virally infected cell culture).

      Author response image 11.

      Quantification of RV+ cells that are not microglia across different cell populations. No statistically significant difference was detected in RV infectivity in cells cultured with or without microglia.

      • QRT-PCR validations of some of these key brain targets should be performed.

      We agree with the reviewer that further validation of the predicted molecular changes downstream of Rubella exposure would be valuable. We have opted to validate IFITM3 and NOVA1 expression differences using immunostaining, and the results are consistent with our predictions from scRNAseq, and the data is presented in revised Figure 5 and 6 of the current manuscript.

      Reviewer #3 (Public Review):

      Weaknesses of the paper: Overall, additional control experiments are needed to support the stated conclusions. Affinity chromatography is used to purify microglia and other cell types, but the overall cell enrichment is not quantified.

      We appreciate the reviewer concern. However, affinity based enrichments rarely guarantee purity of the enrichment, and we do not believe accurate estimation of the purification purity would alter the biological interpretation of the data.

      In cell mixing experiments, the authors do not rule out the possibility that the added non- microglia cells also become infected, releasing additional infectious viruses. The finding that a diffusible factor is required for RV infection would be unusual if not unprecedented; therefore, additional data are required to support this claim and rule out other interpretations.

      We provide quantification of non-microglia cells that are positive for RV capsid in the presence and absence of microglia. Small (~1-4%) of non-microglia cells get infected with the virus and can potentially release more of the virus (see Author response image 12), but we do not know how this newly produced virus would be different from the one that was applied to the cells directly. To follow up our co-culture experiments, we wanted to exclude a possibility of microglia engulfing RV- infected cells in co-cultures, therefore we separated the two cell fractions by a liquid-permeable membrane (Figure 3 of the current manuscript). It is possible that factors secreted by other cell populations in the transwell assay experiments act on microglia cells to upregulate a yet unidentified receptor on microglia surface or other infection-dependent molecule rendering them infectable by the virus.

      We re-phrase the text by de-emphasizing “soluble factors” and focusing on excluding phagocytosis of infected cells as a possible mechanism of RV capsid immunoreactivity in microglia cells.

      Author response image 12.

      Rubella infection in non-microglia cells. A. Representative images of different cell types depleted of microglia. Cell cultures were stained RV capsid (green) and DAPI. B. Quantification of total cells that are positive for RV capsid across conditions. C. Quantification of RV+ cells that are not microglia across different cell populations. No statistically significant difference was detected in RV infectivity in cells c-cultured with or without microglia.

      The methods section would be improved by including details about the iPSC line that was used.

      We include the following section in Materials and Methods:

      iPSC lines.

      All work related to human iPS cells has been approved by the UCSF Committee on Human Research and the UCSF GESCR (Gamete, Embryo, and Stem Cell Research) Committee. Human iPS cell line “WTC-10” derived from healthy 30-year-old Japanese male fibroblasts was from the Conklin Lab, UCSF (Bershteyn et al., 2017; Kreitzer et al., 2013). Human iPSC line “13325” was derived from 9-year-old female fibroblasts originally obtained from Coriell cell repository. Human iPSC line “1323-4” derived from healthy 48-year-old Caucasian female fibroblasts (gift from the Conklin Lab, UCSF) was used for immunofluorescence validation analysis as we found that this line generates more reproducible brain organoid differentiations.

      and by a more thorough description of virus-specific details, including the numbers of infectious particles added per volume of incubation media.

      We now include the following data in the Materials and Methods section:

      Rubella virus infection

      Cells cultured in 2D were inoculated by adding RV stock virus to culture media in 1:1 dilution (250 ul of media to the equal volume of viral stock, 1.75x105 total ffu/well) to achieve a multiplicity of infection (MOI) of 2. After four hours, media was exchanged with fresh cell culture media. Cortical brain slices were treated with 500 ul of RV viral stock (3.5x105 total ffu/slice) applied over the slice culture filter for four hours, and then the viral culture media was removed and replaced with fresh slice culture media. Organoids were treated in 6-well plates with 2ml of 1:1 dilution of viral stock:organoid maintenance media (7x105 total ffu) for four hours, and then viral media was exchanged for fresh media. For all experimental conditions, cells were fixed and processed for downstream analysis at 72 hours post infection. Supernatant from non-infected Vero cells (mock) or heat-inactivated RV (650C, 30 mins) was used as control.

      In addition to immunofluorescence, adding additional data to demonstrate and quantify virus infection (PCR and plaque assays. or immunofluorescence using an anti-double-stranded RNA antibody such as J2) from the infected brain slices and organoids would provide greater assurance that the virus is indeed replicating under the experimental conditions.

      We performed RV titering experiment in dissociated microglia co-cultured with other cell types, as well as Vero cells control. While we can detect a robust increase in viral titer from Vero cells, it fell below detection levels in microglia co-cultures. We now include these data in Supplementary Figure 2D.

      Author response image 13.

      Rubella virus titering experiment performed in Vero cells (positive control) or dissociated microglia co-cultures. In primary microglia co- cultures, viral titer falls below detection levels after several days of infection.

      Unfortunately, we did not find J2 staining informative because we could detect signal in both wild type RV infection conditions and in heat-inactivated RV, presumably due to native dsRNA species present in cells. We did not detect any increase or difference in the pattern of staining between RV and heat-inactivated virus-exposed conditions (Author response image 14; not included in the manuscript).

      Author response image 14.

      J2 antibody labels dsRNA in both RV-exposed and control heat- inactivated virus conditions, presumably due to native dsRNA that is not unique to the viral replication.

      Organoid imaging with immunofluorescence would be very informative in demonstrating the presence of microglia and also in showing which cells are virus-infected in the context of organoid structures.

      We provide images from 72hrs and 2 week RV infection, providing a zoomed-out view of organoids with microglia and RV capsid staining. We also provide images of 72hrs post- infection in organoids without microglia Author response image 15, Figure 4C in current manuscript).

      Author response image 15.

      Microglia in organoids co-localize with RV capsid staining.

      GenBank accession numbers are listed for the recombinant RV and GFP-RV reporter, but a search using those numbers did not locate the deposits--perhaps the deposits were very recent?

      Both viral construct information is now available on GenBank:

      M33 RV strain can be found here: https://www.ncbi.nlm.nih.gov/nuccore/OM816674

      RV-GFP can be found here: https://www.ncbi.nlm.nih.gov/nuccore/OM816675

      The authors incorrectly refer to the GFP virus as a new strain; it is not a viral strain and should be referred to as a reporter virus.

      Thank you, we changed the description to

      “To confirm functional transcription and translation of the viral genome, a new reporter construct of RV designed to express GFP within the non-structural P150 gene was generated (RV-GFP, GenBank Accession OM816675)”

      Given that the authors show that Vero cell cultures are infected by the Rubella virus in the absence of other cells, additional evidence is needed to demonstrate that a diffusible factor from other cells enables microglia to be infected by the Rubella virus.

      We have revised the manuscript to indicate that our data is consistent with the possibility that a diffusible factor is involved. Our experiment utilizing transwell assay argues against phagocytosis and physical interactions as primary drivers, but future studies will be needed to determine if soluble factors are involved.

      The authors did not detect Rubella virus transcripts in the single-cell RNA sequencing experiment, nor was a microglia cluster found.

      Indeed, microglia recovery using scRNAseq is very inefficient. We note this limitation in the discussion.

      Innate immune responses can be activated in the presence of viral particles but without virus replication, as in inactivated viral vaccines; therefore changes in interferon responses do not necessarily prove virus replication.

      We agree with the reviewer on this point, it is difficult, if at all possible, to entirely eliminate the possibility that some of the transcriptomic changes, particularly the interferon responses, are not induced by the exposure to viral particles. We have revised the manuscript to more rigorously described the conditions as “RV-exposed”.

      Figure 4: it would be helpful to define the abbreviations used in the figure legend (e.g. IPC, RG, EN). In the volcano plots, the gene names are blocked by the dots, and the figure becomes very pixelated when enlarged to read the text.

      We have added abbreviations and replaced the figure files with higher resolution images (Figure 6 in current manuscript).

      The value of including Supplemental Figure 2 (MOG) is not clear because it receives little mention in the text and also seems to be previously published data that could be cited.

      We have removed the figure and replaced it with a citation and a link to the Cell Browser.

      Supplemental Figure 4: In panel G, the legend shows "YH10" and "13325". These terms are not described in the Figure legend, nor did a search of the manuscript identify these terms. In its current form Supp. Fig. 4G is not interpretable. In addition, would be more clear to use the term "RV-infected" instead of "treated" to describe the addition of the virus.

      We have expanded the Methods section to include the description of different organoid lines and added a revised legend for Supplementary Figure 4. We do not provide evidence of RV infecting organoids without microglia, therefore we have revised the claims that organoid cells become infected with the virus and replaced it with “RV-exposed” to better reflect the conditions studied.

      Reviewer #3 (Recommendations for the authors):

      1) Demonstrate and quantify virus replication to provide data to complement the imaging. In order of data quality, plaque assays would be most convincing in demonstrating infection and release of infectious virus, while a time course of PCR on RV transcripts would support a conclusion of replicating virus. Further, staining with an anti-double-stranded RNA antibody (J2) would represent evidence of virus replication.

      In response to the reviewer’s comment, we performed an RV titering experiment in dissociated microglia co-cultured with other cell types, as well as Vero cells control. While we can detect a robust increase in viral titer from Vero cells, it fell below detection levels in microglia co-cultures. We now include these data in Supplementary Figure 2D.

      Author response image 16.

      Rubella virus titering experiment performed in Vero cells (positive control) or dissociated microglia co-cultures. In primary microglia co- cultures, viral titer falls below detection levels after several days of infection.

      We detected a very modest increase in RV RNA in infected brain slices over time using RT- qPCR (see Author response image 17, not included in current manuscript)

      Author response image 17.

      Modest increase in RV RNA over time in brain slice infections. Rubella virus RNA measured by qPCR relative to GAPDH gene, in n=3 samples (2 technical replicates each condition). Brain slices were exposed to RV, then collected at end of inoculation (4 hours post infection), or at 3 or 5 days post infection, and processed for RNA extraction and RT-qPCR.

      Unfortunately, we did not find J2 staining informative because we could detect signal in both wild type RV infection conditions and in heat-inactivated RV, presumably due to native dsRNA species present in cells. We did not detect any increase of difference in the pattern of staining between RV and heat-inactivated virus-exposed conditions (Author response image 18; not included in the manuscript).

      Author response image 18.

      J2 antibody labels dsRNA in both RV-exposed and control heat- inactivated virus conditions, presumably due to native dsRNA that is not unique to the viral replication.

      We utilized FISH to detect negative-stranded (non-genomic) RV RNA as an alternative to J2 to indicate RNA replication. However, it proved to be not very sensitive, as a small quantity of negative-strand RV RNA could be detected in highly infected Vero cells, but negative-strand RV RNA was not detected in more modestly infected microglia (based on positive-strand RV RNA quantification), as in Author response image 19, not included in current manuscript.

      Author response image 19.

      FISH probes to positive strand (genomic) and negative strand (replication template) RV RNA in Vero cells and microglia co-cultures. A: representative images of Vero cells infected with RV (top row) or Zika virus as control (bottom row). At 72hpi, cells were fixed and processed for immunofluorescence with anti-RV capsid antibody (RVcap) or Zika virus antibody (Zika4G2), and then FISH was performed using probes to positive strand (+) or negative strand (-) RV RNA. Negative strand RV RNA difficult to visualize at low-power magnification, and required quantification within cell borders defined by wheat germ agglutinin staining with results in panel B. B: In Vero cells, negative strand RV RNA is detected in strongly infected cells. Infection strength determined by intensity of RV capsid immunofluorescence staining and positive strand RV RNA (RVcap/(+) 2/3 indicates robust infection, RVcap/(+) 1 indicates weak infection). ZIKVinf = Zika virus infected control. C: In microglia co-cultures, positive strand RV RNA detected in cells with RV capsid immunopositivity (RVcap_pos). RVinf = RV infected. RVHI = heat-inactivated RV. D: In microglia co-cultures, negative strand RV RNA quantification not significantly different between mock, heat-inactivated RV (RVHI), or RV- infected conditions (RVinf), including cells with weak positive-strand RV RNA (RVinf, (+)<8) or cells with stronger positive-strand RV RNA ((RVinf, (+)>=8). Two biological replicates (bHR60 and bHR61), n indicates number of cells counted.

      While we could not detect an increase in the viral particles from microglia mixed cultures, we confirmed the presence of GFP from the RV-GFP reporter construct, and we believe it serves as a proof that the virus can infect microglia cells and lead to production of functional viral protein (see Author response image 20, Figure 1E-F of the current manuscript)

      Author response image 20.

      Thus, overall we detect replication of viral RNA and protein (qPCR, RV-GFP), but not an appreciable increase in released newly-made virions. The discussion now reflects this more clearly in the current manuscript.

      2) The claim of requiring a diffusible factor to enable RV infection requires additional data. A suggestion would be to include further characterization of affinity-purified cells to define the levels of cell enrichment and to determine which other cell types are present, It is also important to test the RV infection of the fractionated cell types alone before adding to the microglia, in order to demonstrate whether RV is replicating in cell types other than microglia.

      We performed quantifications of RV capsid-positive cells in each of the affinity-purified cell populations: neuron-enriched (purified with PSA-NCAM beads), glia-enriched (PSA-NCAM depleted cell fraction), or non-microglia fraction (“Flow through”, depleted of CD11b+ cells). We show that across each condition, we have low infectivity (ranging from ~1 to 4% of total cell population) after 72 hours post-infection. We include these data in Supplementary Figure 3.

      Author response image 21.

      Rubella infection in non-microglia cells. A. Representative images of different cell types depleted of microglia. Cell cultures were stained RV capsid (green) and DAPI. B. Quantification of total cells that are positive for RV capsid across conditions. C. Quantification of RV+ cells that are not microglia across different cell populations. No statistically significant difference was detected in RV infectivity in cells c-cultured with or without microglia.

      Another approach to limit cell heterogeneity would be to use iPSC-derived cells, which are highly enriched as a single cell type as a specific cell type, to test the requirement for additional cell types to achieve RV infection of microglia.

      In our prior publication (Popova et al. 2021) we have identified a number of molecular differences between primary and iPSC derived microglia. iPSC derived microglia like cells could show differences in infection tropism from primary microglia, and those results may be difficult to interpret biologically. We agree with the reviewer that iPSC derived cells would be an interesting model, there are now several distinct protocols for deriving microglia like cells from pluripotent stem cells and we feel that embarking on a protocol comparison project would fall outside the scope of the current manuscript.

      3) Consider a longer organoid infection. The authors did not identify viral RNA transcripts in their organoid scRNAseq data after a 72-hour infection. Although the 72-hour time point seems right for cells in 2D culture, it’s possible that the infection in the organoids is slower because the virus has to spread inwardly. It would be worth trying a time course out to 2 weeks, collecting organoids every few days and then imaging and doing pcr or plaque assays. Zoomed-out views that show immunofluorescence of the entire organoid would also be beneficial in assessing organoid quality and immunofluorescent staining to identify cell types,

      We performed longer RV infection for two weeks and now present data on RV capsid in microglia in 72 hrs and 2 weeks post-infection (Author response image 22, Figure 4C of the current manuscript). We have also validated one of the scRNAseq-generated gene candidates in combination with different cell type markers and present data on whole organoids immunostained with NeuN for neurons and EOMES for intermediate progenitor cells that demonstrate the overall structure of the organoids (Author response image 23; Figure 6 of the current manuscript).

      Author response image 22.

      Microglia in organoids co-localize with RV capsid staining. Organoid with microglia were exposed to RV for 72 hrs or two weeks.

      Author response image 23.

      Organoids labeled with splice regulator NOVA1 (magenta), neuronal marker NeuN (green) and intermediate progenitor cell marker EOMES (cyan).

    1. Author Response

      Reviewer #1 (Public Review):

      While the CTD human brain organoids show a decrease in Cr (in absence of Cr in the culture medium) as compared to control organoids (4 times less), they are not devoid of Cr. Do these organoids express the two enzymes allowing Cr synthesis (AGAT and GAMT), and in which brain cell types? If yes, how to explain the decrease in Cr in the CTD organoids?

      There is a lack of functional CRT in the CTD human brain organoids. The basal level of creatine in CTD human brain organoid is significantly lower than in healthy human brain organoids. The intracerebral creatine synthesis is due to different expression of the AGAT and GAMT enzymes and relies on functional CRT for the transport of the GAA intermediate Litterature pointed out that both enzymes are rarely co-expressed (Braissant et al., 2001, PMID: 11165387) meaning that GAA intermediate needs to be transported by CRT to neurones for complete creatine synthesis. Even if we evidenced a slight mRNA expression of AGAT and GAMT enzymes, the creatine synthesis is not effective since the GAA intermediate could not be transporterd in cell expressing GAMT due to the non functional creatine transporter in the CTD human brain organoids.

      The rescue experiment, re-establishing a functional Cr transporter (CRT or SLC6A8) in the CTD human brain organoids, is very interesting, as this may help the design and development of new treatments for CTD. However, authors claim that the functional CRT expressed in the rescued CTD organoids was expressed in each cell. This may be a difficulty in the development of new CTD treatments, as CRT should be expressed in neurons and oligodendrocytes, but not in astrocytes. Authors may want to comment on this point.

      As shown in Figure S2C, the whole brain organoid in the resue experiment shows the expression of the GFP protein, thus also the co-expressed wild-type CRT. In these experiments we did not make a detailed cellular characterization of the rescued organoids, and this may be a task in our next experiments for an exact characterization of the cell-specific CRT expresion and function in the rescued brain organoids. According to this, we will correct in the revision version of manuscript the statement on page 6: “SLC6A8 expressing brain organoids showed GFP fluorescence in the whole area of the organoid (Fig S2C).”

    1. Author Response

      Reviewer #2 (Public Review):

      The current work was basically a follow-up of a previous study in juvenile mice, and the results were also very similar to the juvenile results (Sommeijer et al., 2017). One possible interpretation of the results is that the lack of OD plasticity in adult V1 and dLGN was caused by an early blockade of the development of the inhibitory circuit in dLGN, which retains the dLGN in an immature stage till adulthood. The authors indeed claimed in the discussion that the 2-day OD shift is intact in juvenile dLGN and V1 in KO mice, and provided evidence in supplementary figure that GABAergic and cholinergic synapse amount are similar between WT and KO mice. However, the 7-day OD shift is indeed defected in juvenile V1 and dLGN in KO mice (Sommeijer et al., 2017), and it is possible that this early functional deficit didn't induce a structural remodeling in adulthood. To better support the author's claim that the lack of adult V1 OD plasticity is specifically due to reduced dLGN synaptic inhibition, the author should generate conditional KO mice that dLGN synaptic inhibition was only interfered in adulthood.

      In order to address this point it is important to discuss the plasticity deficits in dLGN and V1 separately.

      Concerning V1 plasticity: We have previously shown that brief MD during the standard critical period induces an OD shift in V1 of mice lacking thalamic synaptic inhibition in dLGN (Sommeijer et al, 2017). OD plasticity induced by brief MD is a hallmark of critical period plasticity in V1, and it thus seems unlikely that critical period onset in V1 is defective or that development of V1 is halted in an immature state that does not support OD plasticity in thalamus-specific GABRA1 deficient mice.

      The observed plasticity deficit during the critical period was limited to the second stage of the OD shift in V1, which requires long-term monocular deprivation. The straightforward explanation for this result and our current findings is that both during the critical period and in adulthood, the second stage of OD plasticity in V1 induced by long-term monocular deprivation requires thalamic plasticity or inhibition. The proposed alternative, that lack of thalamic synaptic inhibition during development results in a possible lack of structural change in V1 that would cause a lifelong deficiency selectively affecting OD plasticity induced by long-term monocular deprivation, is not impossible but requires many more assumptions.

      Concerning dLGN plasticity: The simplest explanation for the observed lack of OD plasticity in dLGN is that it is a direct consequence of the absence of synaptic inhibition in the KO mice. However, an alternative explanation could indeed be that dLGN is kept in an immature (pre-critical period-like) state due to the developmental absence of synaptic inhibition. This situation would be analogous to that in V1 of GAD65 deficient mice (which have reduced GABA release), in which OD plasticity cannot be induced by brief monocular deprivation during the critical period or in adulthood (Fagiolini and Hensch, 2000). Because this deficit can be reversed by treating the mice with benzodiazepines (positive allosteric modulators of GABA receptors) at any age, it is thought that development of V1 in GAD65 mice is halted in a pre-critical period-like state until inhibition is strengthened. We cannot exclude that something similar occurs in dLGN of mice lacking thalamic synaptic inhibition, although we did not observe any changes in hallmarks of dLGN maturity, such as reduced receptive field size (Fig. 1C), and increased cholinergic and inhibitory bouton densities (Suppl. Fig. 1).

      However, if the analogy with the developmental deficit in V1 of GAD65 deficient mice is valid, the reduced plasticity is still likely to be a direct consequence of reduced inhibition. In GAD65 deficient mice, long-term monocular deprivation during the critical period causes a full OD shift, showing that no additional deficits (besides reduced inhibition) limit OD plasticity in V1 of these mice (Fagiolini and Hensch, 2000). And, as already mentioned, increasing inhibition rescues OD plasticity in GAD65 KO mice. Thus, the immature state of V1 in these mice is probably a situation in which inhibition tone is too low to support efficient OD plasticity. In dLGN, knocking out GABRA1 at a later age could therefore also create a situation in which inhibition is too low to support thalamic OD plasticity, which is not different from the situation in which the gene is inactivated at birth. Only if lack of synaptic inhibition in thalamus affects another, unknown developmental process that is of importance later in life to support OD plasticity in dLGN, the proposed experiment would result in a different outcome. We are not convinced that this scenario is likely enough to justify repeating most of this study, but now using mice in which GABRA1 is inactivated in dLGN through bilateral AAV-cre injections.

      Independently of the exact cause of the plasticity deficit in dLGN, our results make clear that a cortical plasticity deficit in adulthood can have a thalamic origin, which we believe is an important insight that is highly relevant.

      2) The authors found that in juveniles, dLGN OD shift is dependent on V1 feedback, but not in adults. However, a recent work showed that the effects of V1 silencing on dLGN OD plasticity could differ with various starting points and duration of the V1 silencing and MD (Li et al., 2023). Could the authors provide more details of the MD and V1 silencing for an in-depth discussion?

      We would be happy to include some discussion about this interesting new paper in a revised manuscript. Some of the results may appear to contradict our findings. Most strikingly, the study by Li et al does not find evidence for OD plasticity in dLGN of 60-day old mice after 7 days of monocular deprivation. This seems to be at odds with the current work and with that of (Jaepel et al 2017) and (Huh et al. 2020). However, in the (Li et al, 2022) study, only the binocular neurons which responded to both contralateral and ipsilateral stimulus were included to measure the OD. This has important consequences for assessing OD and its plasticity. To illustrate: if dLGN neurons are monocularly responsive to the contralateral eye and become binocular after deprivation of the contralateral eye, they are excluded from analysis before deprivation but included after. This would cause an underestimation of the size of this OD shift. In our experiments, all dLGN neurons with receptive fields that were within 30o degrees away from the center of the visual field were included in the analysis, potentially explaining the different outcome of the studies.

      Also, Li et al observed that an OD shift in dLGN was still present after silencing V1 at p24. This observation is not necessarily at odds with our observation that the OD shift reduces at p30 upon silencing V1, as we find that the ODI does not return to normal but remains slightly lower (though not significantly so). Moreover, the age and the duration of deprivation were different and as mentioned before, analysis was performed differently.

      Interestingly, an excitotoxic lesion of V1 was found to alter OD in dLGN during development and affect OD plasticity in dLGN at various ages in the work of Li et al. This suggests that continuous crosstalk between thalamus and cortex during development guides plasticity, possibly optimizing thalamocortical and corticothalamic connections. The continued absence of corticothalamic feedback is likely to have a much larger impact on dLGN plasticity than the acute silencing we performed.

      Fagiolini M, Hensch TK. Inhibitory threshold for critical-period activation in primary visual cortex. Nature. 2000 Mar 9;404(6774):183-6.

      Huh CYL, Abdelaal K, Salinas KJ, Gu D, Zeitoun J, Figueroa Velez DX, Peach JP, Fowlkes CC, Gandhi SP. Long-term Monocular Deprivation during Juvenile Critical Period Disrupts Binocular Integration in Mouse Visual Thalamus. J Neurosci. 2020 Jan 15;40(3):585-604. doi: 10.1523/JNEUROSCI.1626-19.2019

      Jaepel J, Hübener M, Bonhoeffer T, Rose T. Lateral geniculate neurons projecting to primary visual cortex show ocular dominance plasticity in adult mice. Nat Neurosci. 2017 Dec;20(12):1708-1714

      Li N, Liu Q, Zhang Y, Yang Z, Shi X, Gu Y. Cortical feedback modulates distinct critical period development in mouse visual thalamus.. iScience. 2022 Dec 7;26(1):105752.

      Sommeijer JP, Ahmadlou M, Saiepour MH, Seignette K, Min R, Heimel JA, Levelt CN. Thalamic inhibition regulates critical-period plasticity in visual cortex and thalamus. Nat Neurosci. 2017 Dec;20(12):1715-1721.

    1. Author Response

      We sincerely appreciate the reviewers for investing their valuable time in assessing our manuscript. We understand the considerable effort involved in the review process, and we will make use of these suggestions in order to make the revised manuscript more complete in terms of explanation, discussion, additional simulations, experiments and analyses.

      -Specifically, we will experimentally and computationally investigate how activation via anti-CD3 antibodies relates to our mechanism.

      -We will also utilize a weaker pMHC binder in the pMHC-mediated T cell activation experiments.

      -We will improve the description of the function of the FG loop and the role of the connecting peptide (CP).

      -Furthermore, we will improve our description of and justification for the simulation methodology. We want to emphasize that all potentials have been described, and we will draw attention to these methodological descriptions where needed.

      The reviewers also suggested a number of additional simulations that are probably beyond our current capability. These include:

      -simulations of TCR in complex with a weaker agonist -simulations of the proline and alanine TCR mutants in complex with a pMHC.

      While we agree that such simulations would provide new insights into the mechanism of TCR triggering, they simply are not feasible at this time. We will give a more detailed explanation for these arguments in the revised manuscript.

      Below, please find our point-by-point planned action items:

      Reviewer #1 (Public Review):

      The manuscript entitled: "TCR-pMHC complex formation triggers CD3 dynamics" by Van Eerden et al. mainly uses coarse-grained molecular dynamics to probe the dynamic changes, in terms of CDε spatial arrangements around 226 TCRs, before and after the engagements of MCC/I-Ek. The broader distributions of CDε iso-occupancies after pMHC binding correlate with the decreases of TCR-CD3 contacts and extensions of TCR conformations. Given the observed release of motion restrictions upon antigen recognition, the authors proposed a "drawbridge" model to describe the initial triggering processes from pMHC association to TCR straightening, FG-loop getaway, and CD3 enhanced mobility. In addition, the authors briefly investigated the functional effects of the rigidified connecting peptide (CP) in T-cell activation using in silico and in vitro mutagenesis. The manuscript raises an important and exciting hypothesis about the allostery of TCR-CD3 during TCR triggering; however, due to current not-yet-convincing evidence, both computationally and experimentally, in supporting their conclusions.

      I would like to see additional work before supporting the publication of this manuscript in Life. See details below:

      1) As mentioned by the authors, the TCR triggering and T cell activation have been illustrated by a number of models, such as mechanosensing and kinetic proofreading, "in which TCRs discriminate agonistic from antagonistic pMHCs." However, the critical feature of antigen discrimination is lacking in the drawbridge model. So far, the CDε movements qualitatively distinguish on and off states. The simulation of the antagonist or weaker binder would strengthen the manuscript by demonstrating the relevance of CDε mobility in the triggering mechanism. 226 TCR associated with K99E/I-Ek has been resolved in Ref (DOI: 10.4049/jimmunol.1100197), which can potentially serve as the "intermediate" system to formulate the gradual increase of CDε dynamics.

      Planned actions:

      -Explain why the current study can not easily address pMHC discrimination

      -Explain why simulation of antagonist or weaker binding pMHC is technically difficult

      2) The linkage between conserved motifs in CP and CDε mobility is less apparent to this reviewer. The notion of the rigidified hinge (PP) requires further clarification. Computationally, the details of fine-grained simulations are required to justify the origin of the apparent mobility increase in PP. The direct comparison between Fig. 2 and Fig. 7 can help assess the relevance of CP through the alignment by FG-loop at a fixed direction in polar coordinates. Experimentally, anti-CD3 positive experiments and, ideally, another antagonist on 3A9 TCRs can strengthen the current functional assay. The baseline level of TCR expression (after positive selection) and 0h activation (Fig. S8) is missing.

      Planned actions:

      -Provide additional analysis of the role of CP as a hinge

      -Better clarify the FG simulation methodology

      -Align the CG and the FG polar plots

      -Perform experiments with anti-CD3 antibody 2C11

      -Perform additional experiment using weaker agonist (HEL peptide mutant)

      -Measure baseline-level TCR expression

      -Perform T cell activation experiments at t=0 h

      3) Regarding the section "The TCRβ FG loop acts as a gatekeeper," besides contact analysis, additional motion analysis, such as RMSF or PCA, can further establish the importance of FG loops.

      Planned actions:

      -Perform additional analyses of FG loop dynamics

      4) The discussion on anti-CD3 antibody effects and their potential contribution to CD3 mobility is highly recommended.

      Planned actions:

      -We will add the discussion of anti-CD3 antibody effects

      Reviewer #2 (Public Review):

      In this research article a new allosteric mechanism for T cell receptor (TCR) triggering upon peptide-MHC complex binding is presented in which conformational change in the TCR facilitates activation by controlling CD3 dynamics around the TCR. The authors find that the Cb FG loop acts as a gatekeeper and Cb connecting peptide acts as a hinge to control TCR flexibility.

      As an initial result, the authors set up two sets of simulations - TCR-CD3 and pMHC-TCR-CD3 in POPC bilayers and identified that the CD3e chains exhibit a wider range of mobility in the pMHC-TCR-CD3 system as compared to the TCR-CD3 system. Next, they examined the contacts between all subunits during the course of both simulations and established that CD3g and CD3eg made far fewer contacts with TCRb in the pMHC-TCR-CD3 simulations. Next, they identified that the TCR is extended in the pMHC-TCR-CD3 simulations with larger tilt angle of 150º and FG loop acts as gatekeeper that allows CD3 movements upon pMHC binding. Finally, Mutations in Cb connecting peptide regions indicated rigidified TCR leading to hypersensitive TCR, proved both by simulations and in vitro experiments.

      The following major concerns must be addressed.

      Major concerns:

      1) The simulations were performed with intracellular regions unfolded and free from the membrane. A more complete system should have the intracellular regions embedded in the membrane. An NMR structure of CD3e is available (Xu et al., Cell, 2008) and could have been modeled into the TCR-CD3 system before the simulation. Prakaash et al., (PLoS, Comput Biol, 2021) studied cytoplasmic domain motions during in their simulation experiments.

      Planned actions:

      -Explain why we can not perform adequate additional simulations of ITAMs

      2) Comparing Fig. 2C and Fig.7C, the movement of CD3eg is more restricted in Fig.7C. Is this because the simulation time is lower in the mutation experiments?

      Planned actions:

      -Explain the differences between the CG and FG polar plots

      3) Only TCR-CD3 simulation were performed for PP and AA mutants. A simulation with pMHC (pMHC-TCRmutants-CD3) should be performed to show increased CD3 mobility.

      Planned actions:

      -Explain why TCR-CD3-pMHC simulations of the mutants are not feasible at this time

      4) Using CD3e antibody, OKT3, for activation instead of pMHC as a separate experiment would add more value to this study. They can look at CD3 mobility and TCR elongation in the system with OKT3 antibody and compare it to the CD3 mobility and TCR elongation with the pMHC system. They can also use OKT3 with AA and PP mutants and perform both simulation and in vitro activation experiments.

      Planned actions:

      -Perform anti-CD3 (2C11) experiments

      -Perform CG simulation of TCR with CD3 Fab fragment

      -Explain why we cannot perform FG simulations of TCR mutants with CD3

      5) The activation experimental data is rather underwhelming. The difference between WT and PP in 2hr experiment at 0.016 ug/mL looks exceedingly low. A stronger TCR-pMHC system should be considered for the in vitro activation experiments.

      Planned actions:

      -Explain that this is a dilution curve, which is why at lower concentrations the effect is smaller, but at higher concentrations the effect is clear

      6) There is some concern that the scientific work lacks solid experimental functional data and lack of novelty due to earlier TCR-CD3 simulation studies (Pandey et al., 2021, eLife) that already reported flexibility and elongation of the complex.

      Planned actions:

      -Explain the similarities and difference between this and Pandey’s work; clarify how our study contributes novel findings

      Reviewer #3 (Public Review):

      The authors first explore structural differences of unbound TCR-CD3 complexes and pMHC-bound TCR-CD3 complexes with coarse-grained simulations. In the simulations with pMHC-bound complexes, the transmembrane (TM) domains of the TCR-CD3 complex and of pMHC are embedded in two opposing membrane patches. In the pMHC membrane patch, a pore is created and stabilised in the simulation setup with the aim to allow water transport in and out of the compartment between the membranes. The authors report a more upright conformation of the TCR extracellular (EC) domain in the simulations in which this EC domain is bound to pMHC, compared to simulations with unbound TCR, and postulate an allosteric signalling model based on these apparent conformational changes and associated changes in TCR-CD3 quaternary arrangements. Subsequently, the authors identify a GxxG motif in the TCRbeta connecting peptide between EC domain and TM domain as putative hinge in allosteric signalling, and explore the effect of double proline and double alanine substitutions in atomistic simulations and experiments.

      While these simulation and experimental setups and the addressed questions are of interest in the field, the following weaknesses prevail in my overall assessment of the work:

      (1) I am not convinced that the reported coarse-grained simulation results are sound or allow to draw the conclusions stated in the work. In the simulations with a pMHC-bound TCR-CD3 complex, the intermembrane distance in the setup shown in Figure S1 appears excessively large and likely leads to a rather strong force in the membrane-vertical direction and to the reported upright conformation of the TCR EC domain. This upright confirmation thus appears to be a consequence of force from the simulation setup, rather than a consequence of pMHC binding alone as suggested by the authors. While the membrane pore in principle allows water exchange, the relaxation of the intermembrane distance resulting from this water exchange in the 10 microsecond long simulation trajectories is not (but needs to be) addressed. This relaxation eventually would lead to an equilibrated membrane separation, in which essentially no force is exerted on the TCR-pMHC EC complex. However, I suspect that this computationally demanding equilibration is not achieved in the simulations, with the consequence that forces on the TCR-pMHC EC complex in the membrane-vertical direction remain.

      In addition, I am not convinced that the Martini force field of the coarse-grained simulations allows a reliable assessment of the quaternary interactions between the TCR and CD3 EC domains. Getting protein structures and interactions right in coarse-grained simulations is notoriously difficult. In simulations with the coarse-grained Martini force field, secondary protein structures are constrained as a standard procedure, and the authors also use a recommended Go-potential procedure, likely to stabilise tertiary protein structures. The quaternary interactions between the TCR EC domain and the pMHC EC domain are modelled by rather strong harmonic constraints to prevent dissociation. While the treatment of the quaternary interactions between the TCR EC domain and the CD3 EC domains in the simulations is not (but needs to be) addressed in detail, I suspect that there are no additional, or only weak constraints to stabilise these interactions. In any case, I think that a faithful representation of these quaternary interactions is beyond the reach of the Martini force field, as is the reported diffusion of the CD3 EC domains around the TCR EC domain, which plays a central role in the allosteric mechanism proposed by the authors (see Fig 2 and 5).

      Planned actions:

      -We will provide further description and justification for the CG simulations

      (2) The allosteric model suggested by the authors is motivated in an introduction that appears to omit central controversial aspects in the field, as well as experimental evidence that is not compatible with allosteric conformational changes in the TCR. These aspects are:

      • The mechanosensor model is controversial. In original versions of this model, a transversal force has been postulated to be required for T cell activation. However, more recent single-molecule force-sensor experiments reported in J Goehring et al., Nat Commun 12, 1 (2021) provide no evidence for a scenario in which transversal forces beyond 2 pN are associated with T cell activation.

      • The role of catch bonds is controversial. Evidence for TCR catch bonds has been mainly obtained in experimental setups using the biomembrane force probe, in which force is applied to TCRs on the surface of T cells, but is not reproduced in experimental setups using isolated TCRs, see e.g. L Limozin et al., PNAS 116, 16943 (2019)

      • Ref. 1 of the manuscript prominently discusses the kinetic segregation model of T cell activation, which is not (but needs to be) addressed in the introduction. In this model, exclusion of CD45 from close-contact zones around pMHC-bound TCRs triggers T cell activation. The model is supported by evidence from diverse experiments, see for example M Aramesh et al., PNAS 118, e2107535118 (2021) and Ref. 1. At least part of this evidence is not compatible with mechanosensing or allosteric models of T cell activation.

      Planned actions:

      -We will add the requested literature references and include a better description of the kinetic segregation model

    1. Author Response:

      The major criticism from the reviewers is that factors other than high-impact rare variants – such as environmental factors or epistasis – could have produced the complex tail architecture that we test for and detect. While we did explain this point in the Discussion, we agree with the reviewers that this should have been emphasized more and earlier in the manuscript.

      Regarding suggestions for more complex simulations and methods, we absolutely agree that much more work is needed here to produce optimised inference of all the causes of complex tail architecture. We are performing multiple projects at various stages of completion that we hope will contribute to this, but we felt that this was a good stopping-point in this project to publish what we had completed so far, in order to: (1) introduce the idea of inferring complex genetic architecture from siblings without requiring genetic data, (2) outline an initial theoretical framework for inferring complex tail architecture from sibling data, (3) provide simple tests powered to identify enrichments of de novo or ‘Mendelian’ variants in the tails (albeit tests that make several strong simplifying assumptions), (4) enable others interested in the topic to build upon this work now. However, we plan to expand our simulations and analyses in a revised manuscript based on reviewer feedback.

      We thank the reviewers for their comments about the value of our work, its mathematical robustness and the promise of our method.

    1. Author Response:

      The following is the authors' response to the original reviews.

      Reviewer #1 (Public Review):

      […] Overall, the authors build a convincing case for TEs being an important source of regulatory information. I don't have any issues with the analysis, but I am concerned about the sweeping claims made in the title. Once you get rid of eQTLs that could be altered by either SNPs or TIPs and include only those insertions that show strong evidence of selection, the number of genes is reduced to only 30. And even in those cases, the observed linkage is just that, not definitive evidence for the involvement of TEs. Although clearly beyond the scope of this analysis, transgenic constructs with the TEs present or removed, or even segregating families, would have been far more convincing. 

      We notice that the referee thinks that we "built a convincing case for TEs being an important source of regulatory information". This is what we wanted to convey in the title, were we were cautious to not claiming that TEs are the most important contributor to gene expression variability in rice populations. However, we agree with the referee that the title may be improved to better describe the results presented. We have therefore changed the title to "Transposons are an important contributor to gene expression variability under selection in rice populations".

      With respect to demonstrating causality by removing or introducing the TEs, this is indeed a work we plant to do but that, as stated by the referee, is beyond the scope of this analysis.

      The fact that many of the eQTL-TIPs were relatively old is interesting because it suggests that selection in domesticated rice was on pre-existing variation rather than new insertions. This may strengthen the argument because those older insertions are less likely to be purged due to negative effects on gene expression. Given that the sequence of these TEs is likely to have diverged from others in the same family, it would have been interesting to see if selection in favor of a regulatory function had caused these particular insertions to move away from more typical examples of the family. 

      The TIP-eQTL are from different classes, superfamilies and families and the number of TIP-eQTLs of the same family is too small to deduce sequence communalities (4.6 TIP-eQTLs/family in indica and 3.6 TIP-eQTLs/family in japonica). On the other hand the effect of TIPs on expression can be positive or negative (we show actually that it is often negative). In the later case, a plausible scenario would be of the insertion inactivating a promoter element, and in this case it would be the insertion itself, and not the actual sequence of the TE what would be selected.

      Also, previous work done in our lab has shown that TEs can amplify and mobilize transcription factor binding sites that are bound by the TF even when they are not close to a gene and therefore probably not directly affecting gene expression (Hénaff et al.,2014. The Plant Journal). In that case, the sequence of the eQTL TEs and those that are far away from genes will not necessarily differ. 

      Reviewer #2 (Public Review):

      In this manuscript, Castanera et al. investigated how transposable elements (TEs) altered gene expression in rice and how these changes were selected during the domestication of rice. Using GWAS, the authors found many TE polymorphisms in the proximity of genes to be correlated to distinct gene expression patterns between O. sativa ssp. japonica and O. sativa ssp. indica and between two different growing conditions (wet and drought). Thereby, the authors found some evidence of positive selection on some TE polymorphisms that could have contributed to the evolution of the different rice subspecies. These findings are underlined by some examples, which illustrate how changes in the expression of some specific genes could have been advantageous under different conditions. In this work, the authors manage to show that TEs should not be ignored when investigating the domestication of rise as they could have played an important role in contributing to the genetic diversity that was selected. However, this study stops short of identifying causations as the used method, GWAS, can only identify promising correlations. Nevertheless, this study contributes interesting insights into the role TEs played during the evolution of rice and will be of interest to a broader audience interested in the role TEs played during the evolution of plants in general. 

      We agree with the referee that the results presented do not allow concluding on causality, and we have been careful not to pretend they would in the manuscript. We plan to perform analysis of adding or removing TEs by CRIPR/Cas 9 approaches to address this, but, in line with referee's 1 comment, we think this is beyond the scope of this analysis.

      ---------- 

      Reviewer #1 (Recommendations For The Authors): 

      Everything that I need to say is provided in the public portion of my review. 

      Reviewer #2 (Recommendations For The Authors): 

      Major concerns:

      1. The authors compare the proportion of the variance explained by the most significant TIP and SNP on the observed eQLTs associated with TIPs and SNPs. Thereby the authors conclude that TIPs explain more variance than SNPs. If I am not mistaken the GWAS was run separately for TIPs and SNPs, however, I am wondering if running the GWAS on the combined TIP and SNP dataset might be the better way to compare the variance explained by TIPs and SNPs on gene expression differences. It would be nice to see if these results also hold true if a TIP and SNP combined dataset is used as the most significant marker in a GWAS might not be the causal mutation but might just be linked to the causal mutation. Further in the TIP dataset, the number of markers is only 45k and in the SNP dataset, it is 1 000k, which could bias the GWAS toward finding markers that explain more of the variation in the dataset with fewer markers. 

      We addressed the reviewer concern by using two complementary approaches, whose results are described in the text (lines 119-121) and in the new Figure 1-figure supplement 1.

      First, we addressed the concern regarding the independent GWAS for TIPs and SNPs vs a combined strategy. For this, we built new japonica/indica genotype matrices containing all TIP and SNP matrix together and ran eQTL mapping again. Using the same strategy (association + FDR adjust), we found 100% of the previous TIP-eQTLs and 99% of the previous SNP-eQTLs. We repeated the same analysis (proportion of expression variance), and the results were mostly the same (Figure 1-figure supplement 1A).

      Second, we addressed the two concerns (combined genotypes and different amount of TIP and SNP markers) using a single approach. SNP matrices were LD pruned using a r2 = 0.9 and later subsampled to the exact number of TIPs (Indica = 30,396, Japonica = 25,168). We verified that these SNPs covered well the 12 rice chromosomes. SNP and TIP genotypes were later merged into a single matrix, and eQTL mapping was repeated for each of the subspecies and conditions using the same parameters as in the previous version of the manuscript. 100 % of the previously reported TIP-eQTL associations were found using this new approach. Nevertheless, we found a very important drop of sensitivity in the SNP-eQTLs (only 15-20% of the previous associations were detected), possibly due to the strong reduction in the number of SNPs (> 95 %), which results in much lower number of markers at < 5Kb from genes). We repeated the analysis of Figure 1D, and observed very similar results (Figure 1-figure supplement 1D). There is a very important number of TIP-eQTL associations that do not coincide with SNP-eQTLs, (74% in indica, 83% in japonica) indicating that TIP-eQTL mapping is complementary to SNP-eQTL mapping as it uncovers additional associations (note that in this case the overlap between TIP-eQTLs and SNP-eQTLs is lower than in the previous analysis due to the lower sensitivity of SNP-eQTL mapping using less markers). In the cases were both a TIP and a SNP coincide as eQTL, TIPs explained slightly more variance than SNPs in both indica and japonica (in 54% of the cases TIP variance > SNP variance).

      2. Line 146 to 152: in this section, the authors describe overlaps between TIP-eQTLs in two different growth conditions, however, in the text it is not mentioned if the TIPs have the same effect on gene expression in the two conditions or if the gene expression is up-regulated in one condition but down-regulated in the other. This information would be interesting to have here, especially as the authors go on to say that only a small number of TIP-eQTLs are stress-specific. The same comment also goes for the eQTL overlap described on lines 167 to 170. 

      We checked the effect type (positive or negative) of TIP-eQTLs in both scenarios (associations shared between wet/dry conditions, and associations shared between subspecies). In both cases, 100 % of the shared TIP-eQTLs have the same effect type in the two conditions or subspecies. We have updated the text accordingly (Lines 55-157 and Lines 179-181)

      3. Lines 192 to 196: the authors mention that the frequency of non-eQTL-TIPs was at the same frequency in indica and japonica, which is in contrast to eQTL-TIPs. However, on line 132 it is mentioned that eQTL-TIPs were overrepresented in 1 kb regions upstream of genes. Hence, is the pattern of the frequency of non-eQTL-TIPs being at the same frequency in indica and japonica also observed in the 1 kb regions upstream of genes and/or if the distribution of non-eQTL-TIPs is matched to one of the eQTL-TIPs? Or is this pattern driven by non-eQTL-TIPs far away from genes?

      We checked the frequencies of TIPs at 1Kb upstream genes and found that the general pattern is maintained, with the frequencies of TIP no-eQTLs being more correlated than that of TIP-eQTLs. We have included this information (lines 204-206) an added a new supplementary file (Figure 2-figure supplement 2)

      4. In the discussion, the authors could briefly discuss how linked selection affecting TIPs could contribute to the observed results. After reading the second example in the result section where one of the example TIPs (TIP_50059) is found on the Hap B which contains "some additional structural differences" (line 290), I was left wondering how much of the increase in TIP frequency can be attributed to genetic hitchhiking? And how much of the results could be caused by linked selection, especially when considering that structural variations are not included in the GWAS analyses. 

      We agree with the referee in that some of the TIP eQTLs here described might be not the actual cause of expression variability (ej, TIP linked with the causal mutation), although we cannot know the exact fraction. This is stated in several places of the results and discussion sections. However, the fact that TIPs tend to explain more variance than SNPs and that TIP eQTL, but not SNP eQTL, tend to concentrate in the upstream proximal region of genes where most transcription regulatory sequences are located (Figure 1), suggest that TIP eQTLs could be more frequently the causal than SNP eQTLs. We revised the text to ensure that we convey this message appropriately.

      Minor comments: 

      • Lines 80 to 83: the description of the rice phylogeny should be moved to the introduction. 

      Done (Lines 68-72)

      • Line 177 to 186: It was unclear to me if the authors checked in the ancestral rice population laced the TIPs described in this section as recently inserted in the indica and japonica ssp. It would be nice to add this information to this section. 

      Thanks to the referee comment we noted an imprecision in the text. The approximate 1/3 of subspecies specific TIP-eQTLs refers to the TIPs at 3% MAF (ie, some of these insertions could be present at > 3% in indica, but at < 3% MAF in japonica). We now indicate only the TIPs that are truly specific to any of the two subspecies (frequency is zero in one of the two) and looked for their presence in rufipogon:

      59 insertions are indica-specific. Of those, 33 are present in rufipogon.

      21 insertions are japonica-specific. Of those, 5 are present in rufipogon.

      We have incorporated this information in the manuscript (Lines 185-189). The species-specific TIPs are also available in the Supplementary File 3.

      • Line 353: "have two of more TIPs" should be "two or more" 

      Done (Line 369)

      • Figure 1D: Using a square layout instead of a rectangle layout for the plot will make it easier to interpret. 

      Done.

    1. Author Response:

      The following is the authors' response to the original reviews.

      Reviewer #1 (Public Review):

      […] This novel system could serve as a powerful tool for loss-of-function experiments that are often used to validate a drug target. Not only this tool can be applied in exogenous systems (like EGFRdel19 and KRASG12R in this paper), the authors successfully demonstrated that ARTi can also be used in endogenous systems by CRISPR knocking in the ARTi target sites to the 3'UTR of the gene of interest (like STAG2 in this paper).

      We thank the referee for highlighting the novelty and potential of the ARTi system.

      ARTi enables specific, efficient, and inducible suppression of these genes of interest, and can potentially improve therapeutic target validations. However, the system cannot be easily generalized as there are some limitations in this system:

      • The authors claimed in the introduction sections that CRISPR/Cas9-based methods are associated with off-target effects, however, the author's system requires the use CRISPR/Cas9 to knock out a given endogenous genes or to knock-in ARTi target sites to the 3' UTR of the gene of interest. Though the authors used a transient CRISPR/Cas9 system to minimize the potential off-target effects, the advantages of ARTi over CRISPR are likely less than claimed.

      We thank the reviewer for raising these very valid concerns about potential off-target effects related to the CRISPR/Cas9-based gene knockout or engineering of endogenous ARTi target sites. In contrast to conventional RNAi- and CRISPR-based approaches, such off-target effects can be investigated prior to loss-of-function experiments through comparison between parental and engineered cells, which in the absence of CRISPR-induced off-target events are expected to be identical. Subsequent ARTi experiments provide full control over RNAi-induced off-target activities through comparison of target-site engineered and parental cells. However, we agree that undetected CRISPR/Cas9-induced off-target events cannot be ruled out in a definitive way, which we have pointed out in our revised manuscript.

      • Instead of generating gene-specific loss-of-function triggers for every new candidate gene, the authors identified a universal and potent ARTi to ensure standardized and controllable knockdown efficiency. It seems this would save time and effort in validating each lost-of-function siRNAs/sgRNAs for each gene. However, users will still have to design and validate the best sgRNA to knock out endogenous genes or to knock in ARTi target sites by CRISPR/Cas9. The latter is by no-means trivial. Users will need to design and clone an expression construct for their cDNA replacement construct of interest, which will still be challenging for big proteins.

      We fully agree that the required design of gene-specific sgRNAs and subsequent CRISPR-engineering steps are by no means trivial. However, we believe that decisive advantages of the method, in particular the robustness of LOF perturbations and additional means for controlling off-target activities, can make ARTi an investment that pays off. In our experience, much time can be lost in the search for effective LOF reagents, and even when these are found, questions about off-target activity remain. While ARTi overcomes many of these challenges by providing a standardized experimental workflow, we do not propose to replace all other LOF approaches by this method. Instead, we would position ARTi as a unique orthogonal approach for the stringent validation and in-depth characterization of candidate target genes, as we have highlighted in our revised discussion.

      • The approach of knocking-out an endogenous gene followed by replacement of a regulatable gene can also be achieved using regulated degrons, and by tet-regulated promoters included in the gene replacement cassette. The authors should include a discussion of the merits of these approaches compared with ARTi.

      We thank the reviewer for pointing out these alternative LOF methods. We had already included a brief discussion of chemical-genetic LOF methods based on degron tags. While we certainly share the current excitement about degron technologies, they inevitably require changes to the coding sequence of target proteins, which can alter their regulation and function in ways that are hard to control for. In our revised discussion, we have added a brief comparison to conventional tet-regulatable expression systems, which unlike ARTi require the use of ectopic tet-responsive promoters. Overall, we would position ARTi as an orthogonal tool that enables inducible and reversible LOF perturbations without changing the coding sequence and the endogenous transcriptional control of candidate target genes.

      Reviewer #2 (Public Review):

      […] The ARTi system is based on expression of a transgene with an artificial RNAi target site in the 3'-UTR as well as a TET-inducible miR-E-based shRNAi. Using this system, the authors convincingly show that they can target strong oncogenes such as EGFRdel19 or KRasG12 as well as synthetic lethal interactions (STAG1/2) in various human cancer cell lines in vivo and in vitro.

      The system is very innovative, likely easy to be established and used by the scientific community and thus very meaningful.

      We thank the reviewer for her/his enthusiasm about ARTi.

      Reviewer #1 (Recommendations For The Authors):

      • The authors claimed that ARTi enables specific, efficient, inducible, and reversible suppression of any gene of interest. However, there are no experiments supporting the reversible suppression of their gene of interest. Data are required to support this statement.

      We thank the reviewer for pointing this out. The statement about the reversibility ARTi-mediated effects was based on a rich body of literature that has demonstrated the reversibility of Tet-shRNAmir-induced LOF perturbations and associated phenotypes. As ARTi employs the same Tet-shRNAmir expression vectors, we have no reason to believe that this feature would be lost. However, since we have not demonstrated this in our study, we have removed this statement in our revised manuscript.

      • In Figure 1E, the authors did make the point by including trametinib treated samples as positive controls. However, the trametinib treated samples also made the transcriptome changes in the ARTi groups hard to read. I wonder what the PCA analysis will look like if the authors exclude the trametinib treated groups.

      In Figure 1E, we used PCA as a common and easy-to-digest visualization tool to showcase the neutrality of ARTi shRNAmirs. Given the complete absence of significantly deregulated genes for all three ARTi shRNAmirs (Figure 1F), we believe that a PCA analysis of just these samples would merely represent experimental noise and not yield additional insights.

      • This universal and potent ARTi should ensure standardized and controllable knockdown efficiency, however, the knockdown efficiency for KRASG12R is not as potent as that for EGFRdel19. The authors should discuss the differences.

      We thank the reviewer for pointing this out. The exact level of knockdown on the protein level is hard to determine due to detection limits of the used method. The differences in the extent to mRNA knockdown could be attributable to cleavage efficiencies due to potential secondary structures in the respective mRNAs. We suspect that the KRASG12R transgene expresses at higher levels, compared to EGFRdel19. We might therefore still be looking at the same overall magnitude of knockdown. As we did not perform a detailed analysis of the respective knockdown levels, we do not feel comfortable in stating differences in knockdown levels and therefore do not think that addressing potential differences are justified.

      • It is interesting to see that, unlike other cancer types, tumor burdens did not decrease after inducing knockdown of STAG1 in STAG2 knockout HCT116 lines in Figure 2L. Have the authors examined senescence markers in this set of mice?

      We have not investigated these markers and thank the reviewer for this suggestion. More detailed analyses of the phenotype are planned.

      • Have the authors carefully examined the transcriptome changes induced or if not across all targets at least in the case of ARTi knock into the 3'UTR of STAG1?

      We thank the reviewer for this suggestion. This would indeed be interesting to conduct for STAG1/2, especially for genes with an integration of the ARTi into the 3’UTR. The reason why we did not perform this analysis with our cell lines is that we used a construct that also adds an AID tag to STAG1 (STAG1_AID_V5_P2A_Blasti_STOP_ARTi), as outlined in the methods section. After the engineering, STAG1 carries the ARTi sequence in the 3’UTR but is also fused to AID::V5. In addition a P2A::Blasticidin_resistance Protein is made from the same transcript. We chose to use this complex strategy with the aim of comparing AID mediated degradation with ARTi-mediated knockdown. Unfortunately, the AID-based approach did not work, and we were not able to observe a reduction in protein levels. We however observed lower expression of STAG1 in the engineered versus the parental cells, likely caused by the tag, and consequently did not conduct gene expression analyses, as we would not be able to assess if transcriptome changes could be solely ascribed to the changes in the 3’UTR. The knockdown levels are hence only analyzed on the protein level.

      Reviewer #2 (Recommendations For The Authors):

      This is a fantastic paper, easy to read and provides a very meaningful new and innovative approach for drug target validation. I think the manuscript could be further improved by adding a section to the discussion outlining other approaches that could be used to solve the same problem. For example, Bill Kaelin came up with a strategy of expressing shRNA- or sgRNA-resistant and rtTA- or tTA-regulated cDNAs of essential gene-of-interest followed by sh/sgRNA-mediated ablation of the endogenous gene (e.g.PMID: 28082722), which is conceptually quite similar to the ARTi approach. Similarly, people have used conditional degron tags such as AID tags, dTags, HALOTags, IHZF3 degrons or SMASh either knocked into the endogenous locus or as rescue transgene. Comparing and contrasting the pros and cons of these methods to the ARTi-based approach would be certainly beneficial to the readers.

      We thank the referee for pointing out these alternative LOF methods. We certainly share the current excitement about various degron tags and are applying them in our own research. In our view, a major downside of these strategies is that they inevitably require changes to the coding sequence of target proteins, which can alter their regulation and function in ways that are hard to predict and control for. We had briefly mentioned this distinguishing feature in our discussion. The strategy proposed by Bill Kaelin, i.e. rescue of the the endogenous gene through Tet-regulated expression of sh/sgRNA-resistant cDNAs, indeed shares many features of the ARTi system, but requires expression of the candidate target from an ectopic promoter element. In contrast, ARTi enables similar perturbations of candidate genes without altering their endogenous transcriptional regulations – a feature that we have highlighted in our revised discussion.

      All my other comments outlined below should be considered minor and are not essential.

      1, Suppl Fig.1 C: Please explain what the red star means. How can the knock-out be more than 100%. Please specify what the controls are. Why does shRNA660 exhibit no knockdown at all?

      The red star indicates ARTi-shRNAmirs that were selected for further characterization. Depicted GFP knockdown levels are normalized to the performance of shRen.713, a well-characterized potent control shRNA targeting Renilla Luciferase. Values of more than 100% mean that the respective shRNA exceeded effects of shRNA.713. shRNA.660 served as a neutral control – its target site was not included in the reporter construct. We thank the reviewer for bringing up these points, which we have clarified in the legend.

      2, x-axis label in Suppl Fig. 1D is missing

      We thank the referee for spotting this and have added this information to the figure and its legend.

      3, I would argue that ARTi6634 also has a slight effect in MV4-11 similar to its effect to RN2. Maybe add that to the text.

      We thank the reviewer and have added this observation to our revised text.

      4, Suppl. Figure Legend 1F - specify which cell line was used (HT-1080 presumably)

      We apologize for this mistake and now have indicated the cell line in the legend.

      5, Fig. 2A and E, it might be nice to add the dsRED fusion to the schematics so that the reader sees the difference between the endogenous and the endogenous. One could then also change the color to red instead of blue.

      We thank the reviewer for this suggestion and adapted the figure accordingly.

      6, Fig. 2B - In the third lane, there appears to be a residual band of the endogenous EGFR despite the fact that it should be KO. Is this a EGFR wt lysate with EGFR::dsRED::ARTi overexpression and as such a type in the legend or is this a non-complete KO? It might be beneficial to label the legend with EGFR::dsRED::ARTi instead of EGFR::ARTi have one arrow depicting EGFR and one additional arrow showing the EGFR::dsRED fusion (as in Fig. 1F).

      We thank the reviewer for this insightful comment. We interpret the WB signal in lane three as potential cleavage/degradation products of the transgene as all signal disappears upon ARTi-mediated knockdown. Due to space reasons, we would prefer to keep the label as it is. The exact nature of the transgene is stated in the text and in the methods section.

      7, Suppl Fig. 2d: It is interesting that there is such a huge upregulation of DUSP6 in cells that express EGFR::ARTi compared to parental? The figure legend states: expression levels of DUSP6 in parental and engineered PC-9 cells. I assume the first box (EGFR::ARTi -/ dox -) is the parental line? Is there really a 5x upregulation of DUSP6 upon overexpression of EGFR::ARTi compared to parental (despite the fact that the endogenous EGFR::ARTi is expressed to similar levels compared to the endogenous EGFR)? Please clarify a little better which of the cells are parental and which are EGFR KO and which are transduced with EGFR::ARTi. Might suffice to just explain in the supplmental figure legend that expression of the exogenous EGFR::ARTi in EGFR KO cells leads to increased expression of ERK targets such as DUSP6 and EPHA2 etc.

      We thank the reviewer for this comment. We ascribe the increased expression of DUSP6 to the forced expression of the oncogenic variant of EGFR (EGFRdel19) while only a subset of EGFR genes in PC-9 cells is mutated and the rest is wild-type. Therefore, the net-output of EGFR signaling would be higher, even if the EGFR protein levels were exactly the same, as the EGFR gene is only present in the oncogenic form in the engineered cells but a mixture of mutant and wild-type proteins would make up the EGFR pool in the parental cells. The figure legend was changed accordingly, highlighting that DUSP6 is a MAPK downstream gene.

      8, Suppl Fig. 2e: Similar to my comment #7. Expression of endogenous EGFR is lost upon KO of EGFR, but cylcinD1 expression as well as expression of other ERK target genes increases upon loss of the endogenous EGFR gene with concomitant expression of EGFR::ARTi . It is nice to see that most of those genes are down-regulated upon DOX treatment. However, CyclinD1 is strongly up-regulated - any idea why? Might be nice to comment on this in the supplemental material to make it easy for the reader to interpret the data.

      We agree with the reviewer that the direct MAPK target genes follow the expected pattern of strong downregulation. We have not studied the expression of CCND1 in detail and therefore cannot comment on the mechanistic basis of this observation.

      9, Fig. 2F - might be nice to provide some densitometry data to quantify the effect of ARTi-mediated KRasG12R knock-down.

      We thank the reviewer for this suggestion and apologize that this data is not available for this study. We will include densitometry data in upcoming studies involving ARTi. As the observed knockdown was almost complete and hence readily observable by eye, we did not measure the effects using densitometry. In addition, we would like to mention that the sensor assay contains a quantitative analysis of the knockdown levels.

      10, Fig. 2I, it might be nice to add the V5 tag to the schematic and mention the V5 tag in the text: ... and homozygously inserted ARTi target sites into the 3'-UTR as well as a V5 tag to the endogenous STAG1 alleles (Fig. 2i)

      We thank the reviewer for the suggestion and explained the exact makeup of the construct better in the main text. We would however like to keep the figure as simple as possible and put the focus on the endogenous engineering here.

      11, Fig. 2J - might be nice to provide some densitometry data to quantify the effect of ARTi-mediated STAT1::V5 knock-down.

      We thank the reviewer for this suggestion and apologize that this data is not available for this study. We will include densitometry data in upcoming studies involving ARTi. As the observed knockdown was almost complete and hence readily observable by eye, we did not measure the effects using densitometry. In addition, we would like to mention that the sensor assay contains a quantitative analysis of the knockdown levels.

      12, Suppl. Fig 4B: the authors write: 'Western blotting confirmed ... the homozygous insertion of the targeting cassette into the STAG1 locus, ...' . I think the WB nicely shows insertion of the V5 tag into the STAG1 locus, but it I think WB cannot show homozygous insertion. The fact that in Suppl Fig 1B STAG1 expression is (almost) completely ablated, is a good indication, but in Fig. 2J, there is still about 50% expression. As such, proofing homozygous insertion by PCR/Sanger sequencing or densitometry over several experiments or just rephrasing the text a little might be beneficial.

      We agree with the reviewer and have adapted the respective passage in the main text.

      Competing interests statement: A patent application related to the design and use of the ARTi system entitled ‘Methods and molecules for RNA interference (RNAi)’ has been submitted by T.H., M.H., J.Z. and R.N. to the European Patent Office (application EP21217407.2).

    1. Author Response:

      The following is the authors' response to the original reviews.

      Reply to Public Reviews:

      Reply to Reviewer #1:

      This is a carefully performed and well-documented study to indicate that the FUS protein interacts with the GGGGCC repeat sequence in Drosophila fly models, and the mechanism appears to include modulating the repeat structure and mitigating RAN translation. They suggest FUS, as well as a number of other G-quadruplex binding RNA proteins, are RNA chaperones, meaning they can alter the structure of the expanded repeat sequence to modulate its biological activities.

      Response: We would like to thank the reviewer for her/his time for evaluating our manuscript. We are very happy to see the reviewer for highly appreciating our manuscript.

      1. Overall this is a nicely done study with nice quantitation. It remains somewhat unclear from the data and discussions in exactly what way the authors mean that FUS is an RNA chaperone: is FUS changing the structure of the repeat or does FUS binding prevent it from folding into alternative in vivo structure?

      Response: We appreciate the reviewer’s constructive comments. Indeed, we showed that FUS changes the higher-order structures of GGGGCC [G4C2] repeat RNA in vitro, and that FUS suppresses G4C2 RNA foci formation in vivo. According to the established definition of RNA chaperone, RNA chaperones are proteins changing the structures of misfolded RNAs without ATP use, resulting in the maintenance of proper RNAs folding (Rajkowitsich et al., 2007). Thus, we consider that FUS is classified into RNA chaperone. To clarify these interpretations, we revised the manuscript as follows.

      (1) On page 10, line 215-219, the sentence “These results were in good agreement with our previous study on SCA31 showing the suppressive effects of FUS and other RBPs on RNA foci formation of UGGAA repeat RNA as RNA chaperones …” was changed to “These results were in good agreement with … RNA foci formation of UGGAA repeat RNA through altering RNA structures and preventing aggregation of misfolded repeat RNA as RNA chaperones …”.

      (2) On page 17, line 363-366, the sentence “FUS directly binds to G4C2 repeat RNA and modulates its G-quadruplex structure, as evident by CD and NMR analyses (Figure 5), suggesting its functional role as an RNA chaperone.” was changed to “FUS directly binds to G4C2 repeat RNA and modulates its G-quadruplex structure as evident by CD and NMR analyses (Figure 5, Figure 5—figure supplement 2), and suppresses RNA foci formation in vivo (Figures 3A and 3B), suggesting its functional role as an RNA chaperone.”

      Reply to Reviewer #2:

      Fuijino et al. provide interesting data describing the RNA-binding protein, FUS, for its ability to bind the RNA produced from the hexanucleotide repeat expansion of GGGGCC (G4C2). This binding correlates with reductions in the production of toxic dipeptides and reductions in toxic phenotypes seen in (G4C2)30+ expressing Drosophila. Both FUS and G4C2 repeats of >25 are associated with ALS/FTD spectrum disorders. Thus, these data are important for increasing our understanding of potential interactions between multiple disease genes. However, further validation of some aspects of the provided data is needed, especially the expression data.

      Response: We would like to thank the reviewer for her/his time for evaluating our manuscript and also for her/his important comments that helped to strengthen our manuscript.

      Some points to consider when reading the work:

      1. The broadly expressed GMR-GAL4 driver leads to variable tissue loss in different genotypes, potentially confounding downstream analyses dependent on viable tissue/mRNA levels.

      Response: We thank the reviewer for this constructive comment. In the RT-qPCR experiments (Figures 1E, 3C, 4G, 6D and Figure 1—figure supplement 1C), the amounts of G4C2 repeat transcripts were normalized to those of gal4 transcripts expressed in the same tissue, to avoid potential confounding derived from the difference in tissue viability between genotypes, as the reviewer pointed out. To clarify this process, we have made the following change to the revised manuscript.

      (1) On page 30, line 548-550, the sentence “The amounts of G4C2 repeat transcripts were normalized to those of gal4 transcripts in the same sample” was changed to “The amounts of G4C2 repeat transcripts were normalized to those of gal4 transcripts expressed in the same tissue to avoid potential confounding derived from the difference in tissue viability between genotypes”.

      2. The relationship between FUS and foci formation is unclear and should be interpreted carefully.

      Response: We appreciate the reviewer’s important comment. We apologize for the lack of clarity. We showed the relationship between FUS and RNA foci formation in our C9-ALS/FTD fly, that is, FUS suppresses RNA foci formation (Figures 3A and 3B), and knockdown of endogenous caz, a Drosophila homologue of FUS, enhanced it conversely (Figures 4E and 4F). We consider that FUS suppresses RNA foci formation through altering RNA structures and preventing aggregation of misfolded G4C2 repeat RNA as an RNA chaperone. To clarify these interpretations, we revised the manuscript as follows.

      (1) On page 10, line 215-219, the sentence “These results were in good agreement with our previous study on SCA31 showing the suppressive effects of FUS and other RBPs on RNA foci formation of UGGAA repeat RNA as RNA chaperones …” was changed to “These results were in good agreement with … RNA foci formation of UGGAA repeat RNA through altering RNA structures and preventing aggregation of misfolded repeat RNA as RNA chaperones …”.

      (2) On page 17, line 363-366, the sentence “FUS directly binds to G4C2 repeat RNA and modulates its G-quadruplex structure, as evident by CD and NMR analyses (Figure 5), suggesting its functional role as an RNA chaperone.” was changed to “FUS directly binds to G4C2 repeat RNA and modulates its G-quadruplex structure as evident by CD and NMR analyses (Figure 5, Figure 5—figure supplement 2), and suppresses RNA foci formation in vivo (Figures 3A and 3B), suggesting its functional role as an RNA chaperone.”

      Reply to Reviewer #3:

      In this manuscript Fujino and colleagues used C9-ALS/FTD fly models to demonstrate that FUS modulates the structure of (G4C2) repeat RNA as an RNA chaperone, and regulates RAN translation, resulting in the suppression of neurodegeneration in C9-ALS/FTD. They also confirmed that FUS preferentially binds to and modulates the G-quadruplex structure of (G4C2) repeat RNA, followed by the suppression of RAN translation. The potential significance of these findings is high since C9ORF72 repeat expansion is the most common genetic cause of ALS/FTD, especially in Caucasian populations and the DPR proteins have been considered the major cause of the neurodegenerations.

      Response: We would like to thank the reviewer for her/his time for evaluating our manuscript. We are grateful to the reviewer for the insightful comments, which were very helpful for us to improve the manuscript.

      1. While the effect of RBP as an RNA chaperone on (G4C2) repeat expansion is supposed to be dose-dependent according to (G4C2)n RNA expression, the first experiment of the screening for RBPs in C9-ALS/FTD flies lacks this concept. It is uncertain if the RBPs of the groups "suppression (weak)" and "no effect" were less or no ability of RNA chaperone or if the expression of the RBP was not sufficient, and if the RBPs of the group "enhancement" exacerbated the toxicity derived from (G4C2)89 RNA or the expression of the RBP was excessive. The optimal dose of any RBPs that bind to (G4C2) repeats may be able to neutralize the toxicity without the reduction of (G4C2)n RNA.

      Response: We appreciate the reviewer’s constructive comments. We employed the site-directed transgenesis for the establishment of RBP fly lines, to ensure the equivalent expression levels of the inserted transgenes. We also evaluated the toxic effects of overexpressed RBPs themselves by crossbreeding with control EGFP flies, showing in Figure 1A. To clarify them, we have made the following changes to the revised manuscript.

      (1) On page 8, line 166-168, the sentence “The variation in the effects of these G4C2 repeat-binding RBPs on G4C2 repeat-induced toxicity may be due to their different binding affinities to G4C2 repeat RNA, and their different roles in RNA metabolism.” was changed to “The variation in the effects of these G4C2 repeat-binding RBPs on G4C2 repeat-induced toxicity may be due to their different binding affinities to G4C2 repeat RNA, and the different toxicity of overexpressed RBPs themselves.”.

      (2) On page 29, line 519-522, the sentence “By employing site-specific transgenesis using the pUASTattB vector, each transgene was inserted into the same locus of the genome, and was expected to be expressed at the equivalent levels.” was added.

      2. In relation to issue 1, the rescue effect of FUS on the fly expressing (G4C2)89 (FUS-4) in Figure 4-figure supplement 1 seems weaker than the other flies expressing both FUS and (G4C2)89 in Figure 1 and Figure 1-figure supplement 2. The expression level of both FUS protein and (G4C2)89 RNA in each line is important from the viewpoint of therapeutic strategy for C9-ALS/FTD.

      Response: We appreciate the reviewer’s important comment. The FUS-4 transgene is expected to be expressed at the equivalent level to the FUS-3 transgene, since they are inserted into the same locus of the genome by the site-directed transgenesis. Thus, we suppose that the weaker suppressive effect of FUS-4 coexpression on G4C2 repeat-induced eye degeneration can be attributed to the C-terminal FLAG tag that is fused to FUS protein expressed in FUS-4 fly line. Since the caz fly expresses caz protein also fused to FLAG tag at the C-terminus, we used this FUS-4 fly line to directly compare the effect of caz on G4C2 repeat-induced toxicity to that of FUS.

      3. While hallmarks of C9ORF72 are the presence of DPRs and the repeat-containing RNA foci, the loss of function of C9ORF72 is also considered to somehow contribute to neurodegeneration. It is unclear if FUS reduces not only the DPRs but also the protein expression of C9ORF72 itself.

      Response: We thank the reviewer for this comment. We agree that not only DPRs, but also toxic repeat RNA and the loss-of-function of C9ORF72 jointly contribute to the pathomechanisms of C9-ALS/FTD. Since Drosophila has no homolog corresponding to the human C9orf72 gene, the effect of FUS on C9orf72 expression cannot be assessed. Our fly models are useful for evaluating gain-of-toxic pathomechanisms such as RNA foci formation and RAN translation, and the association between FUS and loss-of function of C9ORF72 is beyond the scope of this study.

      4. In Figure 5E-F, it cannot be distinguished whether FUS binds to GGGGCC repeats or the 5' flanking region. The same experiment should be done by using FUS-RRMmut to elucidate whether FUS binding is the major mechanism for this translational control. Authors should show that FUS binding to long GGGGCC repeats is important for RAN translation.

      Response: We would like to thank the reviewer for these insightful comments. Following the reviewer’s suggestion, we perform in vitro translation assay again using FUS-RRMmut, which loses the binding ability to G4C2 repeat RNA as evident by the filter binding assay (Figure 5A), instead of BSA. The results are shown in the figures of Western blot analysis below. The addition of FUS to the translation system suppressed the expression levels of GA-Myc efficiently, whereas that of FUS-RRMmut did not. FUS decreased the expression level of GA-Myc at as low as 10nM, and nearly eliminated RAN translation activity at 100nM. At 400nM, FUS-RRMmut weakly suppressed the GA-Myc expression levels probably because of the residual RNA-binding activity. These results suggest that FUS suppresses RAN translation in vitro through direct interactions with G4C2 repeat RNA.

      Unfortunately, RAN translation from short G4C2 repeat RNA was not investigated in our translation system, although the previous study reported the low efficacy of RAN translation from short G4C2 repeat RNA (Green et al., 2017).

      Author response image 1.

      (A) Western blot analysis of the GA-Myc protein in the samples from in vitro translation.

      (B) Quantification of the GA-Myc protein levels.

      We have made the following changes to the revised manuscript.

      (1) Figure 5F was replaced to new Figures 5F and 5G.

      (2) On page 14-15, line 326-330, the sentence “Notably, the addition of FUS to this system decreased the expression level of GA-Myc in a dose-dependent manner, whereas the addition of the control bovine serum albumin (BSA) did not (Figure 5F).” was changed to “Notably, upon the addition to this translation system, FUS suppressed RAN translation efficiently, whereas FUS-RRMmut did not. FUS decreased the expression levels of GA-Myc at as low as 10nM, and nearly eliminated RAN translation activity at 100nM. At 400nM, FUS-RRMmut weakly suppressed the GA-Myc expression levels probably because of the residual RNA-binding activity (Figure 5F and 5G).”.

      (3) On page 15, line 330-332, the sentence “Taken together, these results indicate that FUS suppresses RAN translation from G4C2 repeat RNA in vitro as an RNA chaperone.” was changed to “Taken together, these results indicate that FUS suppresses RAN translation in vitro through direct interactions with G4C2 repeat RNA as an RNA chaperone.”.

      (4) On page 37, line 720-723, the sentence “For preparation of the FUS protein, the human FUS (WT) gene flanked at the 5¢ end with an Nde_I recognition site and at the 3¢ end with a _Xho_I recognition site was amplified by PCR from pUAST-_FUS.” was changed to “For preparation of the FUS proteins, the human FUS (WT) and FUS-RRMmut genes flanked at the 5¢ end with an Nde_I recognition site and at the 3¢ end with a _Xho_I recognition site was amplified by PCR from pUAST-_FUS and pUAST- FUS-RRMmut, respectively.”.

      (5) On page 41, line 816-819, the sentence “FUS or BSA at each concentration (10, 100, and 1,000 nM) was added for translation in the lysate.” was changed to “FUS or FUS-RRMmut at each concentration (10, 100, 200, 400, and 1,000 nM) was preincubated with mRNA for 10 min to facilitate the interaction between FUS protein and G4C2 repeat RNA, and added for translation in the lysate.”.

      5. It is not possible to conclude, as the authors have, that G-quadruplex-targeting RBPs are generally important for RAN translation (Figure 6), without showing whether RBPs that do not affect (G4C2)89 RNA levels lead to decreased DPR protein level or RNA foci.

      Response: We appreciate the reviewer’s critical comment. Following the suggestion by the reviewer, we evaluate the effect of these G-quadruplex-targeting RBPs on RAN translation. We additionally performed immunohistochemistry of the eye imaginal discs of fly larvae expressing (G4C2)89 and these G-quadruplex-targeting RBPs. As shown in the figures of immunohistochemistry below, we found that coexpression of EWSR1, DDX3X, DDX5, and DDX17 significantly decreased the number of poly(GA) aggregates. The results suggest that these G-quadruplex-targeting RBPs regulate RAN translation as well as FUS.

      Author response image 2.

      (A) Immunohistochemistry of poly(GA) in the eye imaginal discs of fly larvae expressing (G4C2)89 and the indicated G-quadruplex-targeting RBPs.

      (B) Quantification of the number of poly(GA) aggregates.

      We have made the following changes to the revised manuscript.

      (1) Figures 6E and 6F were added.

      (2) On page 6-7, line 135-137, the sentence “In addition, other G-quadruplex-targeting RBPs also suppressed G4C2 repeat-induced toxicity in our C9-ALS/FTD flies.” was changed to “In addition, other G-quadruplex-targeting RBPs also suppressed RAN translation and G4C2 repeat-induced toxicity in our C9-ALS/FTD flies.”.

      (3) On page 15, line 344-346, the sentence “As expected, these RBPs also decreased the number of poly(GA) aggregates in the eye imaginal discs (Figures 6E and 6F).” was added.

      (4) On page 15, line 346-347, the sentence “Their effects on G4C2 repeat-induced toxicity and repeat RNA expression were consistent with those of FUS.” was changed to “Their effects on G4C2 repeat-induced toxicity, repeat RNA expression, and RAN translation were consistent with those of FUS.”

      (5) On page 16, line 355-357, the sentence “Thus, some G-quadruplex-targeting RBPs regulate G4C2 repeat-induced toxicity by binding to and possibly by modulating the G-quadruplex structure of G4C2 repeat RNA.” was changed to “Thus, some G-quadruplex-targeting RBPs regulate RAN translation and G4C2 repeat-induced toxicity by binding to and possibly by modulating the G-quadruplex structure of G4C2 repeat RNA.”

      (6) On page 19, line 417-421, the sentence “We further found that G-quadruplex-targeting RNA helicases, including DDX3X, DDX5, and DDX17, which are known to bind to G4C2 repeat RNA (Cooper-Knock et al., 2014; Haeusler et al., 2014; Mori et al., 2013a; Xu et al., 2013), also alleviate G4C2 repeat-induced toxicity without altering the expression levels of G4C2 repeat RNA in our Drosophila models.” was changed to “We further found that G-quadruplex-targeting RNA helicases, … ,also suppress RAN translation and G4C2 repeat-induced toxicity without altering the expression levels of G4C2 repeat RNA in our Drosophila models.”.

      Reply to Recommendations For The Authors:

      1) It is not clear from the start that the flies they generated with the repeat have an artificial vs human intronic sequence ahead of the repeat. It would be nice if they presented somewhere the entire sequence of the insert. The reason being that it seems they also tested flies with the human intronic sequence, and the effect may not be as strong (line 234). In any case, in the future, with a new understanding of RAN translation, it would be nice to compare different transgenes, and so as much transparency as possible would be helpful regarding sequences. Can they include these data?

      Response: We thank the editors and reviewers for this comment. We apologize for the lack of clarity. We used artificially synthesized G4C2 repeat sequences when generating constructs for (G4C2)n transgenic flies, so these constructs do not contain human intronic sequence ahead of the G4C2 repeat in the C9orf72 gene, as explained in the Materials and Methods section. To clarify the difference between our C9-ALS/FTD fly models and LDS-(G4C2)44GR-GFP fly model (Goodman et al., 2019), we have made the following change to the revised manuscript.

      (1) Schema of the LDS-(G4C2)44GR-GFP construct was presented in Figure 3—figure supplement 1.

      Furthermore, to maintain transparency of the study, we have provided the entire sequence of the insert as the following source file.

      (2) The artificial sequences inserted in the pUAST vector for generation of the (G4C2)n flies were presented in Figure 1—figure supplement 1—source data 1.

      2) It is really nice how they quantitated everything and showed individual data points.

      Response: We thank the editors and reviewers for appreciating our data analysis method. All individual data points and statistical analyses are summarized in source data files.

      3) So when they call FUS an RNA chaperone, are they simply meaning it is changing the structure of the repeat, or could it just be interacting with the repeat to coat the repeat and prevent it from folding into whatever in vivo structures? Can they speculate on why some RNA chaperones lead to presumed decay of the repeat and others do not? Can they discuss these points in the discussion? Detailed mechanistic understanding of RNA chaperones that ultimately promote decay of the repeat might be of highly significant therapeutic benefit.

      Response: We appreciate these critical comments. Indeed, we showed that FUS changes the higher-order structures of G4C2 repeat RNA in vitro, and that FUS suppresses G4C2 RNA foci formation. According to the established definition of RNA chaperone, RNA chaperones are proteins changing the structures of misfolded RNAs without ATP use, resulting in the maintenance of proper RNAs folding (Rajkowitsich et al., 2007). Thus, we consider that FUS is classified into RNA chaperone. To clarify these interpretations, we revised the manuscript as follows.

      (1) On page 10, line 215-219, the sentence “These results were in good agreement with our previous study on SCA31 showing the suppressive effects of FUS and other RBPs on RNA foci formation of UGGAA repeat RNA as RNA chaperones …” was changed to “These results were in good agreement with … RNA foci formation of UGGAA repeat RNA through altering RNA structures and preventing aggregation of misfolded repeat RNA as RNA chaperones …”.

      (2) On page 17, line 363-366, the sentence “FUS directly binds to G4C2 repeat RNA and modulates its G-quadruplex structure, as evident by CD and NMR analyses (Figure 5), suggesting its functional role as an RNA chaperone.” was changed to “FUS directly binds to G4C2 repeat RNA and modulates its G-quadruplex structure as evident by CD and NMR analyses (Figure 5, Figure 5—figure supplement 2), and suppresses RNA foci formation in vivo (Figures 3A and 3B), suggesting its functional role as an RNA chaperone.”

      Besides these RNA chaperones, we observed the expression of IGF2BP1, hnRNPA2B1, DHX9, and DHX36 decreased G4C2 repeat RNA expression levels. In addition, we recently reported that hnRNPA3 reduces G4C2 repeat RNA expression levels, leading to the suppression of neurodegeneration in C9-ALS/FTD fly models (Taminato et al., 2023). We speculate these RBPs could be involved in RNA decay pathways as components of the P-body or interactors with the RNA deadenylation machinery (Tran et al., 2004; Katahira et al., 2008; Geissler et al., 2016; Hubstenberger et al., 2017), possibly contributing to the reduced expression levels of G4C2 repeat RNA. To clarify these interpretations, we revised the manuscript as follows.

      (3) On page 18, line 392-398, the sentences “Similarly, we recently reported that hnRNPA3 reduces G4C2 repeat RNA expression levels, leading to the suppression of neurodegeneration in C9-ALS/FTD fly models (Taminato et al., 2023). Interestingly, these RBPs have been reported to be involved in RNA decay pathways as components of the P-body or interactors with the RNA deadenylation machinery (Tran et al., 2004; Katahira et al., 2008; Geissler et al., 2016; Hubstenberger et al., 2017), possibly contributing to the reduced expression levels of G4C2 repeat RNA.” was added.

      4) What is the level of the G4C2 repeat when they knock down caz? Is it possible that knockdown impacts the expression level of the repeat? Can they show this (or did they and I miss it)?

      Response: We thank the editors and reviewers for this comment. The expression levels of G4C2 repeat RNA in (G4C2)89 flies were not altered by the knockdown of caz, as shown in Figure 4G.

      5) A puzzling point is that FUS is supposed to be nuclear, so where is FUS in the brain in their lines? They suggest it modulates RAN translation, and presumably, that is in the cytoplasm. Is FUS when overexpressed now in part in the cytoplasm? Is the repeat dragging it into the cytoplasm? Can they address this in the discussion? If FUS is never found in vivo in the cytoplasm, then it raises the point that the impact they find of FUS on RAN translation might not reflect an in vivo situation with normal levels of FUS.

      Response: We appreciate these important comments. We agree with the editors and reviewers that FUS is mainly localized in the nucleus. However, FUS is known as a nucleocytoplasmic shuttling RBP that can transport RNA into the cytoplasm. Indeed, FUS is reported to facilitate transport of actin-stabilizing protein mRNAs to function in the cytoplasm (Fujii et al., 2005). Thus, we consider that FUS binds to G4C2 repeat RNA in the cytoplasm and suppresses RAN translation in this study.

      6) When they are using 2 copies of the driver and repeat, are they also using 2 copies of FUS? These are quite high levels of transgenes.

      Response: We thank the editors and reviewers for this comment. We used only 1 copy of FUS when using 2 copies of GMR-Gal4 driver. Full genotypes of the fly lines used in all experiments are described in Supplementary file 1.

      7) In Figure5-S1, FUS colocalizing with (G4C2)RNA is not clear. High-magnification images are recommended.

      Response: We appreciate this constructive comment on the figure. Following the suggestion, high-magnification images are added in Figure 5—figure supplement 1.

      8) I also suggest that the last sentence of the Discussion be revised as follows: Thus, our findings contribute not only to the elucidation of C9-ALS/FTD, but also to the elucidation of the repeat-associated pathogenic mechanisms underlying a broader range of neurodegenerative and neuropsychiatric disorders than previously thought, and it will advance the development of potential therapies for these diseases.

      Response: We appreciate this recommendation. We have made the following change based on the suggested sentence.

      (1) On page 20-21, line 455-459, “Thus, our findings contribute not only towards the elucidation of repeat-associated pathogenic mechanisms underlying a wider range of neuropsychiatric diseases than previously thought, but also towards the development of potential therapies for these diseases.” was changed to “Thus, our findings contribute to the elucidation of the repeat-associated pathogenic mechanisms underlying not only C9-ALS/FTD, but also a broader range of neuromuscular and neuropsychiatric diseases than previously thought, and will advance the development of potential therapies for these diseases.”.

      Authors’ comment on previous eLife assessment:

      We thank the editors and reviewers for appreciating our study. We mainly evaluated the function of human FUS protein on RAN translation and G4C2 repeat-induced toxicity using Drosophila expressing human FUS in vivo, and the recombinant human FUS protein in vitro. To validate that FUS functions as an endogenous regulator of RAN translation, we additionally evaluated the function of Drosophila caz protein as well. We are afraid that the first sentence of the eLife assessment, that is, “This important study demonstrates that the Drosophila FUS protein, the human homolog of which is implicated in amyotrophic lateral sclerosis (ALS) and related conditions, …” is somewhat misleading. We would be happy if you modify this sentence like “This important study demonstrates that the human FUS protein, which is implicated in amyotrophic lateral sclerosis (ALS) and related conditions, …”.

    1. Author Response:

      The following is the authors' response to the original reviews.

      Reviewer #1 (Recommendations For The Authors):

      This is a list of suggestions the authors could use to improve the details of the manuscript:<br /> - it is not immediately clear what is meant by "modular" on line 38 and the corresponding paragraph. This aspect is not mentioned or developed in the Results.<br /> - the discussion of remapping vectors on lines 119-137 is particularly illuminating. It could have been interesting to generate surrogate manifolds separated by arbitrary remapping vectors and see how much the alignment metric (Procrustes shape) is sensitive to the dimensionality or amplitude of remapping vectors.<br /> - A visual comparison between Fig 1 D and H suggests a difference between the manifold geometry in experiments and in the model. It seems that the embedding dimensionality of ring manifolds is higher in the data than in the model. Is that the case? It could have been interesting to explore how much embedding dimensionality influences the alignment metric.<br /> - I could not find information about the initialization of the connectivity weights. An important possibility is that the degree of alignment (and the organization of remapping vectors) depends on the strength of initial random connectivity.<br /> - It might have been interesting to comment on the relationship between the top three PCS in Fig1 and the three readout vectors. To which extent are they aligned?<br /> - I found panels C and G in Fig 1 somewhat difficult to read. In panel C, the remapping seems to be aligned to the same position across all trials. This is not the case in panel G. I am not certain what the comparison is meant to convey, but it would help to have a similar alignment in C and G. Similarly, I was not sure what to conclude from the matrix in the right part of panel C, perhaps the legend should be expanded.<br /> - the comparison with remapping models of Misha Tsodyks could be expanded. The current discussion implies that the model of Romani & Tsodyks leads to less alignment than found in trained networks, but no direct evidence is given for that statement as far as I can tell.

      Reviewer #2 (Recommendations For The Authors):

      Minor points:

      All mentions of 'modularity' should be replaced with 'compositionality'.

      I found Supplementary Figure 2 highly confusing. I thought it was meant to help understand the analysis in Figure 1K and related figures. In the end, I never really understood what was happening in these figures. Do authors make perturbations along these different coding dimensions and compare the resulting maps? I wasn't sure what exactly the authors were calculating cosine similarity for. Maybe more exposition on this in the methods would help other readers as well.

      Was there any behavioral difference when the maps were not aligned?

      Why did the authors only go up to 10 contexts? Was this dependent on size of the network? Sorry if I missed this.

      Are remapping event aligned to unit axes? Would this change with different nonlinearities? This could be interesting in the context of (Driscoll et all 2022) and (Wittington et al 2022).

      Reviewer #3 (Recommendations For The Authors):

      Cueva, Ardalan, et al. 2021 arXiv:2111.01275 showed that RNNs trained to remember two circular variables develop a toroidal geometry to store this information, so consider citing this in your section on the toroidal manifolds.

      We thank the reviewers for their thoughtful comments. We appreciate that all three reviewers affirmed the importance of our work and the rigor of our approach. We believe that no major weaknesses were identified by the reviews. In our view, the comparisons between recurrent neural network models and experimental data are one of the most important contributions of our work, and all reviewers agreed that this was a core strength of the manuscript.

      The reviewers highlighted several future modeling directions that are raised by our results and that we did not explore in the manuscript. For example, Reviewer 2 suggests that we train networks on a navigation task alone, freeze the weights, and then train on a context discrimination task. We agree that this kind of contextual learning paradigm is of interest and could provide insight into biological remapping, such as that observed by Low et al. (2021). We also agree with Reviewer 3’s broader point that “There are many choices that must be made when simulating RNNs and there is a growing awareness that these choices can influence the kinds of solutions RNNs develop.” It is notable that we were able to reproduce the qualitative features of the experimental data without finely tuning hyperparameters (we used default settings in PyTorch layers), using a very basic training protocol (gradient descent with gradient clipping), and without adding any hand crafted regularization (though we agree that regularization could make the RNN solution look even more like the data).

      We believe that readers will benefit from reading the reviewers' suggestions, which are insightful and well-motivated. Having weighed the reviewer comments carefully, we feel that our manuscript stands as a complete scientific story. We hope that the public reviewer comments will inspire future investigations to fully explore these possibilities and unpack their outcomes at a level of detail that would not be possible in the context of our manuscript.

      Thus, we have chosen to implement the following minor changes suggested by the reviewers, which we hope will improve the clarity of the text and figures (summarized below). These changes do not alter the fundamental content of the manuscript.

      Text:

      • We corrected a few minor typos.

      • We updated the citations to follow the eLife citation style.

      • To address comments from Reviewers 1 and 2: we reworded the final paragraph of the Introduction (p. 3) to remove the term “modularity” and clarify our main finding. Those sentences now read, “The RNN geometry and algorithmic principles readily generalized from a simple task to more complex settings. Furthermore, we performed a new analysis of experimental data published in Low et al.26 and found a similar geometric structure in neural activity from a subset of sessions with more than two stable spatial maps.”

      • To address comments from Reviewer 1: in the first paragraph of the Results section A recurrent neural network model of 1D navigation and context inference remaps between aligned ring manifolds (p. 3), we added the sentence, “Remapping was not aligned to particular track positions, rewards, or landmarks.” to clarify that experimental result from Low et al. (2021).

      • To address comments from Reviewer 3: in the final paragraph of the Results section Aligned toroidal manifolds emerge in a 2D generalization of the task (p. 11) we clarified that models were trained “to estimate position on a 2D circular track.” We also added a citation to Cueva, Ardalan et al. (2021) with the following sentence, “Notably, each toroidal manifold alone is reminiscent of networks trained to store two circular variables without remapping.”

      • To address a question from Reviewer 2: in the final paragraph of the Results section Manifold alignment generalizes to three or more maps (p. 13), we added the following clarification: “In Supplemental Figure 3, we show that RNNs are capable of solving this task with larger numbers of latent states (more than three; for simplicity, we consider up to 10 states).”

      • To address a comment from Reviewer 1: in the fourth paragraph of the Discussion (p. 17), we removed the sentence, “Notably, our model captured aspects of the data that these previous forward-engineered models did not explore—namely, that the ring manifolds corresponding to the correlated spatial maps were much more aligned than expected by chance and than strictly required by the task.” to focus on the key point in the following sentence that, “forward-engineered models provide insights into how neural circuits may remap, but do not answer why they do so.”

      • To address comments from Reviewers 1 and 2: we reworded the penultimate paragraph of the Discussion (p. 17–18) to clarify our findings and remove the term “modularity” (except when referencing papers that themselves use that term (Driscoll et al., 2022; Yang et al., 2019)). Those sentences now read:

      “When RNN architecture is explicitly designed to include dedicated neural subpopulations, these subpopulations can improve model performance on particular types of tasks (Beiran et al., 2021; Dubreuil et al., 2022). Thus, there is an emerging conclusion that RNNs use simple dynamical motifs as building blocks for more general and complex computations, which our results support. In particular, aligned ring attractors are a recurring, dynamical motif in our results, appearing first in a simple task setting (2 maps of a 1D environment) and subsequently as a component of RNN dynamics in more complex settings (e.g., as sub-manifolds of toroidal attractors in a 2D environment, see Figure 4). We can therefore conceptualize a pair of aligned ring manifolds as a dynamical “building block” that RNNs utilize to solve higher-dimensional generalizations of the task. Intriguingly, our novel analysis of neural data from Low et al. (2021) revealed that similar principles may hold in biological circuits—when three or more spatial maps were present in a recording, the pairs of ring manifolds tended to be aligned.”

      • To address questions from Reviewers 2 and 3: in the first paragraph of the Methods section RNN Model and Training Procedure (p. 21), we added the sentence: “The connection weights were randomly initialized from the uniform distribution 𝑈(−√1/N, √1/N), which is the default initialization scheme in PyTorch.”

      • To address a question from Reviewer 2: we added a third paragraph to the Methods section Manifold Geometry Analysis (p. 23), as follows:

      “In Figure 1K, 4G, 5G, and Supplementary Figure 2B, we calculate the angles between the input and output weights and the position subspace or remapping dimension. To find this angle, we calculated the cosine similarity between each weight vector and each subspace. Cosine similarity of 0 indicates that the weights were orthogonal to the subspace, while a similarity of 1 indicates that the weight vector was contained within the subspace.”

      • To address a question from Reviewer 1: we added the following sentence to the second paragraph of the Methods section Experimental Data (p. 24), “We performed the same analysis of trial-by-trial spatial stability to obtain the similarity matrices in Figure 1C and G.”

      Figures and legends:

      • To address a question from Reviewer 1: in Figure 1C and G, we added x-axis labels to the similarity matrices to clarify that these are trial-by-trial correlations.

      • To address a question from Reviewer 1: we expanded the Figure 1C legend to clarify the experimental results as follows:

      Old legend:

      (C, left) An example medial entorhinal cortex neuron switches between two maps of the same track (top, raster; bottom, average firing rate by position; red, map 1; black, map 2). (C, right/top) Network-wide trial-by-trial correlations for the spatial firing pattern of all co-recorded neurons in the same example session (colorbar indicates correlation). (C, right/bottom) k-means map assignment.

      New legend:

      (C, left) An example medial entorhinal cortex neuron switches between two maps of the same track (top, spikes by trial and track position; bottom, average firing rate by position across trials from each map; red, map 1; black, map 2). (C, right/top) Correlation between the spatial firing patterns of all co-recorded neurons for each pair of trials in the same example session (dark gray, high correlation; light gray, low correlation). The population-wide activity is alternating between two stable maps across blocks of trials. (C, right/bottom) K-means clustering of spatial firing patterns results in a map assignment for each trial.

      • To address comments from Reviewer 3: in the legend of Figure 4C, we added the sentence “Note that the true tori are not linearly embeddable in 3 dimensions, so this projection is an approximation of the true torus structure.”

      • To address a question from Reviewer 2: we expanded the legend for Supplementary Figure 2 to clarify the purpose of the figure schematics as follows:

      Old legend:

      (A)  Schematic showing the orthogonalization of the position and context input and output weights.

      (B)  Reproduced from Figure 1K.

      (C-D) Schematic: How a single velocity input (blue arrows) updates the position estimate (yellow to red points) from the starting position (blue points).

      (C)  Velocity input lies in the position tuning subspace (gray plane)(hypothetical). Note that the same velocity input results in different final positions.

      (D)  Velocity input is orthogonal to the position tuning subspace (observed).

      (E)  Schematic of possible flow fields in each of the three planes (numbers correspond to planes in C and D), which would result in the correct positional estimate given orthogonal velocity inputs at different positions (D).

      New legend:

      (A)  Schematic showing the relative orientation of the position output weights and the context input and output weights to the position and state tuning subspaces.

      (B)  Reproduced from Figure 1K.

      (C-D) Schematic to interpret why the position input weights are orthogonal to the position tuning subspace. These schematics illustrate how a single velocity input (blue arrows) updates the position estimate (yellow to red points) from a given starting position (blue points).

      (C, not observed) Velocity input lies in the position tuning subspace (gray plane). Note that the same velocity input pushes the network clockwise or counterclockwise along the ring depending on the circular position

      (D, observed) Velocity input is orthogonal to the position tuning subspace and pushes neural activity out of the subspace.

      (E) Schematic of possible flow fields in each of three planes (numbers correspond to planes in C and D). We conjecture that these dynamics would enable a given orthogonal velocity input to nonlinearly update the position estimate, resulting in the correct translation around the ring regardless of starting position (as in D).

    1. Author Response:

      The following is the authors' response to the original reviews.

      We thank both reviewers for their comments, which have suggested changes that have improved the manuscript.

      Reviewer #1 (Public Review): 

      […] A weakness in the methodology is the link to tissue tension and conclusions about tissue mechanics. Methods that directly affect tissue tension and a more thorough and systematic application of laser ablation experiments would be needed to profoundly investigate mechanosensation and consequential effects on tissue tension by the various genetic perturbations.

      Response: In revision, we have added some additional experiments that examine altered tension.

      While the in-silico analysis of competing for F-actin binding sites for βH-Spec and myosin appears logical and supports the authors' claims, no point mutation or truncations were used to test these results in vivo.

      In its current structure the manuscript's strength, the genetic perturbations, is compromised by missing clear assessments of knockdown efficiencies early in the manuscript and other controls such as the actual effect on myosin by ROCK overactivation. 

      Response: In revision, we reorganized the manuscript and figures to document the knockdown efficiency earlier in the manuscript, and have added additional figure panels illustrating the effects of altered tension on myosin levels.

      Reviewer #2 (Public Review):

      […] The authors suggest that Ajuba is required for the effect of beta-heavy spectrin. However, it is still formally possible that this could be a parallel pathway that is being masked by the strong phenotype of Ajuba RNAi flies. 

      Response: While it is formally true that the genetic requirement for Jub could reflect a role in parallel to, rather than downstream of, spectrins, our conclusion that spectrins act through Jub is based not only on the genetic requirement for Jub, but also on the influence of spectrins on junctional tension and Jub localization, which indicate that spectrins influence Jub activity in a manner consistent with their affecting the Hippo pathway through Jub.

      One of the major points of the manuscript is the observation that alpha- and beta-heavy-spectrin are potentially working independently and not as part of a spectrin tetramer. This is mostly dependent on the observation that alpha- and beta-heavy-spectrin appear to have non-overlapping localizations at the membrane and the fact that alpha- and beta-heavy-spectrin localize at the membrane seemingly independently. It is not entirely obvious that a potential lack of colocalization and the fact that protein localization at the membrane is not affected when the other partner is absent is sufficient to argue that alpha- and beta-heavy-spectrin do not form a complex. Moreover, it is possible that the spectrin complexes are only formed in specific conditions (e.g. by modulating tissue tension). 

      Response: Our results argue that alpha- and beta-heavy-spectrin do not form a detectable complex in the wing disc under the conditions examined, and thus that they act independently is this context. However, we agree that it is possible that they could function together contexts, eg in other tissues or under different conditions, and we have revised the text in the Discussion to note this.

      If indeed spectrins function independently, would it not be expected to see additive effects when both spectrins are depleted? 

      Response: Not necessarily, since both alpha- and beta-heavy-spectrin act through Jub, and there may be a limit as to how much Yki activity can be increased by Jub (eg the increases in wing size induced by spectrin RNAi are similar to the increases in wing size observed with constitutive recruitment of Jub through alpha-catenin mutation (Alegot et al 2019).

      Related to the two previous points, the fact that the authors suggest that both alpha- and beta-heavy-spectrin regulate Hippo signaling via Ajuba would be consistent with the necessity of an alpha- and beta-heavy-spectrin complex being formed. How would the authors explain that both spectrins require Ajuba function but work independently? 

      Response: The different spectrins both affect Jub because they both affect cytoskeletal tension, but our results suggest that they act in different ways to affect tension. We have made some revisions to the Discussion section to try to make this clearer.

      Another major point of the manuscript is the potential competition between beta-heavy-spectrin and myosin for F-actin binding. The authors suggest that there is a mutual antagonism between the two proteins regarding apical F-actin. However, this has not been formally assessed. Moreover, despite the arguments put forward in the discussion, it seems hard to justify a competition for F-actin when beta-heavy-spectrin seems to be unable to compete with myosin. Myosin can displace beta-heavy-spectrin from F-actin but the reciprocal effect seems unlikely given the in vitro data. 

      Response: We show in vivo, in vitro, and in silico data that are all consistent with the inference that beta-heavy-spectrin and myosin compete for binding to F-actin. As the reviewer notes, and as we discuss, the in vitro competition experiments were limited because, for technical reason, we were unable to increase the protein concentrations higher. We also note that our in vitro experiments used an active form of myosin, which binds F-actin much more strongly than inactive myosin.

      Reviewer #1 (Recommendations For The Authors): <br /> While the flow of experiments is logical in general, I see major problems regarding the structure of the manuscript and essential controls: 

      • It is very confusing to have samples (kst-CRISPRa) in figures 1-3 that were not introduced in the text until the second-last paragraph of the results. I would suggest introducing this elegant overexpression experiment early in the manuscript as it fits well in the scope of these experiments or alternatively (if the authors prefer) make a new figure containing all the data regarding the overexpression in the end. 

      Response: We have now moved these results to a new figure (new Fig 7) that is described later in the text.

      • At the beginning of the manuscript, essential controls regarding the knockdown efficiency are missing in the main figure. Many of the key experiments are based on KD and as a reader, I want to assess their efficiency. Only in Figure 4, at the end of the manuscript, KST and α-Spec KD efficiency is revealed - this should be shown earlier and quantified properly. While reading the manuscript in its current form, the doubt remains that differences e.g. in α-Spec and KST KD can be explained by varying knockdown efficiencies as their levels can't be assessed. 

      Response: We have now moved these results to a new supplemental figure (Fig 1-supplement 1) that is cited earlier in the text.

      • On a similar line, in Figure 5 where myosin activity is perturbed, induction or repression of myosin activity is only suggested but not formally shown. The authors have to demonstrate that this is indeed the case by showing the myosin signal, ideally accompanied by measurement of tissue tension. 

      Response: This was not included because we and others have assessed these manipulations in earlier publications. However, as requested we have now added a supplemental figure (Fig 6 supplement 1) showing myosin levels in these genotypes.

      • On p. 7, the authors claim that "The epistasis of jub to kst suggests that βH-Spec regulates wing size through its tension-dependent regulation of Jub." While the authors show that KST KD increases myosin and junctional Jub, and that the wing overgrowth phenotype of KST KD depends on Jub, the tension-dependency was not demonstrated. To make that claim, the tension profile should be perturbed e.g. by overexpression of rok, myosin mutants (as the authors do in Fig 5) and the effect on Jub should be analyzed. Induction of tension in these conditions should be measured by laser ablation or a suitable alternative method. It might well be that the induction of Jub in KST KD is not via tension but an alternative mechanism such as the release of steric hindrance, interaction competition, etc. Also: Does KD of Jub affect spectrin localization? 

      Response: The effect of tension on Jub, and the effects of the myosin activity changes we employed on tension, have been analyzed in prior publications (eg Rauskolb et al 2014). To further address the issue raised by the reviewer here as to whether Kst affects Jub and wing growth via tension, we have also now added an additional experiment (Fig 3 supplement 1) in which we decreased tension in a βH-Spec RNAi wing disc by simultaneously expressing RNAi targeting Rok. The results show that the wing growth and Jub accumulation associated with βH-Spec RNAi are suppressed by Rok RNAi, consistent with our conclusion that these effects are mediated via cytoskeletal tension.

      As KD of Jub alters the pattern of myosin accumulation in wing discs (Rauskolb et al 2019) it could be expected to have a complementary influence on βH-Spec localization, but we have not examined this.

      • The authors make a very strong point in saying "The influence of βH-Spec on junctional tension is thus a direct consequence of its competition with myosin for overlapping binding sites on F-actin." While the authors provide some in vitro and in silico evidence, it was for example not possible to outcompete myosin by increasing levels of KST CH1-CH2 domains in vitro (for possible reasons the authors discuss). More importantly, the hypothesis that competition for actin binding is the definite cause of the antagonizing effect was not tested in vivo. Overexpression of a mutant version of KST that is unable to bind F-actin, or that has an increased affinity (etc) for actin was not tested. Such an experiment would be very valuable to enrich this manuscript but at least, claims like that have to be less bold and need to be written in a more speculative language. 

      Response: We consider creating and analyzing mutant forms of Kst in vivo to be beyond the scope of this manuscript, but as suggested we have now modified the text highlighted by the Reviewer to be more cautious.

      Further points: 

      • Why does the thickness of the wing disc epithelium change due to KST and α Spec KD, the authors should introduce this experiment better and draw a proper conclusion. Is there any relocalization of myosin along the apical-basal axis? Can the authors speculate about the differences between KST and α Spec KD? 

      Response: The epithelium thickness changes with α-Spec KD, but does not change with Kst KD. We think the explanation is provided by work from the Pan lab (done mainly in pupal eyes), which reported decreased cortical tension and increased apical area when α-Spec is lost. The interpretation in essence is that with the loss of attachment of F-actin to membranes along the lateral sides of the cells, the sides of the cells are "softer" and the cells expand laterally and thus also (by conservation of volume) shorten apical-basally. This is somewhat speculative, and it's not a focus of our study, but we have added some text to try to explain this better. Myosin along apical-basal axis was not visibly altered, but it is harder to analyze as it is very weak compared to junctional myosin.

      • Given the authors' observation of differences in the relative localization of KST and α Spec (Figure 4), proper quantification of KST, α Spec and myosin levels along the apical-basal cell axis would be important. This would also ease data interpretation. 

      Response: We have now added a higher resolution image and also a line scan of Kst, α-Spec  and Myo in a new supplemental figure (Fig 6 supplement 1)

      • KD of α Spec seems to induce myosin activity more, causes a bigger reduction of wing thickness, a stronger induction of Jub, and a similar effect on wing size. What lead the authors to focus on KST rather than α Spec regarding the detailed analysis of myosin competition? 

      Response: Our observations identify a competition between Kst and myosin, but we have no indication that α-Spec competes with myosin. (It's conceivable that β-Spec might also compete with myosin in some contexts, but wing discs would not be a good place to examine this because the localization profiles of β-Spec and Myosin are so different).

      • A big criticism regarding the figures is the bad color choice which makes it difficult to decipher the fluorescent signals. Likewise, the labels are difficult to read with the present coloring. They should really be changed. 

      Response: We have now changed the single color images to gray scale (for multi-color images we retain RGB coloring).

      A minor point: 

      • To make the manuscript more accessible for researchers outside the Drosophila field, I'd suggest adding explanatory labels for Drosophila-specific terms such as hyperactive myosin for sqhEE, a scheme to show where UAS-dcr2 is active, explain the purpose of Rfp expression as a control for tissue specificity, etc. 

      Response: We have added some explanations to the text to try to make this clearer.

      Reviewer #2 (Recommendations For The Authors): <br /> Major points: 

      In lines 99-101, the authors mention that Deng et al., 2015 report that the depletion of spectrins leads to an increase in pMLC, with no associated changes in the colocalization of myosin and F-actin. It is more accurate to mention that Deng et al. suggest that the levels of a GFP-tagged rescue construct of MLC (Sqh) are unchanged in alpha-spectrin mutants, although this was not formally quantified. Moreover, there was not a formal assessment of colocalization between MLC and F-actin, but rather a suggestion that F-actin levels are unaffected by the alpha-spectrin mutation. Finally, Deng et al. mostly analyzed alpha-spectrin so it remains possible that the new results shown by the authors are compatible with the initial observations from Deng and colleagues. 

      Response: As suggested, we revised the text to note that Deng et al., 2015 specifically examined Sqh:GFP. While we agree that our focus is more on Kst and Deng et al focused on α-Spec, we also examined α-Spec, and as described our results examining Myosin and Jub differ from what was reported by Deng et al 2015.

      As mentioned above, it is still possible that spectrins and Ajuba are working in parallel and Ajuba is not necessarily downstream of spectrins. The strong phenotype of Ajuba RNAi flies in adult wings could mask the effect of spectrins. Are the results similar in other settings, such as in the absence of Dicer2? Also, can Ajuba RNAi phenotypes be modified by overexpression of spectrins? This would provide further evidence of a link to Ajuba function. 

      Response: While formally it is true that the genetic requirement for Jub could reflect a role in parallel to, rather than downstream of, spectrins, our conclusion that spectrins act through Jub is based not only on the genetic requirement for Jub, but also on the influence of spectrins on junctional tension and Jub localization, which indicate that spectrins influence Jub activity in a manner consistent with their affecting the Hippo pathway through Jub.

      We would not expect over-expression of spectrins in a jub RNAi background to further reduce Hippo signaling, and as the jub RNAi phenotype is much stronger than the Kst over-expression phenotype even if there were an effect it would likely be difficult to detect.

      Regarding the potential independent functions of spectrins, it would be interesting to determine if alpha- and beta-heavy-spectrin can still interact at the level of the AJ despite the fact that their distributions appear to be partly non-overlapping. Would it be possible to assess this using PLA? If an interaction is not detected via PLA, it would be more convincing that spectrins are functioning independently. 

      Response: We have now performed this experiment, and no significant signal was detected by PLA. As a control, we used identical antibodies (GFP and α-Spec) to conduct PLA on α-Spec and β-Spec, and we did detect signal by PLA. These results (included in a revised Figure 4) further support the conclusion that α-Spec and βH-Spec are not physically associated in wing discs.

      Related to this point, if the spectrins work independently, it is reasonable to assume that they could display additive effects. Is this the case? If alpha- and beta-heavy-spectrin are simultaneously depleted are the phenotypes more severe than either depletion alone? 

      Response: We disagree here. Since both alpha- and beta-heavy-spectrin act through tension and Jub, and there is likely a limit as to how much Yki activity can be increased by this pathway. For example, the increases in wing size induced by spectrin RNAi are similar to the increases in wing size observed with constitutive recruitment of Jub through alpha-catenin mutation (Alegot et al 2019), which may thus represent the maximum increase that can be induced through this pathway (as there are multiple, independent factors that regulate Hippo signaling).

      Authors should modulate membrane tension and assess if this affects the localization of alpha- and beta-heavy-spectrin and, specifically, their colocalization, as their interaction could be regulated. 

      Response: As reported, we do see effects of tension on βH-Spec localization. We would not expect significant effects of membrane tension on α-Spec localization, but we consider analysis of this outside the scope of this manuscript.

      In lines 185-187, the authors mention that beta-spectrin depletion does not affect beta-heavy-spectrin localization. Interestingly, Figure 4E appears to show that the levels of Kst-YFP appear to be lower in the beta-spectrin-depleted tissue. The localization of beta-heavy-spectrin is not necessarily affected but the overall levels could be. 

      Response: Indeed the levels appear slightly lower, but elucidating the reason for this will require further experiments that are beyond the scope of this manuscript (we suspect it is because cytoskeletal tension increases in β-Spec-depleted tissue as it does in α-Spec depleted tissue, which based on our observations should decrease levels of Kst at near junctions). The key point of these experiments was to show that α-Spec localization does not require βH-Spec, but does require β-Spec, which supports our conclusion that in wing discs α-Spec forms a complex with β-Spec but not with βH-Spec.

      In lines 200-203, the authors state that beta-heavy-spectrin and myosin colocalize extensively at the apical region. However, this colocalization is not as clear as stated. Do the authors have alternative data that suggests that the two proteins are indeed colocalizing? Would it be possible to perform PLA to detect a potential colocalization? 

      Response: Unfortunately we do not have antibodies against both proteins that work well enough for PLA. However, we quantified the co-localization by analysis of Pearson's correlation coefficient, as reported in the manuscript. We also added an additional higher magnification image, and a line scan, in a supplemental figure (Fig. 6 supplement 1).

      Authors should try to assess and quantify colocalization with F-actin for both beta-heavy-spectrin and myosin in wild-type conditions and when the levels (and/or activity) for each of them are modulated. 

      Response: We have added quantification of the co-localization of βH-Spec with F-actin and of myosin with F-actin to the revised manuscript.

      Minor points: 

      In lines 122-124, the authors should clarify the relevance of the observation that alpha-spectrin knockdown affects the thickness of the wing disc epithelium. 

      Response: We have added some text to try to elaborate on this.

      In the intro, it is perhaps necessary to mention that there are conflicting reports regarding the role of spectrins in the regulation of cell proliferation, at least in the follicular epithelium. For instance, Ng et al., 2016 argued that spectrins do not regulate cell proliferation in FECs. 

      Response: Rather than wading into a detailed discussion of issues that are peripheral to this study, we modified the text in the Introduction to avoid implying that spectrins control cell proliferation in the ovary.

      In Figures 1, 2, 3, and 4 (and respective supplements), it is encouraged that, wherever appropriate, the authors mark the different compartments or the relevant boundary using dashed lines, to more clearly indicate the regions to compare. 

      Response: We have now done this.

      In Figure 2, supplement 1 panels C and D should have an indication of the genotype for clarity. 

      Response: We have now added this.

      In lines 362-367, the authors suggest that other actin-binding proteins are likely to influence the role of beta-heavy-spectrin. Have the authors tested the role of spectrin interactors such as Ankyrin and Adducin?

      Response: No, we have not examined this.

    1. Author Response:

      The following is the authors' response to the original reviews.

      We were pleased with the overall enthusiastic comments of the reviewers:

      • Reviewer #1: “This manuscript by Mahlandt, et al. presents a significant advance in the manipulation of endothelial barriers with spatiotemporal precision”

      • Reviewer #2: “The immediate and repeatable responses of barrier integrity changes upon light-on and light-off switches are fascinating and impressive.”

      • Reviewer #3: “, these molecular tools will be of broad interest to cell biologists interested in this family of GTPases.”

      We thank the reviewers for their fair and constructive comments that helped us to improve the manuscript.

      Reviewer #1 (Recommendations For The Authors):

      1) This paper is likely to attract a diverse audience. However, the order of data presented in this manuscript can be confusing or challenging to follow for the naive reader. This is because the tool characterization is split into two parts: before the barrier strength assay (selection of optogenetic platform and tool expression) and after (characterization of cell morphology with global and local optogenetic stimulation). Reorganizing the results such that the barrier strength results follows from an understanding of individual cell responses to stimulation may improve the ability of this readership to understand the factors at play in the changes in barrier strength observed when opto-RhoGEFs are activated.

      We appreciate this idea, and we initially structured the paper in the proposed order and then decided, that we wanted to put more focus on the barrier strength results by already presenting them in the second figure. Therefore, we prefer to keep this order of figures.

      2) While the description of the selection of iLID as the study's optogenetic platform is clear, a better job could be done motivating the need for engineering new optogenetic tools for the control of GEF recruitment. Given that iLID-based tools for GEFs of RhoA, Rac1, and Cdc42 already exist, some of which are cited in the introduction, more information on why these tools were not used would be helpful-were these tools tested in endothelial cells and found lacking.

      The original system has the domain structure DHPH-tagRFP-SspB. But we wanted to work with a SspB-FP-GEF construct, which would allow easy exchange of the FP and the DHPH domain. This modular approach allowed us to generate and compare the mCherry, iRFP647 and HaloTag version. We don’t want to claim that we engineered an entirely new optogenetic tool but rather optimized an existing one with different tags. To make this more clear we added : ‘The membrane tag of the original iLID was changed to an optimized anchor. In addition, we modified the sequence of the domains to SspB, tag, GEF to simplify the exchange of GEF and genetically encoded tag. A set of plasmids with different fluorescent tags was created for more flexibility in co-imaging.’

      3) Comment on the reason behind using DHPH vs. DH domains for each GEF is needed.

      We have previously found (and this is supported by biochemical analysis of GEF activity) that the selected domains provide the best activity. We will add reference and the following to the text: ‘Their catalytic active DHPH domains were used for ITSN1 and TIAM1 (Reinhard et al., 2019).  In case of p63 the DH domain only was used, because the PH domain of p63 inhibits the GEF activity (Van Unen et al., 2015) (Fig. 1E).

      4) Since multiple Rho GTPases (e.g., RhoA, RhoB, RhoC) exist and Rho is used as the name of the GTPase family, please use RhoA where applicable for clarity.

      Since the RhoGEFp63 will activate RhoA/B/C we would rather not refer to RhoA only. We will clarify this in the text: ‘Three GEFs were selected, ITSN1, TIAM1 and RhoGEFp63, which are known to specifically activate respectively Cdc42, Rac and Rho and their isoforms.’

      5) A brief comment on the use of HeLa cells for protein engineering and characterization (versus the endothelial cells motivated in the introduction) may be helpful.

      We added the following to the text: ‘HeLa cells were used for the tool optimization because of easier handling and  higher transfection rate in comparison to endothelial cells.

      Minor suggestions:

      In figure 1C, line sections showing intensity profiles before and after protein dimerization might further emphasize the change in biosensor localization.

      We are not a fan of intensity profiles as the profile depends strongly on the position of the line and it basically turns a 2D image in 1D data, for a single image. So, we prefer to stick to the quantification as shown in panel 1B (which shows data from multiple cells).

      Reviewer #2 (Recommendations For The Authors):

      1)The study has analyzed the effects of light-induced activation of the three optogenetic constructs in endothelial cells on their barrier function (electrical resistance) at high cell density and correlated the findings with the cellular overlap-producing effects on endothelial cells cultured at sparse cell density. It should be tried to show these effects at a cell density where these light-induced effects increase electrical resistance. Lifeact with different chromophores in adjacent cells might be useful.

      We had attempted to measure the overlap in a monolayer by taking advantage of the Halotag and the variety of dyes available by staining one pool of cells red with JF 552 nm and the other far red with the JF 635 nm dye. However, the cells need at least 24 h to form a monolayer and by then they had exchanged the dye and red and far red pool could not be distinguished any longer.

      Therefore, we used the Lck-mTq2-iLID construct, which already marks the plasma membrane of the cells. We created a mosaic monolayer of cells expressing mScarlet-CaaX and cells expressing Lck-mTq2-iLID + SspB-HaloTag-TIAM(DHPH). We observed and increase in the overlap between cells under this condition. The results have been added to figure 4 - figure supplement 2I&J. To the text we added:

      'Additionally, cell-cell membrane overlap increased about 20 %, up on photo-activation of OptoTIAM, in a mosaic expression monolayer (figure 4 - figure supplement 2I,J, Animation 22)‘

      2) The authors correctly state that some reports have shown that S1P can increase endothelial barrier function in VE-cadherin independent ways and these are related to Rac and Cdc42. This was also shown for Tie-2 in vitro and even in vitro in the absence of VE-cadherin and should also be mentioned.

      We added the following to the text: ‘Not only S1P promotes endothelial barrier independent from VE-cadherin, also Tie2 can increase barrier resistance in the absence of VE-cadherin (Frye et al. 2015).

      Since a blocking antibody against VE-cadherin was used, a negative control antibody should be tested which also binds to endothelial cells.

      To visualize the cell-cell junctions in the experiment shown in Supplemental Fig 3.1, we added a non-blocking VE-cadherin antibody that is directly labeled with ALEXA 647 and shows normal junction morphology. These experiments already give an indication that the live labeling antibody of VE-cadherin does not disturb the junction morphology. However, when we added the blocking antibody against VE-cadherin, known to interfere with the trans-interactions of VE-cadherin, a rapid disruption of the junctions is observed.

      Additionally, previous work has shown, that VE-cadherin labeling antibody does not interfere with junction dynamics and function (see Figure 2.A, Kroon et al. 2014 ‘Real-time imaging of endothelial cell-cell junctions during neutrophil transmigration under physiological flow’, jove.). We have added the figures below, showing that addition of the control IgG and VE-cadherin 55-7H1 Abs at the timepoint where the dotted line is, did not interfere with the resistance whereas the blocking Ab drastically reduced resistance. We have added this reference to the results. ‘Previous work has shown the specific blocking effect of this antibody in comparison to the VE-cadherin (55-7H1) labeling antibody (Kroon et al., 2014).’

      Author response image 1.

      Reviewer #3 (Recommendations For The Authors):

      Additional comments for the authors:

      1) The introduction is very long and would benefit from a more concise emphasis on the information required to put the work and results in context and understand their importance.

      Comment: we appreciate the comment of the reviewer. However, we wish to introduce the topic and the tools thoroughly and therefore we chose to keep the introduction as it is.

      2) The N-terminal membrane-binding domain does not homogeneously translocate to the plasma membrane, since lck is a raft-associated kinase. Please comment on this.

      In our hands, the Lck is among the most selective and efficient tags for plasma membrane localization (https://doi.org/10.1101/160374). We do observe homogeneous translocation, but our resolution is limited to ~200 nm and so we cannot exclude that the Lck concentrates in structures smaller than 200 nm. Given the robust performance of the lck-based iLID anchor in the optogenetics experiments, we think that the Lck anchor is a good choice.

      3) Figure 1D is not very clear. What does 25 or 36% change mean? If iLID tg is conjugated to these sequences, its cytosolic localization should be reduced versus iLID alone. Is this what the graph wants to express? If so, please, label properly the ordinate axis in the graph (% of non-tagged iLID values?)

      The graph is representing the recruitment efficiency of SspB to the plasma membrane for the two different membrane tags, targeting iLID to the plasma membrane. The recruitment efficiency was measured by the depletion of SspB-mScarlet intensity in the cytosol, up on light activation, and represented as a change in percentage.

      We added the following to the title of the graph_: SspB recruitment efficiency for Plasma Membrane tagged iLID._

      4) Supplemental figures in the main text. Fig S1D in the text refers to data in Fig S1E and Fig S1E is supposed to be Fig S1F? (page 11).

      That is correct. The mistakes have been corrected (and this is now renamed to figure 1 - figure supplement 1E and 1F).

      5) Figure 3. Contribution of VE-cadherin. Other junctional complexes, such as tight junctions may also intervene. However, these results would also suggest that cell-substrate adhesion rather than cell-cell junctions may modulate the barrier properties, as it has been previously demonstrated for example by imatinib-mediated activation of Rac1 (Aman et al. Circulation 2012). The ECIS system used to measure TEER in the quantitative barrier function assays can modulate these measurements and discriminate between paracellular permeability (Rb) and cell-substrate adhesion (alpha). Please, provide whether the optogenetic modulation of these GTPases does indeed regulate Rb or alpha.

      The measured impedance is made up of two components: capacitance and resistance. At relatively high AC frequencies (> 32,000 Hz) more current capacitively couples directly through the plasma membranes. At relatively low frequencies (≤ 4000 Hz), the current flows in the solution channels under and between adjacent endothelial cells’ (https://www.biophysics.com/whatIsECIS.php).

      Therefore, the high frequency impedance is representing cell-substrate adhesion whereas the low frequency responds more strongly to changes in cell-cell junction connections.

      We only measured at 4000 Hz, representing the paracellular permeability. We chose a single frequency to maximize time resolution.

      We have added this extra comment to the legend of the figure: ‘(B) Resistance of a monolayer of BOECs stably expressing Lck-mTurquoise2-iLID, solely as a control (grey), and either SspB-HaloTag-TIAM1(DHPH)(purple)/ ITSN1(DHPH) (blue) or p63RhoGEF(DH) (green) measured with ECIS at 4000 Hz, representing paracellular permeability, every 10 s.

    1. Author Response

      eLife assessment

      In this work, the authors provide important mechanistic insights into how the intracellular effector protein Calcineurin B homologous protein 3 (CHP3) can be regulated in a calcium-independent manner to expose its lipid binding site. Compelling evidence demonstrates a binding partner protein (NHE1) triggers a conformation change and exposure of the myristoyl group in CHP3 resulting in membrane association. This provides mechanistic insight into the signalling mechanisms achieved by CHP3 in a target-dependent manner, which will be of broad scientific interest.

      Thank you for providing an accompanying eLife assessment. As we slightly modified the name of the novel mechanism to meet the suggestion of reviewer 2, and to emphasize the binding to a lipid membrane, we suggest the following update:

      “In this work, the authors provide important mechanistic insights into how the intracellular effector protein Calcineurin B homologous protein 3 (CHP3) can be regulated in a calcium-independent manner to expose its lipid membrane binding site. Compelling evidence demonstrates a binding partner protein (NHE1) triggers a conformation change and exposure of the myristoyl group in CHP3 resulting in membrane association. This provides mechanistic insight into the signalling mechanisms achieved by CHP3 in a target-binding dependent manner, which will be of broad scientific interest.

      Reviewer #1 (Public Review):

      This study examines the effects of Ca2+ and NHE1 peptide binding on the conformation of CHP3, one of three related calcineurin-homologous proteins. One question that is addressed is whether Ca2+ binding triggers membrane association of the myristoyl group, a so-called "Ca2+-myristoyl switch". This is convincingly demonstrated to not be the case by the experiment in Figure 6B: unlike myristoylated recoverin, mCHP3 does not show enhanced association with liposomes. In the presence of a target peptide, however, myristoylation enhances membrane association. Curiously, this interaction is not Ca2+ dependent, but the membrane association of the non-myristoylated CHP3 is Ca2+-dependent.

      My concerns with this study relate to physiological relevance. First, it is unclear if Ca2+ binding has a regulatory function in any of the CHP proteins. The authors state that CHP1 and CHP2 have Ca2+ binding affinities <100 nM, so these proteins are likely saturated with Ca2+ under all physiological conditions. On the other hand, CHP3 binds Ca2+ with a Kd of 8 micromolar (in the presence of physiological concentrations of Mg2+) so it will be largely unbound under most normal cellular concentrations of Ca2+ which are in the submicromolar range. Free Ca2+ rarely reaches 1 micromolar under non-pathological concentrations, and if it does, the fraction of CHP3 bound to Ca2+ should be estimated for context. Given these caveats, I am not convinced that experiments done with millimolar concentrations of Ca2+ (e.g., Figures 2, 3, 6) are physiologically informative.

      Precise knowledge on the distinct and isoform-specific molecular basis of the important physiological roles of calcineurin homologous proteins is only emerging. Here, we ruled out the suggested Ca2+-myristoyl switch and showed that instead, target-binding (NHE1-peptide) induces membrane association of myristoylated CHP3. In respect to Ca2+ response, we showed in this work and previous studies that all CHPs undergo Ca2+-induced conformational changes, a feature that is required for EFCaBPs to act as Ca2+ sensor. Millimolar Ca2+ concentrations are commonly used in this type of in vitro characterization to ensure uniform conformational states of the protein, thus we followed this approach. We agree that in future studies, the distinct molecular responses to Ca2+ signals have to be studied in cellular context. So far, one can state that for CHP1 and CHP2, affinities for Ca2+ were reported with Kd values of ~90 nM determined in vitro in the absence of Mg2+. This is close to the cellular Ca2+ concentration in the resting cell, but would not lead to saturation of all CHP1 or CHP2 molecules in the cell with Ca2+. The presence of Mg2+ in the cell may further attenuate the affinity of CHPs for Ca2+. One cannot exclude, that CHP1 and CHP2 could respond to Ca2+ signals in the cell. For target-free CHP3, a Kd of 3.5 µM for Ca2+ in the presence of Mg2+ was reported, so it is unlikely to respond to Ca2+-signals. However, target binding (at least for NHE1) does not require the presence of Ca2+ (as shown in the present study), and target binding can increase Ca2+-binding affinity of EFCaBPs up to 100 fold (reported 45-fold for CHP1 and 42-fold for CHP2). Target-bound CHP3 might have an affinity for Ca2+ that enables a response to Ca2+-signals.

      Reviewer #2 (Public Review):

      The manuscript by Becker and coworkers describes a target-binding myristoyl switch in the calcium-binding EF hand protein CHP3 using one of its targets, the NHE1. The work uses a suite of biophysical methods including SEC, nanoDSF, fluorescence, and native MS, to address conformations, ligand binding (Ca2+, Mg2+, NHE1), and liposome association, pinpointing a conformation switch which they term a target-dependent myristoyl switch. The strength of the manuscript is a convincing mapping of the different conformations and the conclusion that target binding, and not Ca2+ binding is necessary to expel the lipid from the protein, and that this jointly enhances membrane binding. It would have been even stronger if additional structural data had been included to address the properties of the different states and hence support if there indeed are changes in dynamics and flexibility.

      We are thankful to Reviewer #2 for a number of valuable comments on our manuscript which we addressed systematically to enhance description and discussion of our results. Specifically, we clarified the use of conformation, flexibility, state, dynamics and now consistently refer to distinct states of the protein (Ca2+-, Mg2+- and apo-state) as well as defined conformations (open, closed and target-bound). We agree that structural characterization is important, yet, it is beyond the focus of the present biochemical and biophysical characterization and needs to be addressed in future studies.

      Reviewer #3 (Public Review):

      This work provides new insights into the regulation of the intracellular effector protein Calcineurin B homologous protein 3 (CHP3). The authors precisely delineate how intracellular calcium signals and myristoylation affect the binding of CHP3 to lipid membranes and the sodium/proton exchanger NHE1. Different mechanisms are known to trigger the exposure of the myristoyl-moiety in the calcium-binding protein family and CHP3 was proposed to use a "calcium-myristoyl switch", which leads to exposure of the myristoyl group due to conformational changes in the protein triggered by calcium-binding. Becker and Fuchs et al. now demonstrate that CHP3 uses a novel mechanism, in which not calcium-binding but binding to the target protein NHE1 triggers exposure of its myristoyl-group. This paper represents a detailed functional characterization of CHP3 and the maximum level of mechanistic interpretation that can be achieved without high-resolution structural information.

      The conclusions of this paper are fully supported by the data.

      Strengths

      The protein biochemistry is of an exceptionally high level, both with respect to the quality of the material and the stringency with which the authors assess and assure the protein quality. The authors purify CHP3 without any affinity tags, and thus in its most representative relevant state. Their validations indicate that complete myristoylation of CHP3 is achieved and that all protein is functional with respect to calcium binding.

      The authors go to extensive lengths to convince themselves of the quality of their data and their interpretation. They use an extensive amount of replicates, including both biological and technical replicates. Assays and experimental procedures are verified using model proteins, such as Recoverin. In addition, the authors employ an extensive set of complementary approaches to assure their observations are universal.

      We highly appreciate the positive feedback of Reviewer #3 on our experimental design and quality of biochemical data.

      Weaknesses

      A small weakness is the fact that the interpretation in terms of mechanistic insights contributed by some of the assays employed is rather limited, resulting in comparably unprecise descriptions of the state of the protein such as "affects the conformation and/or flexibility of CHP3" or the "open" and "closed" conformations. As indicated by the authors, structural studies are required to precisely detail the conformational states and delineate their mechanism of action.

      We updated the manuscript for a stringent use of the descriptions “conformation”, “state” and “flexibility” to match terminology commonly used for EFCaBPs.

      The authors imply that the major form of CHP3 is the myristoylated state. However, it remains unclear whether the source of the biological material, which appears to be membrane-only, already implies a significant experimental bias that only allows (or highly favors) the identification of myristoylated CHP3. Without a calcium-signal, unmyristoylated CHP may not associate with membranes, or be less strong, resulting in its depletion upon isolation of the vesicles.

      We agree that our data are based on membrane fractions, so referring to the “major form of CHP3” was misleading. We updated two sentences as follows: “Finally, we investigated the N-terminal myristoylation status of membrane associated CHP3 in vivo using liquid-chromatography coupled mass spectrometry (LC-MS/MS). ………Together, this suggests that myristoylated CHP3 is both NHE1-associated and membrane-anchored in agreement with a target-induced exposure and membrane integration of the N-terminal myristoyl moiety.”

    1. Author Response

      Reviewer #1 (Public Review):

      The Introduction starts by setting up a straw-man argument, claiming that the assumption is that gene expression is set up as stable expression domains that undergo little or no subsequent change. I don't think that any current developmental biologist thinks this is true. The references used to support this claim are from the 1990s up to the early 2000s. There are numerous examples since then that show that developmental gene expression is dynamic as a rule.

      Our argument might seem like a strawman for certain sector of developmental biologists who work in the field of pattern formation, or aware of the latest advances in the field. However, a look at current publications on developmental enhancers reveals that the dominant model with which enhancer biologists interpret their data is still the French Flag model (specifically, the eve-stripe-2 model of enhancer function). We meant to address this audience, and attempted to clarify this from the very beginning by stating that “Much of our models of how enhancers work during development relies on the assumption that …”. Please, note here that we are talking about “models of how enhancers work”, not models of pattern formation in general.

      The Introduction then continues as a rather detailed review of enhancers, Tribolium methodology, tools for identifying enhancers, and more. The Introduction cites 99 references, which seems excessive for what is essentially an experimental paper. Significant parts of the Introduction can be trimmed or removed. There is no need to mention all the tools available for Tribolium if they are not used in the described experiments. A thorough analysis of the advantages and disadvantages of different modes of ATAC-seq is also beyond the scope of the Introduction. The authors should explain why they chose the tools they chose without excessive background.

      In the revised manuscript, we shortened the discussion of Tribolium methodologies and imaging techniques. However, we think that the paragraph discussing ATAC-seq strategies are important to justify our choices as why we took the effort to cut the embryos to perform tissue-specific ATAC-seq analysis, instead of performing whole-embryo ATAC-seq.

      Having said that, the Introduction actually overlooks a lot of significant work that is relevant to the subject of the paper. Specifically, the authors completely ignore all of the work on development in hemimetabolous insects such as Oncopeltus and Gryllus - the omission is glaring. There has been a lot of relevant work on dynamic gene expression patterns coming out of these species.

      You are right indeed. We apologize for that. We added now citations to relevant works from those to insect to the manuscript.

      The experimental setup involves cutting embryos into three sections at two time points. The results then discuss differences in "space" and "time" but there is no discussion of the embryological meaning of these terms. What is happening at the two time points from a developmental perspective? What is the difference between the three sections? There is a lot of relevant development going on at these stages and important regional differences, which have been well-studied in Tribolium and in other insects but are not even mentioned.

      A good point. Correlating chromatin landscape changes with embryological events is an interesting point that needs further analysis and the application of ATAC-seq to further timepoints. We chose leaving this to future work (possibly using single cell ATAC-seq). In this work, we restricted our analysis to the benefits of applying time- and tissue-specific ATAC-seq in predicting active enhancers. We added a note on this point in the discussion.

      In the preliminary results of the ATAC-seq analysis, it is clear that there are significant differences between the sections, which should come as no surprise, but fairly minor differences between the same section at the two time points. This could be because the two time points are pretty close together at a stage when there is a lot of repetitive patterning going on. A possible interpretation, which the authors don't mention because it goes against their main thesis, is that maybe most of the processes that are taking place at this stage are not dynamic enough to show up at the temporal resolution they have applied. This is worth at least a mention.

      We agree with this observation. We would like to draw the reviewer’s attention to our statement “Together, our findings indicate that changes in chromatin accessibility in Tribolium at this developmental stage are primarily associated with space rather than time…””. Detailed analysis of the chromatin dynamics across time would need taking more datapoints, which is something we plan to do in future work.

      The authors link each accessible site to the nearest gene when looking at putative enhancer function. This is a risky assumption since there are many examples of enhancer sites that are far upstream or downstream of the target gene and often closer to an unrelated gene than to the target gene. The authors should at least acknowledge this problem with their functional annotation.

      The reviewer is correct in that, in particular for large eukaryotic genomes, enhancers are often located far away from their target genes. We have no comprehensive enhancer-target data that would enable us to perform a more accurate analysis. Furthermore, the assumption that at least for some of the enhancers the nearest genes will also be their targets, and hence, provide insight into the function of the enhancers themselves seems reasonable given the relatively compact organization of the Tribolium genome. In any case, the analysis was just presented as one of several sanity checks for our ATAC-seq data; for the sake of streamlining the manuscript we no longer include this analysis in the current version of the manuscript.

      In the Discussion, the authors claim that contrary to how it may seem, the question they are addressing is not a "fringe problem". Once again, I think this is a straw man. No active researcher thinks that the question of dynamic regulation of gene expression during development is a fringe problem. On the contrary, most researchers will accept that this is one of the most interesting and important questions in current developmental biology.

      This whole argument was removed from the Discussion in the revised manuscript.

      Perhaps the most significant problem with the manuscript is that it is all built around the premise of enhancer switching between dynamic enhancers and static enhancers. The authors find one site that is consistent with their prediction for a dynamic enhancer and one site - regulating a different gene - that is consistent with their prediction for a static enhancer and claim that they have provided support for their model. I think this claim is grossly exaggerated. They present data that can be seen as consistent with their model but are a long way from providing evidence for it.

      We actually thought we were cautious enough about this. Nowhere in our text did we mention that our data “support” the enhancer switching model. We stated quite early (in the abstract, actually) that:

      “We found our data consistent with a model in which the timing of gene expression during embryonic pattern formation is mediated by a balancing act between enhancers that induce rapid changes in gene expressions (that we call ‘dynamic enhancers’) and enhancers that stabilizes gene expressions (that we call ‘static enhancers’).”

      To make this message clearer, we added the following sentence to the abstract of the revised manuscript: “However, more data is needed for a strong support for this or any other alternative models.” And again at the end of the Introductions: “While these data are in line with our Enhancer Switching model, more data is needed as a strong support for the model.” Also, at the end of the Results section examining runB enhancer dynamics, we stated: “However, this merely shows that runB activity dynamics are consistent with our model, but is still far from strongly supporting the model (more on that in the Discussion).” Also for the Results section on enhancer hbA dynamics: “Again, this merely shows that hbA activity dynamics are consistent with our model, but is still far from strongly supporting it.”.

      Moreover, in the opening paragraph of the Discussion, we explicitly and quite openly addressed this point, and suggested what kind of observations and experiments needed in the future to qualify as a “strong support” for the model. We even ran simulations for what kind of observation should one expect in enhancer deletion experiments if the model is correct (Figure 7).

      But it seems like discussing the enhancer switching model in detail gives the impression of its central importance to the paper. In our view, our experimental system is quite general and does not depend on that model, but the point of mentioning it is that it is an example of how could an alternative model of enhancer regulation be of relevance to the problem of dynamic gene expression. This wouldn’t be obvious without this or a similar model that is showing this, even if it is hypothetical. But since our presentation is obviously giving the impression that our claims are stronger that they really are, we altered our phrasing in the introduction of the revised manuscript to make our point clearer:

      “Despite its potential inaccuracies, the Enhancer Switching model exemplifies the type of alternative frameworks we need to explore in order to elucidate the mechanisms driving the generation of gene expression waves during development. Consequently, an appropriate model system is required, allowing us to test not only the Enhancer Switching model but also any other prospective model that provides a satisfactory explanation for the initiation of gene expression waves at the enhancer level.”

      We hope that this addresses the reviewer’s quite legitimate concerns.

      Like the Introduction, the Discussion includes long paragraphs (lines 450-480) that are more suitable for a review/hypothesis paper. The data presented in this manuscript has little relevance to the question of kinematic vs. trigger waves, and therefore there is no real reason for the question to be discussed here.

      We have now significantly shortened the discussion.

      Reviewer #2 (Public Review):

      Open questions:

      What happens with the runB enhancer at later stages of embryogenesis? With what kind of dynamics do the anterior-most stripes fade and does that agree with the model? Do they show the same dynamics throughout segmentation? I think later stages need to be shown because the prediction from the model would be that the dynamics are repeated with each wave. I am not so sure about the prediction for ageing stripes – yet it would have been interesting to see the model prediction and the activity of the static enhancer.

      Yes, the dynamics repeats in the germband. This is shown in Supplementary Figure 8. The dynamics in germband were shown by visualizing yellow mRNA and intronic probes. MS2 imaging was not possible to be used because the embryo dive into the yolk for a while, and then it becomes difficult to capture the germband in the right orientation for imaging. We are currently working to use light sheet microscopy for imaging germband stages.

      I understand that the mRNA of the reporter gene yellow is more stable than the runt mRNA. This might interfere with the possibility to test your prediction for static enhancers: The criterion is that the stripes should increase in strength as the wave migrates towards the anterior. You show this for runB – but given that yellow has a more stable transcript – could this lead to a “false positive” increase in intensity with the slower migration and accumulation of transcripts? I would feel more comfortable with the statement that this is a static enhancer if you could exclude that the signal is blurred by an artifact based on different mRNA stability. What about re-running the simulation (with the p–rameters that have shown to well reflect endogenous –unt mRNA levels) but i“creasing the parameter for the stability of the mRNA? Are static and dynamic enhancers still distinguishable? The claim of having found a static enhancer rests on this increase in signal, hence, other explanations need to be excluded carefully.

      Good questions. Note that runB reporter dynamics were examined not only by visualizing yellow mRNAs (which indeed seem to be more stable than endogenous run mRNA; see Supplementary Figure 10), but also using MS2 (with virtually zero mRNA stability; although stability was simulated in the shown movies to show virtual mRNA dynamics), and intronic yellow mRNA (showing de novo transcription; Supplementary Figure 10; you will need to zoom in to see intronic de novo transcripts). The expected dynamics of a static enhancer reporter is quite unique: it progressively increases initially as it propagates from posterior to anterior, then it progressively decreases as it slows down and stabilizes at the anterior. Then they eventually fade. These full range of dynamics is obvious in germband embryos stained for intronic yellow to show de novo transcription of runB enhancer reporter (Supplementary Figure 10; you will need to zoom in to see intronic de novo transcripts).

      Running the simulation for the model using different degradation rates for the enhancer reporter made the static enhancer’s expression either less or more persistent, but gave the same overall result: the static enhancer expression has diminished expression at the very posterior, but high expression as its expression wave exiting the growth-zone/SAZ. This is consistent with not only yellow mRNA expressions of runB, but with its intronic expression as well (Supplementary Figure 10; you will need to zoom in to see intronic de novo transcripts).

      What about the head domain of the runB enhancer (e.g. Fig. 6A lowest row): This seems to be different from endogenous expression in your work and in Choe et al. Is that aspect different from endogenous expression and can this be reconciled with your model?

      Yes, indeed this aspect cannot be explained by our model. We believe that head patterning in insects is regulated by a different regulatory network. This network might be (de)-activated by missing repressors in the selected DNA segment for runB enhancer. We mentioned this issue in the revised manuscript.

      The claim of similar dynamics of expression visualized by in situ and MS2 in vivo relies on comparing Fig. 6C with 6A. To compare these two panels, I would need to know to what stage in A the embryo in C should be compared. Actually, the stripe in 6C appears more crisp than the stripes in 6A.

      Were the enhancer dynamics tested in vivo at later stages as well? I would appreciate a clear statement on what stages can be visualized and where the technical boundaries are because this will influence any considerations by others using this system.

      One really cannot be that super-precise about the timing of a very dynamic process in space and time like this one we are studying. We believe that Figure 6D shows clearly that runB activity dynamics are similar to endogenous run expression.

      How do the reported accessibility dynamics of runA enhancer correlate with the activity of the reporter: E.g. is the enhancer open in the middle body region but closed at the posterior part of the embryo? Or is it closed at the anterior – and if so: why is there a signal of the reporter in the head?

      You show that chromatin accessibility dynamics help in identifying active enhancers. Is this idea new or is it based on previous experience with Drosophila (e.g. PMID: 29539636 or works cited in https://doi.org/10.1002/bies.201900188)? Or in what respect is this novel?

      Our manuscript contributes to the growing body of evidence confirming that accessibility per se does not imply activity. Of course, this is not a new idea, but given the widely use of accessibility as a proxy for enhancer activity in the genomics community, we do feel it is important to reiterate the message. As the reviewer correctly indicates, several published findings point to a correlation between accessibility dynamics and enhancer activity. However, to our knowledge, this is the first example in Tribolium. It is important to point out that what “dynamic” means strongly depends on the experimental design. Even in Drosophila, not enough studies have been conducted to fully understand the relationship (e.g., ideally, this should be done on a continuous time scale and at single cell level). We acknowledge in the manuscript that this relationship has been observed before in other species (and have added the references suggested by the reviewer, for which we are very grateful), but still believe that our observations are highly significant to the Tribolium community.

      Reviewer #3 (Public Review):

      I have two major concerns: First, the claim about differential accessibility being related to enhancer activity is not really established from the presented data, in my view. This needs to be clarified. (I do believe in the claim to some extent, but not based on presented evidence.)

      We agree with the reviewer that more data – and, more importantly, independent replication – are necessary to confirm this finding. Please, refer to our response to your comment regarding the statistical significance of the findings.

      Second, the evidence in support of the Enhancer Switching model for runt should be accompanied by identification of and spatiotemporal profiling of the “speed regulator”, if this is not established yet.

      Experiments supporting the role of Cad as a speed regulator for both pair-rule and gap genes have been published in El-Sherif et al PLOS Genetics 2014 and Zhu et al PNAS 2017. We added a comment stressing this fact.

      In addition to these two concerns, the simulations of the Enhancer Switching model need to be described, at least in the outline, in the Methods section.

      Done

    1. Author Response

      Reviewer #1 (Public Review):

      Specifically, the authors define "efficacy" (eta) of a ligand as the fractional change in binding free energy between the open and the closed states of the channel.

      We assume that the word in quotes is a typo; ղ is efficiency, not efficacy (now given the symbol λ). We now emphasize the distinction immediately after Eq. 2.

      1) One concern regards the clustering of the data sets in Fig. 5 into exactly 5 eta-classes. First, two clusters contain only two data points each. Second, the proposed "catch&hold LFER model" (Fig. 2) does not predict the existence of a discrete number of such eta-classes. How strong is the evidence that there are exactly 5 classes as opposed to a continuum of possible eta values.

      Statistical (x means cluster) analysis indicates that the 23 agonists segregate into 5 ղ classes. Groups with only 2 members (plus the intercept) are less well defined (Fig 4) but are supported by the 5 mutational ղ classes (Fig. 7). (see above)

      2) The authors do not discuss the uniqueness of the proposed model.

      see above. Ln 405 Induced fits are common.

      In fact, it seems to me that the existence of eta-classes might be explained just as well by an alternative model which assumes a single gating mechanism for the receptor,

      We are not sure what a “single gating mechanism” means. Does non-single refer to i) the2 stage induced fits (catch-hold LFER)? … ղ classes makes this conclusion unavoidable. ii) our conjecture that are there are 5 different C versus O binding site structural pairs…? Energy derives from structure, so we the 5 energy ratios indicate 5 structural pairs. iii) multiple steps inside gating (ϕ)? …So far there have not been any alternative explanations for the organized map of ϕ. iv) catch itself?... Evidence for this induced fit is given in Fig 2 and 7 SI, and on Ln 528-547 we discuss the implications of kon to C versus O. Ln 405 Local ‘Induced fit’ rearrangements in enzymes are common. We think the evidence is strong for the bottom scheme in Fig 2A.

      but distinct patterns of ligand-protein interactions for the different agonists.

      ղ classes derive from distinct interactions for different agonists, but what these are and whether the ‘contact number’ idea is useful are uncertain (see above).

      The pore opening-associated increase in agonist affinity is typically caused by a tightening of the substrate binding site (often called clamshell closure) …

      Ln 379-386 In the Discussion we now relate catch-hold to induced fit

      Ln 455, 461-463, 471-474 Fig 2SI and the induced fit to clamshell closure

      Reviewer #2 (Public Review):

      This is an interesting manuscript with a worthwhile approach to receptor mechanisms. The paper contains an impressive amount of new data. These single molecule concentration response curves have been compiled with care and the authors deserve great credit for obtaining these data.

      Ln 233 ղ can be estimated from a CRC built from whole-cell currents…

      Ln 150 …or indeed any method that estimates KdC and KdO (for example binding assays, or perhaps in silico simulations of AC and AO structures)

      I judge the main result to be that there are different values of the recently-proposed agonist-related quantity "efficiency".

      Ln 21, 26-27, 535-547 OK, but to us the most interesting insight is that in AChRs binding IS gating.

      These values are clustered into 5 quite closely spaced groups. The authors propose that these groups are the same whether considering mutations in the binding site or different agonists.

      see above

      It was unclear to me in several places, what new data and what old data are included in each figure. Therefore readers may have difficulty judging the claimed advance. This difficulty is not helped by the discussion, which includes some previous findings as "results".

      see above.

      A further weakness is that it is unclear how general or how specific these concepts are. The authors assert that they are, by definition, completely universal. However, we do not have reference to previous work or current data on any other receptor than the muscle nicotinic. I could not square the concept that "every receptor works like this" with the evident lack of desire to demonstrate this for any other receptor.

      Ln 132-136 There are reasons to think that receptors in general work according to Figure 1A. A thermalized ligand (for instance TriMA, MW 60) has the momentum of only ~3 water molecules. A momentum sensor would have terrible signal/noise.

      Reviewer #3 (Public Review):

      This work attempts to introduce a new attribute of the receptor- efficiency, a fraction of an agonist binding energy consumed by conformational transition of the receptor from resting to active (open) states. Furthermore, the authors use an impressive set of experimental data (single channel recordings with 23 agonists and 53 mutations) to measure the efficiency for each agonist and mutant receptor. All the estimated efficiencies fall into a few groups and inside each of the efficiency groups there is a strong correlation between agonist affinity and receptor opening efficacy.

      The main finding in this study is that estimated efficiencies fall into 5 groups.

      see above.

      There is no clear description of the method how the efficiencies were allocated into different groups. Most importantly, it is not clear if the method used takes into account the uncertainty of the efficiency estimate. The study does not show any statistical metrics of the efficiency estimates as well as any other calculated variable such as dissociation equilibrium constants to resting or open states. Surely, the uncertainty of the efficiency should matter especially considering how near the efficiency group values are (eg. difference about 10% between 0.51 and 0.56 or 0.41 and 0.45).

      see above

      All the tested agonists fell into groups according to the efficiency value attributed to them. It is difficult to see why some of the agonists belong to the same group. For example, it is not obvious at all why such agonists as epibatidine, decamethonium and TMP are in the same group. The question, I guess, arises if this grouping based on efficiency has any predictability value. Furthermore, if a series of mutations with the same agonist fall into different groups, the prediction power of this approach is very limited if one attempts to design a new agonist or look for a new mutation.

      see above and Ln 548-561 (last para of text). Efficiency is a relatively new idea. This report is one of only a few on the subject. More experiments with different receptors by more labs using other approaches are needed to ascertain whether ղ is general.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript will interest cognitive scientists, neuroimaging researchers, and neuroscientists interested in the systems-level organization of brain activity. The authors describe four brain states that are present across a wide range of cognitive tasks and determine that the relative distribution of the brain states shows both commonalities and differences across task conditions.

      The authors characterized the low-dimensional latent space that has been shown to capture the major features of intrinsic brain activity using four states obtained with a Hidden Markov Model. They related the four states to previously-described functional gradients in the brain and examined the relative contribution of each state under different cognitive conditions. They showed that states related to the measured behavior for each condition differed, but that a common state appears to reflect disengagement across conditions. The authors bring together a state-of-the-art analysis of systemslevel brain dynamics and cognitive neuroscience, bridging a gap that has long needed to be bridged.

      The strongest aspect of the study is its rigor. The authors use appropriate null models and examine multiple datasets (not used in the original analysis) to demonstrate that their findings replicate. Their thorough analysis convincingly supports their assertion that common states are present across a variety of conditions, but that different states may predict behavioural measures for different conditions. However, the authors could have better situated their work within the existing literature. It is not that a more exhaustive literature review is needed-it is that some of their results are unsurprising given the work reported in other manuscripts; some of their work reinforces or is reinforced by prior studies; and some of their work is not compared to similar findings obtained with other analysis approaches. While space is not unlimited, some of these gaps are important enough that they are worth addressing:

      We appreciate the reviewer’s thorough read of our manuscript and positive comments on its rigor and implications. We agree that the original version of the manuscript insufficiently situated this work in the existing literature. We have made extensive revisions to better place our findings in the context of prior work. These changes are described in detail below.

      1) The authors' own prior work on functional connectivity signatures of attention is not discussed in comparison to the latest work. Neither is work from other groups showing signatures of arousal that change over time, particularly in resting state scans. Attention and arousal are not the same things, but they are intertwined, and both have been linked to large-scale changes in brain activity that should be captured in the HMM latent states. The authors should discuss how the current work fits with existing studies.

      Thank you for raising this point. We agree that the relationship between low-dimensional latent states and predefined activity and functional connectivity signatures is an important and interesting question in both attention research and more general contexts. Here, we did not empirically relate the brain states examined in this study and functional connectivity signatures previously investigated in our lab (e.g., Rosenberg et al., 2016; Song et al., 2021a) because the research question and methodological complexities deserved separate attention that go beyond the scope of this paper. Therefore, we conceptually addressed the reviewer’s question on how functional connectivity signatures of attention are related to the brain states that were observed here. Next, we asked how arousal relates to the brain states by indirectly predicting arousal levels of each brain state based on its activity patterns’ spatial resemblance to the predefined arousal network template (Goodale et al., 2021).

      Latent states and dynamic functional connectivity

      Previous work suggested that, on medium time scales (~20-60 seconds), changes in functional connectivity signatures of sustained attention (Rosenberg et al., 2020) and narrative engagement (Song et al., 2021a) predicted changes in attentional states. How do these attention-related functional connectivity dynamics relate to latent state dynamics, measured on a shorter time scale (1 second)?

      Theoretically, there are reasons to think that these measures are related but not redundant. Both HMM and dynamic functional connectivity provide summary measures of the whole-brain functional interactions that evolve over time. Whereas HMM identifies recurring low-dimensional brain states, dynamic functional connectivity used in our and others’ prior studies captures high-dimensional dynamical patterns. Furthermore, while the mixture Gaussian function utilized to infer emission probability in our HMM infers the states from both the BOLD activity patterns and their interactions, functional connectivity considers only pairwise interactions between regions of interests. Thus, with a theoretical ground that the brain states can be characterized at multiple scales and different methods (Greene et al., 2023), we can hypothesize that the both measures could (and perhaps, should be able to) capture brain-wide latent state changes. For example, if we were to apply kmeans clustering methods on the sliding window-based dynamic functional connectivity as in Allen et al. (2014), the resulting clusters could arguably be similar to the latent states derived from the HMM.

      However, there are practical reasons why the correspondence between our prior dynamic functional connectivity models and current HMM states is difficult to test directly. A time point-bytime point matching of the HMM state sequence and dynamic functional connectivity is not feasible because, in our prior work, dynamic functional connectivity was measured in a sliding time window (~20-60 seconds), whereas the HMM state identification is conducted at every TR (1 second). An alternative would be to concatenate all time points that were categorized as each HMM state to compute representative functional connectivity of that state. This “splicing and concatenating” method, however, disrupts continuous BOLD-signal time series and has not previously been validated for use with our dynamic connectome-based predictive models. In addition, the difference in time series lengths across states would make comparisons of the four states’ functional connectomes unfair.

      One main focus of our manuscript was to relate brain dynamics (HMM state dynamics) to static manifold (functional connectivity gradients). We agree that a direct link between two measures of brain dynamics, HMM and dynamic functional connectivity, is an important research question. However, due to some intricacies that needed to be addressed to answer this question, we felt that it was beyond the scope of our paper. We are eager, however, to explore these comparisons in future work which can more thoroughly address the caveats associated with comparing models of sustained attention, narrative engagement, and arousal defined using different input features and methods.

      Arousal, attention, and latent neural state dynamics

      Next, the reviewer posed an important question about the relationship between arousal, attention, and latent states. The current study was designed to assess the relationship between attention and latent state dynamics. However, previous neuroimaging work showed that low-dimensional brain dynamics reflect fluctuations in arousal (Raut et al., 2021; Shine et al., 2016; Zhang et al., 2023). Behavioral studies showed that attention and arousal hold a non-linear relationship, for example, mind-wandering states are associated with lower arousal and externally distracted states are associated with higher arousal, when both these states indicate low attention (Esterman and Rothlein, 2019; Unsworth and Robison, 2018, 2016).

      To address the reviewer’s suggestion, we wanted to test if our brain states reflected changes in arousal, but we did not collect relevant behavioral or physiological measures. Therefore, to indirectly test for relationships, we predicted levels of arousal in brain states by applying the “arousal network template” defined by Dr. Catie Chang’s group (Chang et al., 2016; Falahpour et al., 2018; Goodale et al., 2021). The arousal network template was created from resting-state fMRI data to predict arousal levels indicated by eye monitoring and electrophysiological signals. In the original study, the arousal level at each time point was predicted by the correlation between the BOLD activity patterns of each TR to the arousal template. The more similar the whole-brain activation pattern was to the arousal network template, the higher the participant was predicted to be aroused at that moment. This activity pattern-based model was generalized to fMRI data during tasks (Goodale et al., 2021).

      We correlated the arousal template to the activity patterns of the four brain states that were inferred by the HMM. The DMN state was positively correlated with the arousal template (r=0.264) and the SM state was negatively correlated with the arousal template (r=-0.303) (Author response image 1). These values were not tested for significance because they were single observations. While speculative, this may suggest that participants are in a high arousal state during the DMN state and a low arousal state during the SM state. Together with our results relating brain states to attention, it is possible that the SM state is a common state indicating low arousal and low attention. On the other hand, the DMN state, a signature of a highly aroused state, may benefit gradCPT task performance but not necessarily in engaging with a sitcom episode. However, because this was a single observation and we did not collect a physiological measure of arousal to validate this indirect prediction result, we did not include the result in the manuscript. We hope to more directly test this question in future work with behavioral and physiological measures of arousal.

      Author response image 1.

      Changes made to the manuscript

      Importantly, we agree with the reviewer that a theoretical discussion about the relationships between functional connectivity, latent states, gradients, as well as attention and arousal was a critical omission from the original Discussion. We edited the Discussion to highlight past literature on these topics and encourage future work to investigate these relationships.

      [Manuscript, page 11] “Previous studies showed that large-scale neural dynamics that evolve over tens of seconds capture meaningful variance in arousal (Raut et al., 2021; Zhang et al., 2023) and attentional states (Rosenberg et al., 2020; Yamashita et al., 2021). We asked whether latent neural state dynamics reflect ongoing changes in attention in both task and naturalistic contexts.”

      [Manuscript, page 17] “Previous work showed that time-resolved whole-brain functional connectivity (i.e., paired interactions of more than a hundred parcels) predicts changes in attention during task performance (Rosenberg et al., 2020) as well as movie-watching and story-listening (Song et al., 2021a). Future work could investigate whether functional connectivity and the HMM capture the same underlying “brain states” to bridge the results from the two literatures. Furthermore, though the current study provided evidence of neural state dynamics reflecting attention, the same neural states may, in part, reflect fluctuations in arousal (Chang et al., 2016; Zhang et al., 2023). Complementing behavioral studies that demonstrated a nonlinear relationship between attention and arousal (Esterman and Rothlein, 2019; Unsworth and Robison, 2018, 2016), future studies collecting behavioral and physiological measures of arousal can assess the extent to which attention explains neural state dynamics beyond what can be explained by arousal fluctuations.”

      2) The 'base state' has been described in a number of prior papers (for one early example, see https://pubmed.ncbi.nlm.nih.gov/27008543). The idea that it might serve as a hub or intermediary for other states has been raised in other studies, and discussion of the similarity or differences between those studies and this one would provide better context for the interpretation of the current work. One of the intriguing findings of the current study is that the incidence of this base state increases during sitcom watching, the strongest evidence to date is that it has a cognitive role and is not merely a configuration of activity that the brain must pass through when making a transition.

      We greatly appreciate the reviewer’s suggestion of prior papers. We were not aware of previous findings of the base state at the time of writing the manuscript, so it was reassuring to see consistent findings. In the Discussion, we highlighted the findings of Chen et al. (2016) and Saggar et al. (2022). Both studies highlighted the role of the base state as a “hub”-like transition state. However, as the reviewer noted, these studies did not address the functional relevance of this state to cognitive states because both were based on resting-state fMRI.

      In our revised Discussion, we write that our work replicates previous findings of the base state that consistently acted as a transitional hub state in macroscopic brain dynamics. We also note that our study expands this line of work by characterizing what functional roles the base state plays in multiple contexts: The base state indicated high attentional engagement and exhibited the highest occurrence proportion as well as longest dwell times during naturalistic movie watching. The base state’s functional involvement was comparatively minor during controlled tasks.

      [Manuscript, page 17-18] “Past resting-state fMRI studies have reported the existence of the base state. Chen et al. (2016) used the HMM to detect a state that had “less apparent activation or deactivation patterns in known networks compared with other states”. This state had the highest occurrence probability among the inferred latent states, was consistently detected by the model, and was most likely to transition to and from other states, all of which mirror our findings here. The authors interpret this state as an “intermediate transient state that appears when the brain is switching between other more reproducible brain states”. The observation of the base state was not confined to studies using HMMs. Saggar et al. (2022) used topological data analysis to represent a low-dimensional manifold of resting-state whole-brain dynamics as a graph, where each node corresponds to brain activity patterns of a cluster of time points. Topologically focal “hub” nodes were represented uniformly by all functional networks, meaning that no characteristic activation above or below the mean was detected, similar to what we observe with the base state. The transition probability from other states to the hub state was the highest, demonstrating its role as a putative transition state.

      However, the functional relevance of the base state to human cognition had not been explored previously. We propose that the base state, a transitional hub (Figure 2B) positioned at the center of the gradient subspace (Figure 1D), functions as a state of natural equilibrium. Transitioning to the DMN, DAN, or SM states reflects incursion away from natural equilibrium (Deco et al., 2017; Gu et al., 2015), as the brain enters a functionally modular state. Notably, the base state indicated high attentional engagement (Figure 5E and F) and exhibited the highest occurrence proportion (Figure 3B) as well as the longest dwell times (Figure 3—figure supplement 1) during naturalistic movie watching, whereas its functional involvement was comparatively minor during controlled tasks. This significant relevance to behavior verifies that the base state cannot simply be a byproduct of the model. We speculate that susceptibility to both external and internal information is maximized in the base state—allowing for roughly equal weighting of both sides so that they can be integrated to form a coherent representation of the world—at the expense of the stability of a certain functional network (Cocchi et al., 2017; Fagerholm et al., 2015). When processing rich narratives, particularly when a person is fully immersed without having to exert cognitive effort, a less modular state with high degrees of freedom to reach other states may be more likely to be involved. The role of the base state should be further investigated in future studies.”

      3) The link between latent states and functional connectivity gradients should be considered in the context of prior work showing that the spatiotemporal patterns of intrinsic activity that account for most of the structure in resting state fMRI also sweep across functional connectivity gradients (https://pubmed.ncbi.nlm.nih.gov/33549755/). In fact, the spatiotemporal dynamics may give rise to the functional connectivity gradients (https://pubmed.ncbi.nlm.nih.gov/35902649/). HMM states bear a marked resemblance to the high-activity phases of these patterns and are likely to be closely linked to them. The spatiotemporal patterns are typically obtained during rest, but they have been reported during task performance (https://pubmed.ncbi.nlm.nih.gov/30753928/) which further suggests a link to the current work. Similar patterns have been observed in anesthetized animals, which also reinforces the conclusion of the current work that the states are fundamental aspects of the brain's functional organization.

      We appreciate the comments that relate spatiotemporal patterns, functional connectivity gradients, and the latent states derived from the HMM. Our work was also inspired by the papers that the reviewer suggested, especially Bolt et al.’s (2022), which compared the results of numerous dimensionality and clustering algorithms and suggested three spatiotemporal patterns that seemed to be commonly supported across algorithms. We originally cited these studies throughout the manuscript, but did not discuss them comprehensively. We have revised the Discussion to situate our findings on past work that used resting-state fMRI to study low-dimensional latent brain states.

      [Manuscript, page 15-16] “This perspective is supported by previous work that has used different methods to capture recurring low-dimensional states from spontaneous fMRI activity during rest. For example, to extract time-averaged latent states, early resting-state analyses identified task-positive and tasknegative networks using seed-based correlation (Fox et al., 2005). Dimensionality reduction algorithms such as independent component analysis (Smith et al., 2009) extracted latent components that explain the largest variance in fMRI time series. Other lines of work used timeresolved analyses to capture latent state dynamics. For example, variants of clustering algorithms, such as co-activation patterns (Liu et al., 2018; Liu and Duyn, 2013), k-means clustering (Allen et al., 2014), and HMM (Baker et al., 2014; Chen et al., 2016; Vidaurre et al., 2018, 2017), characterized fMRI time series as recurrences of and transitions between a small number of states. Time-lag analysis was used to identify quasiperiodic spatiotemporal patterns of propagating brain activity (Abbas et al., 2019; Yousefi and Keilholz, 2021). A recent study extensively compared these different algorithms and showed that they all report qualitatively similar latent states or components when applied to fMRI data (Bolt et al., 2022). While these studies used different algorithms to probe data-specific brain states, this work and ours report common latent axes that follow a long-standing theory of large-scale human functional systems (Mesulam, 1998). Neural dynamics span principal axes that dissociate unimodal to transmodal and sensory to motor information processing systems.”

      Reviewer #2 (Public Review):

      In this study, Song and colleagues applied a Hidden Markov Model to whole-brain fMRI data from the unique SONG dataset and a grad-CPT task, and in doing so observed robust transitions between lowdimensional states that they then attributed to specific psychological features extracted from the different tasks.

      The methods used appeared to be sound and robust to parameter choices. Whenever choices were made regarding specific parameters, the authors demonstrated that their approach was robust to different values, and also replicated their main findings on a separate dataset.

      I was mildly concerned that similarities in some of the algorithms used may have rendered some of the inter-measure results as somewhat inevitable (a hypothesis that could be tested using appropriate null models).

      This work is quite integrative, linking together a number of previous studies into a framework that allows for interesting follow-up questions.

      Overall, I found the work to be robust, interesting, and integrative, with a wide-ranging citation list and exciting implications for future work.

      We appreciate the reviewer’s comments on the study’s robustness and future implications. Our work was highly motivated by the reviewer’s prior work.

      Reviewer #3 (Public Review):

      My general assessment of the paper is that the analyses done after they find the model are exemplary and show some interesting results. However, the method they use to find the number of states (Calinski-Harabasz score instead of log-likelihood), the model they use generally (HMM), and the fact that they don't show how they find the number of states on HCP, with the Schaeffer atlas, and do not report their R^2 on a test set is a little concerning. I don't think this perse impedes their results, but it is something that they can improve. They argue that the states they find align with long-standing ideas about the functional organization of the brain and align with other research, but they can improve their selection for their model.

      We appreciate the reviewer’s thorough read of the paper, evaluation of our analyses linking brain states to behavior as “exemplary”, and important questions about the modeling approach. We have included detailed responses below and updated the manuscript accordingly.

      Strengths:

      • Use multiple datasets, multiple ROIs, and multiple analyses to validate their results

      • Figures are convincing in the sense that patterns clearly synchronize between participants

      • Authors select the number of states using the optimal model fit (although this turns out to be a little more questionable due to what they quantify as 'optimal model fit')

      We address this concern on page 30-31 of this response letter.

      • Replication with Schaeffer atlas makes results more convincing

      • The analyses around the fact that the base state acts as a flexible hub are well done and well explained

      • Their comparison of synchrony is well-done and comparing it to resting-state, which does not have any significant synchrony among participants is obvious, but still good to compare against.

      • Their results with respect to similar narrative engagement being correlated with similar neural state dynamics are well done and interesting.

      • Their results on event boundaries are compelling and well done. However, I do not find their Chang et al. results convincing (Figure 4B), it could just be because it is a different medium that explains differences in DMN response, but to me, it seems like these are just altogether different patterns that can not 100% be explained by their method/results.

      We entirely agree with the reviewer that the Chang et al. (2021) data are different in many ways from our own SONG dataset. Whereas data from Chang et al. (2021) were collected while participants listened to an audio-only narrative, participants in the SONG sample watched and listened to audiovisual stimuli. They were scanned at different universities in different countries with different protocols by different research groups for different purposes. That is, there are numerous reasons why we would expect the model should not generalize. Thus, we found it compelling and surprising that, despite all of these differences between the datasets, the model trained on the SONG dataset generalized to the data from Chang et al. (2021). The results highlighted a robust increase in the DMN state occurrence and a decrease in the base state occurrence after the narrative event boundaries, irrespective of whether the stimulus was an audiovisual sitcom episode or a narrated story. This external model validation was a way that we tested the robustness of our own model and the relationship between neural state dynamics and cognitive dynamics.

      • Their results that when there is no event, transition into the DMN state comes from the base state is 50% is interesting and a strong result. However, it is unclear if this is just for the sitcom or also for Chang et al.'s data.

      We apologize for the lack of clarity. We show the statistical results of the two sitcom episodes as well as Chang et al.’s (2021) data in Figure 4—figure supplement 2 in our original manuscript. Here, we provide the exact values of the base-to-DMN state transition probability, and how they differ across moments after event boundaries compared to non-event boundaries.

      For sitcom episode 1, the probability of base-to-DMN state transition was 44.6 ± 18.8 % at event boundaries whereas 62.0 ± 10.4 % at non-event boundaries (FDR-p = 0.0013). For sitcom episode 2, the probability of base-to-DMN state transition was 44.1 ± 18.0 % at event boundaries whereas 62.2 ± 7.6 % at non-event boundaries (FDR-p = 0.0006). For the Chang et al. (2021) dataset, the probability of base-to-DMN state transition was 33.3 ± 15.9 % at event boundaries whereas 58.1 ± 6.4 % at non-event boundaries (FDR-p < 0.0001). Thus, our result, “At non-event boundaries, the DMN state was most likely to transition from the base state, accounting for more than 50% of the transitions to the DMN state” (pg 11, line 24-25), holds true for both the internal and external datasets.

      • The involvement of the base state as being highly engaged during the comedy sitcom and the movie are interesting results that warrant further study into the base state theory they pose in this work.

      • It is good that they make sure SM states are not just because of head motion (P 12).

      • Their comparison between functional gradient and neural states is good, and their results are generally well-supported, intuitive, and interesting enough to warrant further research into them. Their findings on the context-specificity of their DMN and DAN state are interesting and relate well to the antagonistic relationship in resting-state data.

      Weaknesses:

      • Authors should train the model on part of the data and validate on another

      Thank you for raising this issue. To the best of our knowledge, past work that applied the HMM to the fMRI data has conducted training and inference on the same data, including initial work that implemented HMM on the resting-state fMRI (Baker et al., 2014; Chen et al., 2016; Vidaurre et al., 2018, 2017) as well as more recent work that applied HMMs to the task or movie-watching fMRI (Cornblath et al., 2020; Taghia et al., 2018; van der Meer et al., 2020; Yamashita et al., 2021). That is, the parameters—emission probability, transition probability, and initial probability—were estimated from the entire dataset and the latent state sequence was inferred using the Viterbi algorithm on the same dataset.

      However, we were also aware of the potential problem this may have. Therefore, in our recent work asking a different research question in another fMRI dataset (Song et al., 2021b), we trained an HMM on a subset of the dataset (moments when participants were watching movie clips in the original temporal order) and inferred latent state sequence of the fMRI time series in another subset of the dataset (moments when participants were watching movie clips in a scrambled temporal order). To the best of our knowledge, this was the first paper that used different segments of the data to fit and infer states from the HMM.

      In the current study, we wanted to capture brain states that underlie brain activity across contexts. Thus, we presented the same-dataset training and inference procedure as our primary result. However, for every main result, we also showed results where we separated the data used for model fitting and state inference. That is, we fit the HMM on the SONG dataset, primarily report the inference results on the SONG dataset, but also report inference on the external datasets that were not included in model fitting. The datasets used were the Human Connectome Project dataset (Van Essen et al., 2013), Chang et al. (2021) audio-listening dataset, Rosenberg et al. (2016) gradCPT dataset, and Chen et al. (2017) Sherlock dataset.

      However, to further address the concern of the reviewer whether the HMM fit is reliable when applied to held-out data, we computed the reliability of the HMM inference by conducting crossvalidations and split-half reliability analysis.

      (1) Cross-validation

      To separate the dataset used for HMM training and inference, we conducted cross-validation on the SONG dataset (N=27) by training the model with the data from 26 participants and inferring the latent state sequence of the held-out participant.

      First, we compared the robustness of the model training by comparing the mean activity patterns of the four latent states fitted at the group level (N=27) with the mean activity patterns of the four states fitted across cross-validation folds. Pearson’s correlations between the group-level vs. cross-validated latent states’ mean activity patterns were r = 0.991 ± 0.010, with a range from 0.963 to 0.999.

      Second, we compared the robustness of model inference by comparing the latent state sequences that were inferred at the group level vs. from held-out participants in a cross-validation scheme. All fMRI conditions had mean similarity higher than 90%; Rest 1: 92.74 ± 5.02 %, Rest2: 92.74 ± 4.83 %, GradCPT face: 92.97 ± 6.41 %, GradCPT scene: 93.27 ± 5.76 %, Sitcom ep1: 93.31 ± 3.92 %, Sitcom ep2: 93.13 ± 4.36 %, Documentary: 92.42 ± 4.72 %.

      Third, with the latent state sequences inferred from cross-validation, we replicated the analysis of Figure 3 to test for synchrony of the latent state sequences across participants. The crossvalidated results were highly similar to manuscript Figure 3, which was generated from the grouplevel analysis. Mean synchrony of latent state sequences are as follows: Rest 1: 25.90 ± 3.81%, Rest 2: 25.75 ± 4.19 %, GradCPT face: 27.17 ± 3.86 %, GradCPT scene: 28.11 ± 3.89 %, Sitcom ep1: 40.69 ± 3.86%, Sitcom ep2: 40.53 ± 3.13%, Documentary: 30.13 ± 3.41%.

      Author response image 2.

      (2) Split-half reliability

      To test for the internal robustness of the model, we randomly assigned SONG dataset participants into two groups and conducted HMM separately in each. Similarity (Pearson’s correlation) between the two groups’ activation patterns were DMN: 0.791, DAN: 0.838, SM: 0.944, base: 0.837. The similarity of the covariance patterns were DMN: 0.995, DAN: 0.996, SM: 0.994, base: 0.996.

      Author response image 3.

      We further validated the split-half reliability of the model using the HCP dataset, which contains data of a larger sample (N=119). Similarity (Pearson’s correlation) between the two groups’ activation patterns were DMN: 0.998, DAN: 0.997, SM: 0.993, base: 0.923. The similarity of the covariance patterns were DMN: 0.995, DAN: 0.996, SM: 0.994, base: 0.996.

      Together the cross-validation and split-half reliability results demonstrate that the HMM results reported in the manuscript are reliable and robust to the way we conducted the analysis. The result of the split-half reliability analysis is added in the Results.

      [Manuscript, page 3-4] “Neural state inference was robust to the choice of 𝐾 (Figure 1—figure supplement 1) and the fMRI preprocessing pipeline (Figure 1—figure supplement 5) and consistent when conducted on two groups of randomly split-half participants (Pearson’s correlations between the two groups’ latent state activation patterns: DMN: 0.791, DAN: 0.838, SM: 0.944, base: 0.837).”

      • Comparison with just PCA/functional gradients is weak in establishing whether HMMs are good models of the timeseries. Especially given that the HMM does not explain a lot of variance in the signal (~0.5 R^2 for only 27 brain regions) for PCA. I think they don't report their own R^2 of the timeseries

      We agree with the reviewer that the PCA that we conducted to compare with the explained variance of the functional gradients was not directly comparable because PCA and gradients utilize different algorithms to reduce dimensionality. To make more meaningful comparisons, we removed the data-specific PCA results and replaced them with data-specific functional gradients (derived from the SONG dataset). This allows us to directly compare SONG-specific functional gradients with predefined gradients (derived from the resting-state HCP dataset from Margulies et al. [2016]). We found that the degrees to which the first two predefined gradients explained whole-brain fMRI time series (SONG: 𝑟! = 0.097, HCP: 0.084) were comparable to the amount of variance explained by the first two data-specific gradients (SONG: 𝑟! = 0.100, HCP: 0.086). Thus, the predefined gradients explain as much variance in the SONG data time series as SONG-specific gradients do. This supports our argument that the low-dimensional manifold is largely shared across contexts, and that the common HMM latent states may tile the predefined gradients.

      These analyses and results were added to the Results, Methods, and Figure 1—figure supplement 8. Here, we only attach changes to the Results section for simplicity, but please see the revised manuscript for further changes.

      [Manuscript, page 5-6] “We hypothesized that the spatial gradients reported by Margulies et al. (2016) act as a lowdimensional manifold over which large-scale dynamics operate (Bolt et al., 2022; Brown et al., 2021; Karapanagiotidis et al., 2020; Turnbull et al., 2020), such that traversals within this manifold explain large variance in neural dynamics and, consequently, cognition and behavior (Figure 1C). To test this idea, we situated the mean activity values of the four latent states along the gradients defined by Margulies et al. (2016) (see Methods). The brain states tiled the two-dimensional gradient space with the base state at the center (Figure 1D; Figure1—figure supplement 7). The Euclidean distances between these four states were maximized in the two-dimensional gradient space, compared to a chance where the four states were inferred from circular-shifted time series (p < 0.001). For the SONG dataset, the DMN and SM states fell at more extreme positions of the primary gradient than expected by chance (both FDR-p values = 0.004; DAN and SM states, FDRp values = 0.171). For the HCP dataset, the DMN and DAN states fell at more extreme positions on the primary gradient (both FDR-p values = 0.004; SM and base states, FDR-p values = 0.076). No state was consistently found at the extremes of the secondary gradient (all FDR-p values > 0.021).

      We asked whether the predefined gradients explain as much variance in neural dynamics as latent subspace optimized for the SONG dataset. To do so, we applied the same nonlinear dimensionality reduction algorithm to the SONG dataset’s ROI time series. Of note, the SONG dataset includes 18.95% rest, 15.07% task, and 65.98% movie-watching data whereas the data used by Margulies et al. (2016) was 100% rest. Despite these differences, the SONG-specific gradients closely resembled the predefined gradients, with significant Pearson’s correlations observed for the first (r = 0.876) and second (r = 0.877) gradient embeddings (Figure 1—figure supplement 8). Gradients identified with the HCP data also recapitulated Margulies et al.’s (2016) first (r = 0.880) and second (r = 0.871) gradients. We restricted our analysis to the first two gradients because the two gradients together explained roughly 50% of the entire variance of functional brain connectome (SONG: 46.94%, HCP: 52.08%), and the explained variance dropped drastically from the third gradients (more than 1/3 drop compared to second gradients). The degrees to which the first two predefined gradients explained whole-brain fMRI time series (SONG: 𝑟! = 0.097, HCP: 0.084) were comparable to the amount of variance explained by the first two data-specific gradients (SONG: 𝑟! = 0.100, HCP: 0.086; Figure 1—figure supplement 8). Thus, the low-dimensional manifold captured by Margulies et al. (2016) gradients is highly replicable, explaining brain activity dynamics as well as data-specific gradients, and is largely shared across contexts and datasets. This suggests that the state space of whole-brain dynamics closely recapitulates low-dimensional gradients of the static functional brain connectome.”

      The reviewer also pointed out that the PCA-gradient comparison was weak in establishing whether HMMs are good models of the time series. However, we would like to point out that the purpose of the comparison was not to validate the performance of the HMM. Instead, we wanted to test whether the gradients introduced by Margulies et al. (2016) could act as a generalizable lowdimensional manifold of brain state dynamics. To argue that the predefined gradients are a shared manifold, these gradients should explain SONG data fMRI time series as much as the principal components derived directly from the SONG data. Our results showed comparable 𝑟!, both in predefined gradient vs. data-specific PC comparisons and predefined gradient vs. data-specific gradient comparisons, which supported our argument that the predefined gradients could be the shared embedding space across contexts and datasets.

      The reviewer pointed out that the 𝑟2 of ~0.5 is not explaining enough variance in the fMRI signal. However, we respectfully disagree with this point because there is no established criterion for what constitutes a high or low 𝑟2 for this type of analysis. Of note, previous literature that also applied PCA to fMRI time series (Author response image 4A and 4B) (Lynn et al., 2021; Shine et al., 2019) also found that the cumulative explained variance of top 5 principal components is around 50%. Author response image 4C shows cumulative variances to which gradients explain the functional connectome of the resting-state fMRI data (Margulies et al., 2016).

      Author response image 4.

      Finally, the reviewer pointed out that the 𝑟! of the HMM-derived latent sequence to the fMRI time series should be reported. However, there is no standardized way of measuring the explained variance of the HMM inference. There is no report of explained variance in the traditional HMMfMRI papers (Baker et al., 2014; Chen et al., 2016; Vidaurre et al., 2018, 2017). Rather than 𝑟!, the HMM computes the log likelihood of the model fit. However, because log likelihood values are dependent on the number of data points, studies do not report log likelihood values nor do they use these metrics to interpret the goodness of model fit.

      To ask whether the goodness of the HMM fit was significant above chance, we compared the log likelihood of the HMM to the log likelihood distribution of the null HMM fits. First, we extracted the log likelihood of the HMM fit with the real fMRI time series. We iterated this 1,000 times when calculating null HMMs using the circular-shifted fMRI time series. The log likelihood of the real model was significantly higher than the chance distribution, with a z-value of 2182.5 (p < 0.001). This indicates that the HMM explained a large variance in our fMRI time series data, significantly above chance.

      • Authors do not specify whether they also did cross-validation for the HCP dataset to find 4 clusters

      We apologize for the lack of clarity. When we computed the Calinski-Harabasz score with the HCP dataset, three was chosen as the most optimal number of states (Author response image 5A). When we set K as 3, the HMM inferred the DMN, DAN, and SM states (Author response image 5C). The base state was included when K was set to 4 (Author response image 5B). The activation pattern similarities of the DMN, DAN, and SM states were r = 0.981, 0.984, 0.911 respectively.

      Author response image 5.

      We did not use K = 3 for the HCP data replication because we were not trying to test whether these four states would be the optimal set of states in every dataset. Although the CalinskiHarabasz score chose K = 3 because it showed the best clustering performance, this does not mean that the base state is not meaningful to this dataset. Likewise, the latent states that are inferred when we increase/decrease the number of states are also meaningful states. For example, in Figure 1—figure supplement 1, we show an example of the SONG dataset’s latent states when we set K to 7. The seven latent states included the DAN, SM, and base states, the DMN state was subdivided into DMN-A and DMN-B states, and the FPN state and DMN+VIS state were included. Setting a higher number of states like K = 7 would mean that we are capturing brain state dynamics in a higher dimension than when using K = 4. Because we are utilizing a higher number of states, a model set to K = 7 would inevitably capture a larger variance of fMRI time series than a model set to K = 4.

      The purpose of latent state replication with the HCP dataset was to validate the generalizability of the DMN, DAN, SM, and base states. Before characterizing these latent states’ relevance to cognition, we needed to verify that these latent states were not simply overfit to the SONG dataset. The fact that the HMM revealed a similar set of latent states when applied to the HCP dataset suggested that the states were not merely specific to SONG data.

      To make our points clearer in the manuscript, we emphasized that we are not arguing for the four states to be the exclusive states. We made edits to Discussion as follows.

      [Manuscript, page 16] “Our study adopted the assumption of low dimensionality of large-scale neural systems, which led us to intentionally identify only a small number of states underlying whole-brain dynamics. Importantly, however, we do not claim that the four states will be the optimal set of states in every dataset and participant population. Instead, latent states and patterns of state occurrence may vary as a function of individuals and tasks (Figure 1—figure supplement 2). Likewise, while the lowest dimensions of the manifold (i.e., the first two gradients) were largely shared across datasets tested here, we do not argue that it will always be identical. If individuals and tasks deviate significantly from what was tested here, the manifold may also differ along with changes in latent states (Samara et al., 2023). Brain systems operate at different dimensionalities and spatiotemporal scales (Greene et al., 2023), which may have different consequences for cognition. Asking how brain states and manifolds—probed at different dimensionalities and scales—flexibly reconfigure (or not) with changes in contexts and mental states is an important research question for understanding complex human cognition.”

      • One of their main contributions is the base state but the correlation between the base state in their Song dataset and the HCP dataset is only 0.399

      This is a good point. However, there is precedent for lower spatial pattern correlation of the base state compared to other states in the literature.

      Compared to the DMN, DAN, and SM states, the base state did not show characteristic activation or deactivation of functional networks. Most of the functional networks showed activity levels close to the mean (z = 0). With this flattened activation pattern, relatively low activation pattern similarity was observed between the SONG base state and the HCP base state.

      In Figure 1—figure supplement 6, we write, “The DMN, DAN, and SM states showed similar mean activity patterns. We refrained from making interpretations about the base state’s activity patterns because the mean activity of most of the parcels was close to z = 0”.

      A similar finding has been reported in a previous work by Chen et al. (2016) that discovered the base state with HMM. State 9 (S9) of their results is comparable to our base state. They report that even though the spatial correlation coefficient of the brain state from the split-half reliability analysis was the lowest for S9 due to its low degrees of activation or deactivation, S9 was stably inferred by the HMM. The following is a direct quote from their paper:

      “To the best of our knowledge, a state similar to S9 has not been presented in previous literature. We hypothesize that S9 is the “ground” state of the brain, in which brain activity (or deactivity) is similar for the entire cortex (no apparent activation or deactivation as shown in Fig. 4). Note that different groups of subjects have different spatial patterns for state S9 (Fig. 3A). Therefore, S9 has the lowest reproducible spatial pattern (Fig. 3B). However, its temporal characteristics allowed us to distinguish it consistently from other states.” (Chen et al., 2016)

      Thus, we believe our data and prior results support the existence of the “base state”.

      • Figure 1B: Parcellation is quite big but there seems to be a gradient within regions

      This is a function of the visualization software. Mean activity (z) is the same for all voxels within a parcel. To visualize the 3D contours of the brain, we chose an option in the nilearn python function that smooths the mean activity values based on the surface reconstructed anatomy.

      In the original manuscript, our Methods write, “The brain surfaces were visualized with nilearn.plotting.plot_surf_stat_map. The parcel boundaries in Figure 1B are smoothed from the volume-to-surface reconstruction.”

      • Figure 1D: Why are the DMNs further apart between SONG and HCP than the other states

      To address this question, we first tested whether the position of the DMN states in the gradient space is significantly different for the SONG and HCP datasets. We generated surrogate HMM states from the circular-shifted fMRI time series and positioned the four latent states and the null DMN states in the 2-dimensional gradient space (Author response image 6).

      Author response image 6.

      We next tested whether the Euclidean distance between the SONG dataset’s DMN state and the HCP dataset’s DMN state is larger than would be expected by chance (Author response image 7). To do so, we took the difference between the DMN state positions and compared it to the 1,000 differences generated from the surrogate latent states. The DMN states of the SONG and HCP datasets did not significantly differ in the Gradient 1 dimension (two-tailed test, p = 0.794). However, as the reviewer noted, the positions differed significantly in the Gradient 2 dimension (p = 0.047). The DMN state leaned more towards the Visual gradient in the SONG dataset, whereas it leaned more towards the Somatosensory-Motor gradient in the HCP dataset.

      Author response image 7.

      Though we cannot claim an exact reason for this across-dataset difference, we note a distinctive difference between the SONG and HCP datasets. Both datasets largely included resting-state, controlled tasks, and movie watching. The SONG dataset included 18.95% of rest, 15.07% of task, and 65.98% of movie watching. The task only contained the gradCPT, i.e., sustained attention task. On the other hand, the HCP dataset included 52.71% of rest, 24.35% of task, and 22.94% of movie watching. There were 7 different tasks included in the HCP dataset. It is possible that different proportions of rest, task, and movie watching, and different cognitive demands involved with each dataset may have created data-specific latent states.

      • Page 5 paragraph starting at L25: Their hypothesis that functional gradients explain large variance in neural dynamics needs to be explained more, is non-trivial especially because their R^2 scores are so low (Fig 1. Supplement 8) for PCA

      We address this concern on page 21-23 of this response letter.

      • Generally, I do not find the PCA analysis convincing and believe they should also compare to something like ICA or a different model of dynamics. They do not explain their reasoning behind assuming an HMM, which is an extremely simplified idea of brain dynamics meaning they only change based on the previous state.

      We appreciate this perspective. We replaced the Margulies et al.’s (2016) gradient vs. SONGspecific PCA comparison with a more direct Margulies et al.’s (2016) gradient vs. SONG-specific gradient comparison as described on page 21-23 of this response letter.

      More broadly, we elected to use HMM because of recent work showing correspondence between low-dimensional HMM states and behavior (Cornblath et al., 2020; Taghia et al., 2018; van der Meer et al., 2020; Yamashita et al., 2021). We also found the model’s assumption—a mixture Gaussian emission probability and first-order Markovian transition probability—to be the most suited to analyzing the fMRI time series data. We do not intend to claim that other data-reduction techniques would not also capture low-dimensional, behaviorally relevant changes in brain activity. Instead, our primary focus was identifying a set of latent states that generalize (i.e., recur) across multiple contexts and understanding how those states reflect cognitive and attentional states.

      Although a comparison of possible data-reduction algorithms is out of the scope of the current work, an exhaustive comparison of different models can be found in Bolt et al. (2022). The authors compared dozens of latent brain state algorithms spanning zero-lag analysis (e.g., principal component analysis, principal component analysis with Varimax rotation, Laplacian eigenmaps, spatial independent component analysis, temporal independent component analysis, hidden Markov model, seed-based correlation analysis, and co-activation patterns) to time-lag analysis (e.g., quasi-periodic pattern and lag projections). Bolt et al. (2022) writes “a range of empirical phenomena, including functional connectivity gradients, the task-positive/task-negative anticorrelation pattern, the global signal, time-lag propagation patterns, the quasiperiodic pattern and the functional connectome network structure, are manifestations of the three spatiotemporal patterns.” That is, many previous findings that used different methods essentially describe the same recurring latent states. A similar argument was made in previous papers (Brown et al., 2021; Karapanagiotidis et al., 2020; Turnbull et al., 2020).

      We agree that the HMM is a simplified idea of brain dynamics. We do not argue that the four number of states can fully explain the complexity and flexibility of cognition. Instead, we hoped to show that there are different dimensionalities to which the brain systems can operate, and they may have different consequences to cognition. We “simplified” neural dynamics to a discrete sequence of a small number of states. However, what is fascinating is that these overly “simplified” brain state dynamics can explain certain cognitive and attentional dynamics, such as event segmentation and sustained attention fluctuations. We highlight this point in the Discussion.

      [Manuscript, page 16] “Our study adopted the assumption of low dimensionality of large-scale neural systems, which led us to intentionally identify only a small number of states underlying whole-brain dynamics. Importantly, however, we do not claim that the four states will be the optimal set of states in every dataset and participant population. Instead, latent states and patterns of state occurrence may vary as a function of individuals and tasks (Figure 1—figure supplement 2). Likewise, while the lowest dimensions of the manifold (i.e., the first two gradients) were largely shared across datasets tested here, we do not argue that it will always be identical. If individuals and tasks deviate significantly from what was tested here, the manifold may also differ along with changes in latent states (Samara et al., 2023). Brain systems operate at different dimensionalities and spatiotemporal scales (Greene et al., 2023), which may have different consequences for cognition. Asking how brain states and manifolds—probed at different dimensionalities and scales—flexibly reconfigure (or not) with changes in contexts and mental states is an important research question for understanding complex human cognition.”

      • For the 25- ROI replication it seems like they again do not try multiple K values for the number of states to validate that 4 states are in fact the correct number.

      In the manuscript, we do not argue that the four will be the optimal number of states in any dataset. (We actually predict that this may differ depending on the amount of data, participant population, tasks, etc.) Instead, we claim that the four identified in the SONG dataset are not specific (i.e., overfit) to that sample, but rather recur in independent datasets as well. More broadly we argue that the complexity and flexibility of human cognition stem from the fact that computation occurs at multiple dimensions and that the low-dimensional states observed here are robustly related to cognitive and attentional states. To prevent misunderstanding of our results, we emphasized in the Discussion that we are not arguing for a fixed number of states. A paragraph included in our response to the previous comment (page 16 in the manuscript) illustrates this point.

      • Fig 2B: Colorbar goes from -0.05 to 0.05 but values are up to 0.87

      We apologize for the confusion. The current version of the figure is correct. The figure legend states, “The values indicate transition probabilities, such that values in each row sums to 1. The colors indicate differences from the mean of the null distribution where the HMMs were conducted on the circular-shifted time series.”

      We recognize that this complicates the interpretation of the figure. However, after much consideration, we decided that it was valuable to show both the actual transition probabilities (values) and their difference from the mean of null HMMs (colors). The values demonstrate the Markovian property of latent state dynamics, with a high probability of remaining in the same state at consecutive moments and a low probability of transitioning to a different state. The colors indicate that the base state is a transitional hub state by illustrating that the DMN, DAN, and SM states are more likely to transition to the base state than would be expected by chance.

      • P 16 L4 near-critical, authors need to be more specific in their terminology here especially since they talk about dynamic systems, where near-criticality has a specific definition. It is unclear which definition they are looking for here.

      We agree that our explanation was vague. Because we do not have evidence for this speculative proposal, we removed the mention of near-criticality. Instead, we focus on our observation as the base state being the transitional hub state within a metastable system.

      [Manuscript, page 17-18] “However, the functional relevance of the base state to human cognition had not been explored previously. We propose that the base state, a transitional hub (Figure 2B) positioned at the center of the gradient subspace (Figure 1D), functions as a state of natural equilibrium. Transitioning to the DMN, DAN, or SM states reflects incursion away from natural equilibrium (Deco et al., 2017; Gu et al., 2015), as the brain enters a functionally modular state. Notably, the base state indicated high attentional engagement (Figure 5E and F) and exhibited the highest occurrence proportion (Figure 3B) as well as the longest dwell times (Figure 3—figure supplement 1) during naturalistic movie watching, whereas its functional involvement was comparatively minor during controlled tasks. This significant relevance to behavior verifies that the base state cannot simply be a byproduct of the model. We speculate that susceptibility to both external and internal information is maximized in the base state—allowing for roughly equal weighting of both sides so that they can be integrated to form a coherent representation of the world—at the expense of the stability of a certain functional network (Cocchi et al., 2017; Fagerholm et al., 2015). When processing rich narratives, particularly when a person is fully immersed without having to exert cognitive effort, a less modular state with high degrees of freedom to reach other states may be more likely to be involved. The role of the base state should be further investigated in future studies.”

      • P16 L13-L17 unnecessary

      We prefer to have the last paragraph as a summary of the implications of this paper. However, if the length of this paper becomes a problem as we work towards publication with the editors, we are happy to remove these lines.

      • I think this paper is solid, but my main issue is with using an HMM, never explaining why, not showing inference results on test data, not reporting an R^2 score for it, and not comparing it to other models. Secondly, they use the Calinski-Harabasz score to determine the number of states, but not the log-likelihood of the fit. This clearly creates a bias in what types of states you will find, namely states that are far away from each other, which likely also leads to the functional gradient and PCA results they have. Where they specifically talk about how their states are far away from each other in the functional gradient space and correlated to (orthogonal) components. It is completely unclear to me why they used this measure because it also seems to be one of many scores you could use with respect to clustering (with potentially different results), and even odd in the presence of a loglikelihood fit to the data and with the model they use (which does not perform clustering).

      (1) Showing inference results on test data

      We address this concern on page 19-21 of this response letter.

      (2) Not reporting 𝑹𝟐 score

      We address this concern on page 21-23 of this response letter.

      (3) Not comparing the HMM model to other models

      We address this concern on page 27-28 of this response letter.

      (4) The use of the Calinski-Harabasz score to determine the number of states rather than the log-likelihood of the model fit

      To our knowledge, the log-likelihood of the model fit is not used in the HMM literature. It is because the log-likelihood tends to increase monotonically as the number of states increases. Baker et al. (2014) illustrates this problem, writing:

      “In theory, it should be possible to pick the optimal number of states by selecting the model with the greatest (negative) free energy. In practice however, we observe that the free energy increases monotonically up to K = 15 states, suggesting that the Bayes-optimal model may require an even higher number of states.”

      Similarly, the following figure is the log-likelihood estimated from the SONG dataset. Similar to the findings of Baker et al. (2014), the log-likelihood monotonically increased as the number of states increased (Author response image 8, right). The measures like AIC or BIC, which account for the number of parameters, also have the same issue of monotonic increase.

      Author response image 8.

      Because there is “no straightforward data-driven approach to model order selection” (Baker et al., 2014), past work has used different approaches to decide on the number of states. For example, Vidaurre et al. (2018) iterated over a range of the number of states to repeat the same HMM training and inference procedures 5 times using the same hyperparameters. They selected the number of states that showed the highest consistency across iterations. Gao et al. (2021) tested the clustering performance of the model output using the Calinski-Harabasz score. The number of states that showed the highest within-cluster cohesion compared to the across-cluster separation was selected as the number of states. Chang et al. (2021) applied HMM to voxels of the ventromedial prefrontal cortex using a similar clustering algorithm, writing: “To determine the number of states for the HMM estimation procedure, we identified the number of states that maximized the average within-state spatial similarity relative to the average between-state similarity”. In our previous paper (Song et al., 2021b), we reported both the reliability and clustering performance measures to decide on the number of states.

      In the current manuscript, the model consistency criterion from Vidaurre et al. (2018) was ineffective because the HMM inference was extremely robust (i.e., always inferring the exact same sequence) due to a large number of data points. Thus, we used the Calinski-Harabasz score as our criterion for the number of states selected.

      We agree with the reviewer that the selection of the number of states is critical to any study that implements HMM. However, the field lacks a consensus on how to decide on the number of states in the HMM, and the Calinski-Harabasz score has been validated in previous studies. Most importantly, the latent states’ relationships with behavioral and cognitive measures give strong evidence that the latent states are indeed meaningful states. Again, we are not arguing that the optimal set of states in any dataset will be four nor are we arguing that these four states will always be the optimal states. Instead, the manuscript proposes that a small number of latent states explains meaningful variance in cognitive dynamics.

      • Grammatical error: P24 L29 rendering seems to have gone wrong

      Our intention was correct here. To avoid confusion, we changed “(number of participantsC2 iterations)” to “(#𝐶!iterations, where N=number of participants)” (page 26 in the manuscript).

      Questions:

      • Comment on subject differences, it seems like they potentially found group dynamics based on stimuli, but interesting to see individual differences in large-scale dynamics, and do they believe the states they find mostly explain global linear dynamics?

      We agree with the reviewer that whether low-dimensional latent state dynamics explain individual differences—above and beyond what could be explained by the high-dimensional, temporally static neural signatures of individuals (e.g., Finn et al., 2015)—is an important research question. However, because the SONG dataset was collected in a single lab, with a focus on covering diverse contexts (rest, task, and movie watching) over 2 sessions, we were only able to collect 27 participants. Due to this small sample size, we focused on investigating group-level, shared temporal dynamics and across-condition differences, rather than on investigating individual differences.

      Past work has studied individual differences (e.g., behavioral traits like well-being, intelligence, and personality) using the HMM (Vidaurre et al., 2017). In the lab, we are working on a project that investigates latent state dynamics in relation to individual differences in clinical symptoms using the Healthy Brain Network dataset (Ji et al., 2022, presented at SfN; Alexander et al., 2017).

      Finally, the reviewer raises an interesting question about whether the latent state sequence that was derived here mostly explains global linear dynamics as opposed to nonlinear dynamics. We have two responses: one methodological and one theoretical. First, methodologically, we defined the emission probabilities as a linear mixture of Gaussian distributions for each input dimension with the state-specific mean (mean fMRI activity patterns of the networks) and variance (functional covariance across networks). Therefore, states are modeled with an assumption of linearity of feature combinations. Theoretically, recent work supports in favor of nonlinearity of large-scale neural dynamics, especially as tasks get richer and more complex (Cunningham and Yu, 2014; Gao et al., 2021). However, whether low-dimensional latent states should be modeled nonlinearly—that is, whether linear algorithms are insufficient at capturing latent states compared to nonlinear algorithms—is still unknown. We agree with the reviewer that the assumption of linearity is an interesting topic in systems neuroscience. However, together with prior work which showed how numerous algorithms—either linear or nonlinear—recapitulated a common set of latent states, we argue that the HMM provides a strong low-dimensional model of large-scale neural activity and interaction.

      • P19 L40 why did the authors interpolate incorrect or no-responses for the gradCPT runs? It seems more logical to correct their results for these responses or to throw them out since interpolation can induce huge biases in these cases because the data is likely not missing at completely random.

      Interpolating the RTs of the trials without responses (omission errors and incorrect trials) is a standardized protocol for analyzing gradCPT data (Esterman et al., 2013; Fortenbaugh et al., 2018, 2015; Jayakumar et al., 2023; Rosenberg et al., 2013; Terashima et al., 2021; Yamashita et al., 2021). The choice of this analysis is due to an assumption that sustained attention is a continuous attentional state; the RT, a proxy for the attentional state in the gradCPT literature, is a noisy measure of a smoothed, continuous attentional state. Thus, the RTs of the trials without responses are interpolated and the RT time courses are smoothed by convolving with a gaussian kernel.

      References

      Abbas A, Belloy M, Kashyap A, Billings J, Nezafati M, Schumacher EH, Keilholz S. 2019. Quasiperiodic patterns contribute to functional connectivity in the brain. Neuroimage 191:193–204.

      Alexander LM, Escalera J, Ai L, Andreotti C, Febre K, Mangone A, Vega-Potler N, Langer N, Alexander A, Kovacs M, Litke S, O’Hagan B, Andersen J, Bronstein B, Bui A, Bushey M, Butler H, Castagna V, Camacho N, Chan E, Citera D, Clucas J, Cohen S, Dufek S, Eaves M, Fradera B, Gardner J, Grant-Villegas N, Green G, Gregory C, Hart E, Harris S, Horton M, Kahn D, Kabotyanski K, Karmel B, Kelly SP, Kleinman K, Koo B, Kramer E, Lennon E, Lord C, Mantello G, Margolis A, Merikangas KR, Milham J, Minniti G, Neuhaus R, Levine A, Osman Y, Parra LC, Pugh KR, Racanello A, Restrepo A, Saltzman T, Septimus B, Tobe R, Waltz R, Williams A, Yeo A, Castellanos FX, Klein A, Paus T, Leventhal BL, Craddock RC, Koplewicz HS, Milham MP. 2017. Data Descriptor: An open resource for transdiagnostic research in pediatric mental health and learning disorders. Sci Data 4:1–26.

      Allen EA, Damaraju E, Plis SM, Erhardt EB, Eichele T, Calhoun VD. 2014. Tracking whole-brain connectivity dynamics in the resting state. Cereb Cortex 24:663–676.

      Baker AP, Brookes MJ, Rezek IA, Smith SM, Behrens T, Probert Smith PJ, Woolrich M. 2014. Fast transient networks in spontaneous human brain activity. Elife 3:e01867.

      Bolt T, Nomi JS, Bzdok D, Salas JA, Chang C, Yeo BTT, Uddin LQ, Keilholz SD. 2022. A Parsimonious Description of Global Functional Brain Organization in Three Spatiotemporal Patterns. Nat Neurosci 25:1093–1103.

      Brown JA, Lee AJ, Pasquini L, Seeley WW. 2021. A dynamic gradient architecture generates brain activity states. Neuroimage 261:119526.

      Chang C, Leopold DA, Schölvinck ML, Mandelkow H, Picchioni D, Liu X, Ye FQ, Turchi JN, Duyn JH. 2016. Tracking brain arousal fluctuations with fMRI. Proc Natl Acad Sci U S A 113:4518–4523.

      Chang CHC, Lazaridi C, Yeshurun Y, Norman KA, Hasson U. 2021. Relating the past with the present: Information integration and segregation during ongoing narrative processing. J Cogn Neurosci 33:1–23.

      Chang LJ, Jolly E, Cheong JH, Rapuano K, Greenstein N, Chen P-HA, Manning JR. 2021. Endogenous variation in ventromedial prefrontal cortex state dynamics during naturalistic viewing reflects affective experience. Sci Adv 7:eabf7129.

      Chen J, Leong YC, Honey CJ, Yong CH, Norman KA, Hasson U. 2017. Shared memories reveal shared structure in neural activity across individuals. Nat Neurosci 20:115–125.

      Chen S, Langley J, Chen X, Hu X. 2016. Spatiotemporal Modeling of Brain Dynamics Using RestingState Functional Magnetic Resonance Imaging with Gaussian Hidden Markov Model. Brain Connect 6:326–334.

      Cocchi L, Gollo LL, Zalesky A, Breakspear M. 2017. Criticality in the brain: A synthesis of neurobiology, models and cognition. Prog Neurobiol 158:132–152.

      Cornblath EJ, Ashourvan A, Kim JZ, Betzel RF, Ciric R, Adebimpe A, Baum GL, He X, Ruparel K, Moore TM, Gur RC, Gur RE, Shinohara RT, Roalf DR, Satterthwaite TD, Bassett DS. 2020. Temporal sequences of brain activity at rest are constrained by white matter structure and modulated by cognitive demands. Commun Biol 3:261.

      Cunningham JP, Yu BM. 2014. Dimensionality reduction for large-scale neural recordings. Nat Neurosci 17:1500–1509.

      Deco G, Kringelbach ML, Jirsa VK, Ritter P. 2017. The dynamics of resting fluctuations in the brain: Metastability and its dynamical cortical core. Sci Rep 7:3095.

      Esterman M, Noonan SK, Rosenberg M, Degutis J. 2013. In the zone or zoning out? Tracking behavioral and neural fluctuations during sustained attention. Cereb Cortex 23:2712–2723.

      Esterman M, Rothlein D. 2019. Models of sustained attention. Curr Opin Psychol 29:174–180.

      Fagerholm ED, Lorenz R, Scott G, Dinov M, Hellyer PJ, Mirzaei N, Leeson C, Carmichael DW, Sharp DJ, Shew WL, Leech R. 2015. Cascades and cognitive state: Focused attention incurs subcritical dynamics. J Neurosci 35:4626–4634.

      Falahpour M, Chang C, Wong CW, Liu TT. 2018. Template-based prediction of vigilance fluctuations in resting-state fMRI. Neuroimage 174:317–327.

      Finn ES, Shen X, Scheinost D, Rosenberg MD, Huang J, Chun MM, Papademetris X, Constable RT. 2015. Functional connectome fingerprinting: Identifying individuals using patterns of brain connectivity. Nat Neurosci 18:1664–1671.

      Fortenbaugh FC, Degutis J, Germine L, Wilmer JB, Grosso M, Russo K, Esterman M. 2015. Sustained attention across the life span in a sample of 10,000: Dissociating ability and strategy. Psychol Sci 26:1497–1510.

      Fortenbaugh FC, Rothlein D, McGlinchey R, DeGutis J, Esterman M. 2018. Tracking behavioral and neural fluctuations during sustained attention: A robust replication and extension. Neuroimage 171:148–164.

      Fox MD, Snyder AZ, Vincent JL, Corbetta M, Van Essen DC, Raichle ME. 2005. The human brain is intrinsically organized into dynamic, anticorrelated functional networks. Proc Natl Acad Sci U S A 102:9673–9678.

      Gao S, Mishne G, Scheinost D. 2021. Nonlinear manifold learning in functional magnetic resonance imaging uncovers a low-dimensional space of brain dynamics. Hum Brain Mapp 42:4510–4524.

      Goodale SE, Ahmed N, Zhao C, de Zwart JA, Özbay PS, Picchioni D, Duyn J, Englot DJ, Morgan VL, Chang C. 2021. Fmri-based detection of alertness predicts behavioral response variability. Elife 10:1–20.

      Greene AS, Horien C, Barson D, Scheinost D, Constable RT. 2023. Why is everyone talking about brain state? Trends Neurosci.

      Greene DJ, Marek S, Gordon EM, Siegel JS, Gratton C, Laumann TO, Gilmore AW, Berg JJ, Nguyen AL, Dierker D, Van AN, Ortega M, Newbold DJ, Hampton JM, Nielsen AN, McDermott KB, Roland JL, Norris SA, Nelson SM, Snyder AZ, Schlaggar BL, Petersen SE, Dosenbach NUF. 2020. Integrative and Network-Specific Connectivity of the Basal Ganglia and Thalamus Defined in Individuals. Neuron 105:742-758.e6.

      Gu S, Pasqualetti F, Cieslak M, Telesford QK, Yu AB, Kahn AE, Medaglia JD, Vettel JM, Miller MB, Grafton ST, Bassett DS. 2015. Controllability of structural brain networks. Nat Commun 6:8414.

      Jayakumar M, Balusu C, Aly M. 2023. Attentional fluctuations and the temporal organization of memory. Cognition 235:105408.

      Ji E, Lee JE, Hong SJ, Shim W (2022). Idiosyncrasy of latent neural state dynamic in ASD during movie watching. Poster presented at the Society for Neuroscience 2022 Annual Meeting.

      Karapanagiotidis T, Vidaurre D, Quinn AJ, Vatansever D, Poerio GL, Turnbull A, Ho NSP, Leech R, Bernhardt BC, Jefferies E, Margulies DS, Nichols TE, Woolrich MW, Smallwood J. 2020. The psychological correlates of distinct neural states occurring during wakeful rest. Sci Rep 10:1–11.

      Liu X, Duyn JH. 2013. Time-varying functional network information extracted from brief instances of spontaneous brain activity. Proc Natl Acad Sci U S A 110:4392–4397.

      Liu X, Zhang N, Chang C, Duyn JH. 2018. Co-activation patterns in resting-state fMRI signals. Neuroimage 180:485–494.

      Lynn CW, Cornblath EJ, Papadopoulos L, Bertolero MA, Bassett DS. 2021. Broken detailed balance and entropy production in the human brain. Proc Natl Acad Sci 118:e2109889118.

      Margulies DS, Ghosh SS, Goulas A, Falkiewicz M, Huntenburg JM, Langs G, Bezgin G, Eickhoff SB, Castellanos FX, Petrides M, Jefferies E, Smallwood J. 2016. Situating the default-mode network along a principal gradient of macroscale cortical organization. Proc Natl Acad Sci U S A 113:12574–12579.

      Mesulam MM. 1998. From sensation to cognition. Brain 121:1013–1052.

      Munn BR, Müller EJ, Wainstein G, Shine JM. 2021. The ascending arousal system shapes neural dynamics to mediate awareness of cognitive states. Nat Commun 12:1–9.

      Raut R V., Snyder AZ, Mitra A, Yellin D, Fujii N, Malach R, Raichle ME. 2021. Global waves synchronize the brain’s functional systems with fluctuating arousal. Sci Adv 7.

      Rosenberg M, Noonan S, DeGutis J, Esterman M. 2013. Sustaining visual attention in the face of distraction: A novel gradual-onset continuous performance task. Attention, Perception, Psychophys 75:426–439.

      Rosenberg MD, Finn ES, Scheinost D, Papademetris X, Shen X, Constable RT, Chun MM. 2016. A neuromarker of sustained attention from whole-brain functional connectivity. Nat Neurosci 19:165–171.

      Rosenberg MD, Scheinost D, Greene AS, Avery EW, Kwon YH, Finn ES, Ramani R, Qiu M, Todd Constable R, Chun MM. 2020. Functional connectivity predicts changes in attention observed across minutes, days, and months. Proc Natl Acad Sci U S A 117:3797–3807.

      Saggar M, Shine JM, Liégeois R, Dosenbach NUF, Fair D. 2022. Precision dynamical mapping using topological data analysis reveals a hub-like transition state at rest. Nat Commun 13.

      Schaefer A, Kong R, Gordon EM, Laumann TO, Zuo X-N, Holmes AJ, Eickhoff SB, Yeo BTT. 2018. Local-Global Parcellation of the Human Cerebral Cortex from Intrinsic Functional Connectivity MRI. Cereb Cortex 28:3095–3114.

      Shine JM. 2019. Neuromodulatory Influences on Integration and Segregation in the Brain. Trends Cogn Sci 23:572–583.

      Shine JM, Bissett PG, Bell PT, Koyejo O, Balsters JH, Gorgolewski KJ, Moodie CA, Poldrack RA. 2016. The Dynamics of Functional Brain Networks: Integrated Network States during Cognitive Task Performance. Neuron 92:544–554.

      Shine JM, Breakspear M, Bell PT, Ehgoetz Martens K, Shine R, Koyejo O, Sporns O, Poldrack RA. 2019. Human cognition involves the dynamic integration of neural activity and neuromodulatory systems. Nat Neurosci 22:289–296.

      Smith SM, Fox PT, Miller KL, Glahn DC, Fox PM, Mackay CE, Filippini N, Watkins KE, Toro R, Laird AR, Beckmann CF. 2009. Correspondence of the brain’s functional architecture during activation and rest. Proc Natl Acad Sci 106:13040–13045.

      Song H, Emily FS, Rosenberg MD. 2021a. Neural signatures of attentional engagement during narratives and its consequences for event memory. Proc Natl Acad Sci 118:e2021905118.

      Song H, Park B-Y, Park H, Shim WM. 2021b. Cognitive and Neural State Dynamics of Narrative Comprehension. J Neurosci 41:8972–8990.

      Taghia J, Cai W, Ryali S, Kochalka J, Nicholas J, Chen T, Menon V. 2018. Uncovering hidden brain state dynamics that regulate performance and decision-making during cognition. Nat Commun 9:2505.

      Terashima H, Kihara K, Kawahara JI, Kondo HM. 2021. Common principles underlie the fluctuation of auditory and visual sustained attention. Q J Exp Psychol 74:705–715.

      Tian Y, Margulies DS, Breakspear M, Zalesky A. 2020. Topographic organization of the human subcortex unveiled with functional connectivity gradients. Nat Neurosci 23:1421–1432.

      Turnbull A, Karapanagiotidis T, Wang HT, Bernhardt BC, Leech R, Margulies D, Schooler J, Jefferies E, Smallwood J. 2020. Reductions in task positive neural systems occur with the passage of time and are associated with changes in ongoing thought. Sci Rep 10:1–10.

      Unsworth N, Robison MK. 2018. Tracking arousal state and mind wandering with pupillometry. Cogn Affect Behav Neurosci 18:638–664.

      Unsworth N, Robison MK. 2016. Pupillary correlates of lapses of sustained attention. Cogn Affect Behav Neurosci 16:601–615.

      van der Meer JN, Breakspear M, Chang LJ, Sonkusare S, Cocchi L. 2020. Movie viewing elicits rich and reliable brain state dynamics. Nat Commun 11:1–14.

      Van Essen DC, Smith SM, Barch DM, Behrens TEJ, Yacoub E, Ugurbil K. 2013. The WU-Minn Human Connectome Project: An overview. Neuroimage 80:62–79.

      Vidaurre D, Abeysuriya R, Becker R, Quinn AJ, Alfaro-Almagro F, Smith SM, Woolrich MW. 2018. Discovering dynamic brain networks from big data in rest and task. Neuroimage, Brain Connectivity Dynamics 180:646–656.

      Vidaurre D, Smith SM, Woolrich MW. 2017. Brain network dynamics are hierarchically organized in time. Proc Natl Acad Sci U S A 114:12827–12832.

      Yamashita A, Rothlein D, Kucyi A, Valera EM, Esterman M. 2021. Brain state-based detection of attentional fluctuations and their modulation. Neuroimage 236:118072.

      Yeo BTT, Krienen FM, Sepulcre J, Sabuncu MR, Lashkari D, Hollinshead M, Roffman JL, Smoller JW, Zöllei L, Polimeni JR, Fisch B, Liu H, Buckner RL. 2011. The organization of the human cerebral cortex estimated by intrinsic functional connectivity. J Neurophysiol 106:1125–1165.

      Yousefi B, Keilholz S. 2021. Propagating patterns of intrinsic activity along macroscale gradients coordinate functional connections across the whole brain. Neuroimage 231:117827.

      Zhang S, Goodale SE, Gold BP, Morgan VL, Englot DJ, Chang C. 2023. Vigilance associates with the low-dimensional structure of fMRI data. Neuroimage 267.

    1. Author Response

      Reviewer #1 (Public Review):

      Here the authors set out to disentangle neural responses to acoustic and linguistic aspects of speech. Participants heard a short story, which could be in a language they understood or did not (French vs. Dutch stories, presented to Dutch listeners). Additional predictors included a combination of acoustic and linguistic factors: Acoustic, Phoneme Onsets, Phoneme Surprisal, Phoneme Entropy and, Word Frequency. Accuracy of reconstruction of the acoustic amplitude envelope was used as an outcome measure.

      The use of continuous speech and the use of comprehended vs. uncomprehended speech are both significant strengths of the approach. Overall, the analyses are largely appropriate to answer the questions posed.

      1) The reconstruction accuracies (e.g., R^2 values Figure 1) seem lower perhaps than might be expected - some direct comparisons with prior literature would be welcome here. Specifically, the accuracies in Figure 1A are around .002-.003 whereas the range seen in some other papers is about an order of magnitude or more larger (e.g. Broderick et al. 2019 J Neurosci; Ding and Simon 2013 J Neurosci).

      We thank the reviewer for their constructive comments and careful review of our paper. The important point the reviewer makes stems from whether the reconstruction accuracies presented are from the whole brain/sensor space (as in our submission) or from selected channels (Broderick) or selected sources (Ding & Simon). Moreover, we used R2 score for reconstruction accuracy which is generally of a different order of magnitude than correlation coefficients (as used in Ding and Simon 2013). Crucially when we now selected the “auditory cortex,” we can also report reconstruction accuracies around the language network on the same scale as in the previous studies. In Figure 2 A and B (Figure 1 in the first version of the manuscript), we took the average of model accuracies of each source point over whole brain, without selecting any region of interest, to investigate if each speech feature is incrementally increasing the averaged model accuracy which was a more conservative method than selecting the sources with a stronger response to the stimuli (e.g., the average R2 value over all participants of acoustic model in auditory cortex for French stories is 0.01187 and it is 0.01315 for Dutch stories, which is similar in magnitude to e.g. Broderick et al. 2019 J Neuroscience). TRF accuracies on the brain regions outside of the language network are quite small, so the average accuracy on Figure 2 A and B is almost an order of magnitude lower than previous studies. (Ding and Simon 2013 J Neurosci : “To reduce computational complexity, the MEG sensors in each hemisphere were compressed into 3 components using denoising source separation”, averaged accuracy over all subjects is around 0.2 because they used both correlation as a measure of accuracy (not R2) and backward modeling (decoding) instead of forward modeling. Reconstruction accuracy of decoding models are usually higher than forward models; Broderick et al. 2019 J Neurosci: Averaged across frontocentral channels, averaged R2 over all subjects is 0.0171) Figure 2 C shows sources where accuracies of base acoustic model were significantly different than 0. Reconstruction accuracies around the language network is in the similar scale with the previous studies. Figure 2 D shows the sources where each feature significantly improved the reconstruction accuracy compared to the previous model. Accuracy values are smaller than the accuracies of base acoustic model because they are the values that shows how much each speech feature incrementally increased the accuracy. (E.g Phoneme onset accuracy = (Accuracy of the model Acoustic features + Phoneme Onset) – (Accuracy of the model Acoustic Features). Figure captions are updated on the manuscript.

      Figure 2. A) Accuracy improvement (averaged over the sources in whole brain) by each feature for Dutch Stories B) Accuracy improvement (averaged over the sources in whole brain) by each feature for French Stories (Braces in Figure A and B shows the significance values of the contrasts (difference between consecutive models, ** <0.0001, *** <0.001, <0.01, * < 0.05) in linear mixed effect models (Table 2 and 3) C) Source points where accuracies of base acoustic model were significantly different than 0 D) Source points where reconstruction accuracies of the model were significantly different than previous model. Accuracy values shows how much each linguistic feature increased the reconstruction accuracy compared to the previous model.

      2) One theoretical point relevant to this and similar studies concerns the use of acoustic envelope reconstruction accuracy as the dependent measure. On the one hand, reconstruction accuracy provides an objective measure of "success", and a satisfying link between stimulus and brain activity. On the other hand, as the authors point out, envelope reconstruction is probably not the primary goal of listeners in a conversation: comprehension is. Some discussion of the implications of envelope reconstruction accuracy might be useful in guiding interpretation of the current work, and importantly, helping the field as a whole grapple with this issue.

      Overall, the results support the authors' conclusions that acoustic edges and phoneme features are treated differently depending on whether a listener comprehends the language being spoken. In particular, phoneme features contribute to a greater degree when language is comprehended, whereas acoustic edges contribute similarly regardless of comprehension. These findings are important in part because of prior work suggesting that acoustic edges are critically important for "chunking" continuous speech into linguistic units; the current results re-center language units (phonemes) as critical to comprehension.

      Reviewer #2 (Public Review):

      In this study, the authors used an audiobook listening paradigm and encoding analysis of MEG to examine the independent contributions to MEG responses of putative acoustic and phoneme-level linguistic features in speech and their modulation by higher-level sentence/discourse constraints and language proficiency. The results indicate that:

      1) Acoustic and phoneme features do indeed make independent contributions to MEG responses in frontotemporal language regions (with a left-hemisphere bias for phoneme features).

      2) Brain responses to acoustic and phoneme features are enhanced when sentence/discourse constraints are low (i.e. when word entropy is high).

      3) While brain responses to phoneme features are enhanced when the language is comprehended (or word entropy is high), the opposite is observed for acoustic features.

      These results are taken to support widely held views on the nature of information flow during language processing. On the one hand, processing is hierarchical, consistent with finding 1 above. On the other hand, information flow between lower and high-levels of language processing is also flexible and interactive (finding 2) and modulated by behavioural goals (finding 3).

      This is a methodologically sophisticated study with useful findings that I think will be of interest to the burgeoning community investigating 'neural speech tracking' and also to the wider community interested in language processing and predictive coding. Moreover, the evidence appears convincing.

      I thought the impact was somewhat limited by the results presentation, which I think missed some key details and made the study somewhat hard to follow (but this issue can be addressed).

      Perhaps more major, I do wonder about the novelty of the study as each of the main findings has precedent in the literature. Finding 1 (e.g. Brodbeck, Simon et al.), Finding 2 (e.g. Broderick, Lalor et al.; Molinaro et al.), Finding 3 (e.g. Brodbeck, Simon et al. although here the manipulation of behavioural goals was through a cocktail party listening manipulation and there were was no opposing modulation of acoustic vs phoneme level representations). Thus, while the study appears well executed, overall I am unsure how significant the advance is. Related to this point, the study's findings and theoretical interpretations (e.g. the brain as a hierarchical 'filter') are consistent with widely held views of language processing (at least within cognitive neuroscience) and so again I question the potential advance of the study.

      We are thanking the reviewer for bringing this up. While we started our work with the aim to replicate these patterns seem in the literature – which is especially important in the burgeoning area of neural tracking of speech and language - our key extension of these findings is that we can show that phonemic features are encoded more strongly both in a comprehended language compared to an uncomprehended language, and as a function of word-level statistical information, and that there is a tradeoff between acoustic and linguistic features encoding. As the Reviewer mentions, there is a patchwork of consistent findings from very different experimental circumstances, but in order to have strong evidence for the “tradeoff” of hierarchical feature encoding, it is even more crucial to have a design where features can directly compared as we do, and where acoustic differences are carefully controlled in contrast to the presence of linguistic features and language comprehension.

      While our results are consistent with Molinaro et al. (2021). – as we also provide support for a cost minimization perspective rather than the perception facilitation perspective discussed in Molinaro et al. - it is important to note that Molinaro et al. only examined the tracking of acoustic features, specifically the speech envelope, using the Phase Locking Value, and did not examine the contribution of lower-level linguistic features. Secondly, Molinaro et al. use a condition-based experimental design in contrast to our naturalistic stimulus approach. In our study, our aim was to investigate the dynamics of encoding both acoustic and linguistic features, and we utilized a multivariate linear regression method on low and high constraining words which ‘naturally’ occurred in our audiobook stimulus across languages. Our results revealed a trade-off between the encoding of acoustic and linguistic features that was dependent on the level of comprehension. Specifically, in the comprehended language, the predictability of the following word had a greater influence on the tracking of phoneme features as opposed to acoustic features, while in the uncomprehended language, this trend was reversed. To best of our knowledge, Brodbeck et al. (2020) showed an effect of attention on the tracking of acoustic features only in cocktail party problem but didn’t investigate the encoding of linguistic features. Brodbeck et al. (2018) showed that linguistic features are represented only in the attended speech but they didn’t explicitly compare the acoustics features as in the previous study. Both studies used a mixed speech and investigated the effect of attention rather than comprehension. In our study, we investigated the effect of comprehension where both stimuli were attended. We found that even in the uncomprehended language, linguistic features are represented as opposed to unattended speech in Brodbeck et al. (2018) study, however it was less strong than the comprehended language. Additionally, one of the goals in this study was to investigate the effect of context on the representations of acoustic and phoneme level features. Opposing modulation of acoustic and phonemic features in our study was driven by the contextual information. However, as we also mentioned in the discussion, we don’t expect the effect of context on the uncomprehended language so the modulation of acoustic features could be related to statistical chunking of acoustic signal for frequent words, essentially reflecting recognition of those single function words such as le, la, un, une.

      We have now revised the Discussion (we revised manuscript as highlighted in red in this text) to clarify the advance of this study and how this study adds more on previous studies.

    1. Author Response

      eLife assessment

      Mizukami et al. propose a scenario for the evolutionary origin of the coronary artery in amniotes by comparing the morphologies of the vasculatures across several species and developmental timepoints. They show that the coronary arteries of non-amniotes most closely resemble embryonic amniote aortic subepicardial vessels (ASVs), which are replaced by the true coronary arteries during amniote development. While the identification of common vascular structures in diverse taxa is a valuable contribution, additional developmental evidence is needed to confirm that such vessels are truly homologous.

      We have extensively revised our paper by including additional animal data and references. While we were unable to obtain useful data on lungfish or coelacanth, we have obtained new data related to the physiology of coronary artery, which has been added to Fig. 7. We have also attempted to compare blood vessels at the molecular level, but found that gene expression patterns in blood vessels throughout the body were not always conserved between lineages, making it difficult to make comparisons between amphibians and amniotes. However, based on comparative morphological analysis using newly added three-dimensional data, it is reasonable to consider the amniotes' ASVs and amphibians' ASV-like vessels to be homologous.

      Reviewer #3 (Public Review):

      Mizukami et al. compare the structure of the coronary arteries in multiple species of amniotes, amphibians, and fish. By selecting species from each of these taxa, the authors were able to evaluate modifications to the coronary arteries during key evolutionary transitions. In mice and quail, they show two populations of vessels that are visible on the developing heart-true coronary arteries on the ventricle and a second population of vessels on the outflow tract known as the ASV., They found that in amphibians, outflow tract vessels were present but ventricular coronary arteries were completely absent. In zebrafish (a more ancestral species) an arterial branch off the rostral section of the hypobranchial artery was shown to have similar anatomical features to outflow tract vessels found in higher organisms. These zebrafish outflow tract arteries also appeared conserved in several chondriichthyes specimens. The authors conclude that rearrangement of the outflow tract vasculature or hypobranchial arteries in fish during evolution, could be homologous to the ASV population of coronary arteries in amphibians and amniotes. These data give new insight into the evolutionary origins of the coronary vasculature. 

      Major Points

      1) The manuscript presents important data on the coronary vascular structure of several different species. However, these data alone do not conclusively demonstrate whether the developmental origins of ASV like vessels are homologous. Therefore, care should be taken when concluding that the outflow tract vessels found in all different species are conserved features. While this is a reasonable hypothesis and should be presented, the manuscript could be improved by also discussing alternate explanations. For example, ASVs in mice originate during embryonic development, while in fish and amphibians outflow tract vessels are formed only in mature animals.

      We have added data on mice and amphibians (e.g., Fig. 2) and substantially revised the overall development and discussion of the paper. Morphological homology is evident for ASVs and amphibian ASV-like vessels, but the homologous relationship with the hypobranchial artery only suggests a similarity in the embryonic region.

      Comparisons of developmental timings of the various structures among diferent lineages of vertebrates reveal that heterochronical shifts are not uncommon. For example, ossification of the head skeleton and vertebrae occurs during the fetal stage in amniotes, but after hatching in larval amphibians and teleosts. A similar trend is observed in the development of the limb bud (paired fins). Overall, the larval stages of amphibians and teleosts are comparable to the fetal stages of amniotes for many structures. We did not suppose this to be particularly unusual, and we did not include it in the text.

      2) Figure 3 A-D: The authors state that "the ASV ran through the outflow tract, then entered the aortic root before reaching the ventricle to form a secondary orifice". Do the authors have serial sections to conclude that the vessel branching off the carotid runs the length of the aorta and is continuous with an orifice at the aortic root? The endothelial projection off the aorta in panel C could reasonably be an independent projection. For example, Chen et al., described similar looking projections in the base of the aorta that were not attached to external vessels. A whole mount approach would be the most convincing to show the attachments of the ASV vessel.

      We added the data of the whole-mount immunohistochemistry. Please refer Figs. 2 and S2.

      3) Figure 3E: Similar as above, how is it concluded that the orifice is continuous with the ASV and that this projection is not the coronary artery stem?

      As for quail, we could not achieve as a clear whole-mount staining as in mice. It was also difficult to trace the route in sections because in quail, ASVs are not restricted to a few lines as in mice, but are the plexus of small vessels. Thus, we added the detailed data from mice (Fig. 2, S2) and we emphasized that the position of orifice in quail is exactly same as that in mice.

      4) The discussion section could be improved by making some statements more consistent, using more precise or appropriate terminology accepted in the field, and being more cognizant of how the authors' findings fit within the history of the field. For example, when referring to coronary arteries, please clarify whether this refers to ASV/ outflow tract coronary arteries, or true ventricular coronary arteries. In addition, the first sentence of the discussion makes it seem like the origins of coronary arteries were unknown prior to this study, however, their origins have been described in multiple papers previously. The authors could revise their statement to acknowledge these previous findings.

      We rewrote the entire text to clarify what each "coronary artery" refers to. We also changed the first section of the discussion as suggested by the reviewer.

    1. Author Response:

      The following is the authors' response to the current reviews.

      We appreciate the thoughtful critiques of the reviewers. While we agree that performing additional experiments and analyses probing the sensitivity of the technique would be useful for future studies, we are unable to perform additional experiments as our lab has closed. We share this technique as a starting point for further investigation, but it may need to be modified for success in other contexts. We have provided details of the scenarios (life stage, feeding, day, number of ticks) where we successfully sequenced B. burgdorferi from ticks, as well as one where we did not (unfed nymphs) as a starting point. We will clarify in proofing that our qPCR experiments show that we capture the vast majority of B. burgdorferi flaB mRNA from our input samples, suggesting that we are likely capturing the majority of the B. burgdorferi.

      In this work, we were most interested in using RNA-seq to perform differential expression analysis between annotated mRNAs across our timepoints. We have provided the number of genes detected in each sample (92% of annotated transcripts on average) as well as the median number of reads covering each gene (604 on average) in the supplemental file containing sequencing statistics. This coverage is highly reproducible across replicates, with an average Pearson correlation of 0.99 between gene expression levels (as Transcripts Per Million) between any two replicates. These data and the fact that many of the gene expression changes we observed align with previous observations of others give us confidence in our differential expression analysis. For those interested in tRNAs or sRNAs, we think that it would be best to modify the protocol to focus specifically on capturing those sequences in the library preparation. We encourage others interested in other aspects of our data to download it and explore it.

      We will correct remaining wording issues in proofing.

      —————

      The following is the authors' response to the original reviews.

      Dear Reviewing Editor,

      We thank you and the reviewers for the thoughtful comments on our manuscript, and we are excited to submit a revised version of our manuscript “Longitudinal map of transcriptome changes in the Lyme pathogen Borrelia burgdorferi during tick-borne transmission.” In response to the reviews, we have made the following changes to our manuscript:

      1. We updated the text for increased clarity around experimental details, including statistical analyses.

      2. We added additional details about the mapping of non-Bb reads as well as more information about Bb read coverage.

      3. We compared our differentially expressed genes to 4 previous studies of global transcriptional changes in different tick feeding contexts.

      4. We updated the discussion to address these comparisons as well as caveats of our study more directly.

      Please see our responses to individual comments below.

      Reviewer #1 (Public Review):

      In this study, Sapiro et al sought to develop technology for a transcriptomic analysis of B. burgdorferi directly from infected ticks. The methodology has exciting implications to better understand pathogen RNA profiles during specific infection timepoints, even beyond the Lyme spirochete. The authors demonstrate successful sequencing of the B. burgdorferi transcriptome from ticks and perform mass spectrometry to identify possible tick proteins that interact with B. burgdorferi. This technology and first dataset will be useful for the field. The study is limited in that no transcripts/proteins are followed-up by additional experiments and no biological interactions/infectious-processes are investigated.

      Critiques and Questions:

      We thank the reviewer for these thoughtful critiques and helping us improve our manuscript.

      This study largely develops a method and is a resource article. This should be more directly stated in the abstract/introduction.

      We edited the abstract and introduction to more directly state that we are sharing a new method and a resource for future investigations. (Lines 29-32; 101-103)

      Details of the infection experiment are currently unclear and more information in the results section is warranted. State the species of tick and life-stage (larval vs nymphal ticks) used for experiments. For RNA-seq, are mice are infected and ticks are naïve or are ticks infected and transmitting Borrelia to uninfected mice?

      We updated the results section to more clearly state the tick species and life stage and to make it more clear that infected ticks are transmitting Bb to naïve mice. (Lines 113-115)

      What is the limit of detection for this protocol? Experimental data should be provided about the number of B. burgdorferi required to perform this approach.

      We performed this protocol on pools of 6 (for later feeding stages) to 14 (for early stages) infected nymphs. Published studies (PMID: 7485694, PMID: 11682544) suggest that one day after attachment, there may be a few thousand Bb per tick, suggesting what we’ve measured here may come from on the order of 104 Bb. We were not able to capture consistent data from Bb from unfed ticks, which may be due to lower numbers or to an altered transcriptional state caused by lack of nutrients in the unfed tick. We updated the discussion to reflect some of these limitations and uncertainties. (Lines 461-465)

      More information regarding RNA-seq coverage is required. Line 147-148 "read coverage was sufficient"; what defines sufficient? Browser images of RNA-seq data across different genes would be useful to visualize the read coverage per gene. What is the distribution of reads among tRNAs, mRNAs, UTRs, and sRNAs?

      As we were interested in differential expression analysis, we defined sufficient as the number of reads needed per gene to determine statistically significant expression changes across days, which with DESeq2 is typically 10 reads. We reworded this section for clarity and added additional information about the median number of reads per gene which is also useful in thinking about differential expression analysis. (Lines 163-170) As we chose to focus on differential expression analysis here, we believe these are most relevant metrics to cover.

      My lab group was excited about the data generated from this paper. Therefore, we downloaded the raw RNA-seq data from GEO and ran it through our RNA-seq computational pipeline. Our QC analysis revealed that day 4 samples have a different GC% pattern and that a high percentage of E. coli sequences were detected. This should be further investigated and addressed in the paper: Are other bacteria being enriched by this method? Why would this be unique to day 4 samples? Does this affect data interpretation?

      We appreciate the interest in our data and pointing out this anomaly. We found that the day 4 samples do have a high percentage of reads that mapped to a bacterial species, Pseudomonas fulva, rather than ticks as we expected. (The reads that map to E. coli also map to P. fulva.) We have updated the results to include this information (Lines 156-165). We believe this is likely due to contamination from collecting ticks after they have fallen off mice in cages on day 4, rather than pulling ticks off the mice as in days 1-3. Unfortunately, as our lab has shut down, we cannot investigate the source further. We do think the high percentage of P. fulva reads suggests that other bacteria can be enriched with the anti-Bb antibody we used. We’ve updated the discussion to highlight this caveat. (Lines 459-460)

      While the presence of these bacterial reads did lower our overall Bb mapping rate and necessitate deeper sequencing for the day 4 samples, the Bb sequencing coverage of these samples is on par with samples from the other days in terms of percentage of genes with at least 10 reads and median number of reads per gene. Fewer than 0.0002% of the reads that map to Bb genes in any day 4 sample also map to P. fulva. We found that this small fraction of reads is dispersed across 334 genes in which an average of 0.05% (maximally 2.3%) of day 4 reads also map to P. fulva. Therefore, these bacterial reads do not change our interpretation of the results comparing gene expression across days, including day 4.

      Comprehensive data comparisons of this study and others are warranted. While the authors note examples of known differentially expressed genes (like lines 235-241), how does this global study compare to other global approaches? Are new expression patterns emerging with this RNA-seq approach compared to other methods? What differences emerged from day 1 to day 4 ticks compared to differences observed in unfed to fed ticks or fed ticks to DMC experiments? Directly compare to the following studies (PMID: 11830671; PMID: 25425211; PMID: 36649080.

      We added comparisons of our list of DE genes to those noted to change between “unfed tick” and “fed tick” culture conditions (PMID: 11830671 and 12654782), as well as fed nymph to DMC (PMID: 25425211 and 36649080) (Lines 231-252, Figure S4). These comparisons pointed us to two main findings: that global changes to Bb in different culture conditions generally agreed with the most dramatic changes we saw in our data, and that the timing of expression increases during feeding may relate to whether genes are more highly expressed in fed ticks or in mammalian conditions. Overall, the majority of our DE genes have been identified in at least one of these studies or in the other studies we compared to outlining RpoS, Rrp1, and RelBbu regulons. As many of these studies were asking slightly different questions and using different conditions and vastly different technology, we would expect some differences to arise from different contexts and some to be purely technical. The genes that were not seen in these previous studies tended to follow the same functional patterns we saw overall, heavily skewing towards genes of unknown function, outer surface proteins, and a handful of genes related to other functions. With the current state of the functional annotation of the genome, it is difficult to assess whether these amount to new expression patterns in and of themselves, so we focused on the overall trends in our data rather than those that were different from other studies.

      Details about the categorization of gene functions should be further described. The authors use functional analysis from Drechtrah et al., 2015, but that study also lacks details of how that annotation file was generated. Here, the authors have seemed to supplement the Drechtrah et al., 2015 list with bacteriophage and lipoprotein predictions - which are the same categories they focus their findings. Have they introduced a bias to these functional groups? While it can be noted that many lipoproteins are upregulated (or comment on specific genes classes), there are even more "unknown" proteins upregulated. I argue that not much can be inferred from functional analysis given the current annotation of the B. burgdorferi genome.

      We strongly agree that the current annotation of the Bb genome makes it difficult to perform meaningful global functional analysis, but we feel it is useful to get a general overview of gene functions. We described our methods for classifying genes into functional categories in the methods, in which we relied on previously published papers to make our best estimate of gene category (noted for each gene in the Table S4). Due to the lack of annotations for many genes, we focused on the relatively well-defined category of lipoproteins, as these are overrepresented as a group in our upregulated genes, as well as phage genes, which are not necessarily overrepresented, but are still interesting to us. We hope that others will look at the data (particular in Table S4, but also Table S3, or download the raw data and do their own analysis) with their own interests and biases and dig more into genes that we did not highlight specifically. We provide this data as a resource with the hope that some of the genes of unknown function that we see change here will be the subject of future functional studies so that this is less of problem in the future.

      Reviewer #1 (Recommendations For The Authors):

      In general, the paper is well written and digestible for a broad audience. However, some of the figure graphics are unnecessary and take away from the data. Please label tick species and tick life-stage in Figure 1 drawings. The legend of Figure 1 requires citations. The Figure 4B graphic is unnecessary and the colors are confusing as they are too similar to the color palette of Figure 4A, where the colors have meaning. The Figure 5A graphic is unnecessary and takes away from the data embedded within it.

      We more clearly labeled the species in Figure 1 and added citations to the legend. We have simplified Figures 4A and 5A for clarity.

      Clarify lines 220-259 and Figure 3. What days are being compared? Downregulated genes should also be commented on.

      We considered our set of differentially expressed genes as those that changed two-fold (multiple hypothesis adjusted p-value < 0.05) in any of the three comparisons shown in Figure 2 (day1 to day2, day1 to day3, day1 to day4). We clarified this at multiple points in the results (i.e Line 273). We commented on downregulated genes throughout, although as there were fewer genes and the magnitude of change was smaller, we focused more on upregulated genes.

      Line 327-329, state numbers not percentages. How many Bb proteins were actually detected?

      We updated this section to include numbers (Lines 371-374). In concordance with our sequencing data, we found (and were looking for) mainly tick proteins in this experiment.

      Data availability: B. burgdorferi and tick oligo sequences used for DASH should be provided in a supplemental table.

      We added a supplemental table of these sequences (Table S9). Please note they have been previously published in Dynerman et al. 2020 and Ring et al. 2022.

      Reviewer #2 (Recommendations For The Authors):

      The manuscript is overall well written and easy to follow. The data are compelling and support the conclusions. The discussion of this work is however highly insufficient and needs to be thoroughly edited:

      - Statistical analysis: The authors mention that DESeq2 was used. Please provide information on the type and the stringency of the tests used for differential gene expression analysis, including any additional potential correction for p-values (Bonferroni). The authors mention that genes with fold changes >2 were used for analysis, yet there is no information on the p-value cut off or if the genes with fold changes >2 were statistically significant. Please provide detail and rationale for the analysis.

      We clarified in the results and methods (Lines 200, 642-644) that we required a adjusted p-value < 0.05 from DESeq2’s Wald test with Benjamini-Hochberg correction along with a two-fold change when determining our genes of interest. As small fold changes showed statistically significant differences, we chose to set a fold change cutoff in most of our analysis to help us focus on the most highly expressed genes, like other studies we compared our data to. We included all of the DESeq2 results in Table S3 so that others may explore the data with different cutoffs if desired.

      - The field has been generating data on gene expression in ticks for decades. Yet, many of these studies are not referenced here. There is no discussion of how the data described here compares to what is known in the literature. For example, Venn diagrams or tables could be included for comparison with the data described lines 208-216. Extensive description and comparison of the data to the literature should be added in the discussion, and similarities/discrepancies should be discussed appropriately.

      We added additional comparisons to four different papers looking at global gene expression in Bb in the fed tick or tick-like culture conditions (Lines 231-252, Figure S4). This information as well as comparisons to transcriptional regulons (Figure S3) is available in Table S4. In addition to discussing some examples in the results, we added more information in the discussion regarding these comparisons (Lines 420-425). The majority of the genes that we see change over feeding have been previously noted to change expression during the enzootic cycle or be regulated by transcriptional programs active during this timeframe, and we have more clearly stated that. We focused on similarities here as these papers all ask slightly different questions in different contexts and use different technology which could all account for the many differences in individual genes between all of them and our work.

      - There is no discussion of the caveats of the study: for example, the authors are using an anti-OspA antibody, which could induce bias. The authors provide in-vitro pull down data supporting that this should not be an issue, but the pull down is performed from BSK-grown bacteria. This caveat should be discussed.

      We’ve added a paragraph to the discussion including this caveat and others (Lines 453-463).

      - Timing of RNA extraction: There is over 1h of delay between initial tick collection and RNA fixation. The effects of time on gene expression should be discussed.

      Although we were able to show that this timeframe did not affect cultured Bb gene expression, we added this to the discussion.

      - Gene expression is compared to Day 1. This introduces analyses bias as it does not allow identification of transcripts that first change upon initial feeding. This caveat should also be discussed

      We added this caveat – that we may miss gene expression changes in the first 24 hours of feeding – to the discussion.

      - This study is performed with 1 strain of B. burgdorferi on one tick species. Please provide perspective on the impact of these findings on Lyme disease causing spirochetes and their vectors broadly.

      We believe this method could be easily adaptable to study gene expression in other spirochete/vector pairs to determine similarities and differences and we added a comment to the discussion.

      - The discussion should also include insights on how to build on this work and include additional areas of method development to increase the recovery of B. burgforferi from ticks or other organisms and facilitate future transcriptomic studies.

      We added a few ideas to the discussion noting that this protocol could be modified for use in other timeframes, with other antibodies, or in other organisms. We also highlight the recent advent of TBDCapSeq by Grassmann et al. that may be used in conjunction with this type of protocol.

      Minor comments:

      - Consider re-wording the description of the methods and findings to the third person for coherence.

      The majority of the methods are now written in third person.

      - Over 90% of the reads did not map to B. burgdorferi: please provide additional information on what these reads mapped to (tick or mouse), and if the data reflects what is known in the literature

      We have updated the results and discussion with information about the reads that do not map to Bb (Lines 156-166). The majority of reads mapped the tick genome, which is what we expected. While a large number of reads in our day 4 samples unexpectedly mapped to Pseudomonas fulva, we do not believe this affects the interpretation of our data as we were still able to get broad genome coverage of Bb in these samples.

      - Please be more clear in the result section on the life stage of the ticks used for these studies.

      We have updated the results to clarify throughout.

      - Indicate how many total reads were generated for each sample

      This information is present in Table S1.

      - Provide statistical analyses for Figures 1C and D.

      We added t tests to determine statistical differences for these panels.

      Reviewing Editor (Recommendations for The Authors):

      1. It is important to mention in the abstract (line 27) that 'upregulated genes' is in comparison to day 1. This is also true in the introduction (lines 92-93).

      We updated in the results and introduction to more clearly include that day 1 is our baseline measurement.

      2. It is also important to discuss in the manuscript that because your 'controls' are day 1 samples, initial transcriptome changes in response to the tick environment might be missed.

      This has been added in the discussion as a caveat (Lines 460-463).

      3. As someone who does not work with Bb, I would like to have seen a clearer description of what the feeding event looks like. Although there is some text in the introduction that touches on that ('prolonged nature of I. scapularis feeding'), I would like to see something even clearer. Maybe stating that feeding may take from x-y days would clarify that for the non-specialist.

      We updated the results to more clearly state that the tick falls off of the mice by around 4 days after feeding, our last time point (Lines 113-115). Additional details of tick feeding are also in the Figure 1 legend.

      4. In Fig. 3 linear DNA molecules seem to be drawn to scale. Is that also the case for plasmids? This could be clarified in the legend.

      The genome is drawn approximately to scale. We noted this and updated the legend with more information about how linear and circular plasmid names denote their size.

      5. Figure 5C: Colors are a bit confusing here. The legend indicates that they refer to fold changes, but the scale in the panel shows expression levels, not fold changes. Please clarify. Also, is this really TPM or RPKM? If comparisons of relative levels between different genes are made, number of reads should be normalized by gene length.

      The heatmap in Figure 4C does show expression levels, and we updated the legend to more clearly state this. The highlighted gene names are meant to show which genes change two-fold during this time (those present in panel A). The data are presented as TPM (transcripts per million), which, like RPKM, is normalized by gene length (PMID: 20022975).

    1. Author Response:

      The following is the authors' response to the original reviews.

      We have now incorporated the changes recommended by the reviewers to improve the interpretations and clarity of the manuscript. We are grateful for their thoughtful comments and suggestions, which have significantly strengthened the manuscript.

      Reviewer #1 (Public Review):

      Park et al demonstrate that cells on either side of a BM-BM linkage strengthen their adhesion to that matrix using a positive feedback mechanism involving a discoidin domain receptor (DDR-2) and integrin (INA-1 + PAT-3). In response to its extracellular ligand (Collagen IV/EMB-9), DDR-2 is endocytosed and initiates signaling that in turn stabilizes integrin at the membrane. DDR-2 signaling operates via Ras/LET-60. This work's strength lies in its excellent in vivo imaging, especially of endogenously tagged proteins. For example, tagged DDR-2:mNG could be seen relocating from seam cell membranes to endosomes. I also think a second strength of this system is the ability to chart the development of BM-BM linkage over time based on the stages of worm larval development. This allows the authors to show DDR signaling is needed to establish linkage, rather than maintain it. It likely is relevant to many types of cells that use integrin to adhere to BM and left me pondering a number of interesting questions.

      We thank the reviewer for highlighting the strengths and impact of our work in expanding our understanding of tissue linkages and how DDR and integrins might work in other contexts.

      For example: (1) Does DDR-2 activation require integrin? Perhaps integrin gets the process started and DDR-2 positively reinforces that (conversely is DDR-2 at the top of a linear pathway)?

      DDR activation by receptor clustering upon exposure to its ligand collagen is well documented (Juskaite et al., 2017 eLife PMID: 285ti0245). Clustered DDR is rapidly internalized into endocytic vesicles, where full activation of tyrosine kinase activity is thought to occur (Fu et al., 2013 J Biol Chem PMID: 23335507). Supporting this model, we found that concentrated type IV collagen is required for vesicular DDR-2 localization in the utse and seam cells at the utse-seam connection. Whether DDR-2 activation requires integrin has not been fully established. However, one study using mouse and human cell lines showed that DDR1 activation occurs independent of integrin (Vogel et al., 2000 J Biol Chem PMID: 10681566), consistent with the latter possibility raised by the reviewer that DDR-2 is upstream of integrin.

      To test these hypotheses, we require an experimental condition where loss or near complete loss of INA- 1 integrin is achieved by the mid-to-late L4 larval stage, when DDR-2 is activated by collagen and taken into endocytic vesicles. Currently, we can only partially deplete INA-1 by RNAi (Figure 5—figure supplement 2E), and strong loss of function mutations in ina-1 result in early larval arrest and lethality (Baum and Garriga, 1titi7 Neuron PMID: ti247263). To overcome these obstacles, we are adapting the new FLP-ON::TIR1 system developed for precise spatiotemporal protein degradation in worms (Xiao et al., 2023 Genetics PMID: 36722258). We hope to achieve a near complete knockdown of ina-1 with this timed depletion strategy. In the future, we will use this system to block DDR-2 and integrin function specifically in the utse or seam cells, to complement our current dominant negative mis-expression approach.

      (2) In ddr-2(qy64) mutants, projections seem to form from the central portion of the utse cell. Does this reveal a second function for DDR-2, regulating perhaps the cytoskeleton?

      We thank the reviewer for their observation and agree with their interpretation. We think it is important to comment on this and have stated in the results text, lines 208-212: “In addition, membrane projections emanating from the central body of the utse were detected in ddr-2(qy64) animals. These projections were first observed at the mid L4 stage and persisted to young adulthood (Figure 2C). These observations suggest that DDR-2 functions around the mid L4 to late L4 stages to promote utse-seam attachment, and that DDR-2 may also regulate utse morphology.”

      And (3) can you use the forward genetic tools available in C. elegans to find new genes connecting DDR-2 and integrin?

      This is an excellent suggestion. We found that loss of ddr-2 strongly enhanced the uterine prolapse (Rup) defect caused by RNAi mediated depletion of integrin. To find new genes connecting DDR-2 and integrin, a targeted screen for the Rup phenotype could be performed in an integrin reduction of function condition. As we cannot work with null or strong loss-of-function ina-1 alleles (described above), the screen could be conducted with either timed depletion of INA-1 with candidate RNAi treatments, or combinatorial ina-1 RNAi with candidate RNAi treatments.

      I do see two areas where the manuscript could be improved. First, the authors rely on imprecise genetic methods to reach their conclusions (i.e. systemic RNAi, or expression of dominant negative constructs.) I think their conclusion would be stronger if they used tissue specific degradation to block ddr-2 function specifically in the utse or seam cells. Methods to do this are now regularly used in C. elegans and the authors have already developed the necessary tissue-specific promoters.

      We agree with the reviewer that tissue specific degradation of DDR-2 in the utse and seam cells will complement and strengthen our evidence for the site of action of DDR-2. As described earlier, we are currently adapting the FLP-ON::TIR1 tissue degradation system to perform these experiments and will provide our findings in a follow-up manuscript.

      Second, the manuscript is presented in the introduction as a study on formation and function of BM-BM linkage. The authors start the discussion in a similar manner. But their results are about adhesion between cells and BM. In fact they show the BM-BM linkage forms normally in ddr-2 mutants. Thus it seems like what they have really uncovered is an adhesion mechanism that works in parallel to the BM-BM linkage. Since ddr-2 appears to function equally in both utse + seam cells (based on their dominant negative data), there are likely three layers of adhesion (utse-BM, BM-BM, BM-seam) and if any of those break down, you get a partially penetrant rupture phenotype.

      The reviewer raises an important and interesting point, and we agree that we did not articulate the organization of the utse-seam tissue connection clearly. The utse-seam connection is comprised of the utse and seam BMs each ~50nm thick, and a connecting matrix bridging the two BMs, which is ~100nm thick (Vogel and Hedgecock, 2001 Development PMID: 11222143). Type IV collagen builds up to high levels within the connecting matrix and links the utse and seam BMs, and its concentration is required for DDR-2 vesiculation. An important point we did not highlight is that type IV collagen is approximately 400 nm long (Timpl et al. 1ti81, Eur J Biochem PMID: 6274634). Thus, collagen molecules within the connecting matrix could span the entire length of the utse-seam connection and project into the utse and seam BMs to interact with cell surface receptors. Consistent with this possibility, we found that buildup of type IV collagen that spans the utse-seam BM-BM linkage correlated with the timing of DDR-2 activation/vesiculation within utse and seam cells. In addition, super-resolution imaging of the mouse kidney glomerular basement membrane (GBM), a tissue connection between endothelial BM and epithelial (podocyte) BM, showed type IV collagen, which spans the BMs, projects into the endothelial and podocyte BMs (Suleiman et al., 2013 eLife PMID: 24137544 ). We carefully considered these points to generate the schematics in Figure 1A and Figure 8, but failed to articulate this point in the manuscript. We are grateful for the reviewer for bringing up our error and have now stated these details in the text to address the reviewer’s concern as outlined below.

      In the introduction (lines ti3-ti6): “A BM-BM tissue connection between the large, multinucleated uterine utse cell and epidermal seam cells stabilizes the uterus during egg laying. The utse-seam connection is formed by BMs of the utse and the seam cells, each ~50 nm thick, which are bridged by an ~100 nm connecting matrix (Vogel and Hedgecock 2001, Morrissey, Keeley et al. 2014, Gianakas, Keeley et al. 2023).”

      In the discussion (lines 507-520): “We also found that internalization of DDR-2 at the utse-seam connection correlated with the assembly of type IV collagen at the BM-BM linkage and was dependent on type IV collagen deposition. Type IV collagen is ~400 nm in length and the utse-seam connecting matrix spans ~100 nm, while the utse and seam BMs are each ~50 nm thick (Timpl, Wiedemann et al. 1ti81, Vogel and Hedgecock 2001). Thus, collagen molecules in the connecting matrix could project into the utse and seam BMs to interact with DDR-2 on cell surfaces. Consistent with this possibility, super- resolution imaging of the mouse kidney glomerular basement membrane (tiBM), a tissue connection between podocytes and endothelial cells, showed type IV collagen within the tiBM projecting into the podocyte and endothelial BMs (Suleiman, Zhang et al. 2013). As DDR-2 is activated by ligand-induced clustering of the receptor (Juskaite, Corcoran et al. 2017, Corcoran, Juskaite et al. 201ti), it suggests that the BM-BM linking type IV collagen network, which is specifically assembled at high levels, clusters and activates DDR-2 in the utse and seam cells to coordinate cell-matrix adhesion at the tissue linkage site.”

      These concerns do not undercut the significance of this work, which identifies an interesting mechanism cells use to strengthen adhesion during BM linkage formation. In fact, I am excited to read future papers detailing the connection between DDR-2 and integrin. But before undertaking those experiments the authors should be certain which cells require DDR-2 activity, and that should not be determined based solely on mis expression of a dominant negative.

      We thank the reviewer for recognizing the significance of our work and reiterate that we will use tissue-specific degradation for site of action experiments in future studies on the biology of the utse- seam tissue linkage.

      Reviewer #2 (Public Review):

      This paper explores the mechanisms by which cells in tissues use the extracellular matrix (ECM) to reinforce and establish connections. This is a mechanistic and quantitative paper that uses imaging and genetics to establish that the Type IV collagen, DDR-2/collagen receptor discoidin domain receptor 2, signaling through Ras to strengthen an adhesion between two cell types in C. elegans. This connection needs to be strong and robust to withstand the pressure of the numerous eggs that pass through the uterus. The major strengths of this paper are in crisply designed and clear genetic experiments, beautiful imaging, and well supported conclusions. I find very few weaknesses, although, perhaps the evidence that DDR-2 promotes utse-seam linkage through regulation of MMPs could be stronger. This work is impactful because it shows how cells in vivo make and strengthen a connection between tissues through ECM interactions involving collaboration between discoidin and integrin.

      We appreciate the reviewer’s assessment of the impact of our work in detailing a mechanism for how cells increase their adhesion to the ECM to establish connections between adjacent tissues. We have softened the interpretation of our MMP localization data to address the reviewer’s concern (detailed below).

      Reviewer #1 (Recommendations For The Authors):

      Regarding Figure 1D, is it possible to show when the BM forms on the cartoons more clearly (something like the 3rd section of Fig 3A)? I can see it in the timeline but it's hard to follow in the diagrams.

      We agree with the reviewer that we could show when the BM-BM connecting matrix forms more clearly in Figure 1D. Hemicentin and fibulin, the earliest components of the connecting matrix, are detected at very low levels at the utse-seam connection during the mid-L4 stage and are more prominently localized by the mid-to-late L4 stage (Gianakas et al., 2023 J Cell Biol PMID: 36282214). For this reason, we only show the connecting matrix in yellow from the mid-to-late L4 stages onward. We have now made the BM-BM connection more prominent in the figure 1D cartoons with boxed outlines (similar to Figure 3A as the reviewer suggested). We also added a label for the time window when the BM-BM connection forms.

      Regarding the RNAi induced prolapse phenotype, looking at 2B, it appears that between 5% and 10% of animals have uterine prolapse when fed control RNAi. Is this correct, it seems very high? This prolapse in control animals was not observed other RNAi experiments such as Figure 5C.

      We thank the reviewer for pointing this out. For Figure 2B, the control used was wild-type N2 animals fed with OP50 E. coli bacteria, rather than HT115 bacteria carrying the L4440 empty vector (control RNAi). This is because the main comparisons were to five ddr-1 and ddr-2 mutant strains. We did notice a slightly higher baseline uterine prolapse frequency (5% on average, detailed in Figure 2—Source data 1) in wild-type animals fed OP50 bacteria, compared to HT115 bacteria fed animals (approximately 1-2% on average). It is possible this could be linked to the nutritional differences in the two bacterial strains. However, we are confident of our data in Figure 2B as we carried out 3 independent trials, and the uterine prolapse frequencies in ddr-1 mutant animals matched the baseline in wild-type animals, while the frequencies for ddr-2 mutants were all increased over the baseline in all trials (as detailed in Figure 2—Source data 1).

      Relating to the point above, in reading the methods to try to understand how they did the RNAi, I noticed that they measure prolapse continually over five days. I didn't realize it takes a long time to occur. I think they should explain this in the text and in the figures. Reading the manuscript I thought prolapse occurred as soon as mutant animals began laying eggs. In the text they should explain this when they first assay the phenotype (page 7), and for figures the Y axis on the graphs could say "% uterine prolapse after 5 days."

      We thank the reviewer for their suggestions. We did not articulate clearly that the utse-seam connection is able to withstand some mechanical stress, even when key components are lost. It’s only over time and repeated use that the connection breaks down. This is likely because a number of components contribute to the connection and as we have shown previously, there is feedback, such that when one components is reduced, such as collagen, hemicentin is increased in levels at the BM-BM connection. Since ruptures arising from utse-seam detachments typically occur sometime after the onset

      of egg-laying, we screened the entire egg-laying period (days two to five post-L1) as described in Gianakas et al. 2023. We have now incorporated these points in the text and figures as follows:

      In the introduction, we clarified that utse-seam BM-BM connection breaksdown over time, by adding (lines titi-105): “Hemicentin promotes the recruitment of type IV collagen, which accumulates at high levels at the BM-BM tissue connection and strengthens the adhesion, allowing it to resist the strong mechanical forces of egg-laying. The utse-seam connection is robust, with each component of the tissue- spanning matrix contributing to the BM-BM connection (Gianakas, Keeley et al. 2023). This likely accounts for the ability of the utse-seam connection to initially resist mechanical forces after loss of any one of these components, delaying the uterine prolapse phenotype until sometime after the initiation of egg-laying.”

      We expanded the results text when we first describe the Rup phenotype (lines 183-184): “We first screened for the Rup phenotype caused by uterine prolapse, observing animals every day during the egg-laying period, from its onset (48 h post-L1) to end (120 h) (Methods)”.

      We provided more detail in the Methods section (lines 784-7ti0): “Uterine prolapse frequency was assessed as described previously (Gianakas et al 2023). Briefly, synchronized L1 larvae were plated (~20 animals per plate) and after 24 h, the exact number of worms on each plate was recorded. Plates were then visually screened for ruptured worms (uterine prolapse) every 24 h during egg-laying (between 48 h to 120 h post-L1). We chose to examine the entire egg-laying period as ruptures arising from utse-seam detachments do not usually occur at the onset of egg-laying, but after cycles of egg-laying that place repeated mechanical stress on the utse-seam connection (Gianakas et al 2023).”

      Finally, we modified the Y-axes of graphs in Figure 2B and 5C and the respective figure legends as suggested by the reviewer.

      Then I went back and compared to the previous publication (Gianakas, 2023). I would be interested to see a time course of how many animals prolapse after 1 day, 2 days, etc.? Is this consistent with their data on hemicentrin?

      We agree with the reviewer that a time course of uterine prolapse would be interesting as we saw ruptures occur throughout the egg-laying period. However, for the hemicentin knockdown experiments in Gianakas et al. 2023 as well as the experiments in this study, we recorded only the pooled number of animals with ruptures at the end of the experimental window. In future studies we will also record the uterine prolapse frequencies on each day to generate time courses that will provide more insight into the function of proteins at the utse-seam connection.

      Lines 183-184: I'm not sure what it means to say "trended towards displaying a significant Rup phenotype?" Since the difference was not statistically significant, it would be better to say something like "increased but not statistically significant."

      We agree with the reviewer and have now modified this sentence (lines 190-193): “Animals carrying the ddr-2(ok574) allele, which deletes a portion of the intracellular kinase domain (Unsoeld, Park et al. 2013),also showed an increased frequency of the Rup phenotype compared to wild-type animals, although this difference was not statistically significant (Figure 2A and B)”.

      Line 186: 'penetrant' needs a qualifier to indicate the magnitude of the proportion of individuals with the phenotype.

      As we provide the Rup frequency numbers in Figure 2—Source data 1, we modified the sentence as follows (lines 1ti3-1ti5): “We further generated a full-length ddr-2 deletion allele, ddr-2(qy64), and confirmed that complete loss of ddr-2 led to a significant uterine prolapse defect (Figure 2A and B).”

      Lines 206-208; could the mounting/imaging procedure (which I assume requires squeezing the worm between agarose pad and coverslip) alter the occurrence of prolapse? I would think prolapse would occur more frequently under these conditions as compared to worms laying eggs on a plate.

      The reviewer brings up an important concern. The mounting and imaging procedure does require placing the worm between an agarose pad and a coverslip. However, this did not alter the occurrence of uterine prolapse in this experiment. We were careful to perform the same procedure on both wild-type and ddr- 2(qy64) animals to control for this. As detailed in the manuscript, none of the eight wild-type animals we mounted underwent uterine prolapse after recovery off the coverslip, and among the ddr-2(qy64) mutants we mounted, only the ones that exhibited utse-seam detachments went on to rupture later.

      We articulated these points more clearly by modifying lines 214-216 as follows: “Wild-type and ddr- 2(qy64) animals were mounted and imaged at the L4 larval stage for utse-seam attachment defects, recovered, and tracked to the 72-hour adult stage, where they were examined for the Rup phenotype.”

      In seam cells you can see that DDR-2:mNG is present at membranes from early to mid L4, which makes sense. But I cannot see it on the membrane at any time point in the utse. Perhaps it is obscured by the yellow dotted line. Should it be visible on utse membranes before it is endocytosed?

      The reviewer raises an interesting question. We think it is likely that DDR-2 is initially on the membrane of the utse like it is on the seam cells. However, we have not observed this, possibly due to the complex shape and thin membrane extensions of the utse. We are unable even to detect clear membrane enrichment of membrane markers in the utse (for example, compare the utse and seam membrane markers in Figure 3B). Thus, we refrained from speculating on DDR-2 utse membrane localization in the manuscript, and instead focused on the pattern of vesicular DDR-2 peaking at the late L4 stage, which was clearly visible in both the utse and seam cells.

      Sup Fig 3A - please show quantification of seam cells not contacting utse at the same Y-axis scale as for regions that do contact utse.

      We have modified the Y-axis scale for the quantification of the seam region not contacting the utse.

      Figure 4A - I don't see a difference between WT and ok574 - what am I missing?

      In the representative ok574 animal shown, a portion of the utse arm on the top right is detached from the seam. To make this phenotype clearer, we have recropped the image panels, readjusted the brightness and contrast of the utse and the seam, and redrawn the outline of the detachment to make this clearer.

      Figure 4C+D, and lines 296-298: I'd bet that both are needed to recruit DDR-2 to membranes. But him-4 has a more severe phenotype because the RNAi knockdown is much more effective (perhaps b/c they are using the newer t444t vector).

      We agree with the reviewer that the him-4 knockdown phenotype is likely more severe than emb-9 knockdown. Type IV collagen at the utse-seam connection is very stable compared to hemicentin (Gianakas et al 2023, J Cell Biol PMID: 36282214, see Fig. 5C), which could explain the lower knockdown efficiency.

      We modified our interpretation of the data in the text as follows (lines 308-312): “In addition, we did not detect DDR-2 at the cell surface, suggesting that hemicentin has a role in recruiting DDR-2 to the site of utse-seam attachment. It is possible that collagen could also function in DDR-2 recruitment, but we could not assess this definitively due to the lower knockdown efficiency of emb-9 RNAi (Figure 4—figure supplement 1A).”

      Reviewer #2 (Recommendations For The Authors):

      Line 218 DDR-2 (typo)

      We have corrected this typo.

      Evidence (line 344-348) may not be strong enough to say whether or not DDR-2 promotes utse- seam linkage through regulation of MMPs.

      We agree with the reviewer and have softened our conclusions as follows (lines 356-363): “The C. elegans genome harbors six MMP genes, named zinc metalloproteinase 1-6 (zmp-1-6) (Altincicek, Fischer et al. 2010). We examined four available reporters of ZMP localization (ZMP-1::tiFP, ZMP-2::tiFP, ZMP-3::tiFP, and ZMP-4::tiFP) (Kelley, Chi et al. 201ti).Only ZMP-4 was detected at the utse-seam connection and its localization was not altered by knockdown of ddr-2 (Figure 5—figure supplement 1F). These observations suggest that DDR-2 does not promote utse-seam linkage through regulation of MMPs, although we cannot rule out roles for DDR-2 in promoting the expression or localization of ZMP-5 or ZMP-6.”

      The authors show the critical period is in late L4, however, is the signaling needed later too? For example, is the linkage strengthening moderated by DDR-2 important as more eggs accumulate?

      The reviewer raises an interesting question. We observed that the vesicular localization of DDR-2 sharply declined before the onset of egg-laying. By young adulthood, very few punctate structures of DDR-2 were observed in the seam cells, and none in the utse (Figure 3B). Furthermore, the frequency of utse- seam detachments in ddr-2 mutant animals peaked by the late L4 stage and did not increase after this time, suggesting DDR function is no longer required after the late L4 stage (Figure 2D). Thus, we believe that DDR-2 signaling strengthens tissue linkage only during the early formation of the utse-seam connection between the mid and late L4 stage.

      We incorporated these points in the discussion (lines 477-485): “Through analysis of genetic mutations in the C. elegans receptor tyrosine kinase (RTK) DDR-2, an ortholog to the two vertebrate DDR receptors (DDR1 & DDR2) (Unsoeld, Park et al. 2013), we discovered that loss of ddr-2 results in utse-seam detachment beginning at the mid L4 stage. The frequency of detachments in ddr-2 mutant animals peaked around the late L4 stage and did not increase after this time. This correlated with the levels of DDR-2::mNG at the utse-seam connection, which peaked at the late L4 stage and then sharply declined by adulthood. Together, these findings suggest that DDR-2 promotes utse-seam attachment in the early formation of the tissue connection between the mid and late L4 stage.”

      Fig. 3B is the fluorescence quantification normalized to the area?

      Yes, it is. We used mean fluorescence intensity for all fluorescence quantifications to normalize for the area where the signal was measured. We added a line in Methods to emphasize this (lines 73ti-740): “We measured mean fluorescence intensity for all quantifications in order to account for linescan area.”

      Fig. 4B a statistical assessment of the degree of co-localization of DDR-2::mNG and the endosomal markers might be a nice addition.

      We believe the reviewer is referring to Figure 3—figure supplement 1B. We have now added the statistical assessment of the degree of co-localization of DDR-2::mNG and the endosomal markers.

      We want to sincerely thank the two reviewers for their thoughtful comments and suggestions. The changes we have made in response to these comments have substantially improved the manuscript.

    1. Author Response:

      The following is the authors' response to the original reviews.

      We appreciate the in depth review of our manuscript, and the excellent suggestions from the two reviewers. We have addressed all concerns as described in the point by point response below. We have also added all of these changes to a revised version posted to biorxiv on May 23rd 2023 (BIORXIV/2023/536585).

      Reviewer #1 (Recommendations For The Authors):

      It is sometimes difficult to connect the rationalizations behind the transitions between NB7 binding interaction, the compare/contrast of p84 and p101 effectors, and the synergy with phosphorylation. More explanation of the rationalizations behind these transitions in the Results would be helpful.

      We agree that the manuscript would benefit from better transitions between the sections. We have added a new paragraph in the final section describing the nanobody structure before the helical domain phosphorylation that fully describes the rationale for how both inform on the critical role of helical domain dynamics in kinase activity. This paragraph is shown below.

      ‘The interface of NB7 with p110_g _is distant from both the putative membrane binding surface, as well as the catalytic machinery of the kinase domain. To further understand how this nanobody could so potently inhibit PI3K activity we examined any other potential modulators of PI3K activity localised in this region. There are two regulatory phosphorylation sites in the helical (Walser et al., 2013) and kinase domain (Perino et al., 2011) localised at the NB7 interface. This is intriguing as helical domain phosphorylation is activating, and kinase domain phosphorylation is inhibitory. This suggested a critical role in the regulation of p110_g _is the dynamics of this kinase-helical interface. To fully define the role of NB7 in altering the dynamics of the helical domain we needed to study other modulators of helical domain dynamics.’

      The Methods section would benefit from careful copy editing for clarity and consistency.

      We have gone through the methods section and edited for clarity and consistency throughout.

      There's a minor ambiguity throughout when referring to the phosphorylation of S594/S595. Although close inspection makes it clear that this refers to the monophosphorylation of either site S594 or S595, there are several references to "S594/S595" that could be interpreted as phosphorylation of both residues.

      We agree that this was ambiguous in the original text. We have added an explicit statement describing this as a single phosphorylation event.

      ‘The modification at this site results in a single phosphorylation event , but due to CID MS/MS fragmentation we cannot determine which site is modified, and will be described as S594/S595 throughout the manuscript.’

      In Figure 2B, the authors show the cryo-EM density map and the structural model based on this map. It would be helpful to also include an image of the structural model fit in the density map to allow readers to evaluate the quality of the map and model. The 2F panel provides an important view of this fit, but CD3 models are difficult to discern.

      We agree that this would help interpret model quality. We have added a new supplemental figure showing the fit of both p110 and NB7 into this Cryo EM density (see new Fig S2).

      Paragraph starting at line 258: The shift to monitoring ATPase activity is confusing here. ATPase activity indicates production of ADP + phosphate (rather than ADP + PIP3). However, an explanation is provided that states that measuring ADP production serves as a surrogate for measuring PIP3 production. The apparent absence of membrane PIP2 substrate in Figure 4E (left) suggests that there is a true ATPase background activity in this kinase. If so, does the increase in ADP production in Fig. 4E reflect the inclusion of PIP2 substrate, increased background ATPase activity, or both?

      We agree that this was worded confusingly in the original version. We have now clarified exactly what we are observing in these ATPase assays. The new paragraph is appended below

      ‘To further explore the potential role of phosphorylation in mediating p110g activity, we examined the kinase activity of p110g under two conditions: basal ATP turnover, and with PIP2 containing lipid membranes. The experiments in the absence of PIP2 measure turnover of ATP into ADP and phosphate, and is a readout of basal catalytic competency.  Experiments with PIP2, measured ATP consumed in the generation of PIP3, as well as in non-productive ATP turnover. The p110g enzyme in the absence of stimulators is very weakly active towards PIP2 substrate with only ~2 fold increased ATP turnover compared to in the absence of membranes. This is consistent with very weak membrane recruitment of p110g complexes in the absence of lipid activators (Rathinaswamy et al., 2023). PKCb-mediated phosphorylation enhanced the ATPase activity of p110g ~2-fold in both the absence and presence of membranes (Fig. 4E). This suggests that the effect of phosphorylation is to change the intrinsic catalytic efficiency of phosphorylated p110g, with limited effect on membrane binding.’

      In the section "Nanobody blocks p110gamma phosphorylation," it's not entirely convincing that "the presence of NB7 showed even lower phosphorylation than p110gamma-p101." This does not seem to be subject to a significance test in Figure 5A/B. The follow-up point about "complete abrogation of phosphorylation," however, is readily apparent.

      We agree that we could have been more precise with our language, as this is not a complete block of phosphorylation, it is merely a significant decrease in phosphorylation. We have removed the comparison to p110-p101, and also removed the statement about complete abrogation of phosphorylation. This is now reworded to

      ‘The presence of NB7 showed a significant decrease in p110g phosphorylation at both sites (Fig. 5A-B).’

      Figure 1: Legend needs to include more detail to define the data. (1A) Representations of variance need to be clarified (e.g., replicates, error bar meaning). Consider "Normalized lipid kinase activity" as a y-axis label and expand on the activity measurement and normalization in the legend. (1B) How was error calculated? (1C) Mislabeled as 1B? Also, consider clarifying the first title highlighting the comparison to class IA PI3Ks. (1D) Typo: "Y647-p84/p110gamma." Also, would it not be more accurate to say "effect of nanobody NB7 on PI3K displacement..." for this experiment?

      We apologise for these oversights. See details on what has been changed in Fig 1A, 1B, 1C, and 1D.

      For Fig 1A we now show the data where each replicate is indicated in the graph in the absence of error bars, and have also more clearly expanded on this activity measurement in the figure legend and also stated the replicate number.

      For Fig 1B we now clearly state how the error was generated.

      For Fig 1C we have fixed the typo

      For Fig 1D, we have fixed this typo and also changed the sub-heading as suggested.

      New figure legend is below as well

      Figure 1. The inhibitory nanobody NB7 binds tightly to all p110γ complexes and inhibits kinase activity, but does not prevent membrane binding

      A. Cartoon schematic depicting nanobody inhibition of activation by lipidated Gβγ (1.5 µM final concentration). Lipid kinase assays show a potent inhibition of lipid kinase activity with increasing concentrations of NB7 (3-3000 nM) for the different complexes. Experiments are carried out in triplicate (n=3) with each replicate shown. The y-axis shows lipid kinase activity normalised for each complex activated by Gβγ in the absence of nanobody. Concentrations of each protein were selected to give a lipid kinase value in the detectable range of the ATPase transcreener assay. The protein concentration of p110γ (300 nM), p110γ-p84 (330 nM) and p110γ-p101 (12 nM) was different due to intrinsic differences of each complex to be activated by lipidated Gβγ, and is likely mainly dependent for the difference seen in NB7 response.

      B. Association and dissociation curves for the dose response of His-NB7 binding to p110γ, p110γ-p84 and p110γ-p101 (50 – 1.9 nM) is shown. A cartoon schematic of BLI analysis of the binding of immobilized His-NB7 to p110γ is shown on the left. Dissociation constants (KD) were calculated based on a global fit to a 1:1 model for the top three concentrations and averaged with error shown. Error was calculated from the association and dissociation value (n=3) with standard deviation shown. Full details are present in the source data.

      C. Association and dissociation curves for His-NB7 binding to p110γ, p110a-p85a, p110b-p85b, and p110d-p85b. Experiments were performed in duplicate with a final concentration of 50 nM of each class I PI3K complex.

      D. Effect of NB7 on PI3K recruitment to supported lipid bilayers containing H-Ras(GTP) and farnesyl-Gbg as measured by Total Internal Reflection Fluorescence Microscopy (TIRF-M). DY647-p84/p110g displays rapid equilibration kinetics and is insensitive to the addition of 500 nM nanobody (black arrow, 250 sec) on supported lipid bilayers containing H-Ras(GTP) and farnesyl-Gbg.

      E. Kinetics of 50 nM DY647-p84/p110g membrane recruitment appears indistinguishable in the absence and presence of nanobody. Prior to sample injection, DY647-p84/p110g was incubated for 10 minutes with 500 nM nanobody.

      F. Representative TIRF-M images showing the localization of 50 nM DY647-p84/p110g visualized in the absence or presence of 500 nM nanobody (+NB7). Membrane composition for panels C-E: 93% DOPC, 5% DOPS, 2% MCC-PE, Ras(GTP) covalently attached to MCC-PE, and 200 nM farnesyl-Gbg.

      Figure 2: (1A) For consistency with the rest of the paper, p110g can be updated with the Greek character. (1B) This may have been intentional to attract attention to subdomains interacting with NB7, but "colored according to the schematic" omits the purple RBD. (2F) the figure legend should specify whether p110gamma surfaces depicted are the cryo-EM density or a surface rendition of the structural model.

      We agree and have fixed the p110 typo, and have also colored the schematic the same as shown in the cartoon model.

      The data shown in Fig 1B is indeed the Cryo EM density and this is now clearly indicated in the legend.

      Figure 3: (3B) Specifying the [M+H] as [M+2H]2+ and [M+4H]4+ would help the reader understand the delta mass for monophosphorylation here. Given the broad readership of this journal, it would be useful to define 't' and 'e' as 'theoretical' and 'experimental' in the legend. It may also help to be explicit about the meaning of the red spectra and residues in the legend. (3C-E) autocorrect typo for "(C)" and an opportunity to update "b" for Greek character beta.

      We agree that clearly defining the charge state of each spectra will make it more obvious that we are dealing with a mono-phosphorylation and have made this change as suggested in the figure. We have also clearly define m/z t and m/z e in the figure legend, as well as the black and red lines, and characters. Finally we have added PKCb for all descriptions of PKC treatment in the figure, and fixed the incorrect PKC’b’ in the legend.

      Figure 4: (4C) Given the common use of "ND" for other terms, it would be useful to spell out "no deuterium" or "undeuterated." (4E) the parenthetical "(concentration, 12nM to 1000nM)" could be clarified. How are the (presumably p100gamma) concentration ranges reflected in the three plotted data points per treatment? See also Figure 5E.

      We agree and have redefined ND as undeuterated. We apologise for the typo in the figure legend, as the concentrations of p110 gamma were the same for both phosphorylated and non-phosphorylated, with this being a typo (all concentrations of enzyme were 1000 nM). This has been changed here and in Fig 5E.

      Figure 5: (5A/B) Some clarification that we're looking at extracted ion chromatograms would be very useful in this figure legend. On a related note, the experimental details on the LC-MS methodology for this data appear to be split between two sections of the Methods: the "Phosphorylation analysis" paragraph (line 526) and the HDX-MS section. Some more explicit cross-referencing would clarify this experiment. (5E) Clarify inclusion of PIP2 membranes here.

      We have clearly described that we are looking at extracted ion chromatograms in both panel A and B. We also have normalized the experimental methods in the LC-MS as these used exactly the same procedure. Finally, we now clearly describe the assays shown in Fig 5E were performed in the absence of PIP2 membranes.

      Miscellaneous typos:<br /> Line 205: reference omitted for "Previous study.."

      We have added this reference

      Line 196: "unambiguous"

      Fixed to unambiguously

      Reviewer #2 (Recommendations For The Authors):

      The only mistake I spotted was that on line 729 there is a reference to Fig 3C that should actually be Fig. 4C

      We have changed this to the correct Fig 4C.

    1. Author Response

      eLife assessment

      In this valuable study, the authors investigate the mechanism of amyloid nucleation in a cellular system using their novel ratiometric measurements and uncover interesting insights regarding the role of polyglutamine length and the sequence features of glutamine-rich regions on amyloid formation. Overall, the problem is significant and being able to assess nucleation in cells is of considerable relevance. The data, as presented and analyzed, are currently still incomplete. The specific claims would be stronger if based on in vitro measurements that avoid the intricacies of specific cellular systems and that are more suitable for assessing sequence-intrinsic properties.

      We are pleased that the editors find our study valuable. We find that the reviewers’ criticisms largely arise from misunderstandings inherent to the conceptually challenging nature of the topic, rather than fundamental flaws, as we will elaborate here. We are grateful for the opportunity afforded by eLife to engage reviewers in a constructive public dialogue.

      Reviewer #1 (Public Review):

      The authors take on the challenge of defining the core nucleus for amyloid formation by polyglutamine tracts. This rests on the assertion that polyQ forms amyloid structures to the exclusion of all other forms of solids. Using their unique assay, deployed in yeast, the authors attempt to infer the size of the nucleus that templates amyloid formation by polyQ. Further, through a series of sequence titrations, all studied using a single type of assay, the authors converge on an assertion stating that a single polyQ molecule is the nucleus for amyloid formation, that 12-residues make up the core of the nucleus, that it takes ca. 60 Qs in a row to unmask this nucleation potential, and that polyQ amyloid formation belongs to the same universality class as self-poisoned crystallization, which is the hallmark of crystallization from polymer melts formed by large, high molecular weight synthetic polymers. Unfortunately, the authors have decided to lean in hard on their assertions without a critical assessment of whether their findings stand up to scrutiny. If their findings are truly an intrinsic property of polyQ molecules, then their findings should be reconstituted in vitro. Unfortunately, careful and rigorous experiments in vitro show that there is a threshold concentration for forming fibrillar solids. This threshold concentration depends on the flanking sequence context on temperature and on solution conditions. The existence of a threshold concentration defies the expectation of a monomer nucleus. The findings disagree with in vitro data presented by Crick et al., and ignored by the authors. Please see: https://doi.org/10.1073/pnas.1320626110. These reports present data from very different assays, the importance of which was underscored first by Regina Murphy and colleagues. The work of Crick et al., provides a detailed thermodynamic framework - see the SI Appendix. This framework dove tails with theory and simulations of Zhang and Muthukumar, which explains exactly how a system like polyQ might work (https://doi.org/10.1063/1.3050295). The picture one paints is radically different from what the authors converge upon. One is inclined to lean toward data that are gleaned using multiple methods in vitro because the test tube does not have all the confounding effects of a cellular milieu, especially when it comes to focusing on sequence-intrinsic conformational transitions of a protein. In addition to concerns about the limitations of the DAmFRET method, which based on the work of the authors in their collaborative paper by Posey et al., are being stretched to the limit, there is the real possibility that the cellular milieu, unique to the system being studied, is enabling transitions that are not necessarily intrinsic to the sequence alone. A nod in this direction is the work of Marc Diamond, which showed that having stabilized the amyloid form of Tau through coacervation, there is a large barrier that limits the loss of amyloid-like structure for Tau. There may well be something similar going on with the polyQ system. If the authors could show that their data are achievable in vitro without anything but physiological buffers one would have more confidence in a model that appears to contradict basic physical principles of how homopolymers self-assemble. Absent such additional evidence, numerous statements seem to be too strong. There are also several claims that are difficult to understand or appreciate.

      Rebuttal to the perceived necessity of in vitro experiments

      The overarching concern of this reviewer and reviewing editor is whether in-cell assays can inform on sequence-intrinsic properties. We understand this concern. We believe however that the relative merit of in-cell assays is largely a matter of perspective. The truly sequence-intrinsic behavior of polyQ, i.e. in a vacuum, is less informative than the “sequence-intrinsic” behaviors of interest that emerge in the presence of extraneous molecules from the appropriate biological context. In vitro experiments typically include a tiny number of these -- water, ions, and sometimes a crowding agent meant to approximate everything else. Obviously missing are the myriad quinary interactions with other proteins that collectively round out the physiological solvent. The question is what experimental context best approximates that of a living human neuron under which the pathological sequence-dependent properties of polyQ manifest. We submit that a living yeast cell comes closer to that ideal than does buffer in a test tube.

      The reviewer’s statements that our findings must be validated in vitro ignores the fact -- stressed in our introduction -- that decades of in vitro work have not yet generated definitive evidence for or against any specific nucleus model. In addition to the above, one major problem concerns the large sizes of in vitro systems that obscure the effects of primary nucleation. For example, a typical in vitro experimental volume of e.g. 1.5 ml is over one billion-fold larger than the femtoliter volume of a cell. This means that any nucleation-limited kinetics of relevant amyloid formation are lost, and any alternative amyloid polymorphs that have a kinetic growth advantage -- even if they nucleate at only a fraction the rate of relevant amyloid -- will tend to dominate the system (Buell, 2017). Novel approaches are clearly needed to address these problems. We present such an approach, stretch it to the limit (as the reviewer notes) across multiple complementary experiments, and arrive at a novel finding that is fully and uniquely consistent with all of our own data as well as the collective prior literature.

      That the preceding considerations are collectively essential to understand relevant amyloid behavior is evident from recent cryoEM studies showing that in vitro-generated amyloid structures generally differ from those in patients (Arseni et al., 2022; Bansal et al., 2021; Radamaker et al., 2021; Schmidt et al., 2019; Schweighauser et al., 2020; Yang et al., 2022). This is highly relevant to the present discourse because each amyloid structure is thought to emanate from a different nucleating structure. This means that in vitro experiments have broadly missed the mark in terms of the relevant thermodynamic parameters that govern disease onset and progression. Note that the rules laid out via our studies are not only consistent with structural features of polyQ amyloid in cells, but also (as described in the discussion) explain why the endogenous structure of a physiologically relevant Q zipper amyloid differs from that of polyQ.

      A recent collaboration between the Morimoto and Knowles groups (Sinnige et al.) investigated the kinetics of aggregation by Q40-YFP expressed in C. elegans body wall muscle cells, using quantitative approaches that have been well established for in vitro amyloid-forming systems of the type favored by the reviewer. They calculate a reaction order of just 1.6, slightly higher than what would be expected for a monomeric nucleus but nevertheless fully consistent with our own conclusions when one accounts for the following two aspects of their approach. First, the polyQ tract in their construct is flanked by short poly-Histidine tracts on both sides. These charges very likely disfavor monomeric nucleation because all possible configurations of a four-stranded bundle position the beginning and end of the Q tract in close proximity, and Q40 is only just long enough to achieve monomeric nucleation in the absence of such destabilization. Second, the protein is fused to YFP, a weak homodimer (Landgraf et al., 2012; Snapp et al., 2003). With these two considerations, our model -- which was generated from polyQ tracts lacking flanking charges or an oligomeric fusion -- predicts that amyloid nucleation by their construct will occur more frequently as a dimer than a monomer. Indeed, their observed reaction order of 1.6 supports a predominantly dimeric nucleus. Like us and others, Sinnige et al. did not observe phase separation prior to amyloid formation. This is important because it not only argues against nucleation occurring in a condensate, it also suggests that the reaction order they calculated has not been limited by the concentration-buffering effect of phase separation.

      While we agree that our conclusions rest heavily on DAmFRET data (for good reason), we do provide supporting evidence from molecular dynamics simulations, SDD-AGE, and microscopy.

      To summarize, given the extreme limitations of in vitro experiments in this field, the breadth of our current study, and supporting findings from another lab using rigorous quantitative approaches, we feel that our claims are justified without in vitro data.

      Rebuttal to the perceived incompatibility of monomeric nucleation with the existence of a critical concentration for amyloid

      We appreciate that the concept of a monomeric nucleus can superficially appear inconsistent with the fact that crystalline solids such as polyQ amyloid have a saturating concentration, but this is only true if one neglects that polyQ amyloids are polymer crystals with intramolecular ordering. The perceived discrepancy is perhaps most easily dispelled by protein crystallography. Folded proteins form crystals. These crystals have critical concentrations, and the protein subunits within them each have intramolecular crystalline order (in the form of secondary structure). To extrapolate these familiar examples to our present finding with polyQ, one need only appreciate the now well-established phenomenon of secondary nucleation, whereby transient interactions of soluble species with the ordered species leads to their own ordering (Törnquist et al., 2018). Transience is important here because it implies that intramolecular ordering can in principle propagate even in solutions that are subsaturated with respect to bulk crystallization. This is possible in the present case because the pairing of sufficiently short beta strands (equivalent to “stems” in the polymer crystal literature) will be more stable intramolecularly than intermolecularly, due to the reduced entropic penalty of the former. Our elucidation that Q zipper ordering can occur with shorter strands intramolecularly than intermolecularly (Fig. S4C-D) demonstrates this fact. It is also evident from published descriptions of single molecule “crystals” formed in sufficiently dilute solutions of sufficiently long polymers (Hong et al., 2015; Keller, 1957; Lauritzen and Hoffman, 1960).

      In suggesting that a saturating concentration for amyloid rules out monomeric nucleation, the reviewer assumes that the Q zipper-containing monomer must be stable relative to the disordered ensemble. This is not inherent to our claim and in fact opposes the definition of a nucleus. The monomeric nucleating structure need not be more stable than the disordered state, and monomers may very well be disordered at equilibrium at low concentrations. To be clear, our claim requires that the Q zipper-containing monomer is both on pathway to amyloid and less stable than all subsequent species that are on pathway to amyloid. The former requirement is supported by our extensive mutational analysis. The latter requirement is supported by our atomistic simulations showing the Q zipper-containing monomer is stabilized by dimerization (see our 2021 preprint). Hence, requisite ordering in the nucleating monomer is stabilized by intermolecular interactions. We provide in Author response image 1 an illustration to clarify what we believe to be the discrepancy between our claim and the reviewer’s interpretation.

      Author response image 1.

      That the rate-limiting fluctuation for a crystalline phase can occur in a monomer can also be understood as a consequence of Ostwald’s rule of stages, which describes the general tendency of supersaturated solutes, including amyloid forming proteins (Chakraborty et al., 2023), to populate metastable phases en route to more stable phases (De Yoreo, 2022; Schmelzer and Abyzov, 2017). Our findings with polyQ are consistent with a general mechanism for Ostwald’s rule wherein the relative stabilities of competing polymorphs differ with the number of subunits (De Yoreo, 2022; Navrotsky, 2004). As illustrated in Fig. 6 of Navrotsky, a polymorph that is relatively stable at small particle sizes tends to give way to a polymorph that -- while initially unstable -- becomes more stable as the particles grow. The former is analogous to our early stage Q zipper composed of two short sheets with an intramolecular interface, while the latter is analogous to the later stage Q zipper composed of longer sheets with an intermolecular interface. Subunit addition stabilizes the latter more than the former, hence the initial Q zipper that is stabilized more by intra- than intermolecular interactions will mature with growth to one that is stabilized more by intermolecular interactions.

      We apologize to the Pappu group for neglecting to cite Crick et al. 2013 in the current preprint. Contrary to the reviewer’s assessment, however, we find that the conclusions of this valuable study do more to support than to refute our findings. Briefly, Crick et al. investigated the aggregation of synthetic Q30 and Q40 peptides in vitro, wherein fibrils assembled from high concentrations of peptide were demonstrated to have saturating concentrations in the low micromolar range. As explained above, this finding of a saturating concentration does not refute our results. More relevant to the present work are their findings that “oligomers” accumulated over an hours-long timespan in solutions that are subsaturated with respect to fibrils, and these oligomers themselves have (nanomolar) critical concentrations. The authors postulated that the oligomers result from liquid–liquid demixing of intrinsically disordered polyglutamine. However, phase separation by a peptide is expected to fix its concentration in both the solute and condensed phases, and, because disordered phase separation is inherently faster than amyloid formation, the postulated explanation removes the driving force for any amyloid phase with a critical solubility greater than that of the oligomers. In place of this interpretation that truly does appear to -- in the reviewer’s words -- “contradict basic physical principles of how homopolymers self-assemble”, we interpret these oligomers as evidence of our Q zipper-containing self-poisoned multimers, rounded as an inherent consequence of self-poisoning (Ungar et al., 2005), and likely akin to semicrystalline spherulites that have been observed in other polymer crystal and amyloid-forming systems (Crist and Schultz, 2016; Vetri and Foderà, 2015). That Crick et al. also observed the formation of a relatively labile amyloid phase when the reactions were started with 50 uM peptide is unsurprising in light of the aforementioned kinetic advantage that large reaction volumes can confer to labile polymorphs, and that high concentrations (in this case, orders of magnitude higher than the likely physiological concentration of polyQ (Wild et al., 2015)) can favor the formation of labile amyloid polymorphs (Ohhashi et al., 2010). Indeed, a contemporaneous study by the Wetzel group using very similar peptide constructs and polyQ lengths -- but beginning with lower concentrations -- found that the relevant saturating concentrations for amyloid lie below their limit of detection of 100 nM (Sahoo et al., 2014).

      Rebuttals to other critiques

      The reviewer states that we found nucleation potential to require 60 Qs in a row. Our data are collectively consistent with nucleation occurring at and above approximately 36 Qs, a point repeated in the paper. The reviewer may be referring to our statement, ”Sixty residues proved to be the optimum length to observe both the pre- and post-nucleated states of polyQ in single experiments”. The purpose of this statement is simply to describe the practical consideration that led us to use 60 Qs for the bulk of our assays. We do appreciate that the fraction of AmFRET-positive cells is very low for lengths just above the threshold, especially Q40. They are nevertheless highly significant (p = 0.004 in [PIN+] cells, one-tailed T-test), and we will modify the figure and text to clarify this.

      The reviewer characterizes self-poisoning as the hallmark of crystallization from polymer melts, which would be problematic for our conclusions if self-poisoning were limited to this non-physiological context. In fact the term was first used to describe crystallization from solution (Organ et al., 1989), wherein the phenomenon is more pronounced (Ungar et al., 2005).

      Reviewer #2 (Public Review):

      Numerous neurodegenerative diseases are thought to be driven by the aggregation of proteins into insoluble filaments known as "amyloids". Despite decades of research, the mechanism by which proteins convert from the soluble to insoluble state is poorly understood. In particular, the initial nucleation step is has proven especially elusive to both experiments and simulation. This is because the critical nucleus is thermodynamically unstable, and therefore, occurs too infrequently to directly observe. Furthermore, after nucleation much faster processes like growth and secondary nucleation dominate the kinetics, which makes it difficult to isolate the effects of the initial nucleation event. In this work Kandola et al. attempt to surmount these obstacles using individual yeast cells as microscopic reaction vessels. The large number of cells, and their small size, provides the statistics to separate the cells into pre- and post-nucleation populations, allowing them to obtain nucleation rates under physiological conditions. By systematically introducing mutations into the amyloid-forming polyglutamine core of huntingtin protein, they deduce the probable structure of the amyloid nucleus. This work shows that, despite the complexity of the cellular environment, the seemingly random effects of mutations can be understood with a relatively simple physical model. Furthermore, their model shows how amyloid nucleation and growth differ in significant ways, which provides testable hypotheses for probing how different steps in the aggregation pathway may lead to neurotoxicity.

      In this study Kandola et al. probe the nucleation barrier by observing a bimodal distribution of cells that contain aggregates; the cells containing aggregates have had a stochastic fluctuation allowing the proteins to surmount the barrier, while those without aggregates have yet to have a fluctuation of suitable size. The authors confirm this interpretation with the selective manipulation of the PIN gene, which provides an amyloid template that allows the system to skip the nucleation event.

      In simple systems lacking internal degrees of freedom (i.e., colloids or rigid molecules) the nucleation barrier comes from a significant entropic cost that comes from bringing molecules together. In large aggregates this entropic cost is balanced by attractive interactions between the particles, but small clusters are unable to form the extensive network of stabilizing contacts present in the larger aggregates. Therefore, the initial steps in nucleation incur an entropic cost without compensating attractive interactions (this imbalance can be described as a surface tension). When internal degrees of freedom are present, such as the conformational states of a polypeptide chain, there is an additional contribution to the barrier coming from the loss of conformational entropy required to the adopt aggregation-prone state(s). In such systems the clustering and conformational processes do not necessarily coincide, and a major challenge studying nucleation is to separate out these two contributions to the free energy barrier. Surprisingly, Kandola et al. find that the critical nucleus occurs within a single molecule. This means that the largest contribution to the barrier comes from the conformational entropy cost of adopting the beta-sheet state. Once this state is attained, additional molecules can be recruited with a much lower free energy barrier.

      There are several caveats that come with this result. First, the height of the nucleation barrier(s) comes from the relative strength of the entropic costs compared to the binding affinities. This balance determines how large a nascent nucleus must grow before it can form interactions comparable to a mature aggregate. In amyloid nuclei the first three beta strands form immature contacts consisting of either side chain or backbone contacts, whereas the fourth strand is the first that is able to form both kinds of contacts (as in a mature fibril). This study used relatively long polypeptides of 60 amino acids. This is greater than the 20-40 amino acids found in amyloid-forming molecules like ABeta or IAPP. As a result, Kandola et al.'s molecules are able to fold enough times to create four beta strands and generate mature contacts intramolecularly. The authors make the plausible claim that these intramolecular folds explain the well-known length threshold (L~35) observed in polyQ diseases. The intramolecular folds reduce the importance of clustering multiple molecules together and increase the importance of the conformational states. Similarly, manipulating the sequence or molecular concentrations will be expected to manipulate the relative magnitude of the binding affinities and the clustering entropy, which will shift the relative heights of the entropic barriers.

      The reviewer correctly notes that the majority of our manipulations were conducted with 60-residue long tracts (which corresponds to disease onset in early adulthood), and this length facilitates intramolecular nucleation. However, we also analyzed a length series of polyQ spanning the pathological threshold, as well as a synthetic sequence designed explicitly to test the model nucleus structure with a tract shorter than the pathological threshold, and both experiments corroborate our findings.

      The authors make an important point that the structure of the nucleus does not necessarily resemble that of the mature fibril. They find that the critical nucleus has a serpentine structure that is required by the need to form four beta strands to get the first mature contacts. However, this structure comes at a cost because residues in the hairpins cannot form strong backbone or zipper interactions. Mature fibrils offer a beta sheet template that allows incoming molecules to form mature contacts immediately. Thus, it is expected that the role of the serpentine nucleus is to template a more extended beta sheet structure that is found in mature fibrils.

      A second caveat of this work is the striking homogeneity of the nucleus structure they describe. This homogeneity is likely to be somewhat illusory. Homopolymers, like polyglutamine, have a discrete translational symmetry, which implies that the hairpins needed to form multiple beta sheets can occur at many places along the sequence. The asparagine residues introduced by the authors place limitations on where the hairpins can occur, and should be expected to increase structural homogeneity. Furthermore, the authors demonstrate that polyglutamine chains close to the minimum length of ~35 will have strict limitations on where the folds must occur in order to attain the required four beta strands.

      We are unsure how to interpret the above statements as a caveat. We agree that increasing sequence complexity will tend to increase homogeneity, but this is exactly the motivation of our approach. We explicitly set out to determine the minimal complexity sequence sufficient to specify the nucleating conformation, which we ultimately identified in terms of secondary and tertiary structure. We do not specify which parts of a long polyQ tract correspond to which parts of the structure, because, as the reviewer points out, they can occur at many places. Hence, depending on the length of the polyQ tract, the nucleus we describe may have any length of sequence connecting the strand elements. We do not think that the effects of N-residue placement can be interpreted as a confounding influence on hairpin position because the striking even-odd pattern we observe implicates the sides of beta strands rather than the lengths. Moreover, we observe this pattern regardless of the residue used (Gly, Ser, Ala, and His in addition to Asn).

      A novel result of this work is the observation of multiple concentration regimes in the nucleation rate. Specifically, they report a plateau-like regime at intermediate regimes in which the nucleation rate is insensitive to protein concentration. The authors attribute this effect to the "self-poisoning" phenomenon observed in growth of some crystals. This is a valid comparison because the homogeneity observed in NMR and crystallography structures of mature fibrils resemble a one-dimensional crystal. Furthermore, the typical elongation rate of amyloid fibrils (on the order of one molecule per second) is many orders of magnitude slower than the molecular collision rate (by factors of 10^6 or more), implying that the search for the beta-sheet state is very slow. This slow conformational search implies the presence of deep kinetic traps that would be prone to poisoning phenomena. However, the observation of poisoning in nucleation during nucleation is striking, particularly in consideration of the expected disorder and concentration sensitivity of the nucleus. Kandola et al.'s structural model of an ordered, intramolecular nucleus explains why the internal states responsible for poisoning are relevant in nucleation.

      We thank the reviewer for noting the novelty and plausibility of the self-poisoning connection. We would like to elaborate on our finding that self-poisoning inhibits nucleation (in addition to elongation), as this could prove confusing to some readers. While self-poisoning is claimed to inhibit primary nucleation in the polymer crystal literature (Ungar et al., 2005; Zhang et al., 2018), the semantics of “nucleation” in this context warrants clarification. Technically, the same structure can be considered a nucleus in one context but not in another. The Q zipper monomer, even if it is rate-limiting for amyloid formation at low concentrations (and is therefore the “nucleus”), is not necessarily rate-limiting when self-poisoned at high concentrations. Whether it comprises the nucleus in this case depends on the rates of Q zipper formation relative to subunit addition to the poisoned state. If the latter happens slower than Q zipper formation de novo, it can be said that self-poisoning inhibits nucleation, regardless of whether the Q zipper formed. We suspect this to be the mechanism by which preemptive oligomerization blocks nucleation in the case of polyQ, though other mechanisms may be possible.

      To achieve these results the authors used a novel approach involving a systematic series of simple sequences. This is significant because, while individual experiments showed seemingly random behavior, the randomness resolved into clear trends with the systematic approach. These trends provided clues to build a model and guide further experiments.

      Reviewer #3 (Public Review):

      Kandola et al. explore the important and difficult question regarding the initiating event that triggers (nucleates) amyloid fibril growth in glutamine-rich domains. The researchers use a fluorescence technique that they developed, dAMFRET, in a yeast system where they can manipulate the expression level over several orders of magnitude, and they can control the length of the polyglutamine domain as well as the insertion of interfering non-glutamine residues. Using flow cytometry, they can interrogate each of these yeast 'reactors' to test for self-assembly, as detected by FRET.

      In the introduction, the authors provide a fairly thorough yet succinct review of the relevant literature into the mechanisms of polyglutamine-mediated aggregation over the last two decades. The presentation as well as the illustrations in Figure 1A and 1B are difficult to understand, and unfortunately, there is no clear description of the experimental technique that would allow the reader to connect the hypothetical illustrations to the measurement outcomes. The authors do not explain what the FRET signal specifically indicates or what its intensity is correlated to. FRET measures distance between donor and acceptor, but can it be reliably taken as an indicator of a specific beta-sheet conformation and of amyloid? Does the signal increase with both nucleation and with elongation, and is the signal intensity the same if, e.g., there were 5 aggregates of 10 monomers each versus 50 monomeric nuclei? Is there a reason why the AmFRET signal intensity decreases at longer Q even though the number of cells with positive signal increases? Does the number of positive cells increase with time? The authors state later that 'non-amyloid containing cells lacked AmFRET altogether', but this seems to be a tautology - isn't the lack of AmFRET taken as a proof of lack of amyloid? Overall, a clearer description of the experimental method and what is actually measured (and validation of the quantitative interpretation of the FRET signal) would greatly assist the reader in understanding and interpreting the data.

      We believe the difficulty in understanding the illustrations in Figure 1A and 1B is inherent to the subject. We agree that elaborating how DAmFRET works would help the reader, and will add a few sentences to this end. Beyond this, we refer the reviewer and readers to our cited prior work describing the theory and interpretation of DAmFRET. Note that the y-axes of DAmFRET plots are not raw FRET but rather “AmFRET”, a ratio of FRET to total expression level. As explained thoroughly in our cited prior work, the discontinuity of AmFRET with expression level indicates that the high AmFRET-population formed via a disorder-to-order transition. When the query protein is predicted to be intrinsically disordered, the discontinuous transition to high AmFRET invariably (among hundreds of proteins tested in prior published and unpublished work) signifies amyloid formation as corroborated by SDD-AGE and tinctorial assays.

      When performed using standard flow cytometry as in the present study, every AmFRET measurement corresponds to a cell-wide average, and hence does not directly inform on the distribution of the protein between different stoichiometric species. As there is only one fluorophore per protein molecule, monomeric nuclei have no signal. DAmFRET can distinguish cells expressing monomers from stable dimers from higher order oligomers (see e.g. Venkatesan et al. 2019), and we are therefore quite confident that AmFRET values of zero correspond to cells in which a vast majority of the respective protein is not in homo-oligomeric species (i.e. is monomeric or in hetero-complexes with endogenous proteins). The exact value of AmFRET, even for species with the same stoichiometry, will depend both on the effect of their respective geometries on the proximity of mEos3.1 fluorophores, and on the fraction of protein molecules in the species. Hence, we only attempt to interpret the plateau values of AmFRET (where the fraction of protein in an assembled state approaches unity) as directly informing on structure, as we did in Fig. S3A.

      We believe that AmFRET decreases with longer polyQ because the mass fraction of fluorophore decreases in the aggregate, simply because the extra polypeptide takes up volume in the aggregate.

      Yes, the fraction of positive cells in a discontinuous DAmFRET plot does increase with time. However, given the more laborious data collection and derivation of nucleation kinetics in a system with ongoing translation, especially across hundreds of experiments with other variables, ours is a snapshot measurement to approximately derive the relative contributions of intra- and intermolecular fluctuations to the nucleation barrier, rather than the barrier’s magnitude.

      We will revise the tautological statement by removing “non-amyloid containing”.

      The authors demonstrate that their assay shows that the fraction of cells with AmFRET signal increases strongly with an increase in polyQ length, with a 'threshold around 50-60 glutamines. This roughly correlates with the Q-length dependence of disease. The experiments in which asparagine or other amino acids are inserted at variable positions in the glutamine repeat are creative and thorough, and the data along with the simulations provide compelling support for the proposed Q zipper model. The experiments shown in Figure 5 are strongly supportive of a model where formation of the beta-sheet nucleus is within a monomer. This is a potentially important result, as there are conflicting data in the literature as to whether the nucleus in polyQ is monomer.

      We thank the reviewer for these comments. We wish to clarify one important point, however, concerning the correlation of our data with the pathological length threshold. As we state in the first results section, “Our data recapitulated the pathologic threshold -- Q lengths 35 and shorter lacked AmFRET, indicating a failure to aggregate or even appreciably oligomerize, while Q lengths 40 and longer did acquire AmFRET in a length and concentration-dependent manner”. Hence, most of our experiments were conducted with 60Q not because it resembles the pathological threshold, but rather because it was most convenient for DAmFRET experiments.

      I did not find the argument, that their data shows the Q zipper grows in two dimensions, compelling; there are more direct experimental methods to answer this question. I was also confused by the section that Q zippers poison themselves. It would be easier for the reader to follow if the authors first presented their results without interpretation. The data seem more consistent with an argument that, at high concentrations, non-structured polyQ oligomers form which interfere with elongation into structured amyloid assemblies - but such oligomers would not be zippers.

      Self-poisoning is a widely observed and heavily studied phenomenon in polymer crystal physics, though it seems not yet to have entered the lexicon of amyloid biologists. We were new to this concept before it emerged as an extremely parsimonious explanation for our results. As described in the text, two pieces of evidence exclude the alternative mechanism suggested by the reviewer -- that non-structured oligomers form and subsequently engage and inhibit the template. Specifically, 1) inhibition occurs without any detectable FRET, even at high total protein concentration, indicating the species do not form in a concentration-dependent manner that would be expected of disordered oligomers; and 2) inhibition itself has strict sequence requirements that match those of Q zippers. Hence our data collectively suggest that inhibition is a consequence of the deposition of partially ordered molecules onto the templating surface.

      Although some speculation or hypothesizing is perfectly appropriate in the discussion, overall the authors stretch this beyond what can be supported by the results. A couple of examples: The conclusion that toxicity arises from 'self-poisoned polymer crystals' is not warranted, as there is no relevant data presented in this manuscript. The authors refer to findings 'that kinetically arrested aggregates emerge from the same nucleating event responsible for amyloid formation', but I cannot recall any evidence for this statement in the results section.

      We restricted any mention of toxicity to the introduction and a section in the discussion that is not worded as conclusive. Nevertheless, we will soften the subheading and text of the relevant section in the discussion to more clearly indicate the speculative nature of the statements.

      We stand by our statement 'that kinetically arrested aggregates emerge from the same nucleating event responsible for amyloid formation', as this follows directly from self-poisoning.

      Bibliography

      Arseni D, Hasegawa M, Murzin AG, Kametani F, Arai M, Yoshida M, Ryskeldi-Falcon B. 2022. Structure of pathological TDP-43 filaments from ALS with FTLD. Nature 601:139–143. doi:10.1038/s41586-021-04199-3

      Bansal A, Schmidt M, Rennegarbe M, Haupt C, Liberta F, Stecher S, Puscalau-Girtu I, Biedermann A, Fändrich M. 2021. AA amyloid fibrils from diseased tissue are structurally different from in vitro formed SAA fibrils. Nat Commun 12:1013. doi:10.1038/s41467-021-21129-z

      Buell AK. 2017. The Nucleation of Protein Aggregates - From Crystals to Amyloid Fibrils. Int Rev Cell Mol Biol 329:187–226. doi:10.1016/bs.ircmb.2016.08.014

      Chakraborty D, Straub JE, Thirumalai D. 2023. Energy landscapes of Aβ monomers are sculpted in accordance with Ostwald’s rule of stages. Sci Adv 9:eadd6921. doi:10.1126/sciadv.add6921 Crist B, Schultz JM. 2016. Polymer spherulites: A critical review. Prog Polym Sci 56:1–63. doi:10.1016/j.progpolymsci.2015.11.006

      De Yoreo JJ. 2022. Casting a bright light on Ostwald’s rule of stages. Proc Natl Acad Sci USA 119. doi:10.1073/pnas.2121661119

      Hong Y, Yuan S, Li Z, Ke Y, Nozaki K, Miyoshi T. 2015. Three-Dimensional Conformation of Folded Polymers in Single Crystals. Phys Rev Lett 115:168301. doi:10.1103/PhysRevLett.115.168301

      Keller A. 1957. A note on single crystals in polymers: Evidence for a folded chain configuration. Philosophical Magazine 2:1171–1175. doi:10.1080/14786435708242746

      Landgraf D, Okumus B, Chien P, Baker TA, Paulsson J. 2012. Segregation of molecules at cell division reveals native protein localization. Nat Methods 9:480–482. doi:10.1038/nmeth.1955

      Lauritzen JI, Hoffman JD. 1960. Theory of Formation of Polymer Crystals with Folded Chains in Dilute Solution. J Res Natl Bur Stand A Phys Chem 64A:73–102. doi:10.6028/jres.064A.007

      Navrotsky A. 2004. Energetic clues to pathways to biomineralization: precursors, clusters, and nanoparticles. Proc Natl Acad Sci USA 101:12096–12101. doi:10.1073/pnas.0404778101

      Ohhashi Y, Ito K, Toyama BH, Weissman JS, Tanaka M. 2010. Differences in prion strain conformations result from non-native interactions in a nucleus. Nat Chem Biol 6:225–230. doi:10.1038/nchembio.306

      Organ SJ, Ungar G, Keller A. 1989. Rate minimum in solution crystallization of long paraffins. Macromolecules 22:1995–2000. doi:10.1021/ma00194a078

      Radamaker L, Baur J, Huhn S, Haupt C, Hegenbart U, Schönland S, Bansal A, Schmidt M, Fändrich M. 2021. Cryo-EM reveals structural breaks in a patient-derived amyloid fibril from systemic AL amyloidosis. Nat Commun 12:875. doi:10.1038/s41467-021-21126-2

      Sahoo B, Singer D, Kodali R, Zuchner T, Wetzel R. 2014. Aggregation behavior of chemically synthesized, full-length huntingtin exon1. Biochemistry 53:3897–3907. doi:10.1021/bi500300c

      Schmelzer JWP, Abyzov AS. 2017. How do crystals nucleate and grow: ostwald’s rule of stages and beyond In: Šesták J, Hubík P, Mareš JJ, editors. Thermal Physics and Thermal Analysis, Hot Topics in Thermal Analysis and Calorimetry. Cham: Springer International Publishing. pp. 195–211. doi:10.1007/978-3-319-45899-1_9

      Schmidt M, Wiese S, Adak V, Engler J, Agarwal S, Fritz G, Westermark P, Zacharias M, Fändrich M. 2019. Cryo-EM structure of a transthyretin-derived amyloid fibril from a patient with hereditary ATTR amyloidosis. Nat Commun 10:5008. doi:10.1038/s41467-019-13038-z

      Schweighauser M, Shi Y, Tarutani A, Kametani F, Murzin AG, Ghetti B, Matsubara T, Tomita T, Ando T, Hasegawa K, Murayama S, Yoshida M, Hasegawa M, Scheres SHW, Goedert M. 2020. Structures of α-synuclein filaments from multiple system atrophy. Nature 585:464–469. doi:10.1038/s41586-020-2317-6

      Snapp EL, Hegde RS, Francolini M, Lombardo F, Colombo S, Pedrazzini E, Borgese N, Lippincott-Schwartz J. 2003. Formation of stacked ER cisternae by low affinity protein interactions. J Cell Biol 163:257–269. doi:10.1083/jcb.200306020

      Törnquist M, Michaels TCT, Sanagavarapu K, Yang X, Meisl G, Cohen SIA, Knowles TPJ, Linse S. 2018. Secondary nucleation in amyloid formation. Chem Commun 54:8667–8684. doi:10.1039/c8cc02204f

      Ungar G, Putra EGR, de Silva DSM, Shcherbina MA, Waddon AJ. 2005. The Effect of Self-Poisoning on Crystal Morphology and Growth Rates In: Allegra G, editor. Interphases and Mesophases in Polymer Crystallization I, Advances in Polymer Science. Berlin, Heidelberg: Springer Berlin Heidelberg. pp. 45–87. doi:10.1007/b107232

      Vetri V, Foderà V. 2015. The route to protein aggregate superstructures: Particulates and amyloid-like spherulites. FEBS Lett 589:2448–2463. doi:10.1016/j.febslet.2015.07.006

      Wild EJ, Boggio R, Langbehn D, Robertson N, Haider S, Miller JRC, Zetterberg H, Leavitt BR, Kuhn R, Tabrizi SJ, Macdonald D, Weiss A. 2015. Quantification of mutant huntingtin protein in cerebrospinal fluid from Huntington’s disease patients. The Journal of Clinical Investigation.

      Yang Y, Arseni D, Zhang W, Huang M, Lövestam S, Schweighauser M, Kotecha A, Murzin AG, Peak-Chew SY, Macdonald J, Lavenir I, Garringer HJ, Gelpi E, Newell KL, Kovacs GG, Vidal R, Ghetti B, Ryskeldi-Falcon B, Scheres SHW, Goedert M. 2022. Cryo-EM structures of amyloid-β 42 filaments from human brains. Science 375:167–172. doi:10.1126/science.abm7285

      Zhang X, Zhang W, Wagener KB, Boz E, Alamo RG. 2018. Effect of Self-Poisoning on Crystallization Kinetics of Dimorphic Precision Polyethylenes with Bromine. Macromolecules 51:1386–1397. doi:10.1021/acs.macromol.7b02745

    1. Author Response

      Reviewer #1 (Public Review):

      This study by Park et al. describes an interesting approach to disentangle gene-environment pathways to cognitive development and psychotic-like experiences in children. They have used data from the ABCD study and have included PGS of EA and cognition, environmental exposure data, cognitive performance data and self-reported PLEs. Although the study has several strengths, including its large sample size, interesting approach and comprehensive statistical model, I have several concerns:

      • The authors have included follow-up data from the ABCD Study. However, it is not very clear from the beginning that longitudinal paths are being explored. It would be very helpful if the authors would make their (analysis) approach clearer from the introduction. Now, they describe many different things, which makes the paper more difficult to read. It would be of great help to see the proposed path model in a Figure and refer to that in the Method.

      We clarified the specific longitudinal paths explored in our study in the end of the Introduction section (line 149~160). We also added a figure of the proposed path model (Figure 1) and refer to it in the Method section (line 232~239).

      • There is quite a lot of causal language in the paper, particularly in the Discussion. My advice would be to tone this down.

      We corrected and tone-downed all causal languages used in our manuscript. Per your suggestion, we deleted statements like ‘unbiased estimates’ and used expressions such as ‘adjustment for observed/unobserved confounding’ instead.

      • I feel that the limitation section is a bit brief, and can be developed further.

      We specified additional potential constraints of our study, including limited representativeness, limited periods of follow-up data, possible sample selection bias, and the use of non-randomized, observational data. These corrections can be found in line 518~538.

      • I like that the assessment of CP and self-reports PEs is of good quality. However, I was wondering which 4 items from the parent-reported CBCL were used and how did they correlate with the child-reported PEs? And how was distress taken into account in the child self-reported PEs measurement? Which PEs measures were used?

      We believe that the Reviewer #1’s comment for the correlations between PLEs derived from PQ-BC (total score and distress score PLEs) and from CBCL (parent-rated PLEs) might have been due to the fact that she/he was referring to the prior version of our manuscript submitted to a different journal. We obtained Pearson’s correlation coefficients between the PLEs (baseline year: r = 0.095~0.0989, p<0.0001; 1-year follow-up: r = 0.1322~0.1327, p<0.0001; 2-year follow-up: r = 0.1569~0.1632, p<0.0001) and added this information in the Method section for PLEs (line 198~201).

      • What was the correlation between CP and EA PGSs?

      We also added the Pearson’s correlation between the two PGSs (r =0.4331, p<0.0001) in the Methods section for PGS (line 214~215).

      • Regarding the PGS: why focus on cognitive performance and EA? It should be made clearer from the introduction that EA is not only measuring cognitive ability, but is also a (genetic) marker of social factors/inequalities. I'm guessing this is one of the reasons why the EA PGS was so much more strongly correlated with PEs than the CP PGS. See the work bij Abdellaoui and the work by Nivard.

      We thank the reviewer for the feedback to clarify that educational attainment (EA) is not only a genetic marker of cognitive ability but also that of socioeconomic outcomes. Per your suggestion, we included the associations of EA PGS with multiple biological and socioeconomic outcomes found in prior studies (e.g., Abdellaoui et al., 2022) in the Introduction (line 131~142).

      Abdellaoui, A., Dolan, C. V., Verweij, K. J. H., & Nivard, M. G. (2022). Gene–environment correlations across geographic regions affect genome-wide association studies. Nature Genetics. doi:10.1038/s41588-022-01158-0

      • Considering previous work on this topic, including analyses in the ABCD Study, I'm not surprised that the correlation was not very high. Therefore, I don't think it makes a whole of sense to adjust for the schizophrenia PGS in the sensitivity analyses, in other words, it's not really 'a more direct genetic predictor of PLEs'.

      We conducted this adjustment considering that PLEs often precede the onset of schizophrenia. In addition, prior studies found that schizophrenia PGS is significantly associated with cognitive intelligence within psychosis patients (Shafee et al., 2018) and individuals at-risk of psychosis (He et al., 2021), and that significant distress psychotic-like experiences had greater positive correlation with schizophrenia PGS than PGS for psychotic-like experiences (Karcher et al., 2018).

      For these reasons, we thought that it is necessary to assess whether the effects of cognitive phenotypes PGS (i.e., CP PGS and EA PGS) in the linear mixed model are significant after adjusting for schizophrenia PGS. We believe our results from the mixed linear model showed the sensitivity and specificity of the association between cognitive phenotype PGS and PLEs.

      He, Q., Jantac Mam-Lam-Fook, C., Chaignaud, J., Danset-Alexandre, C., Iftimovici, A., Gradels Hauguel, J., . . . Chaumette, B. (2021). Influence of polygenic risk scores for schizophrenia and resilience on the cognition of individuals at-risk for psychosis. Translational Psychiatry, 11(1). doi:10.1038/s41398-021-01624-z

      Karcher, N. R., Paul, S. E., Johnson, E. C., Hatoum, A. S., Baranger, D. A. A., Agrawal, A., . . . Bogdan, R. (2021). Psychotic-like Experiences and Polygenic Liability in the Adolescent Brain Cognitive Development Study. Biological Psychiatry: Cognitive Neuroscience and Neuroimaging. doi:https://doi.org/10.1016/j.bpsc.2021.06.012

      Shafee, R., Nanda, P., Padmanabhan, J. L., Tandon, N., Alliey-Rodriguez, N., Kalapurakkel, S., . . . Robinson, E. B. (2018). Polygenic risk for schizophrenia and measured domains of cognition in individuals with psychosis and controls. Translational Psychiatry, 8(1). doi:10.1038/s41398-018-0124-8

      • How did the FDR correction for multiple testing affect the results?

      For all analysis results presented in our study, False Discovery Rate (FDR) correction for multiple testing compared p-values of nine key study variables: PGS (cognitive performance or educational attainment), family income, parental education, family’s financial adversity, Area Deprivation Index, years of residence, proportion of population below -125% of the poverty line, positive parenting behavior, and positive school environment. An exception was the sensitivity analysis that included schizophrenia PGS in the linear mixed model for adjustment: with another PGS variable added, FDR correction compared p-values of ten key variables. Overall, the effects of FDR correction on the results were limited; i.e., the majority of associations between the key variables and the outcomes, which were deemed highly significant, remained unchanged after the FDR correction.

      Overall, I feel that this paper has the potential to present some very interesting findings. However, at the moment the paper misses direction and a clear focus. It would be a great improvement if the readers would be guided through the steps and approach, as I think the authors have undertaken important work and conducted relevant analyses.

      We express our appreciation to the reviewer for the constructive feedback and guidance, which has significantly contributed to the improvement of our manuscript. As addressed in the preceding sections, we have implemented the necessary corrections and clarifications in response to the reviewer's suggestions. We remain open to making further amendments as needed, and thus invite any additional comments should any aspect of our revisions be deemed inadequate or inappropriate.

      Reviewer #2 (Public Review):

      This paper tried to assess the link between genetic and environmental factors on psychotic-like experiences, and the potential mediation through cognitive ability. This study was based on data from the ABCD cohort, including 6,602 children aged 9-10y. The authors report a mediating effect, suggesting that cognitive ability is a key mediating pathway in the link between several genetic and environmental (risk and protective) factors on psychotic-like experiences.

      While these findings could be potentially significant, a range of methodological unclarities and ambiguities make it difficult to assess the strength of evidence provided.

      Strengths of the methods:

      The authors use a wide range of validated (genetic, self- and parent-reported, as well as cognitive) measures in a large dataset with a 2-year follow-up period. The statistical methods have the potential to address key limitations of previous research.

      We sincerely thank the reviewer for recognizing these methodological strengths of our study. The reviewer’s positive comments are highly supportive and encouraging for us.

      Weaknesses of the methods:

      The rationale for the study is not completely clear. Cognitive ability is probably a more likely mediator of traits related to negative symptoms in schizophrenia, rather than positive symptoms (e.g., psychosis, psychotic-like symptom). The suggestion that cognitive ability might lead to psychotic-like symptoms in the general population needs further justification.

      We sincerely thank and highly appreciate the concerns that the reviewer has raised regarding our proposal that cognitive ability may serve as a mediator of psychotic-like experiences. To the best of our knowledge, it has been proposed that cognitive ability can be a mediator of positive symptoms in schizophrenia (including psychotic-like experiences), as well as negative symptoms. This mediating role of cognitive ability was proposed in several prior studies on cognitive model of schizophrenia/psychosis. Per your suggestion, we included further justification in the Introduction section of our study (line 104~107). Specifically, we highlighted that cognitive ability has been theoretically proposed as a potential mediator of genetic & environmental influence on positive symptoms of schizophrenia such as psychotic-like experiences. We refer to studies conducted by Howes & Murray (2014) and Garety et al. (2001).

      Howes, O. D., & Murray, R. M. (2014). Schizophrenia: an integrated sociodevelopmental-cognitive model. The Lancet, 383(9929), 1677-1687. doi:https://doi.org/10.1016/S0140-6736(13)62036-X

      Garety, P. A., Kuipers, E., Fowler, D., Freeman, D., & Bebbington, P. E. (2001). A cognitive model of the positive symptoms of psychosis. Psychological Medicine, 31(2), 189-195. doi:10.1017/S0033291701003312

      Terms are used inconsistently throughout (e.g., cognitive development, cognitive capacity, cognitive intelligence, intelligence, educational attainment...). It is overall not clear what construct exactly the authors investigated.

      Thank you for your comment. We corrected the term ‘cognitive capacity’ to ‘cognitive phenotypes’ throughout our manuscript. We also added in the Introduction (line 141~143) that we will collectively refer to these two PGSs of focus as ‘cognitive phenotypes PGSs’, which is similar to the terms used in prior research (Joo et al., 2022; Okbay et al., 2022; Selzam et al., 2019).

      Joo, Y. Y., Cha, J., Freese, J., & Hayes, M. G. (2022). Cognitive Capacity Genome-Wide Polygenic Scores Identify Individuals with Slower Cognitive Decline in Aging. Genes, 13(8), 1320. doi:10.3390/genes13081320

      Okbay, A., Wu, Y., Wang, N., Jayashankar, H., Bennett, M., Nehzati, S. M., . . . Young, A. I. (2022). Polygenic prediction of educational attainment within and between families from genome-wide association analyses in 3 million individuals. Nature Genetics, 54(4), 437-449. doi:10.1038/s41588-022-01016-z

      Selzam, S., Ritchie, S. J., Pingault, J.-B., Reynolds, C. A., O’Reilly, P. F., & Plomin, R. (2019). Comparing Within- and Between-Family Polygenic Score Prediction. The American Journal of Human Genetics, 105(2), 351-363. doi:https://doi.org/10.1016/j.ajhg.2019.06.006

      Not the largest or most recent GWASes were used to generate PGSes.

      Thank you for mentioning this point. The reason why we were not able to use the largest GWAS for cognitive intelligence, educational attainment and schizophrenia is because (unfortunately) our study started earlier than the point when the GWAS studies by Okbay et al. (2022) and Trubetskoy et al. (2022) were published. We corrected that our study used ‘a GWAS of European-descent individuals for educational attainment and cognitive performance’ instead of the largest GWAS (line 206~208).

      It is not fully clear how neighbourhood SES was coded (higher or lower values = risk?). The rationale, strengths, and assumptions of the applied methods are not fully clear. It is also not clear how/if variables were combined into latent factors or summed (weighted by what). It is not always clear when genetic and when self-reported ethnicity was used. Some statements might be overly optimistic (e.g., providing unbiased estimates, free even of unmeasured confounding; use of representative data).

      Consistent with the illustration of neighborhood SES in the Methods section, higher values of neighborhood SES indicate risk. In the original Figure 2, higher values of neighborhood SES links to lower intelligence (direct effects: β=-0.1121) and higher PLEs (indirect effects: β=-0.0126~ -0.0162). We think such confusion might have been caused by the difference between family SES (higher values = lower risk) neighborhood SES (higher values = higher risk). Thus, we changed the terms to ‘High Family SES’ and ‘Low Neighborhood SES’ in the corrected figure (Figure 3) for clarification.

      Considering that shorter duration of residence may be associated with instability of residency, it may indicate neighborhood adversity (i.e., higher risk). This definition of the ‘years of residence’ variable is in line with the previous study by Karcher et al. (2021).

      We represented PGSs, family SES, neighborhood SES, positive family and school environment, and PLEs as composite indicators (derived from a weighted sum of relevant observed variables). To the best of our knowledge, it has been suggested from prior studies that these variables are less likely to share a common factor and were assessed as a composite index during analyses. For instance, Judd et al. (2020) and Martin et al. (2015) analyze genetic influence of educational attainment and ADHD as composite indicators. Also, as mentioned in Judd et al. (2020), socioenvironmental influences are often analyzed as composite indicators. Studies on psychosis continuum (e.g., van Os et al., 2009) suggest that psychotic disorders are likely to have multiple background factors instead of having a common factor, and notes that numerous prior research uses composite indices to measure psychotic symptoms. These are the reasons why we used components for these constructs instead of generating latent factors (which is done in the standard SEM method). On the contrary, we represented general intelligence as a common factor that determines the underlying covariance pattern of fluid and crystallized intelligence, based on the classical g theory of intelligence. We added this explanation in line 269~285.

      Moreover, during estimation, the IGSCA determines weights of each observed variable in such a way as to maximize the variances of all endogenous indicators and components. We added this explanation in the description about the IGSCA method (line 266~268).

      We deleted overly optimistic statements like ‘unbiased estimates’ and used expressions such as ‘adjustment for observed/unobserved confounding’ instead, throughout our manuscript.

      Judd, N., Sauce, B., Wiedenhoeft, J., Tromp, J., Chaarani, B., Schliep, A., ... & Klingberg, T. (2020). Cognitive and brain development is independently influenced by socioeconomic status and polygenic scores for educational attainment. Proceedings of the National Academy of Sciences, 117(22), 12411-12418.

      Karcher, N. R., Schiffman, J., & Barch, D. M. (2021). Environmental Risk Factors and Psychotic-like Experiences in Children Aged 9–10. Journal of the American Academy of Child & Adolescent Psychiatry, 60(4), 490-500. doi:10.1016/j.jaac.2020.07.003

      Martin, J., Hamshere, M. L., Stergiakouli, E., O'Donovan, M. C., & Thapar, A. (2015). Neurocognitive abilities in the general population and composite genetic risk scores for attention‐deficit hyperactivity disorder. Journal of Child Psychology and Psychiatry, 56(6), 648-656.

      van Os, J., Linscott, R., Myin-Germeys, I., Delespaul, P., & Krabbendam, L. (2009). A systematic review and meta-analysis of the psychosis continuum: Evidence for a psychosis proneness–persistence–impairment model of psychotic disorder. Psychological Medicine, 39(2), 179-195. doi:10.1017/S0033291708003814

      It appears that citations and references are not always used correctly.

      We thoroughly checked all citations and specified the references for each statement. We deleted Plomin & von Stumm (2018) and Harden & Koellinger (2020) and cited relevant primary studies (e.g., Lee et al., 2018; Okbay et al., 2022; Abdellaoui et al., 2022) instead. We also specified the references supporting the statement that educational attainment PGS links to brain morphometry (Judd et al., 2020; Karcher et al., 2021). As Okbay et al. (2022) use PGS of cognitive intelligence (which mentions the analyses results in their supplementary materials) as well as educational attainment, we decided to continue citing this reference. These corrections can be found in line 131~141.

      Strengths of the results:

      The authors included a comprehensive array of analyses.

      We thank the reviewer for the positive comment.

      Weaknesses of the results:

      Many results, which are presented in the supplemental materials, are not referenced in the main text and are so comprehensive that it can be difficult to match tables to results. Some of the methodological questions make it challenging to assess the strength of the evidence provided in the results.

      As you rightly identified, we inadvertently failed to reference Table S2 in the main text. We have since corrected this omission in the Results section for the IGSCA (SEM) analysis (line 375). The remainder of the supplementary tables (Table S1, S3~S7) have been appropriately cited in the main manuscript. We recognize that the quantity of tables provided in the supplementary materials is substantial. However, given the comprehensiveness and complexity of our analyses, which encompass a wide array of study variables, these tables offer intricate results from each analysis. We deem these results, which include valuable findings from sensitivity analyses and confound testing, too significant to exclude from the supplementary materials. That said, we are open to, and would greatly welcome, any further suggestions on how to present our supplementary results in a more accessible and digestible format. We are ready and willing to implement any necessary modifications to ensure clarity and ease of comprehension. Your guidance in this matter is highly valued.

      Appraisal:

      The authors suggest that their findings provide evidence for policy reforms (e.g., targeting residential environment, family SES, parenting, and schooling). While this is probably correct, a range of methodological unclarities and ambiguities make it difficult to assess whether the current study provides evidence for that claim.

      Impact:

      The immediate impact is limited given the short follow-up period (2y), possibly concerns for selection bias and attrition in the data, and some methodological concerns.

      We added as study limitations (line 518~538) that the impact of our findings for understanding cognitive and psychiatric development during later childhood may be limited due to the relatively short follow-up period, the possibility of sample selection bias, and the problems of interpreting analyses results from an observational study as causality (despite the novel causal inference methods, designed for non-randomized, observational data, that we used).

      As responded above, we made necessary corrections and clarifications for the points suggested by the reviewer. As we are willing to make additional revisions, please feel free to give comments if you feel that our corrections are insufficient or inappropriate.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript reports new findings about the role of the glutamate transporter EAAC1 in controlling neural activity in the striatum. The significance is two-fold - it addresses gaps in knowledge about the functional significance of EAAC1, as well as provides a potential explanation for how EAAC1 mutations contribute to striatal hyperexcitability and OCD-associated behaviors. The manuscript is clearly presented, and the well-designed experiments are rigorously performed and analyzed. The main results showing that EAAC1 deletion increases the dendritic arbor of MSN D1 neurons and increases excitatory synaptic connectivity, as well as reduces D1-to-D1 mediated IPSCs are convincing. These results clearly demonstrate that EAAC1 deletion can alter excitatory and inhibitory synaptic function. Modelling the potential consequences for these changes on D1 MSN neural activity, and the behavior changes are interesting. Minor weaknesses include incomplete support for the conclusions about how EAAC1 regulates GABAergic transmission.

      We would like to take this opportunity to thank the reviewer. New sets of pharmacology experiments now address the minor concern about supporting the conclusions about the regulation of GABAergic transmission by EAAC1. The revised manuscript also includes new behavioral assays that allow us to examine in more depth the cell- and region-specificity of the effects of EAAC1.

      Reviewer #2 (Public Review):

      The manuscript by Petroccione et al., examines the modulatory role of the neuronal glutamate transporter EAAC1 on glutamatergic and GABAergic synaptic strength at D1- and D2-containing medium spiny neurons within the dorsolateral striatum. They find that pharmacological and genetic disruption of EAAC1 function increases glutamatergic synaptic strength specifically at D1-MSNs. They show that this is due to a structural change in release sites, not release probability. They also show that EAAC1 is critical in maintaining lateral inhibition specifically between D1-MSNs. Taken together, the authors conclude that EAAC1 functions to constrain D1-MSN excitation. Using a computational modeling technique, they posit that EAAC1's modulatory role at glutamatergic and GABAergic inputs onto D1-MSNs ultimately manifests as a reduction of gain of the input-output firing relationship and increases the offset. They go on to show that EAAC1 deletion leads to enhanced switching behavior in a probabilistic operant task. They speculate that this is due to a dysregulated E/I balance at D1-MSNs in the DLS. Overall, this is a very interesting study focused on an understudied glutamate transporter. Generally, the study is done in a very thorough and methodical manner and the manuscript is well written.

      We thank the reviewer for the thorough analysis and insightful comments on the manuscript. Our point-to-point responses to the concerns raised on the initial submission of this work are reported below:

      Major Comments/Concerns:

      Regional/Local manipulations in behavior study: The manuscript would be greatly improved if they provided data linking the ex vivo electrophysiological findings within the DLS with the behavior. Although they are using a DLS-dependent task, they are nonetheless, using a constitutive EAAC1 KO mouse. Thus, they cannot make a strong conclusion that the behavioral deficits are due to the EAAC1 dysfunction in the DLS (despite the strong expression levels in the DLS).

      Corrected - We concur with the reviewer. To address this concern, we performed new experiments to assess the cell- and regional-specificity of the effects of EAAC1 on task-switching behaviors.

      First, we repeated the behavioral assays described in Fig. 8 in two mouse lines (D1Cre/+:EAAC1f/f and A2ACre/+:EAAC1f/f) lacking EAAC1 expression in D1- or D2-MSNs, respectively (Supp. Fig. 8-1). As in the case of EAAC1+/+ and EAAC1-/- mice, when the switch time was short (<15 s), D1Cre/+:EAAC1f/f and A2ACre/+:EAAC1f/f mice collected a similar number of rewards (Supp. Fig. 8-1K, L) and performed a similar number of lever presses (Supp. Fig. 8-1M, N). As the switch time increased (30-75 s), D1Cre/+:EAAC1f/f mice collected more rewards than A2ACre/+:EAAC1f/f mice, at low and high reward probabilities (Supp. Fig. 8-1L, N). Overall, the task switching behavior of D1Cre/+:EAAC1f/f mice was similar to that of EAAC1-/- mice, whereas that of A2ACre/+:EAAC1f/f mice was similar to that of EAAC1+/+ mice (cf. Supp. Fig. 8 and Supp. Fig. 8-1). This suggests that loss of expression of EAAC1 from D1-MSNs is sufficient to reproduce the task switching behavior of EAAC1-/- mice. Because EAAC1 limits excitation onto D1-MSNs (Fig. 2, 3) and lateral inhibition between D1-MSNs (Fig. 4-6), these findings suggest that increased excitation onto D1-MSNs and reciprocal inhibition among D1-MSNs limit execution of reward-based behaviors with task-switching intervals >30s.

      Second, as noted by the reviewer, another potential limitation of the experiments performed on constitutive EAAC1-/- mice is that , on their own, they do not allow us to say whether they are due to changes in E/I onto D1MSNs within a specific domain of the striatum like the DLS. Although the DLS is recruited during task-switching, reward-based flexibility in executive control relies on neuronal activity in the VMS (Wallis 2007; Gu et al. 2008). Therefore, we asked whether limiting excitation in D1-MSNs and strengthening D1-D1 lateral inhibition via EAAC1 in the VMS could also alter reward-based task-switching behaviors. To address this question, we repeated the task switching test in EAAC1f/f mice that received stereotaxic injections of a Cre-dependent viral construct (AAV-D1Cre) that we used to remove EAAC1 expression from D1-MSNs in the DLS or VMS, respectively (Supp. Fig. 8-2). The results showed that the task switching behaviors of EAAC1f/f mice receiving AAV-D1Cre injections in the DLS or VMS were similar to each other and to those of EAAC1-/- mice, while being statistically different from those of EAAC1+/+ mice. This finding is important, as it suggests that: (i) the DLS and VMS are both recruited for the execution of task switching behaviors; (ii) the modulation of E/I onto D1-MSNs by EAAC1 may not be limited to the DLS but could extend to the VMS.

      Third, we performed further tests to examine the regional-specificity of the effects of EAAC1 in D1-MSNs. D1 receptor expressing cells are present not only throughout the striatum, but also in the substantia nigra (pars compacta and reticulata; SN) and ventral tegmental area (VTA) (Cadet et al. 2010; Savasta, Dubois, and Scatton 1986; Boyson, McGonigle, and Molinoff 1986; Wamsley et al. 1989). To determine whether lack of EAAC1 in D1expressing cells in the SN/VTA could also contribute to increased compulsivity, we repeated the task switching behavioral assays in EAAC1f/f mice that received injections of AAV-D1Cre in the SN/VTA (Supp Fig. 8-3). The task switching behavior of these mice was similar to that of EAAC1+/+ , not EAAC1-/- mice, suggesting that altering EAAC1 expression in D1-MSNS of the DLS/VMS, but not the SN/VTA, is implicated with the control of task switching of reward-based behaviors in mice.

      The results of these new sets of experiments are included in the revised version of the manuscript and their implications are reported in the Discussion section of the paper.

      Statistics used in the study: There are some missing details regarding the precise stats using for the different comparisons. I am particularly concerned that the electrophysiology studies that were a priori designed as a 2-factor analysis did not have 2-way ANOVAs performed, but rather a series of t-tests. For example, in Figure 3b, the two factors are 1) cell type and 2) genotype. Was a 2-way ANOVA performed? It is hard for me to tell from the text.

      Corrected - We apologize for any potential confusion. The statistical analysis for the experiments included in this work includes paired and unpaired t-tests, one-way ANOVA, two-way ANOVA, and ANOVA for repeated measures tests followed by post hoc t-test comparisons (reported in the text). To ensure both accuracy and readability of the manuscript, we report the results of the statistical comparisons in the main text of the manuscript, but also provide a fully detailed statistical analysis across all datasets performed in the data repository for this manuscript deposited on Open Science Framework. We revised the methods section to clarify the use of different statistical tests and values reported in the manuscript.

      Moderate Concerns:

      Control mice: I am moderately concerned that littermates were not used for controls for the EAAC1 KO, but rather C57Bl/6NJ presumably ordered from a vendor. It has been shown that issues like transit and rearing conditions can have long term effects on behavior. Were the control mice reared in house? How long was the acclimation time before use?

      Corrected - Sorry for the potential confusion. The EAAC1-/- mice are bred in house and have been backcrossed with C57BL/6J for more than 10 generations. We perform backcrossing regularly and routinely in our animal colony. The C57BL/6J are also bread in house. They are replaced every 10 generations to avoid genetic drift. Therefore, there is no concern about transit from vendors and rearing affecting the results of our experiments. This information has been added to the Methods section of the paper.

      OCD framework: I generally find the OCD framework unnecessary, particularly in the Introduction. Compulsive behaviors are not restricted to OCD. Indeed, the link between the behavioral observations and OCD phenotype seems a bit tenuous. In addition, studying the mechanisms of behavioral flexibility in and of itself is interesting. I do not think such a strong link needs to be made to OCD throughout the entirety of the paper. The authors should consider tempering this language or restricting it to the discussion and end of the abstract.

      Corrected - We concur with the reviewer and have revised the manuscript accordingly. At the end of the Abstract, we refer only to behavior flexibility. We have toned down our emphasis on OCD in the Introduction, broadening the genetic link between the gene encoding EAAC1 (SLC1A1) and neuropsychiatric diseases like OCD, ADHD and ASD. This is now limited to a single sentence. We also revised the Discussion section because we agree with the reviewer on the fact that compulsive behaviors are not limited to OCD.

    1. Author Response

      Reviewer #1 (Public Review):

      1) The model's cortical neurons had no contralateral encoding, unlike their neuroimaging data.

      This is a common point of confusion. In fact, this comment has prompted us to clarify our modeling decisions. For the CBGT pathways, we use a simplified model of isolated "action channels" that represent unique actions without specifying the true laterality of representations in the brain. As long as relatively distinct representations compete, which is what we observed in our human neuroimaging data, and as long as the populations representing the action are unique, regardless of hemisphere, our model assumptions are applicable despite the complicated lateralization of unimanual actions in reality.

      We now specify this in the main text:

      “It is important to note that, for the sake of parsimony, we adopt a simple and canonical model of CBGT pathways, with action channels that are agnostic as to the location of representations (e.g., lateralization), simply assuming that actions have unique population-level representations.”

      2) Another concern with this work is that it was unclear why the spiking neuronal network model with so many model parameters was used to account for coarse-scale fMRI data - a simple firing-rate neural population model would perhaps do the work.

      We see how using a complex, biologically realistic neural network has arguable scientific value when comparisons are coarse and made against macroscopic hemodynamic responses. However, it does have clear value for setting the stage for future work that can unravel the nuances of the mechanism involved.

      To explain our rationale, we take an upward mapping perspective, where implementation-level models at lower levels represent the detailed biophysical properties of neurons and synapses, and models at higher levels represent the emergent properties of neural networks. This approach facilitates prediction at various levels of abstraction, including molecular, cellular, behavioral, and cognitive, by leveraging lower-level models to inform higher-level ones. For example, in other work, we are testing our model in mice using D1 and D2 optogenetic stimulation. We plan to use the same neural network to inform our predictions about these results. So, the complexity of the model does have a clear purpose for informing ongoing and future work by acting as a theoretical bridge between experiments across levels of analysis and spatiotemporal resolution. In our paper, the fMRI findings are compared with predicted dynamics at a common level of abstraction. Given the difference in resolution between these two approaches, our comparison is coarse.

      To the reviewer’s concern about the number of parameters in the model, we make sure to address the dimensionality of our model in our analysis approach in the Results section:

      “To test whether these shifts in v are driven by competition within and between action channels, we predicted the network's decision on each trial using a LASSO-PCR trained on the pre-decision firing rates of the network (see Measuring neural action representations). The choice of LASSO-PCR was based on prior work building reliable classifiers from whole-brain evoked responses that maximizes inferential utility (see Wager et al. 2011). The method is used when models are over-parameterized, as when there are more voxels than observations, relying on a combination of dimensionality reduction and sparsity constraints to find the true, effective complexity of a given model. While these are not considerations with our network model, they are with the human validation experiment that we describe next. Thus, we used the same classifier on our model as on our human participants to directly compare theoretical predictions and empirical observations.”

      3) Moreover, the activity dynamics of the fMRI were not shown. It would have been more rigorous to show the fMRI (BOLD) signals in different (particularly CBGT) brain regions and compare that with the CBGT model simulations.

      The timing of the trials and the autocorrelational structure of the BOLD response make such fine-grained analysis unproductive, as the entire trial is subsumed under a single evoked response. While we sympathize with this question, the limitations of the fMRI signal restrict our resolution for evaluating intra-trial dynamics. Our follow-up work with neurophysiological recordings in rodents may help address this. Given these limitations, we now show averaged node-by-node comparisons for the simulated and human data in Fig. 3 - Fig. Supp. 5.

      4) The association between classier uncertainty and drift rate (by participants) was an order of magnitude difference between the simulated and actual participants (compare Figure 2E with Figure 4B).

      You make a valid point about the difference in effect magnitude between the model and data. The greater effect observed in the simulated data is due to several factors: 1) simulated data is not affected by the same sources of noise as human data, 2) the model is not susceptible to non-task related variance, 3) the model was used to predict associations seen in humans, and fine-tuning the model using human data would result in circular inference, and 4) the simulations used only a single experimental condition with deterministic volatility, while human experiments varied the relative value of the two options and volatility, leading to increased variance in human responses. The goal was to compare the qualitative pattern of results, and the discrepancy in magnitude is addressed in the Discussion section of the revised manuscript:

      “Careful attention to the effect size of our correlations between channel competition and drift rate shows that the effect is substantially smaller in humans than in the model. This is not surprising and due to several factors. Firstly, the simulated data is not affected by the same sources of noise as the hemodynamic signal, whose responses can be greatly influenced by factors such as heterogeneity of cell populations and properties of underlying neurovascular coupling. Additionally, our model is not susceptible to non-task related variance, such as fatigue or lapses of attention, which the humans likely experienced. We could have fine tuned the model results based on the empirical human data, but that would contaminate the independence of our predictions. Finally, our simulations only used a single experimental condition, whereas human experiments varied the relative value of options and volatility, which led to more variance in human responses. Yet, despite these differences we see qualitative similarities in both the model and human results, providing confirmation of a key aspect of our theory.”

      5) There was also a weak effect on human reaction times (Supp. Fig. 2).

      Trial-by-trial reaction times are indeed noisy. However, our estimates rely on the distribution of reaction times, rather than trial-by-trial values.

      6) There were only 4 human participants that performed the experiment - the results would perhaps be better with more human participants.

      We see where this comment arises from and we are sympathetic to the initial thought, but we should point out that our experimental design mirrors the type used in non-human primate research: collect an entire experiment’s worth of data from a single participant and replicate the effects across new participants. We have a total of 2,700 trials per participant (for a total of 10,800 trials across all participants). Each participant has the equivalent number of trials as what would be conducted per experiment in typical single run or single session experiments with a sample of ~40 participants. Our statistical power was focused on within-subjects replication, meaning that each participant can be thought of as their own independent experiment, with sufficient statistical power to address our primary research hypothesis. Thus, in our experimental design, each run is an observation, as opposed to each participant as in typical fMRI experiments, and each participant is then considered a replication test of the other participants.

      We now emphasize the statistical power on a single-subject basis in the Results section:

      “Crucially, we designed this experiment such that each participant acted as an out-of-set replication test, having performed thousands of trials individually. Specifically, to ensure we had the statistical power to detect effects on a participant-by-participant basis, we collected an extensive data set comprising 2700 trials over 45 runs from nine separate imaging sessions for each of four participants. Consequently, we amassed a grand total of 36 hours of imaging data over all participants, which was used to evaluate the replicability of our findings at the participant-by-participant level. Therefore, our statistical analyses were able to estimate effects on a single-participant basis.”

      7) For such a complex biophysical computational model, there could perhaps have been more model predictions provided.

      Using a biologically realistic neural network may not be useful for finer-grained comparisons, but it can inform future work. By mapping upward from lower-level to higher-level models, we can predict emergent properties at different levels of abstraction. The model's complexity is necessary for informing ongoing and future work, such as testing the model in other organisms. While the comparison with fMRI findings is coarse, we address the dimensionality of our model in our analysis approach.

      Reviewer #2 (Public Review):

      1) In this paper, Bond et al. build on previous behavioral modeling of a reversal-learning task. They replicate some features of human behavior with a spiking neural network model of cortical basal ganglia thalamic circuits, and they link some of these same behavioral patterns to corresponding areas with BOLD fMRI. I applaud the authors for sharing this work as a preprint, and for publicly sharing the data and code.

      Thank you for your thoughtful comments on our work! We also appreciate your recognition of our efforts to openly share our data and code.

      2) While the spiking neural network model offers a helpful tool to complement behavior and neuroimaging, it is not very clear which predictions are specific to this model (and thus dissociate it from, or go beyond, previous work). Thus, the main strength of this work (combining behavior, brain, and in silico experiments) is not fully fleshed out and could be stronger in the conclusions we can draw from them.

      We agree that further exploration of the specific predictions that our spiking neural network model offers would be valuable in order to fully delineate its contribution to the field. In our current work, we link our simulated neural network dynamics with whole-brain hemodynamic data, which limits the temporal resolution and complexity of our comparisons. We recognize that a more detailed investigation of the unique contributions of our spiking neural network model would be an important next step in order to better understand the mechanisms underlying the observed behavioral patterns. Indeed – we are currently conducting follow-up work in mice to test finer-grained predictions of cellular dynamics.

      3) It would be helpful to know more about which features of the spiking NN model are crucial in precisely replicating the behavioral patterns of interest (and to be more precise in which behaviors are replicated from previous work with the same task, vs. which ones are newly acquired because the task has changed - or the spiking CBGT model has afforded new predictions for behavior). Throughout, I am wondering if the authors can compare their results to a reasonable 'null model' which can then be falsified (e.g. Palminteri et al. 2017, TICS); this would give more intuition about what it is about this new CBGT model that helps us predict behavior. The same question about model comparison holds for the behavior: beyond relying on DIC score differences, what features of behavior can and cannot be explained by the family of DDMs?

      You raise a crucial point. In our original manuscript, we only compared the single and pairwise variants of the HDDM model and a null model predicting no change in decision policy. The drift rate model best fit the data among these comparisons.

      However, our main claim relies on the link between neural data, behavior, and the underlying cognitive process. Previously, we did not test other variants of this central linking hypothesis. To address this, we tested an alternative linking hypothesis using boundary height instead of drift rate as our target variable. We found a null association with classifier uncertainty. This definitely provides a more rigorous test of our primary hypothesis, and we thank the reviewer for raising this point.

    1. Author Response

      Reviewer #2 (Public Review):

      1) The authors in reality do not analyze oscillations themselves in this manuscript but only the power of signals filtered at determined frequency bands. This is particularly misleading when the authors talk about "spindles". Spindles are classically defined as a thalamico-cortical phenomenon, not recorded from hippocampus LFPs. Thus, the fact that you filter the signal in the same frequency range matching cortical spindles does not mean you are analyzing spindles. The terminology, therefore, is misleading. I would recommend the authors to change spindles to "beta", which at least has been reported in the hippocampus, although in very particular behavioral circumstances. However, one must note that the presence of power in such bands does not guarantee one is recording from these oscillations. For example, the "fast gamma" band might be related to what is defined as fast gamma nested in theta, but it might also be related to ripples in sleep recordings. The increase of "spindle" power in sleep here is probably related to 1/f components arising from the large irregular activity of slow wave sleep local field potentials. The authors should avoid these conceptual confusions in the manuscript, or show that these band power time courses are in fact matching the oscillations they refer to (for example, their spindle band is in fact reflecting increased spindle occurrence).

      We thank the reviewer for allowing us to clarify this subject. We completely agree with concerns raised in the comments. To avoid any confusion, we have replaced throughout the manuscript the word ‘spindle’ with ‘beta’.

      2) The shuffling procedure to control for the occupancy difference between awake and sleep does not seem to be sufficient. From what I understand, this shuffling is not controlling for the autocorrelation of each band which would be the main source of bias to be accounted for in this instance. Thus, time shifts for each band would be more appropriate. Further, the controls for trial durations should be created using consecutive windows. If you randomly sample sleep bins from distant time points you are not effectively controlling for the difference in duration between trial types. Finally, it is not clear from the text if the UMAP is recomputed for each duration-matched control. This would be a rigorous control as it would remove the potential bias arising from the unbalance between awake and sleep data points, which could bias the subspace to be more detailed for the LFP sleep features. It is very likely the results will hold after these controls, given it is not surprising that sleep is a more diverse state than awake, but it would be good practice to have more rigorous controls to formalize these conclusions.

      We are grateful to the reviewer for suggesting alternative analysis. We have used this direction, to create surrogate datasets obtained by time shifting each band and obtained their respective UMAP projections (see modified Figure 2D). Additionally, as suggested, for duration-matched controls, we have selected consecutive windows, rather than random points (Figure 2 – figure supplement 1C). UMAP projections were obtained for each duration-matched control and occupancy was computed. The text in the method section has been modified to indicate the analysis. As expected, the results were identical.

      3) Lots of the observations made from the state space approach presented in this manuscript lack any physiological interpretation. For example, Figure 4F suggests a shift in the state space from Sleep1 to Sleep2. The authors comment there is a change in density but they do not make an effort to explain what the change means in terms of brain dynamics. It seems that the spectral patterns are shifting away from the Delta X Spindle region (concluding this by looking at Fig4B) which could be potentially interesting if analyzed in depth. What is the state space revealing about the brain here? It would be important to interpret the changes revealed by this method otherwise what are we learning about the brain from these analyses? This is similar to the results presented in Figure 5, which are merely descriptions of what is seen in the correlation matrix space. It seems potentially interesting that non-REM seems to be split into two clusters in the UMAP space. What does it mean for REM that delta band power in pyramidal and lm layers is anti-correlated to the power within the mid to fast gamma range? What do the transition probabilities shown in Figures 6B and C suggest about hippocampal functioning? The authors just state there are "changes" but they don't characterize these systematically in terms of biology. Overall, the abstract multivariate representation of the neural data shown here could potentially reveal novel dynamics across the awake-sleep cycle, but in the current form of this manuscript, the observations never leave the abstract level.

      We thank the reviewer for allowing us to clarify this aspect of the manuscript. We have now edited the main text to include considerations on the biological relevance of the findings of Figure 4, 5 and 6.

      Additions to figure 4: In particular, non-REM states in sleep2 tended to concentrate in a region of increased power in the delta and beta bands, which could be the results of increased interactions with cortical activity modulated in the same range. It is also likely that such effect was induced by the exposure to relevant behavioral experience. In fact, changes in density of individual oscillations after learning have been reported using traditional analytical methods and are thought to support memory consolidation (Bakker et al., 2015; Eschenko et al., 2008, 2006). Nevertheless, while traditional methods provide information about individual components, the novel approach used here provides additional information about the combinatorial shift in the dynamics of network oscillations after learning or exploration. Thus, it provides the basis for identifying how coordinated activity among different oscillations supports memory consolidation processes, as those occurring during non-REM sleep after exploration, which cannot be elucidated using traditional analytical methods.

      Additions to figure 5: Gamma segregation and delta decoupling offer a picture of hippocampal REM sleep as being more akin to awake locomotion (with the major difference of a stronger medium gamma presence) while also suggesting a substantial independence from cortical slow oscillations. On the other hand, the across-scale coherence of non-REM sleep is consistent with this sleep stage being dominated by brain-wide collective fluctuations engaging oscillations at every range. Distinct cross frequency coupling among various individual pairs of oscillations such as theta-gamma, delta-gamma etc., have been already reported (Bandarabadi et al., 2019; Clemens et al., 2009; Hammer et al., 2021; Scheffzük et al., 2011). However, computing cross frequency coupling on the state space provides the additional information on how multiple oscillations, obtained from distinct CA1 hippocampal layers (stratum pyramidale, stratum radiatum and stratum lacunosum moleculare), are coupled with each other during distinct states of sleep and wakefulness. Furthermore, projecting the correlation matrices on 2D plane, provides a compact tool that allows to visualize the cross-frequency interactions among various hippocampal oscillations. Altogether, this approach reveals the complex nature of coupling dynamics occurring in hippocampus during distinct behavioral states

      Additions to Figure 6: We found that transitions occurring from REM-to-REM sleep and non-REM-to-non-REM sleep (intra-state transitions) are more vulnerable to plasticity after exploration as compared to inter-state transitions (such as non-REM to REM, REM-to-intermediate etc.) (Fig 6E, F). These changes in intra-state transitions were observed to be beyond randomness (Fig S9 E, F) indicating a specificity in plastic changes in state transitions after exploration. In particular, while the average REM period duration is unaltered after exploration (Fig 4G), REM temporal structure is reorganized. In fact, increased probability of REM to REM transitions indicates a significant prolongation of REM bout duration. Similarly, the increase in non-REM to non-REM transition probability reflects an increased duration of non-REM bouts. Therefore, environment exploration was accompanied by an increased separation between REM and non-REM periods, possibly as a response to increased computational demands. More in general, the network state space allows to characterize the state transitions in hippocampus and how they are affected by novel experience or learning. By observing the state transition patterns, this analytical framework allows to detect and identify state-specific changes in the hippocampal oscillatory dynamics, beyond the possibilities offered by more traditional univariate and bivariate methods. We next investigated how fast the network flows on the state space and assessed whether the speed is uniform, or it exhibits specific region-dependent characteristics.

      Reviewer #3 (Public Review):

      1) My primary concern is to provide clear evidence that this approach will provide key insights of high physiological significance, especially for readers who may think the traditional approaches are advantageous (for example due to their simplicity). I think the authors' findings of distinct sleep state signatures or altered organization of the NLG3-KO mouse could serve this purpose. However, right now the physiological significance of these results is unclear. For example, do these sleep state signatures predict later behavior performance, or is altered organization related to other functional impairments in the disease model? Do neurons with distinct sleep state signatures form distinct ensembles and code for related information?

      We are thankful to the reviewer for raising a very interesting line of questioning regarding sleep signatures and distinct ensemble. In this study, we show that sleep state signatures can predict how individual cells may participate in information processing during open field exploration. However, further analysis exploring the recruitment of neuronal ensembles are in preparation for another manuscript and is beyond the scope of this article.

      We have further modified the description of the results (as also suggested by other reviewers) to highlight the key advantages of this approach over traditional methods.

      Regarding functional impairment: as described in the manuscript, the altered organization in animal model of autism could possibly due to alterations in cellular and synaptic mechanisms as those described in previous reports (Modi et al 2019, Foldy et al 2013)

      2) For cells with different mean firing rates during exploration: is that because they are putative fast-spiking interneurons and pyramidal cells? From the reported mean firing rates, I think some of these cells are interneurons. Since mean firing rates are well known to vary with cell type, this should be addressed. For example, the sleep state signatures may be distinct for different putative pyramidal cells and interneurons. This would be somewhat expected considering prior work that has shown different cell types have different oscillatory coupling characteristics. I think it would be more interesting to determine if pyramidal cells had distinct sleep state signatures and, if so, whether pyramidal cells from the same sleep state signature have similar properties like they code for similar things or commonly fire together in an ensemble ms the number of cells in Fig. 8 may be limited for this analysis. The authors could use the hc-11 data in addition, which was also tested in this work.

      We thank the reviewer for suggesting this additional analysis to better describe the data. To this end, we have added an additional Figure in supplementary data (analysis of hc11 dataset: Figure Figure 8 – figure supplement 3), to demonstrate that interneurons and pyramidal cells have distinct sleep signatures. These findings are in agreement with dataset presented in Figure 8D, E.

      As shown in the manuscript, the spatial firing (sparsity) has large variability for cells having similar network signatures (Fig 8E). Thus, additional parameters beside oscillations may be involved in cells encoding. Different network state spaces are required to be explored in future studies to further understand this phenomenon in detail.

      We agree that investigating neuronal ensembles and state space are an interesting direction to follow. In another study (in preparation) which are investigating in detail the recruitment of neuronal ensemble by oscillatory state space. Thus, those findings are beyond the scope of this introductory article.

      3) Example traces are needed to show how LFPs change over the state-space. Example traces should be included for key parts of the state-space in Figures 2 and 3.

      We thank the reviewer for this key insight on data representation. Example traces of how LFP varies on the state space have been added (see Figure 4 – figure supplement 1).

      4) What is the primary rationale for 200ms time bins? Is this time scale sufficient to capture the slow dynamics of delta rhythm (1-5Hz) with a maximum of 1s duration?

      Time scale of binning depends on the scale of investigation. We also replicated the results with different time bins (such as 50 ms and 1 seconds) and the results are identical. For delta rhythms, with 200 ms time bins, the dynamics will be captured across multiple bins. Additionally, the binned power time series are also smoothed before obtaining projections.

      5) Since oscillatory frequency and power are highly associated with running speed, how does speed vary over the state space. Is the relationship between speed and state-space similar to the results of previous studies for theta (Slawinska and Kasicki, Brain Res 1998; Maurer et al, Hippocampus 2005) and gamma oscillations (Ahmed and Mehta J. Neurosci 2012; Kemere et al PLOS ONE 2013), or does it provide novel insights?

      We thank the reviewer for highlighting this crucial link between oscillation and locomotion. While various articles have focused on individual oscillations, the combinatorial effects of multiple oscillations from multiple brain areas in regulating the speed of the animal during exploration is definitely worth exploring with this novel approach. These set of results will be introduced in another study, currently in preparation.

      6) The separation of 9 states (Fig. 6ABC) seems arbitrary, where state 1 (bin 1) is never visited. I suggest plotting the density distribution of the data in Fig. 2A or Fig. 6A to better determine how many states are there within the state space. For example, five peaks in such a density plot might suggest five states. Alternately, clustering methods could be useful to determine how the number of states.

      We thank the reviewer for this this useful suggestion. We agree that additional clustering methods can be used to identify non-canonical sleep states. These are currently being explored in our lab and will be part of future studies. As for this dataset, the density plots are available in figure 4E, which determines how many states are in each part of the state space.

      7) The results in Fig. 4G are very interesting and suggest more variation of sub-states during non REM periods in sleep1 than in sleep2. What might explain this difference? Was it associated with more frequent ripple events occurring in sleep2?

      The reviewer is right in looking for the source of the decreased of state variability in sleep2. Considering the distribution of relative frequency power in the state space, the higher concentration in sleep 2 corresponds to higher content in the slower delta and spindle frequency bands, rather than the higher frequencies of SWRs. This result can be interpreted in the light of enhanced cortical activity (which is known to heavily recruit those bands) and possibly of enhanced cortical-hippocampal communication following relevant behavioral experience. In fact, it is also necessary to mention that with our recording setup we cannot rule out the effects of volume conductance completely, and thus we cannot exclude that the increase in the delta and spindle bands in the hippocampus were a spurious effect of purely cortical frequency modulations.

      8) The state transition results in Fig. 6 are confusing because they include two fundamentally different timescales: fast transitions between oscillatory states and slow dynamics of sleep states. I recommend clarifying the description in the results and the figure caption. Furthermore, how can an animal transition between the same sleep state (Fig. 6EF)? Would they both be in a single sleep state?

      The transitions capture the fast oscillatory scales (as they are investigated over a timeframe of 1 second). The sleep stages (REM, non-REM etc.) are used as labels from which the states originate on the state space. This allows us to characterize fast oscillatory dynamics in various sleep stages.

      Regarding same state transition: An increase in same state transition probability corresponds to increase in prolongation of that particular state, thereby altering the temporal structure of a given sleep state.

    1. Author Response

      Reviewer #1 (Public Review):

      The paper describes a robotic system that can be used for prolonged recording of forced activity in crawling Drosophila larvae. This is mostly intended to be a proof of principle description of a tool potentially useful for the community. The system - whose value lies completely in its reproducibility and adoption - is only superficially described in the paper, but a more detailed description is made available through Github, along with the software used for the collection and analysis of data.

      There is good, convincing evidence this can work as some sort of "larval conveyor belt", used to artificially prolong food crawling behaviour in the animals. More could be said about the ecological implications of the assay (for instance: how relevant is it to an animal's natural behaviour? Does the system introduce artifactual distortions in the analysis, driven by the fact that animals crawl greater distances than they would normally crawl in nature? Will this extensive activity affect their development to pupation or adulthood?).

      In addition all our code being available on GitHub, we have added substantially to Materials and Methods in the manuscript (1-1.5 pages) detailing the analysis pipeline more thoroughly.

      We agree that a more thorough comparison of ecological vs. laboratory conditions was warranted here, and have addressed this in new Discussion section material (6th paragraph especially). The developmental effect due to prolonged locomotion is a very good point – with only a single animal measured for more than 24 hours, we do not yet know whether instar molting or pupation is delayed, but this could certainly be a concern in longer experiments moving forward.

      Reviewer #3 (Public Review):

      "Continuous, long-term crawling behavior characterized by a robotic transport system" by Yu et al. presents their new robotic device to track, reposition, and feed Drosophila larvae as they crawl on an arena. By using a water droplet (or if necessary, suction) to transport larvae from the edge of the arena to the middle, long behavior trajectories can be recorded without losing larvae from the arena or camera field of view. The picker robot is also able to dispense small amounts of apple juice at precise locations to keep larvae alive for extended periods although the food was not sufficient to trigger molting and the development to the next instar stage.

      The approach is interesting, but the authors could provide more details on why the approach is necessary for non-expert readers. For example, what are the advantages of using the robot picker compared to simply confining larvae in a closed arena? It's not obvious (to me) that being picked back to the center of the arena is a smaller perturbation compared to running into a chamber wall and changing direction.

      Thank you for this suggestion, it’s a very good point. We have expanded our Introduction considerably, and directly address this issue (4th paragraph in particular). We do quantify the perturbation due to robot pick-ups and drop-offs (Fig. 3D), but that only addresses the short term. We prefer not to use a closed arena for three reasons: (1) in a gradient navigation experiment, reaching the edge would effectively end “navigation” and we would be unable to study that behavior over longer times, (2) larvae can crawl up the sides of walls and will be lost to the tracker (they do this all the time in the Petri dishes they are raised in), and (3) larvae often do not bounce off walls and resume crawling, they tend to dwell near edges they find. To this last point, we have added a new Supplemental figure (Figure 1 – supplement 1) illustrating this effect with a representative example.

      The first paragraph of the introduction emphasizes the multiple time scales that are relevant for behavior from rapid stimulus response up to developmental times. This is to set the context of the authors' contribution but I'm not sure it's a fair representation of the state of the art. For example, the authors state that high-bandwidth measurement over long times is prohibitive and cite three Drosophila papers, but there are home-cage monitoring systems that allow continuous recording of mouse behavior over long times with high resolution. At the other end of the spectrum, there have been some long-term behaviour experiments done on worm behaviour with reasonably high time resolution (e.g Stern et al. 10.1016/j.cell.2017.10.041).

      This is absolutely correct, the context needed to be much broader than our own prior larva results. We have overhauled that section and written a wider introduction that includes the C. elegans paper you mentioned, and also brings in other model systems like adult flies, mice, and rats. We frame our own work as (1) in a new animal, for long term measurements; (2) investigating non-confined free locomotion over a long time scale.

      The authors train a neural network to segment and track the larvae, however, little information is given on the training process and I don't think it would be possible to reproduce the model based on the description. More details of the network, hyperparameters, and training data would be required to evaluate it.

      Definitely! We have added a new section to Materials and Methods (1-1.5 pages in length), detailing our analysis pipeline, with sections for position tracking, postural analysis, and behavioral classification.

      The authors also state several times that larval identity is maintained throughout the recording, but this isn't quantified. It's not clear whether identity is maintained across collisions of two or more animals by the tracking algorithm or whether these collisions simply don't happen in their data because density is low.

      This has also been addressed and clarified in the same new part of the Materials and Methods section. We quantify collision rates and give the accuracy maintaining identity after collisions.

      The environment is nominally isotropic, but once larvae have been crawling on the surface for hours, including periodic feeding, there will likely be multiple gradients the larvae may sense. This may not be observable in the data, but should perhaps be mentioned in the text.

      This is certainly true. Other than the single animal 30-hour experiment described in the manuscript, there is no food introduced to the larvae during our 6-hour experiments. Looking ahead, the presence of food remnants in the arena could become a serious confounding factor in nominally isotropic experiments, as the reviewer points out. We have added substantially to the Discussion section to discuss various limitations of the design and experiments, and directly talk about the odor/taste stimuli being introduced by food (second to last paragraph in Discussion).

      The authors show that the picking action results in a small but detectable increase in speed. The degree of perturbation overall depends on the picking frequency so some quantification of the inter-pick time interval would help to interpret whether this perturbation is relevant for a particular experiment. Is there a difference in excitation when larvae are picked successfully on the first try compared to when multiple tries or suction are required?

      We have now quantified the amount of time between pickups and added that in the Materials and Methods section directly (it’s 0.87 pick-ups per hour per animal). We do not have a sufficient amount of data to determine whether there is a statistically significant difference in behavior for multiple pickup attempts – this can also be confounded because sometimes an unsuccessful pickup is one that does not touch the larva at all (so would presumably not introduce additional perturbations).

      From the reconstructed trajectory in Figure 4, this interval looks very long compared to speed increase after picking. When reconstructing the trajectory, how are the segments joined? Is it simply by resetting the xy position or also updating rotating to match the previous direction of travel? (I'm guessing the larva can rotate during transport?)

      We have updated the Figure 4 caption to make it clear that the segments are only joined translationally, by resetting the xy position.

      The authors present a simple model in Figure 6 to illustrate the differences between individuals that can be hidden when looking at population distributions. However, the differences they show in the simulation don't seem relevant to the differences they observe in the experiments. Specifically, Fig. 6A and B show a contrast between individuals with similar mean speeds compared to individuals with different (but still unimodal) mean speeds. In contrast, the experimental data in Fig. D shows individual distributions that are quite similar but that are bimodal. So, there is indeed a difference between the individual distributions that is obscured in the population distribution, but is there evidence of larval personality types (line 444)? Similarly, the sentence beginning line 381 doesn't seem right either.

      We are really glad this was brought up so that we could clarify better in the text, as it’s an important point. We have edited the text in the Results subsection related to Figure 6 and the Figure 6 caption to clear things up. The individual distributions in 6D are not bimodal, there are 38 traces shown that are all essentially unimodal. In addition to stating this directly in the text, we have quantified this by adding the average BC for individuals in both isotropic and thermal gradient contexts (they are essentially the same, i.e. equally unimodal in both cases).

    1. Author Response

      Reviewer #1 Public Review:

      1) “…The authors make reasonable assertions, but all of these need to be validated by electrophysiological studies before they can be treated as fact. Instead, they should be treated as predictions. For example, in the conclusions from the model section, that endbulb size does not strictly predict synaptic efficacy should be modified from an assertion to a prediction.”

      The reviewer makes an important point. We realize that, despite describing the data as the output of a model, we needed to be clearer that the model output is in fact a set of predictions to be tested experimentally. In the reorganization of the results, we collect the model output explicitly in a section named “Model Predictions”, and list five classes of predictions that describe explorations of bushy cells. The fifth set of predictions was previously a separate section but should now be better appreciated as conveying hypotheses since it is incorporated into this newly named section. Please note that the hypotheses are constrained to varying extents by the high-resolution structural data we present, such as the estimation of synaptic weights from the counts of synapses. The compartmental models for each bushy cell also are constrained by the structural data and published biophysical and electrophysiological properties of the cells. The pipeline to create the models is described in its own section now using that terminology: “A pipeline for translating high-resolution neuron segmentation into compartmental models consistent with in vitro and in vivo data.”, which we hope conveys the notion that the modeling framework is indeed a template that can be applied to future experimental data. We explicitly make this latter point in the new Discussion section “Toward a complete computational model for globular bushy cells: strengths and limitations”.

      Reviewer #2 Public Review:

      1) …” While this is technically impressive (in regards to both the structure and modelling) there are significant weaknesses because this integration makes massive assumptions and lacks a means of validation; for example, by checking that the results of the structural modelling recapitulate the single-cell physiology of the neuron(s) under study. This would require the integration of in vivo recorded data, which would not be possible (unless combined with a third high throughput method such as calcium imaging) and is well beyond the present study.

      We appreciate the support for our approach, and we now make explicit in the manuscript that the output of the models should be interpreted as predictions for eventual experimental testing. We also consider in the Discussion some experimental procedures that might be used to test the predictions. Ca2+ imaging is currently too slow a reporter for the rapid synaptic events and integration time constant for bushy cells, as the reviewer knows, and we think (and present in the Discussion, section 2) that focal optical stimulation simultaneous with recording from fast voltage sensors are potential avenues to achieve this goal.

      2) The authors need to be more open about the limitations of their observations and their interpretations and focus on the key conclusions that they can glean from this impressive data set.

      As indicated in response to a similar comment from Reviewer 1, we have collected and discuss the primary limitations in a new section within the Discussion, entitled “Toward a complete computational model for globular bushy cells: strengths and limitations”.

      3) The manuscript would be considerably improved by re-writing to focus the science on the most important results and provide clear declarations of limitations in interpretation.

      We have extensively re-organized and re-written the text to highlight the key structural observations (Figures 1-3, 7-8), the pipeline from structure to model (Figure 4) and interleave structural observations with the outputs of the model (Figures 5-6, 8). The latter are explicitly detailed in a new section called “Model Predictions”. These predictions are organized into five classes. We think that this new organization will improve communication of the key results, and further highlights the key discoveries from structural analysis and predicted functional mechanisms as explored in the compartmental models.

      Reviewer #3 Public Review:

      1) The authors extract here from the longer introductory commentary a one-sentence summary of the strengths of the manuscript, and thereafter focus on the weaknesses, since this document emphasizes our response to those critiques. To quote reviewer #3: “The strengths of this paper are that the authors obtained unprecedented high-resolution 3-D images of the AN-bushy cell circuit, and they implemented a biophysical model to simulate the neural processing of AN inputs based on these structural data. … The biophysical modeling, although lacking comparison with in vivo physiological data due to the chosen species (mice), is also solid and well documented.”

      We appreciate that the reviewer acknowledges the attention to detail that entered into the nanoscale imaging, cell reconstructions, building the modeling pipeline and constructing the compartmental models.

      2) Despite the high quality of the data, the paper is marred by the species they chose: there are very few published in vivo single-unit results from mouse bushy cells, so it is hard to evaluate how well the model predictions fit the real-world data, and how the structural findings address the “fundamental questions” in physiology. … No rationale (e.g. use of molecular tools or in vitro physiology) is given why the authors focus on the mouse. It seems that the analyses provided here could as well have done on a species with good low-frequency hearing, which may have provided a much more interesting case for understanding the spectacular temporal transformation performed by bushy cells.

      We now report our reasons, in the first paragraph of the Results, for selecting the mouse. One reason for choosing mouse was that biophysical properties of bushy cells, which were important parameters to constrain the compartmental models, were collected from mice. These data are collected from dissociated cells and from brain slices, and these experiments continue to be more tractable in mice. The second reason is that mice are used in nanoscale and light microscopy connectomic studies because their neurons, cell groups and entire brain are smaller, so that a given volume of imaged brain will contain more cellular elements. These other connectomic studies provide a template for eventual comparisons among brain regions. Our overall goal is to image the entire cochlear nucleus, and the size of the mouse brain makes this goal tractable given current technology. Indeed, we are currently analyzing an image volume of the more rostral ventral cochlear nucleus that is about 5x larger than this image volume and collected with a much better signal to noise ratio. The third reason for choosing mouse was so that the current project could be augmented by genetic tools to further classify cochlear nucleus (CN) neurons and their extrinsic inputs, and potentially manipulate neural circuits in future studies. For example, the atoh7 (math5) and hhip gene products are markers for subsets of bushy cells, suggesting the presence of molecular subtypes of this cell class (Jing et al. 2023).

      3) If we look at data from other animals such as cats and gerbils, it is true that high-frequency (globular) bushy cells show envelope phase locking, but compared to ANs they are at best only moderately enhanced (gerbils: Frisina et al. 1990: Fig 7 and 10; cats: Joris and Yin 1998 Fig 4); the most prominent enhancement is actually to the temporal fine structures of low-frequency bushy cells (cells tuned to < 1 kHz), which mice lack. Furthermore, the temporal modulation transfer function (tMTF, i.e. the vector strengths vs modulation frequency plots in Fig 7O of the paper) of (globular) bushy cells are mostly low-pass filtered, with a cutoff frequency close to 1 kHz, and the highest vector strength rarely surpasses 0.9 (cats: Rhode 1994 Fig 9, 16, Rhode 2008 Fig 8G, Joris and Yin 1998 Fig 7; and there's one report from mice: Kopp-Scheinpflug et al 2003 Fig 8). Thus, the band-pass tMTFs tuned to 100-200 Hz with vector strengths > 0.9 or 0.95 in this paper (Fig 7O, Fig 8M) do not really match known physiology (in non-mouse species). Again, we know very little about in vivo physiology of mouse (globular) bushy cells and there is of course a possibility that responses in mice may be closer to the predictions of this paper.

      We agree that there are (unfortunately) few studies in mouse that can be compared with our simulations. With regard to the tMTFs, we can make a couple of points. First, we note that the stimulus used for all the panels except P2 in Figure 6 (previous Figure 7) were at 15 dB SPL, which is the level where maximal envelope phase-locking occurs in the low-threshold ANF inputs. This choice was based on previous experimental work that examined the intensity dependence for SAM stimuli in the auditory nerve (Smith and Brachman, 1980; Joris and Yin, 1992; Cooper et al, 1993; Dreyer and Delgutte, 2006, Figure 2B, Figure 3). Second, Figure 6, Supplemental Figure 1 confirms the behavior of the auditory nerve model used for input to the bushy cells (Rudnicki and Hemmert (2017) implementation), replicating Zilany et al., 2009, Figure 13D. These results show that phase-locking decreases at higher intensities as expected from the experimental work. Relevant to this topic, the lone report of responses to SAM stimuli in mice (Kopp-Scheinpflug et al. 2003) used 100% SAM at CF at 80 dB SPL. At this high intensity, it is expected that the envelope phase locking at CF will be less than at lower intensities because of rate saturation in the high and medium spontaneous rate ANFs (Carney, JARO 2019; Joris and Yin, 1998). In guinea pig, envelope phase locking is greater in low-SR fibers at 80 dB SPL than in medium and high SR fibers, but it is still lower than at its peak at about 50 dB SPL (Cooper et al., 1993). All of these experimental observations therefore lead to the prediction that the SAM envelope locking in Kopp-Scheinpflug et al. (2003) should be lower than in our simulations.

      In addition, Kopp-Scheinpflug et al. (2003) did not report which VCN cell populations cells were recorded. If the recorded cells were a heterogenous mixture of bushy and multipolar cells, then their data are not directly comparable to our model predictions. The stimulus intensity also needs to be considered for comparison with the work of Rhode (1994), whose lowest stimulus level is 30 dB SPL (Figure 9), and who also used a different stimulus, 200% SAM, and with the work of Frisina et al. (1990), who used 50 dB SPL. Interestingly, Figure 14D in Rhode (1994) shows a synchrony coefficient ranging from 0.5 to 0.9 at 30 dB SPL at 300 Hz modulation, which is similar to what we predict in Figure 6P2. We also remind the reviewer that our simulations did not include the effects of feed-back inhibition at CF (Caspary and Palombi, 1994; Campagnola and Manis, 2014; Xie and Manis, 2014, Keine et al. eLife 2016), which may affect phase synchrony in complex ways (Gai and Carney, 2008). One important feedback pathways arises from the tuberculoventral cells of the DCN (Wickesberg and Oertel, 1991; Campagnola and Manis, 2014), but the envelope synchrony behavior of those cells is not known.

      Thus, we now emphasize in the revised manuscript (in the Discussion) considerations of stimulus intensity used across published studies, citing the works above, the relatively high vector strengths at low modulation frequency, and that these simulation results are currently predictive. The simulations are also limited in that we used only one configuration of ANF inputs (low-threshold, high SR). This ANF SR category was selected to be consistent with the suggestion by Liberman (1991) that the globular BCs receive input principally from the low-threshold high-SR fibers. Mixtures of input SR classes would be expected to change the envelope representation at higher intensities. Finally, the parameter space is quite large (intensity x frequency x [ANF distributions], x inhibition) and is better explored in a separate study once we are able to provide better or additional constraints to the modeling framework. Also, to put the selection of SAM stimuli in context, we indicate that mice can encode temporal fine structure although only as low at 1 kHz, but at similar VS to larger rodents such as guinea pig (Taberner and Liberman 2005; Palmer and Russell 1986).

      Reviewer 4: Public comments

      1) The authors have collected an impressive array of physiological data and provided some beautiful 3D images of SBCs with dendrites. These are clearly strengths. The computational models for mechanisms of SBC responses, however, are made to fit what may be inadequate anatomical data. Instead of conclusions, perhaps they need to reword their discussions to refer to the anatomy as hypothetical substrates.

      It is true that the SBEM image volumes have strengths and limitations. We now collect these considerations in the second section of the Discussion, “Toward a complete computational model for globular bushy cells: strengths and limitations”. One limitation of this volume is that we do not have sufficient resolution to categorize synaptic vesicles by shape and must infer their excitatory or inhibitory nature. Note that tracing inputs to a source neuron, such as tracing the endbulbs to parent auditory nerve fibers, solves this problem, but the smaller terminals remain problematic in this regard. The goal is to not only assign excitatory or inhibitory phenotype, but also a cell type of origin, so that actual spike patterns, evoked by sound, can be provided as inputs to the model. The compartmental model is detailed, and amenable to mapping this information from other experiments as it becomes available. Nanoscale imaging does provide detailed structural information in terms of surface areas, volumes and process diameters that is important in constraining the compartmental models, and that is not attainable by standard light microscopy approaches. These points are now made in the Results and in the Discussion, as mentioned earlier in this paragraph. And, as indicated in the responses to other reviewers, we highlight the model outputs as predictions to be tested experimentally.

    1. Author Response

      Reviewer #1 (Public Review):

      Ichinose et al., utilize a mixture of cultured hippocampal neurons and non-neuronal cells to identify the role of the transmembrane protein teneurin-2 (TEN-2) in the formation of inhibitory synapses along the dendritic shaft. First, they identify distinct clusters of gephyrin that are either actin-rich, microtubule-rich or contain neither actin nor microtubules and find that TEN-2 is enriched in microtubule-rich gephyrin clusters. This leads the authors to hypothesize that TEN-2 recruits microtubules (MTs) through the plus end binding protein EB1 when successfully matched with a pre-synaptic partner, and perform a variety of experiments to test this hypothesis. The authors then extend this finding to state quite strongly throughout the paper, including in the title, that TEN-2 acts as a signpost for the unloading of cargo from motor proteins without providing any supporting evidence. They use previous work to justify this conclusion, but without actual experiments to back up the claim, it seems like a reach.

      The strength of the paper lies in the various lines of evidence that the authors employ to assess the role of TEN-2 in MT recruitment and synaptogenesis. They have also been very thorough in validating the expression and functionality of various knock-in constructs, knock-down vectors and antibodies that were generated during the study. However, there are some discrepancies in the findings that have not been addressed satisfactorily, as well as some instances where the data presented is not of sufficient quality to support the conclusions derived from them.

      Firstly, we would like to express our sincere appreciation to Reviewer #1 for providing valuable feedback. We have carefully considered Reviewer #1 suggestions and have made significant improvements to the manuscript in response. Additionally, we have conducted an additional experiment to address the missing aspects identified in the initial submission. Furthermore, we have refined the focus of our investigation by narrowing down the number of aspects we aimed to prove and instead increased the number of confirmatory experiments. Specifically, we decided to give up on two aspects: the relationship between kinesins and cargo, and the immobilization of TEN2 in synapses (i.e., extracellular binding of TEN2). Instead, we focused on emphasizing the role of TEN2 as a platform for exocytosis. These modifications have significantly enhanced the quality of our research.

      1) The emphasis placed on the clustering analysis presented in figure 1 and the two associated supplementary figures is puzzling, since the conclusion derived from the results presented would be that Neuroligin 2 (NLGN2) is the strongest candidate to test for a relationship to MT recruitment at inhibitory post synapses. Instead, the authors cite prior evidence to exclude NLGN2 from subsequent analysis and choose to focus on TEN2 instead.

      We fully agree on the importance of studying NLGN2, as highlighted in the DISCUSSION section (line 463-471). While the cluster analysis suggests a correlation between NLGN2 and microtubules, previous research has reported microtubule localization outside the NLGN2 region (Uchigashima et al., 2016). However, this interpretation is based on EM observations at a single time point, so it will be important to evaluate it over time. Conversely, we had believed that further investigations are needed to explore the potential interactions between TEN2 and microtubules, because of its relatively limited characterization (line 156-161).

      2) It is difficult to reach the same conclusion as the authors from the images and intensity plot shown on Figure 2 E and F. While there seems to be an obvious reduction in expression levels between the TEN2N-L and TEN2TM constructs, neither seem to co-localize with EB1.

      As Reviewer #1 pointed out, the previous plots on Figure 2 were of very poor quality. Due to the dynamics of microtubules, evaluating interactions using fixed cells has limitations. Therefore, we decided to shift to live-imaging. Firstly, we observed a tendency for EB3 comets to pause at inhibitory postsynapses (Figures 1D-H). This suggests the presence of a microtubule recruiter at inhibitory synapses. Next, in dendrites expressing TEN2N-L, the velocity of EB3 comets was significantly faster compared to dendrites expressing TEN2TM or TEN2N-L2mut (Figures 7A-E). This suggests that the dominant-negative effect of TEN2N-L inhibits the function of endogenous microtubule recruiters. Additionally, the interaction between TEN2 and EB1/3 has been confirmed by GST pull-down (Figure 6A). Based on these reasoning, we propose that TEN2 present in inhibitory synapses plays a role as a microtubule recruiter through its interaction with EB1/3.

      3) The authors mimic the activity of TEN-2 at the inhibitory post synapse in non-neuronal cells by immobilizing HA- tagged TEN constructs in COS-7 cells as a proxy for synaptic partner matching. Using this model, they find that by immobilizing TEN2N-L, which contains EB1 binding motifs, MTs are excluded from the cell periphery (Figure 3D). This contradicts their conclusion that MTs are recruited through EB1 by TEN-2 on synaptic partner matching. Later in the paper, when they use the same TEN2N-L construct as a dominant negative in neuronal cells, they find that MTs are recruited the membrane, even if TEN2N-L is not immobilized by synaptic partner matching (Figure 6C). Taken together, these findings call into question the sequence of events driven by TEN-2 during synaptogenesis.

      We believe that the differences in the results between the COS-7 and neuron experiments are influenced by variations in the intracellular protein composition and distribution between COS-7 cells and neurons. Therefore, we consider it inappropriate to directly apply the results from COS-7 to neurons. Additionally, we attempted to replicate the experiments in neurons; however, it is worth mentioning that the culture of neurons on antibodies led to a significant decrease in cell viability. As a result, we have decided to withdraw the experiment of immobilized TEN2 using antibodies.

      4) It is unclear how the authors could conclude that TEN-2 is at the semi-periphery (?) of inhibitory post synapses from the STORM data that is presented in the paper. Figure 4D and 4F show comparisons of Bassoon and TEN-2 localization vs TEN-2 and gephyrin, but the image quality is not sufficient to adequately portray a strong distinction in the distance of center of mass, which is also only depicted for the TEN2-Gephyrin pair and not the TEN2-Bassoon pair in Figure 4J.

      The quality limitations of attempting a three-color dSTORM of TEN2-bassoon-gephyrin were addressed by modifying it to a two-color dSTORM. To confirm this modification, a two-color STORM was performed using VGAT instead of Bassoon (Figure 3E). The statement that TEN2 localizes to half of the synapse is supported by the observation of TEN2-gephyrin in the postsynaptic area. This observation aligns with the localization of microtubules at the postsynapse as observed by electron microscopy (Gulley & Reese, 1981; Linsalata et al., 2014).

      5) The authors do not satisfactorily explain why gephyrin appears to have completely disappeared in the TEN2N-L condition (Figure 6A), instead of appearing uniformly distributed as one would expect if MTs are indiscriminately recruited to the membrane by the dominant negative construct that remains unanchored.

      As pointed out by Reviewer #1, it needed to be adequately proven, and we mistakenly conflated dominant-negative and gain-of-function effects. However, through the examination of live imaging of EB3, observation of the localization of gephyrin, and the additional investigation of GABAAR localization in neurons expressing partial domains of TEN2, we found that TEN2N-L functions as a dominant-negative, inhibiting the microtubule recruitment function of endogenous TEN2 (Figure 7). On the other hand, it does not exhibit a gain-of-function effect in inducing exocytosis of GABAAR because both gephyrin and GABAAR were found to be reduced in the neurons expressing TEN2N-L (Figure 7F-H). Therefore, we have corrected this point.

      6) In a similar critique to that of Figure 2E and F, the distinction that the authors wish to portray between the effect of TEN2TM and TEN2N-L constructs on EGFP-TEN-2 and MAP2 colocalization (Figure 6 E and F) appear to be driven by a difference in overall expression levels of EGFP-TEN2 rather that a true difference in localization of TEN-2 and MTs.

      Regarding the previous co-localization of TEN2 and microtubules after permeabilization with saponin, we have removed it from the analysis because it is not possible to perform accurate quantitative analysis in this case. We speculate that this is a combination of two factors: the variation in transfection efficiency and the inherent variability in permeabilization between neurons. Specifically, it is particularly challenging to standardize and quantify the variability in permeabilization. Instead, the current version proposes TEN2-MT interaction via EBs by live imaging of EB3 in neurons expressing each partial domain. As observed in COS-7 cells where EB was overexpressed, whether TEN2 engages in continuous binding with microtubules or if it is a transient interaction remains an interesting topic for future investigation. We have mentioned this in the DISCUSSION section as well (line 415-422).

      Reviewer #2 (Public Review):

      Maturation of inhibitory synapses requires multiple vital biological steps including, i) translocation of cargos containing GABAARs and scaffolds (e.g. gephyrin) through microtubules (MTs), ii) exocytosis of inhibitory synapse proteins from cargo followed by the incorporation to the plasma membrane for lateral diffusion, and iii) incorporation of proteins to inhibitory synaptic sites where gephyrin and GABAARs are associated with actin. A number of studies have elucidated the molecular mechanisms for GABAARs and gephyrin translocation in each step. However, the molecular mechanisms underlying the transition between steps, particularly from exocytosis to lateral diffusion of inhibitory proteins, still need to be elucidated. This manuscript successfully characterizes three stages of inhibitory synapses during maturation, cluster1: an initial stage that receptors are being brought in and out by the MT system; cluster2: lateral diffusion stage; cluster 3: matured postsynapses anchored by gephyrin and actin, by quantifying the abundance of MAP2 or Actin in inhibitory synapse labeled by gephyrin. Importantly, the authors' findings suggest that TEN2, a trans-synaptic adhesion molecule that has two EB1 binding motifs, plays an important role in the transition from clusters 1 to 2, and inhibitory synapse maturation. The imaging results are impressive and compelling, these data will provide new insights into the mechanisms of protein transport during synapse development. However, the present study contains several loose ends preventing convincing conclusions. Most importantly, (1) it remains more TEN2 domain characterization on inhibitory synapse maturation, (2) further validation of the HA knock-in TEN2 mouse model is required, and (3) it requires additional physiology data that complement the authors' findings.

      First we would like to thank Reviewer #2 very much for the efforts and numerous suggestions. While it is highly appealing to comprehensively explain the function of a single synapse organizer in a step-by-step manner during synapse formation, we believe that it requires the identification of changing binding partners at each step, which is currently a challenging task. Therefore, in this paper, we have focused solely on the interaction between TEN2 and microtubules. As a result, we have discovered that TEN2 provides a platform for the exocytosis of GABAR, and this process relies on the interaction between TEN2 and microtubules. The analysis of the immobilization of TEN2, which was included in the previous version, will be part of a future publication. We also plan to continue detailed analysis of other domains. Thus, issues remain regarding the analysis of TEN2, but in order to confirm what is happening in just specific one step, we have made significant revisions in this revised manuscript, including analysis in HA knock-in neurons and electrophysiological analysis. We would greatly appreciate it if Reviewer #2 would reconsider the revised manuscript.

      Reviewer #3 (Public Review):

      In this paper, Ichinose et al. examine mechanisms that contribute to building inhibitory synapses through differential protein release from microtubules. They find that tenurin-2 plays a role in this process in cultured hippocampal neurons via EB1 using a variety of genetic and imaging methods. Overall, the experiments are generally designed well, but it is unclear whether their findings offer a significant advance. The experimental logic flow and rational difficult for readers to follow in the manuscript's current form.

      Strengths:

      1) The experiments are generally well designed overall, and appropriate to the questions posed.

      2) Several experimental methods are combined to validate key results.

      3) Use of cutting-edge technologies (i.e. STORM imaging) to help answer key questions in the paper.

      We thank Reviewer #3 for reviewing our manuscript. We sincerely appreciate the valuable feedback. The previous version of the manuscript contained numerous claims, some of which were not thoroughly validated, making it prone to reader misinterpretation. Based on the results of additional experiments, we have revised the manuscript by focusing solely on the findings that were adequately confirmed, specifically highlighting the role of TEN2 in providing a platform for GABAAR exocytosis. We are grateful for your time and effort in revisiting the revised manuscript, and we believe it meets the necessary requirements.

      Weakness:

      1) Simplifying the text and story line would go a long way to ensure the study results are more effectively communicated. Additional specific suggestions are provided in the recommendations for the authors.

      Thank you for providing valuable suggestions. Based on the results of additional experiments, we have revised our claims accordingly.

      2) The introduction overall would benefit from simplification so that the reader is given only the information they need to know to understand the question at hand.

      We selected essential information from previous studies that we believe readers should be aware of before reading our manuscript.

      3) MT dynamics are important for paper results, but the background in the paper does not appropriately introduce this topic.

      We have provided some information in lines 57-64 of the INTRODUCTION section.

      4) It is a bit unclear from the abstract and introduction how the findings of this paper have significantly advanced the field or taught something fundamentally new about how inhibitory synapses are regulated.

      Thank you for your valuable feedback. In the new version, we have thoroughly examined and emphasized the significance of our research findings.

      5) Figure 1 - Line 109, it is obscure why "it was found appropriate" to divide the data into three clusters. This section would better justified by starting with cellular functions and then basing the clusters on these functions.

      As Reviewer #3 pointed out, we have revised the classification to be based on past knowledge rather than data-driven.

      6) The proteomic screen and candidate selection is not well justified and the logic steps for arriving at TEN2 are a bit weak. Again, less is more here.

      As Reviewer #3 mentioned, we have made revisions in the new version. We have not completely excluded NLGN2, but rather believe that further examination and consideration of NLGN2 are necessary going forward (lines 463-471).

      7) Fig. 2 - The authors should consider whether EB1 overexpression would have functional consequences that alter the results and colocalization.

      The previous Figure 2, which is now Figure 6, is intended to demonstrate protein-protein interactions rather than provide functional implications. It is likely that the original function of EB1, which should be located at the plus ends of MTs, is compromised by its presence in the MT lattice. As an alternative method to demonstrate protein-protein interactions, we have also conducted GST pull-down assays (Figure 6A). From these two experimental results, we infer that the intracellular domain of TEN2 interacts with EB1. However, we have not discussed the functional implications of the TEN2-EB1 complex based on these experimental findings. The function was discussed from the results performed in Figure 7.

      8) Fig. 3 - Is immobilization of COS cells using HA tag antibodies a relevant system for study of these questions?

      We agree with this suggestion regarding the replication of the experimental systems to neurons, as the results have been successful in COS-7 cells. However, when we attempted to culture neurons on antibody-coated cover glass, the survival rate was significantly reduced. We were unable to directly replicate these systems to neurons. Therefore, we have decided to withdraw this claim from the publication.

      9) Fig. 4 - The authors should confirm post-synaptic localization in vivo (brain).

      We agree with this suggestion. Currently, our research group does not have an effective immune-labeling method for synaptic protein in the brain. This is a future challenge that we should address.

      10) Figure 4D-E - The way the STORM results are presented is confusing. The authors state is shows that TEN2 is postsynaptic but before this say that the Abs are the same size as the synaptic cleft so that the results cannot be considered conclusive. This issue should be resolved.

      To improve the quality of our dSTORM experiments, we abandon three color dSTORM and instead focused on two color dSTORM to draw conclusions (Figure 3E). We utilized VGAT to detect presynaptic sites. VGAT is an inhibitory presynaptic-specific molecule that is present at the center of presynaptic terminals, eliminating concerns about the size of the antibodies used.

      11) Figure 5 -The authors should examine the levels of gephyrin relative to the levels of knockdown given the knockdown variability.

      Thank you for your suggestion. As shown in Figure 4D of the current version, we were able to simultaneously quantify the knockdown efficiency and synaptic density. We obtained results indicating a decrease in synaptic density associated with a decrease in TEN2 expression levels.

      12) Functional validation of a reduction in inhibition following TEN2 manipulation would elevate the paper.

      We conducted live imaging of EBs to measure the changes when introducing the partial domain of TEN2 (Figures 7A-E). By observing the decrease in synaptic density and the impaired MT recruitment function of endogenous TEN2 due to the dominant-negative effect of TEN2N-L, we concluded that the TEN2-MT interaction serves as the platform for GABAR exocytosis.

      13) Figure 6E - The expression levels of TEN2TM and TEN2NL are important to the outcome of these experiments. How did the authors ensure that the levels of two proteins were the same to begin with?

      As it was also mentioned by Reviewer #1, we reply with the same answer as follows: Regarding the previous co-localization of TEN2 and microtubules after permeabilization with saponin, we have removed it from the analysis because it is not possible to perform accurate quantitative analysis in this case. We speculate that this is a combination of two factors: the variation in transfection efficiency and the inherent variability in permeabilization between neurons. Specifically, it is particularly challenging to standardize and quantify the variability in permeabilization. Instead, the current version proposes TEN2-MT interaction via EBs by live imaging of EB3 in neurons expressing each partial domain. As observed in COS-7 cells where EB was overexpressed, whether TEN2 engages in continuous binding with microtubules or if it is a transient interaction remains an interesting topic for future investigation. We have mentioned this in the DISCUSSION section as well (line 415-422).

    1. Author Response

      Reviewer #2 (Public Review):

      In this manuscript, the authors have proposed that the suppression of hepatic GPR110, known as a tumorigenic gene, could improve non-alcoholic fatty liver disease (NALFD). With AAV-mediated GPR110 overexpression or a GalNAc-siGPR110 experiment, they have suggested that GPR110 could increase hepatic lipids through SCD1.

      Major comments

      1) Although the authors claimed that GPR110 could enhance SCD1-mediated hepatic de novo lipogenesis, the level of GPR110 expression was decreased in obese mice (Figure 1E-F). However, it has been reported that the levels of de novo lipogenic genes, including SCD1, are upregulated in HFDfed mice (PMID: 18249166, PMID: 31676768). Thus, they should show the levels of hepatic lipids and lipogenic gene expression, including SCD-1, in liver tissues from NCD vs. HFD-fed mice, which will provide insights between GPR110 level and hepatic lipogenic activity.

      Thank you for the comment. The levels of hepatic lipids and lipogenic gene expression, including SCD-1, in liver tissues from NCD vs. HFD-fed mice are summarized in Supplementary Table 4 on page 63. Additionally, we measured the de novo lipogenic activity of primary hepatocytes with varying levels of GPR110 using stable isotopes 3H-acetate. The data are presented in Figure 5D on page 36 of the revised manuscript. These findings suggest that the HFD diet may affect hepatic lipid metabolism through changes in gene expression and lipid accumulation.

      2) In Figure 2, the authors have characterized metabolic phenotypes of hepatic GPR110 overexpression upon HFD, exhibiting significant phenotypes (including GTT, ITT, HOMA-IR, serum lipids, and hepatic lipid level). However, it is likely that these phenotypes could stem from increased body weight gain. Since they cannot explain how hepatic GPR110 overexpression could increase body weight, it is hard to conclude that the increased hepatic lipid level would be a direct consequence of GPR110 overexpression. Also, given the increased fat mass in GPR110 overexpressed mice, they should test whether GPR110 overexpression would affect adipose tissue. Along the same line, they have to carefully investigate the reason of increased body weight gain in GPR110 overexpressed mice (ex., food intake, and energy expenditure).

      Thank you for the comment. Firstly, we checked the expression of GPR110 in the adipose tissues of rAAV-GPR110 mice. We did not observe any change in the mRNA expression level of GPR110 in adipose tissues including SWAT, EWAT and BAT as compared to their controls (Supplementary Figure 3A on page 50). All the Ct levels for adipose GPR110 mRNA were over 40. As suggested, we use metabolic cage system to explore whether the metabolic phenotypic differences between rAAV-GFP and rAAV-GPR110 mice were due to other factors. However, we did not observe any difference in the locomotion, distance in cage locomotion, energy expenditure, daily food intake, daily water intake and respiratory exchange ratio remained similar in these two groups as shown in Supplementary Figure 3.B-G on page 50. Therefore, they shall not be the root cause of the reason of increased body weight gain in GPR110 overexpressed mice.

      3) GPR110 enhances hepatic lipogenesis via SCD1 expression (Figures 5 and 6). To verify whether GPR110 would specifically regulates SCD1 transcript, they have to provide the expression levels of other lipogenic genes, including Srebf1, Chrebp, Acaca, and Fasn.

      Thank you for the comment. As suggested, we added the expression levels of these lipogenic genes in Figure 5B-C on page 36 of the revised manuscript. In addition, we also measured the de novo lipogenic activity using primary hepatocytes with either overexpressing or knockdown of GPR110 to confirm that GPR110 enhances hepatic lipogenesis.

      4) In Figure 6, the author should provide the molecular mechanisms how GPR110 signaling could enhance SCD-1 transcription.

      Thank you for the comment. SREBP1 is a key transcription factor that regulates the expression levels of the SCD1 gene [21]. A study published in March (at the time of revising this manuscript) showed that GPR110 plays a role in mediating the activation of SREBP1 pathways by palmitic acid. This ultimately promotes the synthesis of fats in mammary gland tissues [10]. In our RNA sequencing analysis, we also found that the expression of hepatic SREBP1 was correlated with the expression of GPR110. To further investigate this relationship, we added the mRNA levels of SREBP1 in our experiments, as shown in Figure 5B-C on page 36 of the revised manuscript. Specifically, we found that the expression level of SREBP1 was increased in the GPR110 overexpression group and decreased after using ASOs to knock down hepatic GPR110 levels. These findings suggest that GPR110 regulates hepatic lipid metabolism through the SREBP1-SCD1 pathway.

      5) Figure 9C shows the increased level of GPR110 with NAFLD severity. They should test whether the levels of hepatic GPR110 and SCD-1 might be elevated in a severe NAFLD mouse model. If it is the case, it would be better to show the beneficial effects of GPR110 suppression against NAFLD progression using a severe NAFLD (ex., NASH) mouse model.

      Thank you for the comment. To further explore the expression pattern of GPR110 in a more severe NAFLD mouse model, we injected either CCl4 or STZ to induce NAFLD severity in HFD-fed mice. We found that after treating with CCl4 or STZ, the expression levels of GPR110 and SCD1 mRNAs were significantly increased compared to the control group without treatment with CCl4 or STZ (please see Figure 9F-G). We attempted to knock down the expression of hepatic GPR110 in the CCl4 or STZtreated HFD-fed mice. However, our ASOs were only effective in knocking down high levels of GPR110 mRNA in the virus mediated GPR110 expression systems (please see Figure 5 and 6). The expression level of hepatic GPR110 mRNA in HFD-fed mice after CCl4 or STZ treatment was too low to be effectively knocked down by ASOs. However, a previous study demonstrated that Gpr110-/- mice were resistant to liver tumorigenesis induced by DEN plus CCl4 injection [22]. We believe that GPR110 suppression also can prevent the progression of NAFLD in these severe NAFLD mouse models.

      Reviewer #3 (Public Review):

      In this study, the authors examined the expression of GPR110 in a HFD-fed mouse model and validated their findings in human samples. They then performed both gain- and loss-of-function studies on the cellular and systemic metabolic effects of manipulating the levels of GPR110. They further demonstrated that SCD-1 was a downstream effector of GPR110, and the effects of GPR110 could be mediated by SCD-1. This study provides a novel target in NAFLD. Overall, the data and analyses well performed and convincing. As the GPR110-SCD1-lipid metabolic phenotype axis is a central theme of the study, I would suggest that the authors further discuss the connection between GPR110 and SCD1, especially the persistent upregulation of SCD1 at late stage of HFD-fed mice (obese mouse model) when GPR110 is very low, for example, whether another regulator plays a more relevant role at this time point.

      Thank you for the comment. As SCD1 is the rate limiting enzyme catalysing the biosynthesis of monounsaturated fatty acids, a very tight and complex regulation of SCD1 gene expression in response to various parameters including hormonal and nutrient factors is reported [23]. HFD treatment itself can induce the expression of hepatic SCD1 [21, 23, 24], and our study demonstrated that the expression of SCD1 can be further increased by overexpressing GPR110 in the liver of HFDfed mice (Fig. 9F and G on page 44) that will contribute to the acceleration and aggravation of NAFLD. The discussion of the connection between GPR110 and SCD1was presented on page 21, lines 455-464.

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript by Huang, Li, et al. describes the identification of variants in the gene coding for p31 comet, a protein required for silencing the spindle assembly checkpoint or SAC, in women with recurrent pregnancy loss upon IVF. In three families mutations affecting splicing or expression of full-length protein were identified. The authors show that oocytes of the patients arrest in meiosis I, are most likely to fail to inactivate the SAC without a fully functional p31 comet. Indeed, the metaphase I arrest occurring in mouse oocytes upon overexpression of Mad2 can be rescued by overexpression of wild-type p31 comet, but not a truncated version. Injection of wt p31 comet into 6 human oocytes from one patient rescued the meiosis I arrest.

      Main points:

      The fact that inactivation of the SAC is required for anaphase I onset in human oocytes is not novel. Biallelic mutations of TRIP13 were shown to lead to the same phenotype (Zhang et al. Am J. Hum Gen., 2020).

      As pointed out by the editors and both other reviewers, the strength of this study is highlighted by the identification of genetic variants responsible for oocyte meiosis I arrest in human patients. As a fact, very few genetic variants that cause female oocyte meiotic failure are identified (Ref: Qing Sang, et al. Understanding the genetics of human infertility. Science. 2023). In this study, we for the first time reported the novel deleterious p31comet variants causing human oocyte MI arrest. Without exploring the etiological landscape of human genetic variants, it is impossible to comprehensively invent diagnostic and therapeutic approaches for female patients.

      No new mechanistic insights are obtained.

      To gain the molecular mechanism, we have optimized and performed a modified Smart-seq2 protocol using frozen single-cell human oocytes (Page 11 and Figure 4-figure supplement 1). These data were in well agreement with the phenotypes as reported.

      The authors propose a role for female fertility, however, also a male patient with a p31 comet variant is sterile.

      This manuscript focuses on screening the genetics variants responsible for the oocyte failure in female patients, rather than male patients. In addition, we had difficulties with collection of more detailed information from this male patient because he rejected to provide the consent to us. We currently only have limited information after we tried every effort to get in touch with the male patient. We have added more discussion in the MS. Certainly, further exploration of the roles of MAD2L1BP variants in the male meiosis, for example, by collection of a cohort of male patients’ samples with meiotic defects, would be an interesting direction in the future, but this is beyond the scope of this study.

      The fact that the C-terminus of p31 comet is required for interaction with Mad2 and hence, turning off the SAC, is already known.

      The interaction between p31comet and Mad2 is known in somatic cells, but not in oocytes. As it is widely known that the oocytes are distinct from somatic cells in that the SAC in oocytes is not effective because oocytes can proceed to anaphase I in the presence of even one unattached kinetochore, as compared with somatic cells. We provided evidence that the overexpression of Mad2 can only be rescued by overexpression of wild-type p31comet, but not the truncated p31comet variant in both mouse and human oocytes (Fig.3 and 4), which sufficiently characterized the causative roles of p31comet variants underlying female infertility.

      Reviewer #2 (Public Review):

      In this manuscript by Huang et al. the authors explore the genetic underpinnings that may cause human oocyte meiotic arrest. The meiotic arrest of oocytes can cause female infertility leading patients to seek treatment at IVF clinics to assist in having genetically related babies. However, because oocytes fail to develop to MII, oocytes from these patients cannot be fertilized, leaving no current options for genetically related babies for patients with this pathology. Huang et al identified 50 IVF patients with this phenotype, and after the whole exome sequence, 3 patients had mutations in a spindle assembly checkpoint regulator, Mad1bp1. This study describes these mutations in detail, shows how these mutations affect Mad1bp1 expression, evaluates gross function in mouse oocytes, and explores therapeutic treatment in human oocytes. Overall, this is an important translational study that adds to the growing body of literature that genetic mutations impact oocyte quality and fertility.

      Thank you for your favorable comments.

      In its current form, I find that the strengths exist in the analysis of the patients' genomes and pedigree information. This is unique data and is important for the field. The expression in oocytes, structure modeling, and conservation in evolution, while not essential for this study, add interesting information for the reader to consider. I sometimes find these distracting in manuscripts, but appreciate them here in this context. The conclusion using human oocytes to propose possible treatment takes the study to completion and is not an easy approach to carry out.

      Thank you for your positive comments on this manuscript.

      I do find some weaknesses that weaken the conclusions. The conclusion described is that the SAC is not satisfied in oocytes from these patients. The authors attempt to show this by analysis of mouse oocytes using polar body extrusion and its timing as an assay. There could be many reasons contributing to arrest, therefore a singular assay is not ideal to justify the conclusions. While I do suspect the authors are correct, an intact SAC should be shown at the molecular level to fully justify this conclusion. There are many assays routinely performed in mouse oocytes that the authors can consider (check papers by authors from Wassmann, FitzHarris, and Schindler labs for example).

      Thanks for your good comments. Following your advice, we have performed the immunofluorescence assay to evaluate the SAC integrity using mouse oocytes by microinjection of WT and Mut Mad2l1bp cRNA, which clearly validated the intact SAC activation with Mut Mad2l1bp cRNA injection. Please see the reply as detailed below.

      Reviewer #3 (Public Review):

      The spindle checkpoint ensures the accuracy of chromosome segregation by sensing unattached kinetochores during mitosis and meiosis and delays the onset of anaphase. Unattached kinetochores catalyze the conformational activation of the latent open MAD2 (O-MAD2) to the active closed MAD2 (C-MAD2). C-MAD2 is then incorporated into the mitotic checkpoint complex (MCC), which inhibits the anaphase-promoting complex or cyclosome (APC/C) to delay anaphase. When all kinetochores are properly unattached, the MAD2-binding protein p31comet and the ATPase TRIP13 extract C-MAD2 from the MCC, leading to MCC disassembly and the conversion of C-MAD2 back to O-MAD2. This action turns off the spindle checkpoint, resulting in APC/C activation and anaphase onset. Cells deficient in p31comet exhibit mitotic delays.

      In the current study, Huang et al. have linked p31comet mutations to female infertility. Biallelic loss-of-function alleles of p31comet cause delays in the exiting metaphase of meiosis I and polar body extrusion. The p31comet mutant proteins contain C-terminal truncations and fail to bind to MAD2. Reintroducing full-length p31comet into patient oocytes can bypass the metaphase arrest. Together with a previous study that showed biallelic mutations of TRIP13 caused female infertility, this work established a critical role of the p31comet-TRIP13 module in regulating meiotic progression during oogenesis. As such, this is a significant study.

      Thank you for the very positive comments on this manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      This work reports an important demonstration of how to predict the mutational pathways to antimicrobial resistance (AMR) emergence, particularly in the enzyme DHFR (dihydrofolate reductase). Epistasis, or non-additive effects of mutations due to their background dependence, is a major confounding factor in the predictability of protein evolution, including proteins that confer antimicrobial resistance. In the first approach, they used the Rosetta to predict the mutant DHFRdrug binding affinity and the resulting selection coefficient, which then became inputs to a population genetics model. In the second approach, they use the observed clinical/environmental frequency of the variants to estimate the selection coefficient. Overall, this work is a compelling demonstration that a mechanistic model of the fitness landscape could recapitulate AMR evolution; however, considering that the number of mutations and pathways is small, a more compelling description of the robustness of the results and/or limitations of the model is needed.

      Major strengths:

      1) This is a compelling multi-disciplinary work that combines a mechanistic fitness landscape of DHFR (previously articulated in literature and cited by the authors), Rosetta to determine the biophysical effects of mutations, and a population genetics model.

      2) The study takes advantage of extensive data on the clinical/environmental prevalence of DHFR mutations.

      3) Provides a careful review of the surrounding literature.

      Major weakness:

      1) Considering that the number of mutations and pathways being recapitulated is rather small, I would suggest a more detailed description of the robustness of the results. For example:

      a) Please report the P-value for the correlation of the predicted DDG_{binding, theory} and DDG_{binding, experimental}.

      We thank the reviewer for the suggestion. We agree the available experimental data is small, limiting the statistical power of the Pearsons correlation test to determine how well Flex ddG predicts binding free energy change. However, as highlighted in the manuscript, two earlier studies by Aldeghi et al. 2018 & 2019 considered much larger datasets and found a correlation in a similar range to the one we found here. Furthermore, as suggested by the Reviewer, we carried out a onesided T-test with alternative hypothesis that the correlation is greater than 0 and found a p-value of 0.040, suggesting the correlation we observed is significant. We have included this test and p-value to the Results section.

      If interested in showing the correct assignment of mutational effects, perhaps use a contingency matrix to derive a P-value.

      As suggested by the Reviewer, we used a contingency matrix known as a confusion matrix to determine how accurate Flex ddG is at classifying mutations as stabilising or destabilising. This gave an accuracy of 0.89, sensitivity of 0.83 and a specificity of 1. The p-value associated with this continency table was 0.14, despite the high accuracy, sensitivity and specificity. This is likely due to the small sample size making it difficult to determine significance. This analysis has been included in the Results section.

      b) Although the DDG_binding calculation in Rosetta seems to converge (Appendix figures 3 and 4), I do not think the DDG values before equilibration should be included in the final DDG estimate. In practice, there is a "burn in" number of runs where the force field optimizes the calculation to account for potential clashes in the structure, etc. This is particularly important since the starting structures are modeled from homology. Consequently, the distributions of DDG that include the equilibration runs are multimodal (Appendix figure 2), which means that calculating an average may be inappropriate.

      Each Flex ddG prediction is independent (see Figure 1 of Barlow et al. 2018 for a summary of the Flex ddG method), i.e. the distribution of values does not represent a MCMC process in which there is a burn-in in order to equilibrate. The structures of both the wild-type and mutant are equilibrated in each run using the backrub algorithm. The reason so many runs are required is because each prediction is from a distribution of possible ddG values associated with that specific mutation and the authors of Flex ddG suggest running 35 runs or more and taking the average of the distribution. Therefore, in order to get an accurate prediction, enough simulations must be run per mutation to adequately characterise the distribution so that the average converges to a constant value.

      2) The geographical areas over which the mutational pathways are independently estimated are not isolated, allowing for the potential that an AMR variant in one region arose due to "migration" from another area. For example, the S58R-S117N is the most frequent double mutant of PvDHFR in geographically proximate Southern/Southeastern Asia (Fig. 4). To a certain extent, similar mutational patterns occur for PfDHFR in Southern/Southeastern Asia (Fig. 3). Although accounting for mutant migration in the model may be beyond the scope of the study, a clear argument for the validity of the "isolated island" assumption is needed.

      The Reviewer is correct that some variants in one region may have arisen due to “migration” from another area. This would impact the method for inferring mutational pathways from regional isolate frequency data but not when considering the worldwide population. If this occurred, we would expect to see a multiple mutant appearing in a region without the precursor (single, double etc) mutations, even in the case of large sample size. However, this does not seem to have been an issue for the pathways we have been predicting here. If it were the case that a variant migrated, and the precursor mutations could not be found in that region, we could look to mutations from neighbouring regions to infer the pathway, under the assumption of migration.

      We have added some discussion on this between lines 517-523:

      “When inferring pathways at a regional level, it is possible we may encounter instances where genotypes with multiple mutations are observed in a specific region, but the precursor mutations in the pathway are absent. This could happen either due to insufficient sampling of the region or due to "migration" of the variant from a neighbouring region. To infer pathways in the former case more samples would be required, whereas in the latter case we can look to the data from neighbouring regions where the variant is present and use the frequency data of the precursor mutations.”

    1. Author Response

      Reviewer #2 (Public Review):

      1) Analytical approaches are in the current form preliminary and not enough to draw firm biological conclusions. While the datasets are large (which is highly appreciated), they represent a relatively early stage of ENS development and possible differences between vagal and sacral-derived populations could partially be attributed to difference in maturity. Maturity will surely not explain the whole difference observed but needs to be factored into the interpretation. As scRNA-seq datasets from the mature chicken ENS are lacking (as well as detailed IHC-based neural classification system) the inference made in the paper between molecular classes and functional types are premature.

      We appreciate this comment and think it is an excellent suggestion that we definitely plan to do. This made us realize that we failed to clarify in the text why we chose this particular time point for our study, which is two-fold.

      First, we are particularly interested in how neural crest cells choose their prospective fates. E10 is a time when the post-umbilical gut has been completely populated by both vagal and sacral neural crest cells for 2 days so cells are in the process of differentiation but there still exists a large precursor pool. For this reason, we can capture both precursors and some differentiated neuronal subtypes. We have clarified this point in the revised manuscript and now focus much more on the precursor population to identify both genes that are common to vagal and sacral neural crest cells as well as those that are distinct. This enables us to formulate testable hypotheses for the role of potential role of particular transcription factors is allocation of cell fate. Of particular interest, we find that at E10, the sacral neuronal precursor pool is largely depleted whereas the vagal crest has a substantial neuronal precursor pool. Thus, we believe this is the perfect time point for initial analysis.

      Second and perhaps even more important, in the US, chick embryos are not considered vertebrates until after E10. Thus, E10 represents the last timepoint we can raise embryos without animal approvals which are not currently in hand. We completely agree that performing experiments at later timepoints will be incredibly valuable and therefore are now applying for approvals. But realistically, these take several months and thus would delay publication of our datasets (already delayed due to Covid restrictions) for at least another year. Therefore, we propose to publish the mature dataset as a Research Advance that would focus on differences between mature neuronal subtypes between preumbilical vagal, post-umbilical vagal and sacral datasets that would nicely complement the current work. Instead, we have refocused this paper on the precursor to differentiated neuron transition.

      I should mention that this refocusing seems particularly important given that our original aim was to explore differences between vagal and sacral neural crest contributions to the gut. However, the single cell data reveals strong overlap between sacral and vagal neural crest contributions to the postumbilical gut, suggesting a strong environmental influence on cell fate decisions.

      Specific concerns:

      1) Analysis of scRNA-sequenced sacral- versus vagal-derived ENS reveals clusters consistent with a non-ENS identity (endothelial, muscle, vascular and more). Previous studies in mouse using the neural crest tracing line Wnt1-Cre has not demonstrated such diverse progenies of neural crest from any region. An exception being a small population of mesenchymal-like cells (Ling and Sauka-Spengler, Nat Cell Biol. 2019; Zeisel et al., Cell 2018; Morarach et al., 2021; Soldatov et al., Science 2019). Therefore, the claimed broad potential of 6 of 13 neural crest giving rise to diverse gut cell populations warrants more validating experiments.

      We thank the reviewer for this comment. We clarify that hematopoetic clusters have dropped out upon reanalysis. The other clusters we believe are real based on gene markers used in previous studies to identify cell types such as neural crest-derived melanocytes like Mlana, Dct, and Mitf.

      2) Several earlier studies have revealed that parts of the ENS is derived from neural crest that attach to nerve bundles, obtain a schwann cell precursor-like identity and thereafter migrate into the gut (Uesaka et al. J Neurosci 2015 and Espinosa-Medina et al, PNAS 2017). The current work in chicken needs to be interpretated in the light of these findings and the publications should be discussed in relevant sections of the introduction and discussion.

      Thank you for this suggestion. We agree and indeed our data cannot differentiate between SCPs, which are neural crest-derived, versus early migrating neural crest cells. We have added this point to the discussion and also discuss these papers in more detail.

      3) The analysis indicates the presence of melanocytes. It is not clear why they are part of the GI-tract preparations. Could they correspond to another cell type, with partially overlapping gene expression profile as melanocytes?

      We have assigned these as melanocytes based on expression of Mlana, Mitf, and Dct as highly upregulated genes. These have been used in previous studies to identify neural crest derived melanocytes in the heart (Chen et al., 2021)

      4) As evident, the sacral- and vagal-derived ENS are not clonally related. To decipher differentiation paths and relations between clusters, individual analysis of the different datasets are needed. With only one UMAP representing the merged datasets combined with little information on markers, it is hard to evaluate the soundness of the conclusions regarding cell-identities of clusters and lineage differentiation.

      This is an excellent suggestion and we apologize for not including this previously. We have now added individual pre-umbilical vagal, post-umbilical vagal and sacral neural crest datasets as well as trajectory analysis for each.

      5) E10 is a relatively early stage in chicken ENS development. Around E7, the intestines do not contain differentiated neurons even. The relative high expression of Hes5 (marking mature enteric glia in the mouse; Morarach et al., 2021) in the vagal neural crest population might be explained by the more mature state of vagal versus sacral ENS. As also outlined below, Th/Dbh are known to be transiently expressed in the developing ENS why they could indicate the relative immaturity of sacral neural crest rather than differential neural identities. These issues need to be taken into account when interpreting biology from scRNA-seq data.

      We completely agree. We now clarify that we are particularly interested in how neural crest cells choose their prospective fates. We chose the E10 time point because this reflects a time point when the post-umbilical gut has been completely populated by both vagal and sacral neural crest cells for 2 days so cells are in the process of differentiation but there still exists a large precursor pool. For this reason, we can capture both precursors and some differentiated neuronal subtypes. Notably, the sacral derived precursors seem to be glial in flavor whereas neuronal precursors appear to be absent. We have clarified this point in the revised manuscript.

      6) Unlike the guineapig, and to some extent pig and murine ENS, the physiology of chicken enteric neurons has not been well characterized yet. Therefore, it is highly advisable to refrain from a nomenclature of clusters designating functions. Several key molecular markers are known to differ between murine, guineapig, rat and human systems. IPANs are a good example where differential expression is seen (SST in human but not mice; CGRP labels some IPANS in mouse, but not in guineapig, where Tac1 instead is expressed). IPANs are not defined in the chicken very well, and molecular markers found in other species may not be valid. Adrenergic and noradrenergic neurons have not been validated in the ENS (although, TH and Dbh have been observed in the especially in the submucosal ENS). Cholinergic neurons are also mentioned in the text, but do not appear in the figures as a defined group.

      Another reason to refrain from functional nomenclature is that a rather early stage is analysed in the present study, without possibilities to compare with scRNA-seq data from the mature chicken ENS (which was performed in Morarach et al, 2021 for the mouse). Recent data suggest that considerable differentiation may occur even in postmitotic neurons, and several markers are known to display a transient expression pattern (TH, DBH and NOS1; Baetge and Gershon 1990; Bergner et al., 2014; Morarach et al., 2021) why caution should be taken to infer neuronal identities to clusters.

      This is an excellent point and we thank the reviewer for this valuable input. Accordingly, we have now renamed the clusters based on prominent gene expression rather than neuronal or precursor subtype. Indeed we struggled with finding appropriate names making this comment all the more useful.

      7) The immunohistochemical analysis (Figure 5,6) is an essential complementary addition and validation of scRNA-seq. However, it is very difficult to discern staining when magenda and red are combined to display coexpression.

      Good point. This has been changed to be more readily discernible and higher magnification views have been added.

      8) To give more information to the field and body of evidence for claims made, quantifications relating to the analysis in Figures 5 and 6 are warranted as well as an expanded set of marker genes that align with the scRNA-seq results.

      Good point. We have added additional markers as suggested. In terms of quantitation, we can include numbers of labeled cells in a particular region but this may give a false impression of degree of contribution since we are using different viruses for vagal vs sacral that may have different titers making it a bit like comparing apples and oranges. We now emphasize that our labeling approach does not mark the entire population and that the degree of labeling can be variable.

      9) Correlations between genes and functions/neuron class are in many cases wrong (including Grm3, Gad1, Nts, Gfra3, Myo9d, Cck and more).

      Good point. We have toned this down.

      10) Attempts to subcluster neuronal populations are needed (Figure 7). However, to understand the biology, it is important to address which cells are sacral versus vagal-derived. Additionally, related to previous comment, as the vagal and sacral neurons are not clonally related, it would be important to make separate analysis of neurons relating to each region.

      Good point. We have added additional analysis to address this important point in what is now Fig 6 and in particular validated sacral contributions to glial cells (new Fig 8).

    1. Author Response

      Reviewer #2 (Public Review):

      In this study, Yang et al. used single-cell technology to construct the cell profiles of normal and pathological ligaments and identified the critical cell subpopulations and signaling pathways involved in ligament degeneration. The authors identified four major cell types: fibroblasts, endothelial cells, pericytes, and immune cells from four normal and four pathological human ligament samples. They further revealed the increased number of fibroblast subpopulations associated with ECM remodelling and inflammation in pathological ligaments. In addition, the authors further resolved the heterogeneity of endothelial and immune cells and identified an increase in pericyte subpopulations with muscle cell characteristics and macrophages in pathological ACL. Ligand-receptor interaction analysis revealed the involvement of FGF7 and TGFB signaling in interactions between pathological tendon subpopulations. Spatial transcriptome data analysis also validated the spatial proximity of disease-specific fibroblast subpopulations to endothelial and macrophages, suggesting their interactions in pathological ligaments. This study offers a comprehensive atlas of normal and pathological cells in human ligaments, providing valuable data for understanding the cellular composition of ligaments and screening for critical pathological targets. However, more in-depth analyses and experimental validation are needed to enhance the study.

      1) In this study, the authors performed deconvolution analysis between bulk RNA sequencing results and scRNA-seq results (L204-L208). However, the analysis of this section is not sufficiently in-depth and the authors failed to present the proportion of different cell subpopulations of the bulk sequencing samples to further increase the reliability of the results of the single cell data analysis.

      Thank you for the suggestion. We selected the top 50 Degs in each subpopulation of scRNA-seq, and scored the gene sets at the bulk RNA sequencing data level by GSVA method, so as to present the proportion of different cell subpopulations of the bulk sequencing samples to some extent. The results illustrated that, in the bulk RNA-seq data, fibroblast subpopulations (fibroblast 1,2,8,9) scored higher in the diseased group than in the normal group and fibroblast subpopulations (fibroblast 3,4) scored higher in the normal group than in the diseased group, which are consistent with the results of scRNA-seq.

      2) In results 5, the authors should clearly describe whether the analysis is based only on pathological subpopulations of ligament cells or includes a mixture of normal and pathological subpopulations; the corresponding description should also be indicated in Figure 5. Besides, although the authors claimed that "the TGF-β pathway was involved in many cell-cell interactions among fibroblasts subpopulations and macrophages", Figure 5C displayed that the CD8+NKT-like cells displayed the most TGFB signaling interactions with fibroblasts subpopulations.

      Thank you for your great questions. In results 5, our analysis is based on the mixture of normal and diseased subpopulations. We have also added a description of the data sample in the corresponding position in our manuscript.

      As for the question of the TGF-β pathway in cell-cell interaction analysis, we claimed that “the TGF-β pathway was involved in many cell-cell interactions among fibroblasts subpopulations and macrophages”, because we took into account the proportion of each subpopulation of immune cells. Macrophages are the largest subpopulation of immune cells, and the number of macrophages is significantly increased in the degenerative group, suggesting that they are closely related to disease progression. However, the proportion of CD8+NKT-like cells in immune cells was very small, and the number of them was basically unchanged between the normal and diseased groups. So, macrophages are the focus of our attention, and after comprehensive analysis, we did not mention the strength TGFB signaling interactions of CD8+NKT-like cells.

      3) In result 6, the authors performed spatial transcriptome sequencing, however, the sample numbers were relatively limited, with only one sample from each group; in addition, the results of this part failed to correlate and correspond well with the single-cell results. The subgroups labelled in L382 and L384 should be carefully checked. Besides, expression data of FGF7 and TGFB ligand and receptor molecules based on the spatial transcriptomes should be added to further confirm the critical signalling pathway in regulating the cellular interactions in pathological ACL.

      Thanks for your reminding. The purpose of our spatial transcriptome sequencing (spRNA-seq) was to verify the scRNA-seq results, so only one representative sample from each group was selected for spRNA-seq. We believe that the results of our spRNA-seq were correlated and corresponded well with the scRNA-seq results. The scRNA-seq results were validated on the spRNA-seq data using marker transfer and spotlight methods, respectively. The results showed that more fibroblast4 in the normal group and more fibroblast9 in the diseased group of the scRNA-seq data were also consistent in the distribution of spRNA-seq samples. As shown in the spotlight plots, the more fibroblast subsets (fibroblast1,2,8,9) identified in the scRNA-seq data of the disease group were more widely distributed in the spRNA-seq sample of the disease group, and were closer to endothelial cells and immune cells in spatial location. We have revised the subgroups labelled in L382 and L384.

      According to your suggestions, FGF7 and TGFB related ligand and receptor genes were mapped on spRNA-seq data, and the results were consistent with the results of cellchat analysis in scRNA-seq.

    1. Author Response

      Reviewer #1 (Public Review):

      It has been previously shown that defective autophagy and disorganized microtubule network contribute to the pathogenesis of Duchenne muscular dystrophy (DMD). The authors previously reported that nitrite oxide synthase 2 (NOX2) regulates these alterations. It was also shown that acetylated tubulin facilitates autophagosome-lysosome fusion and thus autophagy. In the present study, the authors showed that autophagy is differentially regulated by redox and acetylation modifications in dystrophic mdx mice. The ablation of Nox2 in mdx mice activated the autophagosome maturation but not its fusion with the lysosome. On the other hand, the inhibition of histone acetylase 6 (HDAC6) restored microtubule acetylation, promoted autophagosome-lysosome fusion, and improved muscle function in mdx mice. The strength of this paper is the combination of different approaches to decipher the mechanism, including the evaluation of the level and interaction of several proteins involved in the maturation of autophagosomes and in the fusion between autophagosomes and lysosomes.

      This study reveals an important molecular mechanism by which increasing microtubule acetylation improves autophagy and muscle function in dystrophic mice. This has a translational impact on several diseases in which autophagy is impaired. The improvement of autophagosome-lysosome fusion with HDAC6 inhibitor is supported by several data, but some parts merit further analysis:

      1) To add appropriate controls (e.g. without antibodies) to support protein-protein interaction for all co-immunoprecipitation assays.

      Thank you for your valuable suggestion. We appreciate your input and have taken it into consideration. Based on your recommendation, we have conducted an experiment by including IP-IgG as a negative control to support the protein-protein interaction results obtained from the co-immunoprecipitation assays. The results of the negative control have been included in the respective figures. Additionally, to ensure the accuracy of the negative control, we ran the positive controls on the same blot. We have immunoprecipitated the same amount of samples for the negative control as we did for the actual IP samples presented in the manuscript. We believe that the inclusion of the negative control has strengthened the validity of our results and the conclusion drawn from our study.

      2) The simple evaluation of the protein levels of p62 and LC3-II is not sufficient to claim autophagy improvement after HDAC6 inhibition. It would be good to evaluate the autophagic flux in vivo in all groups of mice (to treat the mice with or without autophagy inhibitor and evaluate whether the difference in the level of LC3-II between the two conditions is higher with HDAC6 inhibitor than without in the mdx mice).

      Thank you for your suggestion to further evaluate the role of TubA on autophagic flux in vivo. We have included data using chloroquine to test the effect of TubA on autophagic flux in vivo. We found that chloroquine increased LC3 and p62 in skeletal muscle from mdx and mdx + TubA mice, suggesting. We have now included this information in the revised manuscript.

      Reviewer #2 (Public Review):

      Agrawal et al. propose an interesting model in which the autophagy pathway in adult mouse skeletal muscle fibers is orchestrated by two independent mechanisms: a) the activity of the NADPH oxidase (Nox) 2 enzyme necessary for autophagosome biogenesis and maturation and b) the level of acetylation of the microtubule (MT) network more selectively responsible for the fusion of the autophagosomes to the lysosomes. Using the well-known mdx mouse, a model for Duchenne muscular dystrophy, the authors perform a quite impressive (but rather traditional) biochemical characterization of the autophagy pathway and found that biogenesis and maturation of the autophagosomes are impaired in mdx mice muscle fibers by means of altered expression of components of the class III phosphatidylinositol 3-kinase complex (PI3K) such as Beclin, VPS15 (both upregulated in mdx mice), ATG14L and VPS34 (both downregulated), and by the reduced expression of JNK and JIP-1, required for the formation of the heterodimer between Beclin and ATG14L-VPS34. In mdx mice, defective nucleation of the phagophore appears to be coupled to altered elongation and expansion as confirmed by decreased expression of WIPI-1, an early marker of autophagosome formation, required for the assembly of the ATG5-12 complex. Clearance of sequestered cytosolic components necessitates the fusion of the autophagosome with the lysosome, a process that the authors found impaired in mdx mice due to altered formation of the SNARE tertiary complex (STX17-SNAP29-VAMP8), as a result of the marked reduction of STX17 expression.

      In a previous work (Pal et al., Nat Commun 2014), the same group described the generation of an mdx-based mouse model where Nox2 activity was abolished by genetic ablation of the p47phox component. These mice presented with a better outcome in terms of dystrophic pathophysiology by means of reduced oxidative stress and improved autophagy. Further characterization of these mice in the present study reveals that in p47-/-/mdx mice abolishment of Nox2 activity restores autophagosome nucleation and maturation thanks to the increased expression of p-JNK, JIP-1 and improved stability of the Beclin-ATG14L complex, but no amelioration is observed on the formation of the SNARE tertiary complex indicating that the biogenesis of autophagosomes is dependent on Nox2 activity but not the fusion between autophagosomes and lysosomes. Given the existing body of evidence in non-muscle cells pointing at alpha-tubulin acetylation as a regulator of MT activity facilitating the fusion of autophagosomes to lysosomes, the authors thought to investigate the level of MT acetylation in mdx mice muscle fibers and found that acetylation is reduced but can be restored by inhibiting the HDAC6 enzyme via the FDA-approved, highly selective pharmacological inhibitor Tubastatin A (Tub A). Treatment of mdx mice at 3 weeks of age (before the onset of pathological manifestations) with Tub A not only restored the normal level of alpha-tubulin acetylation (without altering the organization and density of the MT network) but also curbed the intracellular redox status and improved the autophagic flux by stabilizing the SNARE tertiary complex. Interestingly, treatment of dystrophic mice with Tub A results in substantial improvement of the dystrophic phenotype as confirmed by a reduced level of apoptosis, diminished tissue inflammation, improved sarcolemma integrity, and superior force generation capacity in ex vivo experiments using the diaphragm and Extensor Digitorum Longus (EDL) muscle fibers of Tub A-treated mdx mice compared to untreated mdx and healthy counterparts.

      The in-depth characterization of the steps orchestrating the autophagy pathway in the mdx mouse model on the one hand, and the comprehensive evaluation of the phenotype of the mdx mice treated with the HDAC6 inhibitor Tubastatin A on the other, support the conclusions proposed by the authors. Nonetheless, some aspects deserve consideration.

      1) The effect of increased alpha-tubulin acetylation by means of genetic and pharmacological strategies (i.e., in vivo overexpression of alpha-tubulin acetyltransferase-aTAT1 and treatment with Tubacin or Tubastatin A, respectively) has been previously explored in isolated cardiomyocytes and skeletal muscle fibers and revealed that augmented MT acetylation, due to selective inhibition of HDAC6, increases cytoskeletal stiffness and favors Nox2 activation (Coleman et al., J Gen Physiol 2021).

      We have added a discussion of the work by Coleman and colleagues. In brief, that work was in wild-type cardiac and skeletal muscle and showed that MT acetylation controlled stiffness in control muscle cells. Interestingly, while they did not quantify MT organization, their data suggest that HDAC6 inhibition does not alter organization. Here, we are assessing the role of MT acetylation is a diseased model, mdx. Taken together, our data along with that from Ward and colleagues highlight the importance of a proper balance of tubulin acetylation in order to maintain cellular signaling, which is different between non-diseased and diseased skeletal muscle.

      2) Altered organization and density of the MT network in mdx FDB muscle fibers with loss of vertical directionality is not a novelty as well and it has been reported by others (see Randazzo et al., Hum Mol Genet 2019), who also observed that overexpression of a single beta-tubulin (tubb6) in normal Flexor Digitorum Brevis (FDB) muscle fibers mimic the disruption to the MT network of mdx FDB fibers, increases the level of detyrosinated tubulin and increases Nox2 activity (through elevated expression of gp91phox). Conversely, downregulation of the same beta-tubulin restores normal MT organization in mdx FDB. Previous work from the authors (Loehr et al., eLife 2018) reported that in p47-/-/mdx mice MT organization in diaphragm muscle fibers is normalized and autophagy improved. Accordingly, it is puzzling that increased alphatubulin acetylation determines such a wide range of ameliorations in terms of physiological and morphological aspects in dystrophic skeletal muscle fibers treated with Tubastatin A whereas no improvement in the overall MT organization is observed, as reported by Agrawal and colleagues.

      Our findings are also supported by Coleman et al who show that HDAC6 inhibition did not alter levels of DT-tubulin. Although that group did not specifically measure MT organization viewing and analyzing their representative images of alpha-tubulin (Figure 1D, control and tubacin) shows that HDAC6 inhibition does not alter MT organization in wild-type FDBs

      3) Given that p47-/-/mdx mice present with levels of acetylated alpha-tubulin and HDAC6 expression comparable to mdx while showing significant improvement of the dystrophic phenotype despite partial rescue of the autophagic flux (as reported in Loehr et al., eLife 2018), it would have been of great interest to investigate the effect of HDAC6 inhibition in p47-/-/mdx mice as well.

      We would like to thank the reviewer for acknowledging our in-depth characterization of the steps orchestrating the autophagy pathway in the mdx mouse model and the comprehensive evaluation of the phenotype of the mdx mice treated with the HDAC6 inhibitor Tubastatin A. While we believe these experiments are of interest, we think that they merit a detailed investigation that is beyond the scope of the current work

    1. Author Response

      Reviewer #1 (Public Review):

      This study provided evidence to interpret and understand the aging and developmental processes in children. The main strength of the study is it measures a set of biological age measures and a set of developmental measures, thus providing multi-faceted evidence to explain the associations between aging and development in children. The main weakness of this study is that how to measure and test the aging hypothesis of "a buildup of biological capital model" and "wear and tear" is not well-explained. Why the observed associations between biological age measures and developmental measures could support the aforementioned aging theories?

      Thank you. On reflection we agree that how to test the aging hypotheses of "a buildup of biological capital model" and "wear and tear" is not well-explained in the manuscript. We have addressed this issue in the point-by-point responses below:

      1) Abstract - conclusion: The aging hypothesis of "a buildup of biological capital model" and "wear and tear" were mentioned in the conclusion without an explanation of these theories in the previous section. Readers who are not experts in the field may not understand the logic.

      We have replaced these phrases in the abstract with the following interpretation, which we hope will be more readily understood:

      “Patterns of associations suggested that accelerated immunometabolic age may be beneficial for some aspects of child development while accelerated DNA methylation age and telomere attrition may reflect early detrimental aspects of biological ageing, apparent even in children.”

      2) Result - Biological age marker performance: the correlation between transcriptome age and chronological age is very strong (r =0.94). I am afraid that very little age-independent information could be captured by the transcriptome age. Is it possible to down-regulate the age dependency of the transcriptome age in the training process?

      Thank you for this important comment: We agree the high accuracy of this clock may in fact reduce its relevance as a biological age marker and note that this is a concern generally in the field. We have explored the possibility of using a less accurate transcriptome age model as follows: Instead of elastic net modelling we tested using the lasso penalisation only, which will result in more parsimonious (sparse) models as less important features are dropped as the strength of the lambda parameter is increased. Plotting the correlation in the test set against number of features in models, as the lambda is sequentially increased, we can see (as shown in Author response image 1 by the blue line) that after the inclusion of around 200 features, the gain in accuracy becomes less steep.

      Author response image 1.

      We then tested the sensitivity of a model optimised for sparsity at the expense of some prediction accuracy, selected based on visual inspection (blue line, r in test set =0.87, number of features= 187) of the above plot, against developmental measures, compared to the most accurate model as presently included in the manuscript:

      Author response image 2.

      We find that, across all outcomes tested, the less accurate model, based on only the most important features, does not provide an improvement in sensitivity to developmental outcomes compared to the currently used model.

      We therefore prefer to keep the more accurate model in this study. Especially as it is consistent with the methodology used in the Horvath and Immunometabolic age models and generally in the field, and otherwise it is not obvious how the biological clock should be trained (especially for children without mortality data) without altering the whole approach of the study. We have acknowledged and discussed this issue on page 15.

      3) The study population comes from several cohorts, which might influence the results. How the cohort effects were controlled for in the analyses?

      The possible influence of cohort is a limitation of the study which we have discussed on page 16. We did not include cohort as a predictor in any of the candidate biological clocks since this may reduce detection of some age -related features. Instead, we include a variable for cohort as a fixed effect in all analyses with risk factors and developmental outcomes and examined the performance of candidate biological clocks in predicting chronological age within each cohort. As a further check, we have added an additional sensitivity analysis (Figure 4-figure supplement 6), against developmental outcomes significant in the main analysis, stratified by cohort. We find generally consistent effects across cohorts.

      4) Figure 3 only showed the number of p values. Can the author also provide the number of point estimates and 95% confidence intervals, perhaps in the supplemental table?

      This information was originally provided in supplemental table 5 (now Supplementary file 7), combined with the sensitivity analyses. To make this information easier to find, we have made this a stand-alone table (table 3). We now direct readers to this information within the caption of Figure 4 (previously figure 2).

      Reviewer #2 (Public Review):

      The study had an especially relevant aim for aging research and utilized various data types in an especially interesting human population. Multi-omics perspective adds great value to the work. The researchers aimed to evaluate how different indicators of biological age (BA) behave in children during their developmental stage. In the analysis, relationships between indicators of BA, health risk factors, and developmental factors were assessed in cross-sectional data comprising children aged 5-12 years. The manuscript is well-written and easy to follow. The methodology is good. The authors succeeded to reach the aim in most parts.

      In the study, previously known and unknown biological age indicators were used. Known indicators included telomere length and Horvath's epigenetic age. Unknown (novel) indicators, transcriptomic and immunometabolic clocks, were developed in the present study and they showed a strong correlation with calendar age in this population, also in the validation data set. Although the transcriptomic and immunometabolic clocks have the potential of being true indicators of biological age, they are still lacking scientific evidence of being such indicators in adults. That is, their associations with age-related diseases and mortality are yet to be shown. Thus, the major remark of the study relates to the phrasing: these novel transcriptomic and immunometabolic clocks should be presented as BA indicator candidates waiting for the needed evidence.

      Thank you for this important observation. However, we still find that “biological age indicator” is a useful umbrella term in this manuscript and there is not an obvious alternative. We therefore have added the following sentence on page 8, and highlighted the difference between the markers at key points in the abstract, introduction, results and discussion.

      “We note that since a common definition of markers of biological age is that they should be associated with age-related disease and mortality [69] these new clocks may only currently be considered “candidate” biological age markers. However, we have referred to both the established and candidate markers as biological age markers throughout to simplify presentation.”

    1. Author Response

      Reviewer #1 (Public Review):

      In the manuscript, titled "Comparative single-cell profiling reveals distinct cardiac resident macrophages essential for zebrafish heart regeneration," Wei et al. perform bulk and single-cell RNA-sequencing on uninjured and injured zebrafish hearts with or without prior macrophage depletion by clodronate. For the single-cell RNA sequencing, the authors sort macrophages and neutrophils prior to sequencing by using fluorescent reporters for each of the two lineages. The authors characterize the differential gene expression between injured and uninjured hearts with and without prior macrophage depletion. The single-cell analyses allow the characterization of nine discrete subpopulations of macrophages and two distinct neutrophil types. The manuscript is largely descriptive with lots of discussion of specific differentially expressed genes. The authors conclude that tissue-resident macrophages are important for heart regeneration through the remodeling of the microenvironment and by promoting revascularization. Circulating monocyte-derived macrophages cannot adequately replace the resident macrophages even after recovery from clodronate depletion.

      The manuscript presents a very large catalog of useful gene expression data and further characterizes the diversity of macrophages and neutrophils in the heart following injury. Although the conclusions that resident macrophages are important for regeneration and that circulating macrophages cannot adequately substitute for them are not particularly novel, this manuscript provides additional support for those ideas and extends that work by providing a wealth of gene expression data from the different macrophage sub-populations in the zebrafish and how they respond to and promote regeneration. The authors also present a nice analysis supporting the interactions of macrophages with neutrophils via comparing receptors and ligands (from gene expression data) on the two populations - this should be a useful resource.

      We appreciate how reviewer #1 recognizes the work we have put into sample preparation, data collection, and all the bioinformatic analyses to delineate and characterize the inflammatory cells during zebrafish heart regeneration.

      Reviewer #2 (Public Review):

      Wei et al. analysed the composition of immune cells, mostly macrophages, and neutrophils, in the context of zebrafish cardiac injury while utilizing clodronate liposomes (CL) to inhibit regeneration via alteration of the immune response. This work is a direct continuation of Shih-Lei et al. which compared the regenerative outcomes of zebrafish vs the non-cardiac regenerative medaka. In that work, the authors used CL to pre-deplete macrophages and showed significant effects on neutrophil clearance, revascularization, and cardiomyocyte proliferation. In this work, the authors used the same pre-depletion method to study the dynamics, composition, and transcriptomic state of macrophages and neutrophils, to overall assess the effect on cardiac regeneration. Using bulk RNA-seq at CL vs PBS treated hearts 7 and 21 days post cryo injury (dpci) a delayed\altered immune response was evident. Single-cell analysis at 1,3 and 7 dpci showed a wide range of immune populations in which most diverse are the macrophage populations. Pre-depletion using CL, altered the composition of immune cells resulting in the complete removal of a single resident macrophage population (M2) or dramatically reducing the overall numbers of other resident populations, while other populations were retained. Looking at the injury time course and distribution of macrophage populations, the authors identified several macrophage populations and neutrophil population 1 as pro-regenerative as their presence compared to CL-treated hearts correlates with regeneration. CL-treated hearts also show a marked sustained neutrophil retention suggesting that interaction with depleted macrophage populations is required for neutrophil clearance. As the marked reduction in populations 2 and 3 occurs after CL treatment, the authors tested whether early CL treatment (8 days or 1 month prior to injury) could reduce the non-recoverable populations and affect regenerative outcomes and indeed they observed a reduction in key genes characterizing M2 and M3 which caused marked reduction in revascularization, CM proliferation, neutrophil retention, and overall higher scaring of the heart.

      The findings of this paper could be broadly separated into the characterization of myeloid cells after injury and in non-regenerating animals and assessing the effects of early pre-depletion of macrophages on various cardiac functions involved in regeneration. Both parts draw conclusions that are supported by the facts however several questions remain to be clarified.

      We thank the reviewer for recognizing that the conclusions we drew were supported by the data we presented and further replied to the specific suggestions below.

      1) In figures 2 and 3 the main claim is that the main resident macrophage populations, M2 and M3 are depleted and are largely unable to replenish after injury, similar to resident macrophages in mice 1. However, as the identification of this population is made solely using scRNA-seq, an alternative explanation would be that these cell populations do replenish but are sufficiently changed due to CL treatment (directly or indirectly) and thus would be a part of another cluster. To address this, we suggest:

      A. Run trajectory analysis to ascertain whether the different cell clusters are due to differentiating states of the cells

      B. Create a reporter line for M2 and M3 macrophages and assess whether they are indeed depleted or changing.

      We followed the reviewer’s suggestion and performed trajectory analyses (Figure 6). The results suggest that Mac 2 and Mac 3 form unique trajectory, which was not shifted by -1d_CL treatment but only diminished in number. Conditionally-enriched gene ontology analysis (Figure 4) also suggests that Mac 2 and 3 do not change property under -1d_CL condition (unlike monocyte-derived Mac 1 and some other clusters). When we examine homx1a expression (Mac2) and timp4.3 expression (Mac3) in -8d_CL treated hearts, we again observed diminished cell numbers (Figure 8C and Figure 7-figure supplement 1D). These results support the resident macrophages Mac 2 and Mac 3 are more likely to be non-recoverable than changing their property so much thus grouped into other subsets.

      We also agree with the reviewer that the specific reporter and CreER driver lines for the lineage tracing experiment will provide the most concrete answer to this question. We have now generated an endogenous Tg(mpeg1-2A-CreERT2) line in the lab (collaborative work with McGrail lab) and reporter lines using Mac2/3 enriched genes. Unfortunately, this work will take much longer time and might not fit into the scope of the current study.

      2) One of the major findings of this paper is that some macrophage populations can persist throughout injury and promote the regenerative response. Considering that macrophages have a half-life of less than a day in tissue 2 (although could be different in zebrafish and in this population), we estimate that the resident populations should be proliferative. As there is only a single proliferating macrophage population (M5) we speculate that it is a combination of several populations which are clustered together due to the high expression of cell cycle genes. To verify whether the resident populations are proliferating we suggest:

      A. Perform cell-cycle scoring and regression (found in Seurat package) and assess whether after regressing out cell cycle genes there are contributions of M5 to other clusters.

      B. Perform EDU labelling experiments with cell cycle identifiers (staining for hbaa1, Timp4.3) and assess their proliferative dynamics.

      We followed the reviewer’s suggestion and performed cell-cycle scoring and regression (Figure 2-figure supplement 4). Cell cycle scoring suggests there are cells in both Mac 2 and 3 in the G2/M phase and presumably proliferative. Cell-cycle regression results suggest that most macrophage subsets, including Mac 5, still stand as unique clusters after regression (Figure 2-figure supplement 4). These results suggest that Mac 5 might not be constitute of proliferating cells from other clusters.

      On the other hand, we also tried to double-stain the proliferating resident macrophages by EdU and ISH of hbaa1 and timp4.3. Unfortunately, these methods were not comparable in our hands, and we failed to confirm their proliferative dynamics. We did show proliferating macrophages residing in the untouched hearts and will further check their identity once we have the cluster-specific reporter lines ready.

      Last but not least, using the Tg(mpeg1-2A-CreERT2) line to label embryonic macrophages under the Tg(ubi:loxP-EGFP-loxP-mCherry)cz1701 background before 7 dpf, we observed mCherry+ macrophages in juvenile fish at 50 dpf, suggesting some embryonically derived macrophages can last more than a week in the system presumably by self-renewing. As replied previously, these results might not be included in this study.

      3) In connection to the previous point if indeed these resident macrophage populations are proliferative, even a smaller portion of remaining cells should be sufficient to partly replenish given sufficient time after CL 1. However as seen in Fig. 3B, the M2 population has a similar proportion of cells on days 1 and 3 after CL treatment and by day 7 it declines in numbers. Given that CL should not be present anymore, we expect this population to increase in numbers over time.

      We thank the reviewer for pointing out that Figure 3B might be misleading as the proportion of the macrophage subsets was calculated. The persistence of Mac 2 proportion at 1 and 3 dpci might be due to the overall depletion of both resident and recruited macrophages after CL treatment. 2 days after CL treatment still have profound effects on total macrophage numbers (Figure 7-figure supplement 1A and Lai et al., 2017) and the overall macrophage numbers only recovered to the same level as those in untouched or PBS-treated injured hearts by 7 days (Figure 7-figure supplement 1A and Lai et al., 2017). We have also confirmed that Mac 2 diminished in CL-treated hearts by both qPCR and ISH/IHC of homx1a in Figure 7-figure supplement 1C and Figure 8B.

      4) In Figure 6 the authors show a reduction in mpeg+ population however a persistent, large population ({plus minus}70% of the original mpeg+) is retained. The authors suggest that this population is comprised of other, non-macrophage, cell types however as this method is the very core of the paper and the persistence of macrophages could alter our understanding of the results, it must be verified.

      Dick, S. A. et al. Self-renewing resident cardiac macrophages limit adverse remodeling following myocardial infarction. Nature Immunology 20, 29-39, doi:10.1038/s41590-018-0272-2 (2019).

      Leuschner, F. et al. Rapid monocyte kinetics in acute myocardial infarction are sustained by extramedullary monocytopoiesis. J Exp Med 209, 123-137, doi:10.1084/jem.20111009 (2012).

      We acknowledge that mpeg1 might not be the perfect marker for pan-macrophage labeling shown by the work published by Ferrero et al., J Leukoc Biol. 2020, when our profiling work had been undergone. Fortunately, scRNAseq profiling is an unbiased method to reveal gene expression/cell identity, and our results indeed identified non-macrophage/non-neutrophil populations out of the clustering and found mpeg1+ B-cells consistent with the literature. Thus, the mixed input from the mpeg1 reporter does not affect the property of Mac 2 and 3 being both mpeg1-positive macrophages, which diminished after both -1d_CL and -8d_CL treatment. Following the reviewer’s suggestion, we further verified this point by both qPCR of hbaa1 and timp4.3 and ISH/IHC of homx1a and timp4.3 in the CL-treated hearts in Figure 7-figure supplement 1C and D and Figure 8B and C.

      Reviewer #3 (Public Review):

      Macrophages play an important role during heart regeneration. This has been shown in the mouse and zebrafish for example by treating the animals with clodronate liposomes to eliminate phagocytic cells.

      The manuscript follows up on a previous observation by the authors performing these experiments in the zebrafish (Lai et al eLife 2017). When comparing regenerative vs non-regenerative teleosts zebrafish resp Medaka they found that macrophages and neutrophils were the cell types more differentially responding in these two species to a cardiac injury.

      Here the authors analyze in extenso neutrophil and macrophage populations using single-cell RNA-seq at different stages of regeneration. They perform FAC sorting of the two populations using specific reporter lines. They also assess the change in these populations upon clodronate treatment. They find that clodronate treatment affects the gene expression profiles of different subsets of macrophages and neutrophils as well as their abundance.

      They also show that chlodronate treatment performed several days before cryoinjury depleted macrophages from the heart but after injury overall macrophage number recovers. However, heart regeneration does not. Cardiomyocyte is the only parameter that is not affected, but vasculogenesis and scar resolution is impaired.

      The authors conclude that (1) there are different subsets of macrophages and neutrophils, (2) that they interact with each other during regeneration through specific ligand and receptor pairs, and (3) that a cardiac resident population rather than a circulating macrophage population is important for heart regeneration.

      The transcriptomic characterization of the two immune cell populations is very exhaustive and rigorous. No functional validation of subpopulation marker genes was performed, but the data as it stands will already be of great value to the community. The figure quality is outstanding.

      We thank the reviewer for recognizing the value of our study and the quality of the data presented. We further examined the subpopulation markers and their functional relevance in the revised manuscript, as suggested.

    1. Author Response

      Reviewer #1 (Public Review):

      In the current work, the authors aimed to investigate the genetic and non-genetic factors that impact structural asymmetry.

      A major strength is the number of data samples included in the study to assess brain structural asymmetry. A consequence of the inclusion of many samples is then also the sample size.

      We thank the reviewer for their supportive and insightful comments that have helped improve our paper.

      Comment #1: Given that the authors also work with longitudinal data, it would be nice to be able to appreciate the individual effects across time points, this is now a little unclear.

      Our lifespan analysis incorporated both single and repeat measures over time in the trajectory estimation, and hence these will be an intermediate estimate of cross-sectional and longitudinal trajectories. We have clarified this in the Methods (see 1). A comprehensive analysis of the individual-specific asymmetry change effects in the current paper is thus hindered by many properties of the data, including that many participants contribute a single measure, that participants vary in their number of repeat-measures (1-6 timepoints), that the number of repeat-measures is dependent on age, and that the degree of asymmetry change differs between cortical metrics, clusters, and along the age variable. Most importantly, the average degree of asymmetry change is small; Fig. 3 indicates thickness asymmetry typically corresponds to a ~0.1 - 0.2mm difference, such that changes therein will be smaller and thus likely unclear at the individual level. Nevertheless, we have modified the average plots in Figures 2 and 3 to allow better visualization of the individual hemispheric measures across timepoints, as well as an appreciation of the density of our longitudinal data.

      1 – (line 646) “GAMMs incorporate both single and repeat measures over time to capture nonlinearity of the mean level trajectories across persons, resulting in population estimates that are intermediate between cross-sectional and longitudinal trajectories”

      Comment #2: A possible less well-developed approach is the genetic basis, as this was stated as the main question, here the investigations are not that deep and may only touch upon the question.

      We agree the previous formulation of our Abstract did convey this impression, and have thus made the following important amendment:

      (Abstract) “Cortical asymmetry is a ubiquitous feature of brain organization that is subtly altered in some neurodevelopmental disorders, yet we lack knowledge of how its development proceeds across life in health. Achieving consensus on the precise cortical asymmetries in humans is necessary to uncover the developmental timing of asymmetry and extent to which it arises through genetic or later influences in childhood.”

      Our paper aims to serve as a critical reference for the normative childhood development and lifespan change of cortical asymmetry. We performed heritability analyses as they are informative regarding development and shed light on the timing of influences shaping cortical asymmetry (also possibly prior to age ~4 at which our sample starts). Similarly, genetic correlation analysis sheds light on whether the replicable interregional correlations are underpinned by genetic differences, indicative of coordinated genetic development of asymmetries. We apologize the rationale behind these analyses was not well-specified, and have clarified this (see response #4). Thus, we respectfully disagree the genetic aspect represented the main research question, but rather lends support to our developmental perspective.

      Given the density of analyses already included and that these are well-specified within the context of our overarching question, we do not see how adding more genetic analyses will be beneficial for our paper. However, we agree with the Reviewer’s subsequent comment (#8) that the genetic correlations in HCP data should also have been reported, and now incorporate these (see response #8).

      Comment #3: Moreover, the association with cognition, handedness, sex, and ICV is somewhat interesting yet seems also a bit minimal to fully grasp its implications.

      In the asymmetry field it has been commonplace to assume these factors are strongly related to asymmetry, particularly sex. Here, despite optimizing the delineation of asymmetries, associations with factors purportedly related to it were all very small. We believe this is an important message that may help reorient the field away from entrenched views; unless we show it is not the case, researchers may think the effects of these factors are larger than they are. Further, because questions pertaining to sex and handedness differences will certainly arise for many, we chose to address them by quantifying the average effects in big data, because our lifespan trajectory analysis was not well-suited to assessing e.g. sex differences in asymmetry trajectories (i.e. 3-way non-linear interactions; sexagehemisphere). We have strengthened the reasoning for this analysis in the Introduction (see 1):

      1 – (line 118) “Therefore, as a final step, we reasoned that combining an optimal delineation of population-level cortical asymmetries with big data would optimize detection and quantification of the effects of factors commonly assumed important for asymmetry, namely general cognitive ability, handedness and sex.”

      Contrary to approaches that often place emphasis on p-values (e.g. pheWAS), our targeted approach using variables long considered important for asymmetry enabled transparent reporting of the effect sizes and directions. We hope the Reviewer agrees we have taken care in this regard, and are careful to communicate the found effects are small. The small effects seem typical of structural brain associations in big data, as may be expected when relating complex phenotypes to any single structural measure. For these reasons, we opt not to extend the analysis beyond our initial targeted approach, arguing instead that the size of the effects is reason enough to report them.

      Despite being small, however, we argue they are not negligible (see 2-4). Of note, though it may appear so in Fig. 7, the p-value for the cognitive association was far from just surviving Bonferroni correction (it would survive >13,000 comparisons at our alpha level [⍺=.01], whereas we corrected for our 136). Note we did not accept a 5% false positive rate. We have clarified this in the Results (see 5):

      2 – (line 485) “Other factors commonly espoused to be important for asymmetry were associated with only small average effects in adults. For example, we found one region – SMG/perisylvian – wherein higher leftward areal asymmetry related to subtly higher cognitive ability. Since interhemispheric anatomy here is likely related to brain torque 2,3, this may agree with work suggesting torque relates to cognitive outcomes 4,5. Interestingly, that ~94% of humans exhibit leftward asymmetry in this region (Figure 1G) suggests tightly regulated genetic-developmental programs control its lateralized direction in humans (see Figure 6). This result may therefore suggest disruptions in areal lateralization early in life are associated with cognitive deficits detectable in later life as small effects in big data 6. While speculative, this may also agree with evidence that differences in general cognitive ability that show high lifespan stability 6 relate primarily to areal phenotypes formed early in life 7–9.”

      3 – (line 461) “We also found areal asymmetry in anterior insula is, to our knowledge, the most heritable asymmetry yet reported with genomic methods 10–14, with common SNPs explaining ~19% variance. This is notably higher than in our recent report (< 5%) 14, illustrating a benefit of our approach. As we reported recently 14, we confirm asymmetry here associates with handedness.”

      4 - (line 495) “Consistent with our recent analysis in UKB 14, we confirmed leftward areal asymmetry of anterior insula, and leftward somatosensory thickness asymmetry is subtly reduced in left-handers. Sha et al. 14 reported shared genetic influences upon handedness and asymmetry in anterior insula and other more focal regions. Anterior insula lies within a left-lateralized functional language network 15, and its structural asymmetry may relate to language lateralization 16–18 in which left-handers show increased atypicality 19–21. Since asymmetry here emerges early in utero 22 and is by far the most heritable (Figure 6), we agree with others 16 that this ontogenetically foundational region of cortex may be fruitful for understanding genetic-developmental mechanisms influencing laterality 23,24. Less leftward somatosensory thickness asymmetry in left-handers also echoes our recent report 14 and fits a scenario whereby thickness asymmetries may be partly shaped through use-dependent plasticity and detectable through group-level hemispheric specializations of function. Still, the small effects show cortical asymmetry cannot predict individual handedness. Associations with other factors typically assumed important were similarly small, and mostly compatible with the ENIGMA report 25 and elsewhere 26,27. 5 - (line 3221) ”Although small, we note this association was far from only just surviving correction at our predefined alpha level (⍺ = .01; corrected for 136 tests; Methods).”

      6 - (line 348) “we … uncover novel and confirm previously-reported associations with factors purportedly related to asymmetry – all with small effects”

      Thus, in quantifying effects we could not include in our lifespan analysis we preempt the questions likely to arise for many researchers, provide a sobering account of the effect sizes of factors typically assumed important for asymmetry, and find results that fit the developmental framework we lay out in the paper. We therefore opt to keep these together with the lifespan and heritability results in the current paper.

      Comment #4: To some extent, the aim of the study could still be written with more clarity. However, the authors have in part achieved their aims - assuming it is found a consensus on the brain asymmetry patterns in humans as is stated in the abstract.

      Alongside the amendment to the Abstract that better clarifies our aims (response #2), we have restated the aims in the Introduction:

      1 - (line 121) Here, we first aimed to delineate population-level cortical areal and thickness asymmetries using vertex-wise analyses and their overlap in 7 international datasets. With a view to gaining insight into cortical asymmetry development, we then aimed to trace a series of lifespan and genetic analyses. Specifically, we chart the developmental and lifespan trajectories of cortical asymmetry for the first time longitudinally across the lifespan. Next, we examine phenotypic interregional asymmetry correlations, under the assumption correlations indicate coordinated development of left-right asymmetries through genes or lifespan influences. To shed light on the extent to which differences in asymmetry are genetic, we test heritability of asymmetry using genome-wide single nucleotide polymorphism (SNP) and extended twin data, and examine whether or not phenotypic associations are underpinned by genetic correlations suggestive of coordinated development through genes. Finally, we screen our set of robust, population-level asymmetries for association with general cognitive ability and factors purportedly related to asymmetry in UK Biobank (UKB). 28

      Comment #5: Overall the results support the conclusions, yet the strong interpretation of early life factors in particular is not empirically investigated as far as I gather.

      The reviewer is correct that we do not have data on neonates to directly support interpretations of prenatal factors. We have therefore tempered strong interpretations pertaining to prenatal accounts accordingly, have added text at the start of the Discussion to address this (see 1), and qualified all discussion of prenatal factors:

      1 – (line 366) “Tracing their lifespan development, we show the trajectories of areal asymmetry primarily suggest this form of asymmetry is developmentally stable at least from age ~4, maintained throughout life, and formed early on – possibly in utero 13,29,30 (while we cannot extrapolate to ages before our sample begins, we note this agrees with findings in neonates 29,30). One interpretation of lifespan stability combined with low heritability may be stochastic early-life developmental influences determine individual differences in areal asymmetry more than later developmental change, but work linking prenatal and childhood trajectories is needed to affirm this”

      2 – (Abstract) “Results suggest areal asymmetry is developmentally stable and arises early in life through genetic but mainly subject-specific stochastic effects”

      We have also added argumentation regarding a just-published study suggesting the average pattern of neonatal areal asymmetry is largely similar to adults 1. In addition, we reiterate what our data can and cannot say about the developmental timing of asymmetry in several places in the Discussion (see 3 & 5). In other places, we have removed reference to prenatal factors (see 4). Still, while we agree we previously used the terms “prenatal” and “early life factors” interchangeably, we note the latter often encompasses periods of early childhood covered here and is not necessarily restricted to factors present at birth 2,3. Thus, we have amended the Discussion to qualify the age-range the interpretation pertains to (see 5), and then retain the conclusion as follows (see 6).

      3 - (line 383) “For areal asymmetry, adult-like patterns of lateralization were strongly established before age ~4, indicating areal asymmetry traces back further and does not primarily emerge through later cortical expansion 33. Rather, the lifespan trajectories predominantly show stability from childhood to old age, as asymmetry was maintained through periods of developmental expansion and aging-related change that were region-specific and bilateral. This may align with evidence indicating areal asymmetry may be primarily determined in utero 29,30, including evidence suggesting little change in areal asymmetry from birth to 2 years 29,33,34, and little difference between maps derived from neonates and adults 29,30. It may also fit with the principle that the primary microstructural basis of cortical area 8 – the number of and spacing between cortical minicolumns – is determined in prenatal life 8,9, and agree with work suggesting asymmetry at this microstructural level may underly hemispheric differences in surface area 35. The developmental trajectories agree with studies indicating areal asymmetry is established and strongly directional early in life 29,36. That change in surface area later in development follows embryonic gene expression gradients may also agree with a prenatal account for areal asymmetry 9”

      4 - (line 439) “The strongest relationships all pertained to asymmetries that were proximal in cortex but opposite in direction. Several of these were underpinned by high asymmetry-asymmetry SNP-based genetic correlations, illustrating some lateralizations in surface area exhibit coordinated genetic development.”

      5 - (line 481) “Regardless, these results support a differentiation between early-life (i.e. before age ~4) and later developmental factors in shaping areal and thickness asymmetry, respectively.”

      6 - (Conclusion) “Developmental and lifespan trajectories, interregional correlations and heritability analyses converge upon a differentiation between early-life and later-developmental factors underlying the formation of areal and thickness asymmetries, respectively. By revealing hitherto unknown principles of developmental stability and change underlying diverse aspects of cortical asymmetry, we here advance knowledge of normal human brain development.”

      Overall this is a nice and thorough work on asymmetry that may inform further work on brain asymmetry, its genetic basis, development, environmentally induced change, and link to behavioural variation.

    1. Author Response

      Reviewer #1 (Public Review):

      Bacterial carboxysomes are compartments that enable the efficient fixation of carbon dioxide in certain types of bacteria. A focus of the current work is on two protein components that provide spatial regulation over carboxysomes. The McdA system is an ATPase that drives the positioning of carboxysomes. The McdB system is essential for maintaining carboxysome homeostasis, although how this role is achieved is unclear. Previous studies, by the lead author's lab, showed that the McdB system is a driver of phase separation in vitro and in cells. They proposed a putative connection between McdB phase separation and carboxysome homeostasis. The central premise of the current work is as follows: In order to understand if and how phase separation of McdB impacts carboxysome homeostasis, it is important to know how the driving forces for phase separation are encoded in the sequence and architecture of McdB. This is the central focus of the current work. The picture that emerges is of a protein that forms hexamers, which appears to be a trimer of dimers. The domains that drive that the dimerziation and trimerization appear to be essential for driving phase separation under the conditions interrogated by the authors. The N-terminal disordered region regulates the driving forces for phase separation - referred to as the solubility of McdB by the authors. To converge upon the molecular dissections, the authors use a combination of computational and biophysical methods. The work highlights the connection between oligomerization via specific interactions and emergent phase behavior that presumably derives from the concentration (and solution condition) dependent networking transitions of oligomerized McdB molecules.

      Having failed to obtain specific structural resolution for the full-length McdB as a monomer or oligomer, the authors leverage a combination of computational tools, the primary one being iTASSER. This, in conjunction with disorder predictors, is used to identify / predict the domain structure of McdB. The domain structure predictions are tested using a limited proteolysis approach and, for the most part, the predictions stand up to scrutiny affirming the PONDR predictions. SEC-MALS data are used to pin down the oligomerization states of McdB and the consensus that emerges, through the investigations that are targeted toward a series of deletion constructs, is the picture summarized above.

      Is the characterization of the oligomerization landscape complete and likely perfect? Quite possibly, the answer is no. Deletion constructs pose numerous challenges because they delete interactions and inevitably impose a modularity to the interpretation of the totality of the data.

      This is a good point and always a possibility with truncations – the protein McdB may not be as modular in nature as it seems in our tripartite model. But the deletion constructs were more so intended to be tools for identifying key regions of oligomerization and condensate formation as others have done, and for this, they were indeed useful. Additionally, we were able to strategically aim our substitution mutations based on data from the deletion constructs. These substitutions provided data consistent with the deletions, but in the context of the full-length protein (see Fig. 5 vs. Figs. 2, 4). However, we ultimately agree with the reviewer that this is always a possibility with truncations, and we have therefore mentioned this caveat in the discussion.

      Line 415 “Truncated proteins have been useful in the study of biomolecular condensates. But it is important to note that using truncation data alone to dissect modes of condensate formation can lead to erroneous models since entire regions of the protein are missing. However, data from our truncation and substitution mutants were entirely congruent. For example, deletion of the CTD or substitutions to this region caused destabilization of the hexamer to a dimer, and deletion of the IDR or substitutions to this region caused solubilization of condensates without affecting hexamer formation.”

      Accordingly, we are led to believe that the N-terminal IDR plays no role whatsoever in the oligomerization.

      Our updated data still strongly supports this interpretation. Both truncation of the IDR (Fig. 2) and the six-Q-substitution mutant in the IDR (Fig. 5) form a monodispersed hexamer in solution via SEC-MALS, as does wild-type McdB.

      Close scrutiny, driven by the puzzling choice of nomenclature and the Lys to Gln titrations in the N-terminal IDR raise certain unresolved issues. First, the central dimerization domain is referred to as being Q-rich. This does not square with the compositional biases of this region. If anything is Q/L or just L-rich. This in fact makes more sense because the region does have the architecture of canonical Leu-zippers, which do often feature Gln residues. However, there is nothing about the sequence features that mandates the designation of being Q-rich nor are there any meaningful connections to proteins with Q-rich or polyQ tracts. This aspect of the analysis and discussion is a serious and erroneous distraction.

      We changed the language here, and no longer refer to the central region as “Q-rich”. However, we would like to note that the second half of the McdB central domain is indeed enriched in glutamines (14/53 = 26.4%) to a comparable extent as the region of FUS, which has been shown to help drive condensate formation via glutamine H-bonding (14/44 = 31.8%; Murthy et al 2019). We were simply proposing that, at a molecular level, there was some insight to be gained from this comparison. We agree, however, that there is no functionally meaningful comparison between McdB and polyQ-tract proteins, as we may have previously alluded to in our discussion, and that text has been removed.

      Back to the middle region that drives dimerization, the missing piece of the puzzle is the orientation of the dimers. One presumes these are canonical, antiparallel dimers. However, this issue is not addressed even though it is directly relevant to the topic of how the trimer of dimers is assembled.

      Indeed, we were unable to resolve the orientation issue, despite much effort. The story we present is not a complete and final model of McdB structure, nor its molecular modes of oligomerization or condensate formation. However we now provide a discussion section “McdB homologs have polyampholytic properties between their N- and C-termini” that highlights this issue. We also mention the remaining dimer orientation issue at the end of the results section “Se7942 McdB forms a trimer-of-dimers hexamer”. However, we believe the data presented still provides useful initial models, which for example, allowed us to create a series of substitutions that tune McdB condensate solubility and verify that they do not affect oligomerization. We would like to further add that for other condensate forming proteins in bacteria, like the PopZ protein we mention in the text, there remains no detailed structural model beyond the resolution we provide here for McdB; despite PopZ being first identified in 2008. Over 40 publications on PopZ have progressively provided useful and more detailed models that are only now being used to develop PopZ as a tool for condensate technologies that are furthering our understanding of the biological implications of condensate formation across all cell types. The intention with our current report is therefore not to generate a finalized molecular model of this entirely unstudied class of McdB proteins. But instead, to generate useful insight into McdB biochemistry that can advance our understanding of this class of protein’s function in vivo. To this end, we now add in vivo data based on these initial models where we specifically link cellular phenotypes to McdB condensate solubility (Fig. 8). Of course, there are several follow-up studies that come from the current report, but we believe that speaks to the value of the presented research in advancing this field.

      If the trimer is such that all binding sites are fully satisfied (with the binding sites presumably being on the C-terminal pseudo-IDR), then the hexamer should be a network terminating structure, which it does not seem to be based on the data. Instead, we find that only the full-length protein can undergo phase separation (albeit at rather high concentrations) in the absence of crowder. We also find that the driving forces for phase separation are pH dependent, with pH values above 8.5 being sufficient to dissolve condensates. Substitution of Lys to Gln in the N-terminal IDR leads to a graded weakening of the driving forces for phase separation. The totality of these data suggest a more complex interplay of the regions than is being advocated by the authors.

      Thank you and we agree. As we discuss above in response #4 and below in response #7, we have changed the focus and tone of our report to say that, while the models we have generated are useful, we are aware they are incomplete at a molecular level. Furthermore, as we describe in response #6, we have added several new McdB mutants to investigate more deeply the role of the CTD, but this region was not amenable to mutagenesis as these mutants affected McdB oligomerization. Lastly, while network forming interactions are certainly important for condensate formation as the reviewer describes, so are solvent interactions. We have added new text and data related to Figs. 3, 4 that address these issues.

      Almost certainly, there are complementary electrostatic interactions among the N-terminal IDR and C-terminal pseudo IDR that are important and responsible for the networking transition that drives phase separation, even if these interactions do not contribute to hexamer formation. The net charge per residue of the 18-residue N-terminal IDR is +0.22 and the NCPR of the remainder is ≈ -0.1. To understand how the N-terminal IDR is essential, in the context of the full-length protein, to enable phase separation (in the absence of crowder), it is imperative that a model be constructed for the topology of the hexamer. It is also likely that the oligomer does not have a fixed stoichiometry.

      We agree and thank the reviewer for these comments. We have added several new substitution mutants aimed at addressing this (Figs. 5, S6). However, the C-terminus was not amenable to substitutions as the trimer-of-dimers was significantly destabilized in these mutants (Figs. 5, S7). Therefore, in this report we were unable to determine specifically how the basic residues in the IDR contribute to condensate formation. However, with the addition of new data in Fig. 8, we think we adequately show that the IDR mutants can be used to investigate McdB condensate formation in vivo, and that follow-up studies will be aimed at investigating these details. We have also added an new discussion section “McdB homologs have polyampholytic properties between their N- and C-termini” that highlight this very likely possibility suggested by the reviewer.

      Therefore, the central weakness of the current work is that it is too preliminary. A set of interesting findings are emerging but by fixating on Lys to Gln titrations within the N-terminal IDR and referring to these titrations as impacting solubility, a premature modular and confused picture emerges from the narrative that leaves too many questions unanswered.

      The work itself is very important given the growing interest in bacterial condensates. However, given that the focus is on understanding the molecular interactions that govern McdB phase behavior - a necessary pre-requisite in the authors minds for understanding if and how phase separation impacts carboxysome homeostasis - it becomes imperative that the model that emerges be reasonably robust and complete. At this juncture, the model raises far too many questions.

      We agree that our previous report was focused mainly on the molecular basis of McdB condensate biochemistry, and in that report we left the model short. In this revised version, we have added several pieces of new data that strengthen the model (Figs. 3-5), although it is still incomplete. However, in this revised version, we have also shifted the focus from a complete biochemical understanding of McdB condensates to a study that links McdB condensate formation in vitro to phenotypes in vivo. In this regard, we have added the in vivo data in Fig. 8 and somewhat changed the focus in the text.

      The MoRF analysis is distraction away from the central focus.

      The MoRF analysis has been removed.

      The problem, as I see it, is that the authors have gone down the wrong road in terms of how they have interpreted the preliminary set of results. Further, the methods used do not have the resolution to answer all the questions that need to be answered. Another issue is that a lot of standard tropes are erected and they become a distraction. For example, it is simply not true that in a protein featuring folded domains and IDRs it almost always is the case that the IDR is the driver of phase transitions. This depends on the context, the sequence details of the IDRs, and whether the interactions that contribute to the driving forces for phase separation are localized within the IDR or distributed throughout the sequence. In McdB it appears to be the latter, and much of the nuance is lost through the use of specific types of deletion constructs.

      Thank you. We have removed much of this and changed the diction on how our current model of McdB condensate formation fits into the literature in the discussion.

      Overall, the work represents a good beginning but the data do not permit a clear denouement that allows one to connect the molecular and mesoscales to fully describe McdB phase behavior. Significantly more work needs to be done for such a picture to emerge.

      Reviewer #2 (Public Review):

      In this work, Basalla et al. study the biochemical properties of the carboxysome positioning protein, McdB. Using in vitro experiments, the authors characterize McdB oligomeric states and the domains driving and modulating its phase separation. Based on bioinformatics analysis, the authors identify a putative binding recognition motif between McdB and its two-component system counterpart McdA. As McdAB-like systems emerge as spatial regulators of bacterial compartments, the data presented here may be of general interest. The study is well executed and provides exciting hypotheses to be tested in vivo.

      The authors found that McdB from S. elongatus PCC 7942 consists of three domains: an N-terminal 18 aa disordered region, a Q-rich helical domain, and a helical C-terminal domain (CTD). Analyzing these domains, the authors present three key results: (i) The Q-rich domains form dimers, and the CTD drives the formation of trimers of dimers (ii) Phase separation is pH sensitive, driven by the Q-rich domain, and modulated by basic residues in the IDR, (iii) The IDR contains a putative recognition motif that binds McdA. While these three sets of results are rich in data, they are disjointed. Relating the three datasets (oligomeric states of the protein, its phase separation behavior, and its ability to bind McdA) is required to provide a complete picture of the molecular mechanism driving McdB condensation.

      Specific comments:

      1) The main limitation of this manuscript is the lack of integration between the three areas of results. In particular: how do the IDR basic residues disrupt phase separation? Is that through interference with either the dimer or timer interface? Does the McdB IDR regulate phase separation behavior when bound to McdA? Or, in other words, is the MoRF acting both as a binding interface and as a solubility regulator, and if so, can both functions be achieved simultaneously? It seems like the MoRF includes at least three basic residues.

      Indeed, we were unable to fully resolve the specific molecular interactions that give rise to condensates versus those that give rise to oligomers, and how these two modes of self-association contribute to one another. One limitation was that, as shown in our new data, the CTD was not amenable to mutagenesis, as it caused destabilization of the trimer-of-dimers (Fig. 5, Fig. S7). Therefore, we could not dissect how the CTD contributes to oligomerization versus driving condensates. However, we did include in vivo data showing how the IDR mutations allowed us to specifically link phenotypes to McdB condensate solubility (Fig. 8). As we discuss above in responses #4, #6, and #7, we changed the focus of the revised manuscript from the molecular basis of McdB condensate formation to linking McdB condensate formation in vitro and its functionality in vivo. To this end, we think the IDR mutation set has been useful, and follow-up studies will be done to further the molecular model of McdB condensate formation. Reviewers 1 and 3 deemed the MoRF section a distraction. Therefore, MoRF analysis and discussions of McdA interactions with this potential MoRF have been removed.

      Finally, what is the effective concentration of McdB in cells, and how does that translate to the in vitro studies?

      In our previous version, we used McdB concentrations between 50-100 µM. We do not know the in vivo concentration of McdB. We have tried several antibodies against McdB, and a few were good enough to detect the presence of McdB, but not quantifiably. We therefore believe in vivo McdB levels are low (sub-micromolar), and definitely lower than the range we previously used in our in vitro studies. In our revised manuscript, we include a titration of McdB at lower concentrations, and see condensates at McdB concentrations lower than 2 µM.

      2) How general are the conclusions made here to other McdBs? The authors have published nice work surveying the commonalities and differences between homologous McdB proteins. Can you comment on the applicability of your findings to other McdB proteins?

      This is a great point, which we have added to a new discussion section titled “McdB homologs have polyampholytic properties between their N- and C-termini”.

      Additional issues:

      3) Using SEC and SEC-MALS, the authors demonstrated that the Q-rich domain forms a stable dimer and that the full-length protein forms hexamers, suggesting trimers of dimers assembly. The authors also suggest that the CTD is responsible for forming those trimers of dimers based on SEC-MALS measurements. However, Figure 2D shows that while the full length runs at 6.6x the monomer, the Q-rich+CTD runs at 5.4x the monomer. First, I could not find SEC-MALS of the full-length protein, and it is not clear whether SEC-MALS was used for all or a fraction of the constructs discussed in Figure 2D. Second, could it be that the Q-rich domain+CTD is an ensemble of hexamers and dimers? Perhaps the IDR is playing a secondary role in stabilizing the hexamer?

      We have repeated the SEC-MALS experiments and included the full-length protein (Fig. 2). Furthermore, we have included SEC-MALS for some of the key substitution mutants (Figs. 5, S7). With the additional findings, our conclusions remain the same as in our previous version of the manuscript.

      4) The analysis of the phase separation results needs to have some extra quantification. The authors show that at 100 uM protein with 10% PEG the full-length phase separates as well as IDR+Q-rich. Lines 176-178: "The CTD, on the other hand, has no effect on the Q-rich domain condensates; Q-rich+CTD condensates formed at the same protein concentration and with identical droplet morphologies at the Q-rich domain alone." It is hard to draw this conclusion solely based on the data presented in Figure 3. An alternative interpretation might be that Q-rich+CTD reduces csat. I suggest the authors include turbidity assays (as shown for pH effect) to quantitively determine csat for these different constructs and perhaps perform FRAP to determine the mobility of these different constructs. In addition, how long after the addition of PEG were these droplets imaged?

      We now include an additional figure where we characterize condensates for full-length McdB (Fig. 3), including FRAP as suggested by the reviewer. We also include additional experiments for the truncations as requested (Fig. 4), and relate the truncation data to the model we propose for the full-length protein. All condensate samples were incubated for 30 mins prior to imaging unless otherwise stated, which we have added to the methods section “Microscopy of protein condensates”.

      5) Solubility assays shown in Figures 4A, B, D, and 5C are missing error bars. Without replicates, it is difficult to assess, for example, the effect of KCl.

      We have included replicates and error bars. Apologies for the omission.

      Also, please indicate the physiological ranges of KCl and pH in Figure 6. The phase separation sensitivity to pH is intriguing. By changing basic residues to glutamines, the authors conclude that the positive charge of the IDR modulates solubility. The Q-rich domain, however, is negatively charged. Can the authors comment on the role of acidic residues in the Q-rich domain? Are they required for phase separation? Also - based on your previous bioinformatics analysis, are the charges of the IDR and the Q-rich domains conserved across McdB homologs?

      Data from this report, and as described by reviewer #1, suggest that charge in the CTD, and not the central region, may be important. Our previous report (MacCready et al., Mol Biol Evol. 2020) touches on the conservation of charge in the NTD and CTD, which we have now added to the discussion section titled ““McdB homologs have polyampholytic properties between their N- and C-termini””. However, we were unable to experimentally verify electrostatic associations between the NTD and CTD because the CTD was not amenable to mutagenesis, as shown in our new data added to the manuscript (Figs. 5, S7).

      6) In previous work, the authors showed a conserved RKR segment in the IDR is highly conserved and missing in S. elongatus PCC 7942 (MacCready et al., Mol Biol Evol. 2020). Given the current finding, it would be important to understand whether the RKR deletion carries functional implications for phase separation behavior.

      The RKR segment is not missing, but likely relates to the KKR residues from S. elongatus PCC 7942. We describe this in more detail elsewhere (MacCready et al., Mol Biol Evol. 2020). However, as we show here, these specific residue locations do not seem to be especially important for condensate formation, but instead the overall net charge of the IDR mediates condensate solubility regardless of the specific residues mutated (Fig. 6).

      7) McdB proteins with 2Q left mutated vs. 2Q middle and 2Q right seem to result in condensates with different material properties (e.g., DIC pictures show different droplet morphologies for the different constructs). Is that the case? And if so, can you comment on that?

      We have included a brief mention of this in the text. However, the overall interpretation of these results remains that regardless of the residues mutated, there is a comparable degree of condensate solubilization for constructs with the same IDR net charge (Fig. 6).

      Reviewer #3 (Public Review):

      Through a series of rigorous in vitro studies, the authors determined McdB's domain architecture, its oligomerization domains, the regions required for phase separation, and how to fine-tune its phase separation activity. The SEC-MALS study provides clear evidence that the α-helical domains of McdB form a trimer-of-dimers hexamer. Through analysis of a small library of domain deletions by microscopy and SDS-PAGE gels of soluble and pellet fractions, the authors conclude that the Q-rich domain of McdB drives phase separation while the N-terminal IDR modulates solubility. A nicely executed study in Figure 4 demonstrated that McdB phase separation is highly sensitive to pH and is influenced by basic residues in the N terminal IDR. The study demonstrates that net charge, as opposed to specific residues, is critical for phase separation at 100 micromolar. In addition, the experimental design included analysis of McdB constructs that lack fluorescent proteins or organic dyes that may influence phase separation. Therefore, the observed material properties have full dependence on the McdB sequence.

      Thank you for the kind words and this perspective. We have added a brief mention to it in the discussion section titled “McdB condensate formation follows a nuanced, multi-domain mechanism”: “Furthermore, it should be noted that the McdB constructs used in our in vitro assays were free from fluorescent proteins, organic dyes, or other modification that may influence phase separation. Therefore, the observed material properties of these condensates have full dependence on the McdB sequence.”

      Studies of proteins often neglect short, disordered segments at the N- or C- terminus due to unclear models for their potential role. This study was interesting because it revealed a short IDR as a critical regulator of phase separation. This includes experiments that remove the IDR (Fig 2 & 3) and mutate the basic residues to show their importance towards McdB phase separation. In a nice set of SDS-PAGE experiments, the authors showed that as the net charge of the IDR decreased the construct became more soluble.

      One challenge is in the experimental design when mutating residues is to assess their impact on phase separation. The author's avoided substitutions to alanine, as alanine substitutions have synthetically stimulated phase separation in other systems. The authors, therefore, have a good rationale for selecting potentially milder mutations of lysine/arginine to glutamine. A potential caveat of mutation to glutamine is that stretches of glutamines have been associated with amyloid/prion formation. So, the introductions of glutamines into the IDR may also have unexpected effects on material properties. Despite these caveats, the authors show mutation of six basic residues in the short IDR abolished phase separation at 100 mM.

      Thank you for the thoughtful consideration, and appreciation of our work! Reviewer 1 had reservations for the Gln substitutions as well. We also used Alanine in new data added to the manuscript. But as the reviewer notes, the alanine mutations artificially drove further phase separation activity, and even aggregation. We show that mutants with the introduction of glutamines, however, remain soluble in vitro and in E. coli even at very high concentrations. Furthermore, we now include SEC-MALS of the McdB variant with 6 glutamines introduced in the IDR and show that there is no impact on oligomeric state. Together the data show no amylogenic properties of these glutamine enriched mutants.

      We have added a note to this potential caveat in the discussion section “McdB condensate formation follows a nuanced, multi-domain mechanism”: “Glutamine-rich regions are known to be involved in stable protein-protein interactions such as in coiled-coils and amyloids (52, 53), and expansion of glutamine-rich regions in some proteins lead to amylogenesis and disease (54, 55). However, when we introduced glutamines into the IDR of McdB solubility was increased both in vitro and in vivo, and without any impact on hexamerization. Together, the data show that increasing the glutamine content in the IDR of McdB did not lead to amylogenesis, but rather increased solubility. Our findings therefore underpin the importance of positive charge in the IDR specifically for stabilizing McdB condensates.”

      Computational studies (Fig 7) also suggest that this short N-IDR region may play a role as a MORF upon potential binding to a second protein McdA. The formulation of this hypothesis is strengthened by the fact that for other ParA/MinD-family ATPases, the associated partner proteins have also been shown to interact with their cognate ATPase via positively charged and disordered N-termini. This aspect of understanding McdB's N-IDR as a MORF is at a very early stage. This study lacks experimental evidence for an N-IDR: McdA interaction and experimental data showing conformational change upon McdA binding. However, the computation study sets up the future to consider whether and how the phase separation activity of McdB is related to its structural dynamics and interactions with McdA.

      Based off of these comments and from Reviewer 1 comments, we have removed the MoRF analyses entirely. The MoRF analysis will be coupled to another study in the lab focused on McdB interactions with McdA.

      In summary, this study provides a strong foundation for the contribution of domains to McdB's in vitro phase separation. This knowledge will inform and impact future studies on McdB regulating carboxysomes and how the related family of ParA/MinD-family ATPases and their cognate regulatory proteins. For example, it is unknown if and how McdB's phase separation is utilized in vivo for carboxysome regulation. However, the revealed roles of the Q-rich domain and N-IDR will provide valuable knowledge in developing future research. In addition, the systematic domain analysis of McdB can be combined with a similar analysis of a broad range of other biomolecular condensates in bacteria and eukaryotes to understand the design principles of phase separating proteins.

    1. Author Response

      Reviewer #1 (Public Review):

      When we tilt our heads, we do not perceive objects to be tilted or rotated. In this study, the authors investigate the underlying neural underpinnings by characterizing how neurons in monkey IT respond to objects when the entire body is tilted. They performed two experiments. In the first experiment, the authors record single neuron responses to objects rotating in the image plane, under two conditions - when the animals were tilted +20{degree sign} or -20{degree sign} relative to the gravitational vertical. Their main finding is that neural tuning curves for object orientation were highly correlated under these conditions. This high correlation is interpreted by the authors as indicative of encoding of object orientations relative to an absolute gravitational reference frame. To control for the possibility that the whole-body tilt could have induced compensatory torsional rotations of the eyes, the authors estimated the eye torsional rotation between the {plus minus}20{degree sign} whole-body tilt to be only {plus minus}6{degree sign}. In the second experiment, the authors recorded neural responses to objects rotated in the image plane with no whole-body tilt but with a visual horizon that could be tilted by the same {plus minus}20{degree sign} relative to the gravitational vertical. Here too they find many neurons whose tuning curves were correlated between the two horizon tilt conditions. Based on these results, the authors argue that IT neurons represent objects relative to the gravitational or absolute vertical.

      The question of whether the visual system encodes objects relative to the gravitational vertical is an interesting and basic one, and I commend the authors for attempting this question through systematic testing of object selectivity under conditions of whole-body tilt. However, I found this manuscript extremely difficult to read, with important analyses and controls described in a very cursory fashion. I also have several major concerns about these results.

      First, the high tuning correlation in the {plus minus}20{degree sign} whole-body tilt conditions could also occur if IT neurons encoded object orientation relative to other fixed contextual cues in the surrounding, such as the frame of the computer monitor. The authors ideally should have some experiment or analysis to address this potential confound, or else acknowledge that their findings can also be interpreted as the encoding of object orientation relative to contextual cues, which would dilute their overall conclusions.

      We think there are three possible interpretations of this comment. First, that visible edges, including the horizon and ground plane (in the scene stimuli), and the screen edges and other gravitationally aligned edges in the room, could serve as visual cues for the orientation of gravity. We agree with this wholeheartedly, and in fact showed a strong degree of gravitational alignment based purely on visual scene cues in Figures 3 and 4. This is consistent with our previous results suggest computation of gravity’s direction in the middle channel of IT (Vaziri et al., Neuron 2014; Vaziri and Connor, Current Biology 2016). Our findings would not be diluted by the fact that multiple cues, not just vestibular/somatosensory but also visual, could help in computing the direction of gravity.

      Second, that overlap between objects and horizon could produce a shape-configuration interaction that changes with object orientation and produces a tuning effect that remains consistent across monkey tilts. We agree this was a possibility, and that is why we tested neurons in the isolated object condition. We have added text to better explain this concern and the control importance of the isolated object condition in the discussion of Fig. 1: “The Fig. 1 example neuron was tested with both full scene stimuli (Fig. 1a), which included a textured ground surface and horizon, providing visual cues for the orientation of gravity, and isolated objects (Fig. 1b), presented on a gray background, so that primarily vestibular and somatosensory cues indicated the orientation of gravity. The contrast between the two conditions helps to elucidate the additional effects of visual cues on top of vestibular/somatosensory cues. In addition, the isolated object condition controls for the possibility that tuning is affected by a shape-configuration (i.e. overlapping orientation) interaction between the object and the horizon or by differential occlusion of the object fragment buried in the ground (which was done to make the scene condition physically realistic for the wide variety of object orientations that would otherwise appear improbably balanced on a hard ground surface).”

      The comparable results in the isolated object condition address the reasonable concern about the horizon/object shape configuration interaction.: “Similar results were obtained for a partially overlapping sample of 99 IT neurons tested with isolated object stimuli with no background (i.e. no horizon or ground plane) (Fig. 2b). In this case, 60% of neurons (32/53) showed significant correlation in the gravitational reference frame, 26% (14/53) significant correlation in the retinal reference frame, and within these groups 13% (7/53) were significant in both reference frames. The population tendency toward positive correlation was again significant in this experiment along both gravitational (p = 3.63 X 10–22) and retinal axes (p = 1.63 X 10–7). This suggests that gravitational tuning can depend primarily on vestibular/somatosensory cues for self-orientation.”

      Third, that the object and screen edges in the isolated object condition have an orientation interaction that influences tuning in a way that remains consistent across monkey tilt. If this was intended, we do not think this is a reasonable concern that needs mentioning in the paper itself. The closest screen edges on our large display were 28 in the periphery, and there is no reason to suspect that IT encodes orientation relationships between distant, disconnected visual elements. Screen edges have been present in all or most studies of IT, and no such interactions have been reported. We will discuss this point in online responses.

      Second, I do not fully understand torsional eye movements myself, but it is not clear to me whether this is a fixed or dynamic compensation. For instance, have the authors measured torsional eye rotations on every trial? Is it fixed always at {plus minus}6{degree sign} or does it change from trial to trial? If it changes, then could the high tuning correlation between the whole-body rotations be simply driven by trials in which the eyes compensated more? The authors must provide more data or analyses to address this important control.

      We now clarify that we could only measure ocular rotation outside the experiment with high-resolution closeup color photography, not possible on individual trials. The extensive literature on ocular counter-rotation has no indication that the degree of rotation is changed by any conditions other than tilt. Our measurements were consistent with previous reports showing that counterroll is limited to 20% of tilt. Moreover, they are consistent with our analyses showing that maximum correlation with retinal coordinates is obtained with a 6 correction for counterroll, indicating equivalent counterroll during experiments. Our analytical compensation for counterroll was based on this value, which optimized results in the retinal reference frame, so our measurements of counter-roll are used only to confirm this value. Ocular rotation would need to be five times greater than any previous observations to completely compensate for tilt and mimic the gravitational tuning we observed. For these reasons, counterroll is not a reasonable explanation for our results:

      “Compensatory ocular counter-rolling was measured to be 6 based on iris landmarks visible in high-resolution photographs, consistent with previous measurements in humans6,7, and larger than previous measurements in monkeys41, making it unlikely that we failed to adequately account for the effects of counterroll. Eye rotation would need to be five times greater than previously observed to mimic gravitational tuning. Our rotation measurements required detailed color photographs that could only be obtained with full lighting and closeup photography. This was not possible within the experiments themselves, where only low-resolution monochromatic infrared images were available. Importantly, our analytical compensation for counter-rotation did not depend on our measurement of ocular rotation. Instead, we tested our data for correlation in retinal coordinates across a wide range of rotational compensation values. The fact that maximum correspondence was observed at a compensation value of 6 (Figure 1–figure supplement 1) indicates that counterrotation during the experiments was consistent with our measurements outside the experiments.”

      Third, I find that when the objects were presented against a visual horizon, different object features are occluded at each orientation. This could reduce the correlation between the neural response in the retinal reference frame, thereby biasing all results away from purely retinal encoding. The authors should address this either through additional analyses or acknowledge this issue appropriately throughout.

      This idea of a shape interaction between object and horizon/ground is essentially the same concern discussed as the second interpretation of the first point, above. As outlined there, we addressed this concern in the best way possible, by removing the horizon/background (in the isolated object condition) and showing that the same results obtained. This comment raises the related point (also cured by the isolated object condition) of differential partial occlusion at the bottom of the object, 15% (by virtual mass) of which was buried below ground to provide a realistic physical interpretation for unbalanced orientations.

      We make both concerns explicit in the revised manuscript: “The Fig. 1 example neuron was tested with both full scene stimuli (Fig. 1a), which included a textured ground surface and horizon, providing visual cues for the orientation of gravity, and isolated objects (Fig. 1b), presented on a gray background, so that primarily vestibular and somatosensory cues indicated the orientation of gravity. The contrast between the two conditions helps to elucidate the additional effects of visual cues on top of vestibular/somatosensory cues. In addition, the isolated object condition controls for the possibility that tuning is affected by a shape-configuration (i.e. overlapping orientation) interaction between the object and the horizon or by differential occlusion of the object fragment buried in the ground (which was done to make the scene condition physically realistic for the wide variety of object orientations that would otherwise appear improbably balanced on a hard ground surface).”

      And we report that the control produces similar results in the absence of horizon/background: “Similar results were obtained for a partially overlapping sample of 99 IT neurons tested with isolated object stimuli with no background (i.e. no horizon or ground plane) (Fig. 2b). In this case, 60% of neurons (32/53) showed significant correlation in the gravitational reference frame, 26% (14/53) significant correlation in the retinal reference frame, and within these groups 13% (7/53) were significant in both reference frames. The population tendency toward positive correlation was again significant in this experiment along both gravitational (p = 3.63 X 10–22) and retinal axes (p = 1.63 X 10–7). This suggests that gravitational tuning can depend primarily on vestibular/somatosensory cues for self-orientation.”

      Reviewer #3 (Public Review):

      This is a very interesting study examining for the first time the influence of lateral tilt of the whole body on orientation tuning in macaque IT. They employed two types of displays: one in which the object was embedded in a scene that had a horizon and textured ground surface, and a second one with only the object. For the first type, they examined the orientation tuning with and without tilting the subject. However, the effect of tilt for the scene stimuli is difficult to interpret in terms of gravitational reference frame since varying the orientation of the object relative to the horizon leads to changes in visual features between the horizon and object. If neurons show tolerance for the global orientation of the scene (within the 50{degree sign} manipulation range) then the consistent orientation tuning across tilts may just reflect tuning for the object-horizon features (like the angle between the object and the horizon line/surface) that is tolerant for the orientation of the whole scene. Thus, the effects of tilt can be purely visually-driven in this case and may reflect feature selectivity unrelated to gravitation. The difference between retinal and gravitational effects can just reflect neurons that do not care about the scene/horizon background but only about the object and neurons that respond to the features of the object relative to the background. Thus, I feel that the data using scenes cannot be used unambiguously as evidence for a gravitational reference frame. The authors also tested neurons with an object without a scene, and these data provide evidence for a gravitational reference frame. The authors should concentrate on these data and downplay the difficult-to-interpret results using scenes.

      We still believe it is important to present these two experimental conditions in parallel, because we believe that visual driving of gravitational tuning by environmental cues is important in real life, and this is substantiated by the effects of visual cues alone. But, we have tried in this revision, in response to these comments and to comments from other reviewers, to clarify the potential concerns about visual effects in the full scene experiment, the importance and meaning of the isolated object condition as a control for concerns about other kinds of tuning, and the relationships between the two experimental conditions:

      Concerns about full scene experiment and the control importance of the isolated object condition: “The Fig. 1 example neuron was tested with both full scene stimuli (Fig. 1a), which included a textured ground surface and horizon, providing visual cues for the orientation of gravity, and isolated objects (Fig. 1b), presented on a gray background, so that primarily vestibular and somatosensory cues indicated the orientation of gravity. The contrast between the two conditions helps to elucidate the additional effects of visual cues on top of vestibular/somatosensory cues. In addition, the isolated object condition controls for the possibility that tuning is affected by a shape-configuration (i.e. overlapping orientation) interaction between the object and the horizon or by differential occlusion of the object fragment buried in the ground (which was done to make the scene condition physically realistic for the wide variety of object orientations that would otherwise appear improbably balanced on a hard ground surface) …

      Similar results were obtained for a partially overlapping sample of 99 IT neurons tested with isolated object stimuli with no background (i.e. no horizon or ground plane) (Fig. 2b). In this case, 60% of neurons (32/53) showed significant correlation in the gravitational reference frame, 26% (14/53) significant correlation in the retinal reference frame, and within these groups 13% (7/53) were significant in both reference frames. The population tendency toward positive correlation was again significant in this experiment along both gravitational (p = 3.63 X 10–22) and retinal axes (p = 1.63 X 10–7). This suggests that gravitational tuning can depend primarily on vestibular/somatosensory cues for self-orientation. However, we cannot rule out a contribution of visual cues for gravity in the visual periphery, including screen edges and other horizontal and vertical edges and planes, which in the real world are almost uniformly aligned with gravity and thus strong cues for its orientation (but see Figure 2–figure supplement 1). Nonetheless, the Fig. 2b result confirms that gravitational tuning did not depend on the horizon or ground surface in the background condition.”

      Cell-by-cell comparisons of scene and isolated stimuli, for those cells tested with both, in Figure 2–figure supplement 6. This figure shows 8 neurons with significant gravitational tuning only in the floating object condition, 11 neurons with tuning only in the gravitational condition, and 23 neurons with significant tuning in both. Thus, a majority of significantly tuned neurons were tuned in both conditions. A two-tailed paired t-test across all 79 neurons tested in this way showed that there was no significant tendency toward stronger tuning in the scene condition. The 11 neurons with tuning only in the gravitational condition by themselves might suggest a critical role for visual cues in some neurons. However, the converse result for 8 cells, with tuning only in the floating condition, suggests a more complex dependence on cues or a conflicting effect of interaction with the background scene for a minority of cells.

      Main text: “This is further confirmed through cell-by-bell comparison between scene and isolated for those cells tested with both (Figure 2–figure supplement 6).”

      Furthermore, the analysis of the single object data should be improved and clarified.

      We have added Figure 1–figure supplement 3–10 that expand the analysis of example cells and additional cells to include all stimuli shown and smoothed tuning curves for individual repetitions of the orientation range.

      We also now present results for individual monkeys in Figure 2–supplements 2,3, and the anatomical locations of individual neurons in Figure 2–supplements 4,5.

    1. Author Response

      Reviewer #1 (Public Review):

      Comment 1:

      The pharmacological tools used in this study are highly non-selective. Gd3+, used here to block NALCN is actually more commonly used to block TRP channels. 2-APB inhibits not only TRPC channels, but also TRPM and IP3 receptors while stimulating TRPV channels (Bon and Beech, 2013), while FFA actually stimulates TRPC6 channels while inhibiting other TRPCs (Foster et al., 2009).

      We agree with the reviewer that the substances mentioned are not specific. Although we performed shRNA experiments against NALCN and TRPC6, we do plan to use more specific pharmacological modulators for these two channels; for this, L703,606 (the antagonist of NALCN) [1] and larixyl acetate (a potent TRPC6 inhibitor) [2] will be used. Actually, we have completed experiments of using larixyl acetate and the results are shown in Author response image 1.

      Author response image 1.

      Example time-course (A), traces (B) and the summaried data (C) for the effect of larixyl acetate (LA), the antagonist of TRPC6 channel, on the spontaneous firing activity of VTA DA neurons. Paired-sample T test, ** P < 0.01. n is number of neurons recorded and N is number of mice used

      Comment 2:

      The multimodal approach including shRNA knockdown experiments alleviates much of the concern about the non-specific pharmacological agents. Therefore, the author's claim that NALCN is involved in VTA dopaminergic neuron pacemaking is well-supported.

      However, the claim that TRPC6 is the key TRPC channel in VTA spontaneous firing is somewhat, but not completely supported. As with NALCN above, the pharmacology alone is much too non-specific to support the claim that TRPC6 is the TRP channel responsible for pacemaking. However, unlike the NALCN condition, there is an issue with interpreting the shRNA knockdown experiments. The issue is that TRPC channels often form heteromers with TRPC channels of other types (Goel, Sinkins and Schilling, 2002; Strübing et al., 2003). Therefore, it is possible that knocking down TRPC6 is interfering with the normal function of another TRPC channel, such as TRPC7 or TRPC4.

      According with your advice, we plan to perform single-cell qPCR experiments to check the expression level of other TRPC channels, after selective knockdown of TRPC6 in VTA DAT+ neurons, results will be shown later in the revised version. From our single-cell RNA-seq results, TRPC7 and TRPC4 are found not to be present broadly like TRPC6 in the VTA DA neurons, therefore it is possible that knocking down TRPC6 maybe not interfering with the normal function of another TRPC channel, such as TRPC7 or TRPC4.

      Comment 3:

      The claim that TRPC6 channels in the VTA are involved in the depressive-like symptoms of CMUS is supported.

      However, the connection between the mPFC-projecting VTA neurons, TRPC6 channels, and the chronic unpredictable stress model (CMUS) of depression is not well supported. In Figure 2, it appears that the mPFC-projecting VTA neurons have very low TRPC6 expression compared to VTA neurons projecting to other targets. However, in figure 6, the authors focus on the mPFC-projecting neurons in their CMUS model and show that it is these neurons that are no longer sensitive to pharmacological agents non-specifically blocking TRPC channels (2-APB, see above comment). Finally, in figure 7, the authors show that shRNA knockdown of TRPC6 channels (in all VTA dopaminergic neurons) results in depressive-like symptoms in CMUS mice. Due to the low expression of TRPC6 in mPFC-projecting VTA neurons, the author's claims of "broad and strong expression of TRPC6 channels across VTA DA neurons" is not fully supported. Because of the messy pharmacological tools used, it cannot be clamed that TRPC6 in the mPFC-projecting VTA neurons is altered after CMUS. And because the knockdown experiments are not specific to mPFC-projecting VTA neurons, it cannot be claimed that reducing TRPC6 in these specific neurons is causing depressive symptoms.

      The reason we focused on the mPFC-projecting VTA DA neurons is that this pathway is indicated in depressive-like behaviors of the CMUS model[3-5]. Although mPFC-projecting VTA DA neurons seem have lower level of TRPC6, we reason they are still functional there. However, we do agree with the reviewer that the statement “broad and strong expression of TRPC6 channels across VTA DA neurons" is not fully supported. We have changed the statements based on the reviewer suggestion. Furthermore, we also plan to selectively knockdown TRPC6 in the mPFC-projecting VTA DA neurons, and then study the behavior.

      Comment 4:

      It is important to note that the experiments presented in Figure 1 have all been previously performed in VTA dopaminergic neurons (Khaliq and Bean, 2010) including showing that low calcium increases VTA neuron spontaneous firing frequency and that replacement of sodium with NMDG hyperpolarizes the membrane potential.

      We agree with reviewer that similar experiments have been performed previously [6]for the flow of our manuscript and for general readers.

      Comment 5:

      The authors explanation for the increase in firing frequency in 0 calcium conditions is that calcium-activated potassium channels would no longer be activated. However, there is a highly relevant finding that low calcium enhances the NALCN conductance through the calcium sensing receptor from Dejian Ren's lab (Lu et al., 2010) which is not cited in this paper. This increase in NALCN conductance with low calcium has been shown in SNc dopaminergic neurons (Philippart and Khaliq, 2018), and is likely a factor contributing to the low-calcium-mediated increase in spontaneous VTA neuron firing.

      We agree with the reviewer and thanks for the suggestions. A discussion for this has been added.

      Comment 6:

      One of the only demonstrations of the expression and physiological significance of TRPCs in VTA DA neurons was published by (Rasmus et al., 2011; Klipec et al., 2016) which are not cited in this paper. In their study, TRPC4 expression was detected in a uniformly distributed subset of VTA DA neurons, and TRPC4 KO rats showed decreased VTA DA neuron tonic firing and deficits in cocaine reward and social behaviors.

      We thank the reviewer for the suggestion.The references and a discussion for this has been added.

      Comment 7:

      Out of all seven TRPCs, TRPC5 is the only one reported to have basal/constitutive activity in heterologous expression systems (Schaefer et al., 2000; Jeon et al., 2012). Others TRPCs such as TRPC6 are typically activated by Gq-coupled GPCRs. Why would TRPC6 be spontaneously/constitutively active in VTA DA neurons?

      In a complex neuronal environment where VTA DA neurons are located, multiple modulatory factors including the GPCRs could be dynamically active, this could lead to the activation of TRP channels including TRPC6.

      Comment 8:

      A new paper from the group of Myoung Kyu Park (Hahn et al., 2023) shows in great detail the interactions between NALCN and TRPC3 channels in pacemaking of SNc DA neurons.

      The reference mentioned has been added. We thank the reviewer.

      Reviewer #2 (Public Review):

      Comment 1:

      These results do not show that TRPC6 mediates stress effects on depression-like behavior. As stated by the authors in the first sentence of the final paragraph, "downregulation of TRPC6 proteins was correlated with reduced firing activity of the VTA DA neurons, the depression-like behaviors, and that knocking down of TRPC6 in the VTA DA neurons confer the mice with depression behaviors." Therefore, the results show associations between TRPC6 downregulation and stress effects on behavior, occlusion of the effects of one by the other on some outcome measures, and cell manipulation effects that resemble stress effects. There is no experiment that shows reversal of stress effects with cell/circuit-specific TRPC6 manipulations. Please adjust the title, abstract and interpretation accordingly.

      We agree with the reviewer’s suggestion. The title was changed to ‘’The cation channel mechanisms of subthreshold inward depolarizing currents in the VTA dopaminergic neurons and their roles in the chronic stress-induced depression-like behavior” and the abstract and interpretation were also adjusted accordingly.

      Comment 2:

      Statistical tests and results are unclear throughout. For all analyses, please report specific tests used, factors/groups, test statistic and p-value for all data analyses reported. In some cases, the chosen test is not appropriate. For example, in Figure 6E, it is not clear how an experiment with 2 factors (stress and drug) can be analyzed with a 1-way RM ANOVA. The potential impact of inappropriate statistical tests on results makes it difficult to assess the accuracy of data interpretation.

      We have redone the statistical analysis as suggested by the reviewer and added specific tests used, factors/groups, test statistic and p-value for all data analyses into the revised manuscript.

      Comment 3:

      Why were only male mice used? Please justify and discuss in the manuscript. Also, change the title to reflect this.

      Although most similar previous studies used male mice or rats[7, 8], we do agree with the reviewer that the female animals should also be tested, in consideration possible role of sex hormones, as such we plan to repeat some key experiments on female mice.

      Comment 4:

      Number of recorded cells is very low in Figure 1. Where in VTA did recordings occur? Given the heterogeneity in this brain region, this n may be insufficient. Additional information (e.g., location within VTA, criteria used to identify neurons) should be included. Report the number of mice (i.e., n = 6 cells from X mice) in all figures.

      Yes indeed, the number here is not high. More experiments will be performed to increase the N/n number. And the location of recorded cells in VTA and the number of used mice are now shown in all figures; criteria to identify neurons is stated in the Methods- Identification of DA neurons and electrophysiological recordings. At the end of electrophysiological recordings, the recorded VTA neurons were collected for single-cell PCR. VTA DA neurons were identified by single-cell PCR for the presence of TH and DAT.

      Comment 5:

      Authors refer to VTA DA neurons as those that are DAT+ in line 276, although TH expression is considered the standard of DAergic identity, and studies (e.g., Lammel et al, 2008) have shown that a subset of VTA DA neurons have low levels of DAT expression. Authors should reword/clarify that these are DAT-expressing VTA DA neurons.

      The study published by Lammel[9] in 2015 has shown the low dopamine specificity of transgene expression in ventral midbrain of TH-Cre mice; on the other hand, DAT-Cre mice exhibit dopamine-specific Cre expression patterns, although DAT-Cre mice are likely to suffer from their own limitations (for example, low DAT expression in mesocortical DA neurons may make it difficult to target this subpopulation, see Lammel et al., 2008[10]). Hence, in our study, the DAT was used as criteria to identify DAT neurons. Of course, TH and DAT were all tested in single-cell PCR to identify whether the recorded cells were DA neurons.

      Comment 6:

      Neuronal subtype proportions should be quantified and reported (Fig. 1Aii).

      Neuronal subtype proportions are now quantified and reported in Fig. 1Aii.

      Comment 7:

      In addition to reporting projection specificity of neurons expressing specific channels, it would be ideal to report these data according to spatial location in VTA.

      The spatial location of recorded cells in VTA are now shown in all figures.

      Comment 8:

      The authors state that there are a small number of Glut neurons in VTA, then they state that a "significant proportion" of VTA neurons are glutamatergic.

      Thanks, “a significant proportion of neurons” has been changed to “ less than half of sequenced DA neurons”.

      Comment 9:

      It is an overstatement that VTA DA neurons are the key determinant of abnormal behaviors in affective disorders.

      Thanks, we have amended the statement to that “Dopaminergic (DA) neurons in the ventral tegmental area (VTA) play an important role in mood, reward and emotion-related behaviors”.

      Reviewer #3 (Public Review):

      Comment 1:

      The authors of this study have examined which cation channels specifically confer to ventral tegmental area dopaminergic neurons their autonomic (spontaneous) firing properties. Having brought evidence for the key role played by NALCN and TRPC6 channels therein, the authors aimed at measuring whether these channels play some role in so-called depression-like (but see below) behaviors triggered by chronic exposure to different stressors. Following evidence for a down-regulation of TRPC6 protein expression in ventral tegmental area dopaminergic cells of stressed animals, the authors provide evidence through viral expression protocols for a causal link between such a down-regulation and so-called depression-like behaviors. The main strength of this study lies on a comprehensive bottom-up approach ranging from patch-clamp recordings to behavioral tasks. However, the interpretation of the results gathered from these behavioral tasks might also be considered one main weakness of the abovementioned approach. Thus, the authors make a confusion (widely observed in numerous publications) with regard to the use of paradigms (forced swim test, tail suspension test) initially aimed (and hence validated) at detecting the antidepressant effects of drugs and which by no means provide clues on "depression" in their subjects. Indeed, in their hands, the authors report that stress elicits changes in these tests which are opposed to those theoretically seen after antidepressant medication. However, these results do not imply that these changes reflect "depression" but rather that the individuals under scrutiny simply show different responses from those seen in nonstressed animals. These limits are even more valid in nonstressed animals injected with TRPC6 shRNAs (how can 5-min tests be compared to a complex and chronic pathological state such as depression?). With regard to anxiety, as investigated with the elevated plus-maze and the open field, the data, as reported, do not allow to check the author's interpretation as anxiety indices are either not correctly provided (e.g. absolute open arm data instead of percents of open arm visits without mention of closed arm behaviors) or subjected to possible biases (lack of distinction between central and peripheral components of the apparatus).

      We agree with the reviewer that behavior tests we used here is debatable whether they represent a real depression state, and this is an open question that could be discussed from different respective. Since these testes (forced swimming and tail suspension), as the reviewer noted, were “widely observed in numerous publications”, we used these seemly only options to reflect a “depression-like” state. One could argue that since these testes were initially used for testing antidepressants (“validated”), with decreased immobility time as indications of anti-depressive effects, why not an increased immobility time reflect a “depression-like” state. As for anxiety tests, both absolute time in open and closed arms are now provided.

    1. Author Response

      Reviewer #1 (Public Review):

      This study optimized a protocol for analyzing microplastics (MPs) in bovine and human follicular fluid. The authors identified the most common plastic polymers in the follicular fluid and assessed the impact of polystyrene beads on bovine oocyte maturation based on the concentration of MPs in follicular fluid. The authors found a decrease in maturation rate in the presence of polystyrene beads and conducted proteomic analysis of oocytes treated with and without MPs, revealing protein alterations.

      Strengths:

      • The optimization of the protocol for analyzing MPs in follicular fluid, which is important for future research in this area.

      • Investigating the effects of MPs on oocyte maturation and proteomic profiles is significant.

      Thank you for the summary and for highlighting our manuscript’s strengths. Weaknesses:

      • The effects of polystyrene beads on oocyte maturation and proteomic profiles are not directly demonstrated, and insufficient analysis is performed to support the claims made in the manuscript.

      We disagree with this statement, as we have shown that the oocyte maturation is affected by the PS beads, which clearly have some effects on the zona pellucida as well, all supported by well thought experimental analysis. Regarding the proteomics data, as suggested to be emphasized by reviewer 3, in the oocyte maturation experiment the PS exposure was performed using cumulus-oocyte-complexes and we believe that the cumulus cells might have a protective role (to a certain extent) to the oocyte. At first, we have performed different methods to try and check incorporation of PS beads into oocyte and cumulus cells but, unfortunately, we could not validate a protocol for that. Therefore, although we have seen some changes on proteomics, indeed we were not able to directly demonstrate which pathways could have been responsible for the decreased oocyte maturation and increased zona pellucida fragility.

      • The use of polystyrene beads does not fully mimic the concentration and interaction of MPs in follicular fluid, which warrants careful interpretation and discussion.

      We are aware that the concentration of polystyrene (PS) used in our experiments (0.01ug/mL and 0.1ug/mL) did not fully represent the PS concentrations found in human and bovine follicular fluid (FF) (0.0013 and 0.0043 ug/mL). We note though that PS is not the only MPs detected in the FF and, in this study we selected PS concentrations that were in the range of the total MPs found in FF (0.102 and 0.025 ug/mL, for human and bovine, respectively). We will carefully re-read and revise the manuscript in order to ensure that we are not at risk of misguiding readers on the environmental relevance of the chosen experimental concentrations. Nevertheless, we firmly believe that our study was performed using a substantially more realistic concentration than the overwhelming majority of existing studies, which tend to use hundreds of thousands of times more plastic than what is naturally occurring (as described by Mills et al. - https://doi.org/10.1186/s43591-023-00059-1).

      • A major weakness is the lack of mechanism. Determining the cause of meiotic arrest (decreased maturationrate) would be needed to strengthen the paper. Are spindle morphology, chromosome morphology/alignment and/or spindle assembly checkpoint mechanism perturbed in MPs-treated oocytes?

      • Functional assays to validate one or more of the pathways suggested by the proteomic analysis would be necessary to strengthen the paper.

      We appreciate that understanding the mechanisms underlying the observed changes is important, however, prior to this work, little was known about the effects of MPs on reproductive health. As such, the experimental plan for this work was focused on providing an assessement of the extent to which MPs occur in reproductive systems, and the effect of these MPs on general metrics of oocyte health and function. It is only with this baseline knowledge that experiments aimed at studying the mechanisms underlying these changes can/should be designed, which we will certainly consider for future research.

      • The analysis of broken zona pellucida is not sufficiently convincing. Definitely the breakage of zona pellucida is most likely a result of oocyte denudation. However, this may indicate increased fragility of polystyrene beads-treated oocytes. Investigating cytoskeletal components in oocytes treated with or without polystyrene beads would strengthen this paper.

      Indeed, the reviewer is correct that the breakage of the zona pellucida happened during denudation. Yet, because all groups were processed in the exact same way, the differences we observed between our experimental and control groups clearly indicate that the PS beads are causing some form of damage to the zona pellucida, or indirect effects through cumulus-oocyte interactions, irrespective of the initial breakage. This is a question we want to answer in future experiments.

      • The percentage of degenerated oocytes in the control group is abnormally high which raises concern that the oocytes are not healthy.

      The reviewer is correct in noting that the baseline number of degenerated oocytes is high. This is unlikely to be due to oocyte health, and is more likely attributed to the fact that the students that were working on this experiment had a period of adaptation to learn to work with these cellular types. In this regard, it is important to mention that we designed the experiment such that this effect was evenly distributed throughout all of the groups. In other words, the technique refinement did not introduce any systematic bias into the data. Thus, while the baseline number of degenerated oocytes is high, we are confident that the effects of MPs are robust.

      • The small font size of the figures (such as Fig. 1C) affects the quality of the manuscript.

      Thank you for pointing this out. We will improve readability of all our figures for a resubmission.

      • Finally, the authors should cite previous publications on the effects of MPs on female reproduction, as this is not a novel area of research, despite the use of different concentrations. For example, "Polystyrene microplastics lead to pyroptosis and apoptosis of ovarian granulosa cells via NLRP3/Caspase-1 signaling pathway in rats (DOI: 10.1016/j.ecoenv.2021.112012)".

      Yes, absolutely. We we will include this interesting and relevant work in our revised mansucript.

      Reviewer #2 (Public Review):

      This study presents valuable findings including the use of an improved method of Raman spectroscopy to measure accumulation of microplastics in ovarian follicular fluid obtained from cows and women and demonstration that experimental direct exposure of bovine eggs to biologically relevant levels of polystyrene, a microplastic found in both cows and women's follicular fluid, negatively influenced ova maturation status and the abundance of proteins involved in oxidative stress, DNA damage, apoptosis, and oocyte maturation.

      Thank you for the summary and for highlighting our manuscript’s strengths.

      The evidence supporting the claims of the authors is solid but inclusion of human population from which the follicular fluid was obtained (e.g., demographics, reason for assisted reproduction),

      Agreed. We will include all information regarding the reason for IVF, age, BMI, and IVF outcomes in the revised manuscript.

      and details about quality control for proteome profiling experiments (i.e., peptide count cut-off for significant proteins) would have strengthened the study. The work will be of interest to exposure scientists, reproductive toxicologists, regulatory scientists, and reproductive health clinicians.

      For protein identification, the default settings of MaxQuant were used. In brief, proteins are only considered as identified with at least one unique or razor peptide. Razor peptides are non-unique and assigned to a single protein to ensure that they are only used once for identification. Additionally, a false discovery rate of 1% was applied using a decoy sequence database approach. Quantification was performed on proteins with at least two different peptides. We will include this information in the revised manuscript.

      Reviewer #3 (Public Review):

      The study from Grechi et al showed that emerging environmental microplastics (MPs) are present in both human and bovine follicular fluid. Moreover, based on the characterization and quantification data, authors treated bovine oocytes with environmentally relevant levels of polystyrene (PS) MPs and found that PS MPs interfered with oocyte maturation in vitro. This study is novel, particularly the first part of MP characterization and quantification, and for the first time confirms the presence of MPs in follicular fluid of humans and large farm animals. These results provide a possible mechanism by which the female infertility rate has been increasing in both humans and large farm animals.

      Thank you for the summary and for highlighting our manuscript’s novelty.

      The session of exposing MPs to bovine and related oocyte health evaluation can be further improved. For example, authors examined the morphology of the oocyte zona pellucida (ZP) and degeneration and stained oocyte DNA to determine the meiotic maturation status. However, a much more comprehensive oocyte health evaluation can be performed including but not limited to the examination of oocyte spindle morphology, meiotic division, fertilization, early embryo development, mitochondria, and accumulation of ROS. These additional endpoints can provide more robust evidence to determine the impact of MPs on oocyte health.

      We agree with the reviewer that a more comprehensive oocyte health evaluation can be performed. Doing so, however, is beyond the scope of any single study as there are many different pathways and mechanisms by which MPs may be affecting oocytes and attempting to include all of these experiments in a single study is simply not feasible. Indeed, we plan on continuing along this line of work in future experiments.

      While the oocyte proteomic analysis identified altered proteins, more functional studies and causation experiments can be performed.

      As noted in our reply to reviewer 1, we appreciate that understanding the mechanisms underlying the observed changes is important, however, prior to this work, little was known about the effects of MPs on reproductive health. As such, the experimental plan for this work was focused on providing an assessement of the extent to which MPs occur in reproductive systems, and the effect of these MPs on general metrics of oocyte health and function. It is only with this baseline knowledge that experiments aimed at studying the mechanisms underlying these changes can/should be designed, which we will certainly consider for future research.

      In addition, authors exposed cumulus-oocyte-complexes (COCs) but not denuded oocytes with MPs, it is crucial to determine whether MPs accumulate in cumulus cells or oocytes or both as well as the compromised oocyte quality is caused by the direct effect of MPs or the indirect impact on somatic cumulus cells to cause a secondary effect on the oocytes.

      As stated previously, at first, we have performed different methods to try and check incorporation of PS beads into oocyte and cumulus cells but, unfortunately, we could not validate a protocol for that. Therefore, although we have seen some changes on proteomics, indeed we were not able to directly demonstrate which pathways could have been responsible for the decreased oocyte maturation and increased zona pellucida fragility, and what is the possible role of the cumulus cells on it.

    1. Author Response

      Reviewer #1 (Public Review):

      [...] This study brings a lot of new information on the regulation of flagellar genes, from the identification of novel sigma 28-dependent sRNAs to their effects on flagella production and motility. It represents a considerable amount of work; the experimental data are clear and solid and support the conclusions of the paper. Even though mechanistic details underlying the observed regulations by MotR or FliX sRNAs are lacking, the effect of these sRNAs on fliC, several rps/rpl genes, and flagellar genes and motility is convincing.

      The connection between r-protein genes regulation and flagellar operons is exciting and raises a few questions. First, from the RILseq data, chimeric reads with mRNA for r-proteins (including rpsJ) are not restricted to the sigma 28-dependent sRNAs (e.g. rpsJ-sucD3'UTR, rpsF-DicF, rplN-DicF, rplK-ChiX, rplU-CyaR, rpsT-CyaR, rpsK-CyaR, rpsF-MicA...), suggesting that regulation of r-protein synthesis by sRNAs is not necessarily related to flagella/motility. Second, it would be interesting to know if the flagellar operons are more sensitive than other long operons to antitermination following MotR overexpression? In other words, does pMotR similarly affect antitermination in rrn or other long operons?

      The general effect of pMotR or pFliX on the expression of multiple middle and late flagellar genes is also interesting even though the mechanism is not clear. While it may be difficult to fully address it, testing whether some of these regulatory events depend on the control of fliC and/or the S10 operon could be relevant (by analyzing the effects in strains deleted for fliC or nusB for instance).

      We also think the connection between r-protein genes regulation and flagellar operons is exciting and raises some intriguing questions. While there are other RIL-seq chimeras for r-protein genes, the highest numbers are found for MotR and FliX. Nevertheless, understanding the impact of these other sRNAs on the r-protein operons and elucidating which long operons are most sensitive to antitermination following MotR overexpression are important directions for further studies.

      Reviewer #2 (Public Review):

      [...] This is a very interesting study that shows how sRNA-mediated regulation can create a complex network regulating flagella synthesis. The information is new and gives a fresh outlook at cellular mechanisms of flagellar synthesis. The presented work could benefit from additional experiments to confirm the effect of endogenous sRNAs expressed at natural level.

      We agree that experiments regarding the endogenous effects of endogenous sRNAs are important. We provide such data in Figures 8 and S14 for MotR and FliX in a variety of assays: flagella numbers by electron microscopy, motility and competition assays, expression of flagellar genes by RT-qPCR and western analysis. We went to the trouble of constructing strains carrying point mutations in the chromosomal copies of these genes rather than deletions to avoid interfering with expression of motA and fliC given that MotR and FliX encompass the 5’ and 3’ UTRs respectively.

      Reviewer #3 (Public Review):

      [...] Overall, this comprehensive study expands the repertoire of characterized UTR derived sRNAs and integrate new layers of post-transcriptional regulation into the highly complex flagellar regulatory cascade. Moreover, these new flagella regulators (MotR, FliX) act non-canonically, and impact protein expression of their target genes by base-pairing with the CDS of the transcripts. Their findings directly connect flagella biosynthesis and motility, highly energy consuming processes, to ribosome production (MotR and FliX) and possibly to carbon metabolism (UhpU).

      Specific points to be considered:

      • The authors use a crl- hyper-motile strain as WT strain for the study and sometimes also a crl+ strain is used. Can the authors comment on potential reasons why some phenotypes (e.g., UhpU and MotR effects on motility) are only detectable in the crl+ strain or vice versa? Is σS regulation important for the function of these sRNAs?

      • In several experiments, a variant of MotR sRNA, MotR that harbors a 3 nt mutation upstream of the seed sequence is used and seems to mediate stronger phenotypes (impact on flagellar number) upon overexpression compared to WT or phenotypes not retrieved for WT MotR (increased flagellin expression). It would be helpful to have some more clarification throughout the text, why this variant was used, even when OE of WT MotR already has impact on the target and how these three mutated nucleotides impact target regulation. For example, does MotR show increased RNA stability or Hfq binding compared to MotR? Does the mutation in MotR* impact MotR structure (e.g., based on secondary structure predictions) or increase the complementarity with selected targets at potential secondary binding sites (e.g., based on target predictions)? For example, Fig. S7 shows additional regions of interaction between MotR and fliC mRNA beside the seed sequence. It is also suggested that MotR might have multiple interaction sites on rpsJ mRNA. Additional structure probing or biocomputational predictions could clarify these points.

      • It is suggested that UphU impacts on motility via regulation of LrhA, which represses transcription of flhDC, and therefore the flagellar cascade. While LhrA-mediated regulation by UphU is validated based on reporter genes, the effect of UhpU OE on FlhDC levels is not directly examined (Fig. 3). Furthermore, as deletion of LrhA de-represses the flagellar cascade and UhpU was also shown to increase motility, the conclusions could be further strengthened by examining flhDC levels and/or the effect of ∆UhpU (if the sRNA part can be deleted) on motility (reduction) due to relieved down-regulation of LrhA.

      • This study provides many opportunities for future follow-work. Now that the four sRNAs and some of their targets and opposing effects on flagella biogenesis have been identified, it will be interesting to see how the sRNAs themselves are temporally regulated throughout the flagella biogenesis cascade and which other targets are regulated by them. Future studies could also provide insights into the mechanism and function of FlgO sRNA, which seems to act via a different mechanism than base-pairing to target RNAs, as well as the global effects of regulation of ribosomal genes via FliX and MotR.

      We thank the reviewer for the constructive comments about the variation between the crl- and crl+ strains, and about the use of MotR versus MotR*, and will address these points in a revised version of the manuscript. Regarding the UhpU-mediated regulation, we agree that assays of flhDC expression will strengthen our conclusions. We share the reviewer opinion regarding many opportunities for future follow-up work.

    1. Author Response

      Reviewer #1 (Public Review):

      This article describes the development and refinement of an open-source software framework that is used to track how the COVID-19 pandemic impacted healthcare use in England over a range of key healthcare use indicators.

      Important strengths of this study include the high coverage of 99% of practices in England, the development of health care indicators with the input of a clinical advisory group, extensive online documentation, and rigorous safeguards for the protection of patient confidentiality.

      Perhaps the largest limitation is that only high-level descriptive data on the monthly volume of health outcomes are presented. It is not clear whether the system could be used to generate more fine-grained or stratified information, ex. weekly or daily data, or data stratified by important characteristics of practices or of patient characteristics. As such, the utility of the system for answering new scientific questions is unclear, and also what the utility and long-term potential uses of this system will be past the COVID-19 pandemic.

      OpenSAFELY allows access to the full primary care record for patients registered with a TPP or EMIS practice in England.This includes medical diagnoses, clinical tests, prescriptions, as well as demographic details such as age, sex, ethnicity. Dates attached to these records allow for daily analyses to be performed. This data is updated weekly. Through linkage of other data sources, it also provides information such as hospital admissions, registered deaths or COVID-19 testing data. Detailed subgroup analysis is possible; OpenSAFELY has already been used to understand disease risk 1, monitor vaccination coverage 2,3 and novel treatments 4, assess patient safety 5, inform public health guidance and policy and much more6. These are all widely applicable beyond the COVID-19 pandemic.

      Reviewer #3 (Public Review):

      This manuscript by Fisher and colleagues documents the change in clinical activity in English general practices during the COVID-19 pandemic according to a set of indicators of clinical activity. The indicators include measures of clinical reviews (e.g. blood pressure, asthma, chronic obstructive pulmonary disease, medication, and cardiovascular risk reviews), blood tests (e.g. cholesterol, liver function, thyroid function, full blood counts, diabetes monitoring blood tests, and kidney function). All these measures saw a drop during the pandemic, to a varying degree, and some recovered afterwards but others did not.

      Clinical activity was measured using SNOMED CT codes, which are standard codes used for recording clinical events in UK GP records.

      Strengths:

      This is a large and comprehensive study including data from 99% of general practices in England. The indicators are clinically relevant, cover a broad range of disease areas, and have been chosen in a sensible manner, involving relevant stakeholders such as GPs, pharmacists, and pathologists.

      The OpenSAFELY platform has the ability to enable federated analyses to be run on raw coded data of almost all patients registered with a GP in England.

      The study demonstrates the value of OpenSAFELY in being able to monitor clinical activity in general practice at a detailed level, which is essential for planning and improving health services. The statistical methodology is broadly sound.

      Weaknesses:

      The measures are all related to chronic physical diseases in adults, with a particular focus on cardiometabolic and respiratory conditions. There are no measures related to mental health, maternal or child health.

      Results from preliminary analyses of a wider range of clinical conditions can be found in our previous work7. This includes mental health and female and reproductive health with details on why these were not covered by the initial key measures described.

      The description of the measures does not distinguish between different types of clinical activity e.g. lab tests, clinical measurements, or diagnoses, and all are lumped together as 'codes'. This is a peculiarity of the way that information is recorded in GP systems - many different types of clinical information (such as diagnoses and lab tests) are recorded using a SNOMED CT 'code', and only the exact code differentiates what type of information is in the record.

      Multiple codes of different types can arise from a single encounter, all of which could be indicative of a clinical event of interest. The codelists for each key measure, available at opencodelists.org shows the type of clinical activity (e.g procedure or observable entity) captured by each code within the codelist (see e.g.https://www.opencodelists.org/codelist/opensafely/red-blood-cell-rbc-tests/576a859e/#tree).

      The codelists were broad and comprehensive, but it is unclear how necessary this is because for some measures e.g. lab tests, laboratories typically record a particular type of test using a single standardised code. Instead of using a broad set of codes in the analysis, the authors could have initially verified which codes are associated with the clinical activity being measured (e.g. a numerical value of a blood pressure measurement) in all practices, as I would expect the same single or small number of codes would be used in all practices. This would have provided a smaller and simpler final codelist.

      Supplementary table 1 shows up to 5 of the most common codes for each key measure across the two electronic health record (EHR) systems used in this analysis. This shows that whilst a single code is often used for many of the clinical activities assessed here, there are exceptions and there can be variation in coded activity between different EHR systems. We have previously described how design features of EHR systems can impact clinical practice 8. Broad codelists allow us to capture activity across multiple EHR systems.

      1. Williamson, E. J. et al. Factors associated with COVID-19-related death using OpenSAFELY. Nature 584, 430–436 (2020).
      2. Trends and clinical characteristics of 57.9 million COVID-19 vaccine recipients: a federated analysis of patients’ primary care records in situ using OpenSAFELY | British Journal of General Practice. https://bjgp.org/content/early/2021/11/08/BJGP.2021.0376.
      3. Parker, E. P. et al. Factors associated with COVID-19 vaccine uptake in people with kidney disease: an OpenSAFELY cohort study. BMJ Open 13, e066164 (2023).
      4. Green, A. C. A. et al. Trends, variation, and clinical characteristics of recipients of antiviral drugs and neutralising monoclonal antibodies for covid-19 in community settings: retrospective, descriptive cohort study of 23.4 million people in OpenSAFELY. BMJ Med. 2, (2023).
      5. Collaborative, T. O. et al. Potentially inappropriate prescribing of DOACs to people with mechanical heart valves: a federated analysis of 57.9 million patients’ primary care records in situ using OpenSAFELY. 2021.07.27.21261136 https://www.medrxiv.org/content/10.1101/2021.07.27.21261136v1 (2021) doi:10.1101/2021.07.27.21261136.
      6. OpenSAFELY Pubmed search results. PubMed https://pubmed.ncbi.nlm.nih.gov/?term=OpenSAFELY.
      7. OpenSAFELY NHS Service Restoration Observatory 2: changes in primary care activity across six clinical areas during the COVID-19 pandemic | medRxiv. https://www.medrxiv.org/content/10.1101/2022.06.01.22275674v1.
      8. Suboptimal prescribing behaviour associated with clinical software design features: a retrospective cohort study in English NHS primary care | British Journal of General Practice. https://bjgp.org/content/70/698/e636.
    1. Author Response

      eLife assessment:

      This important study represents a comprehensive computational analysis of Plasmodium falciparum gene expression, with a focus on var gene expression, in parasites isolated from patients; it assesses changes that occur as the parasites adapt to short-term in vitro culture conditions. The work provides technical advances to update a previously developed computational pipeline. Although the findings of the shifts in the expression of particular var genes have theoretical or practical implications beyond a single subfield, the results are incomplete and the main claims are only partially supported.

      The authors would like to thank the reviewers and editors for their insightful and constructive assessment. We are particularly glad to read of the technical advances of the methods developed here. We will rephrase parts of the manuscript and move some analysis to the supplementary materials. This will improve the clarity of the results and ensure the main claims are supported.

      Reviewer #1 (Public Review):

      The authors took advantage of a large dataset of transcriptomic information obtained from parasites recovered from 35 patients. In addition, parasites from 13 of these patients were reared for 1 generation in vivo, 10 for 2 generations, and 1 for a third generation. This provided the authors with a remarkable resource for monitoring how parasites initially adapt to the environmental change of being grown in culture. They focused initially on var gene expression due to the importance of this gene family for parasite virulence, then subsequently assessed changes in the entire transcriptome. Their goal was to develop a more accurate and informative computational pipeline for assessing var gene expression and secondly, to document the adaptation process at the whole transcriptome level.

      Overall, the authors were largely successful in their aims. They provide convincing evidence that their new computational pipeline is better able to assemble var transcripts and assess the structure of the encoded PfEMP1s. They can also assess var gene switching as a tool for examining antigenic variation. They also documented potentially important changes in the overall transcriptome that will be important for researchers who employ ex vivo samples for assessing things like drug sensitivity profiles or metabolic states. These are likely to be important tools and insights for researchers working on field samples.

      One concern is that the abstract highlights "Unpredictable var gene switching..." and states that "Our results cast doubt on the validity of the common practice of using short-term cultured parasites...". This seems somewhat overly pessimistic with regard to var gene expression profiling and does not reflect the data described in the paper. In contrast, the main text of the paper repeatedly refers to "modest changes in var gene expression repertoire upon culture" or "relatively small changes in var expression from ex vivo to culture", and many additional similar assessments. On balance, it seems that transition to culture conditions causes relatively minor changes in var gene expression, at least in the initial generations. The authors do highlight that a few individuals in their analysis showed more pronounced and unpredictable changes, which certainly warrants caution for future studies but should not obscure the interesting observation that var gene expression remained relatively stable during transition to culture.

      Thank you for the suggestion and we are happy to modify the wording to ensure the correct results are presented. We will reword the abstract and emphasise the main change was observed in the core transcriptome. We will also add clarity to the different var transcriptome results presented.

      It is important to note this study was in a unique position to assess changes at the individual patient level as we had successive parasite generations. This is not done in most cross-sectional studies and therefore these small changes in the var transcriptome would have been missed.

      Reviewer #2 (Public Review):

      In this study, the authors describe a pipeline to sequence expressed var genes from RNA sequencing that improves on a previous one that they had developed. Importantly, they use this approach to determine how var gene expression changes with short-term culture. Their finding of shifts in the expression of particular var genes is compelling and casts some doubt on the comparability of gene expression in short-term culture versus var expression at the time of participant sampling. The authors appear to overstate the novelty of their pipeline, which should be better situated within the context of existing pipelines described in the literature.

      Other studies have relied on short-term culture to understand var gene expression in clinical malaria studies. This study indicates the need for caution in over-interpreting findings from these studies.

      The novel method of var gene assembly described by the authors needs to be appropriately situated within the context of previous studies. They neglect to mention several recent studies that present transcript-level novel assembly of var genes from clinical samples. It is important for them to situate their work within this context and compare and contrast it accordingly. A table comparing all existing methods in terms of pros and cons would be helpful to evaluate their method.

      We are grateful for this suggestion and agree that a table comparing the pros and cons of all existing methods would be helpful for the reader, not just malaria researchers. This will also highlight the key benefits of our new approach. This will be included in the updated manuscript as a supplementary table.

      Reviewer #3 (Public Review):

      This work focuses on the important problem of how to access the highly polymorphic var gene family using short-read sequence data. The approach that was most successful, and utilized for all subsequent analyses, employed a different assembler from their prior pipeline, and impressively, more than doubles the N50 metric.

      The authors then endeavor to utilize these improved assemblies to assess differential RNA expression of ex vivo and short-term cultured samples, and conclude that their results "cast doubt on the validity" of using short-term cultured parasites to infer in vivo characteristics. Readers should be aware that the various approaches to assess differential expression lack statistical clarity and appear to be contradictory. Unfortunately there is no attempt to describe the rationale for the different approaches and how they might inform one another.

      It is unclear whether adjusting for life-cycle stage as reported is appropriate for the var-only expression models. The methods do not appear to describe what type of correction variable (continuous/categorical) was used in each model, and there is no discussion of the impact on var vs. core transcriptome results.

      The reviewer raises a fair point, and we agree the different methods and results of the var transcriptome analysis are difficult to interpret together without further clarification. Var transcript differential expression analysis has been used several times previously and hence was used here. As mentioned above, this study was in a unique position to perform a more focussed analysis of var transcriptional changes across paired samples. This allowed for changes in the var transcriptome to be identified that would have gone unnoticed in the "traditional" differential expression analysis. To address this point, we will add further explanation to the results and move the var differential expression analysis to the supplementary, to allow for comparison with previous studies.

      We thank the reviewer for this highly important comment about adjusting for life cycle stage. Var gene expression is highly stage dependent, so any quantitative comparison between samples does need adjustment for developmental stage. Var gene expression was adjusted for in the differential expression analysis by using the mixture model determined proportions as covariates in the design matrix. The var group level analysis and the global var gene expression analysis was also adjusted for life cycle stage using the same proportions, by including them as an independent variable. The rank-expression analysis did not have adjustment for life cycle stage as the values were determined as a percentage contribution to the total var transcriptome.

      We will update the methods section to ensure this is clearer.

    1. Author Response

      eLife assessment

      This important study addresses both the native role of the Plasmodium falciparum protein PfFKBP35 and whether this protein is the target of FK506, an immunosuppressant with antiplasmodial activity. The genetic evidence for the essentiality of FKBP35 in parasite growth is compelling. However, the conclusion that the role of FKBP35 is to secure ribosome homeostasis and the claim that FK506 exerts its antimalarial activity independently of FKBP35 rely on incomplete evidence.<br />

      We thank the Reviewers and Editors for their careful evaluation of our manuscript and the constructive criticism. We realized that some of our conclusions may be regarded/misunderstood as overstatements. This was by no means our intention and we apologize for the unnecessary inconvenience. The phenotype of FKBP35 knock-out parasites clearly centers on failing ribosomes and protein synthesis, which in our opinion, provides an important leap towards understanding the role of this drug target in P. falciparum biology. It is however correct that, at this point, we can only make evidence-based hypotheses about direct interaction partners and we will emphasize this more clearly in a revised version of the manuscript. In order to prevent misinterpretation of our work, and as detailed in the point-by-point responses to the reviewer comments, we propose changing the manuscript title to “Genetic validation of Pf_FKBP35 as an antimalarial drug target”. To address the criticism regarding the effects of FK506, we will perform specific additional experiments. We are convinced that this new data set will resolve any remaining ambiguities and allows for a conclusive assessment of FK506 drug activity in _P. falciparum.

      Reviewer #1 (Public Review):

      In this study, the authors investigate the biological function of the FK506-binding protein FKBP35 in the malaria-causing parasite Plasmodium falciparum. Like its homologs in other organisms, PfFKBP35 harbors peptidyl-prolyl isomerase (PPIase) and chaperoning activities, and has been considered a promising drug target due to its high affinity to the macrolide compound FK506. However, PfFKBP35 has not been validated as a drug target using reverse genetics, and the link between PfFKBP35-interacting drugs and their antimalarial activity remains elusive. The manuscript is structured in two parts addressing the biological function of PfFKBP35 and the antimalarial activity of FK506, respectively.

      The first part combines conditional genome editing, proteomics and transcriptomics analysis to investigate the effects of FKBP35 depletion in P. falciparum. The work is very well performed and clearly described. The data provide definitive evidence that FKBP35 is essential for P. falciparum blood stage growth. Conditional knockout of PfFKBP35 leads to a delayed death phenotype, associated with defects in ribosome maturation as detected by quantitative proteomics and stalling of protein synthesis in the parasite. The authors propose that FKBP35 regulates ribosome homeostasis but an alternative explanation could be that changes in the ribosome proteome are downstream consequences of the abrogation of FKBP35 essential activities as chaperone and/or PPIase. It is unclear whether FKBP35 has a specific function in P. falciparum as compared to other organisms. The knockdown of PfFKBP35 has no phenotypic consequence, showing that very low amounts of FKBP35 are sufficient for parasite survival and growth. In the absence of quantification of the protein during the course of the experiments, it remains unclear whether the delayed death phenotype in the knockout is due to the delayed depletion of the protein or to a delayed consequence of early protein depletion. This limitation also impacts the interpretation of the drug assays.

      We thank the Reviewer for the compliments regarding our experimental setup and the clarity of our manuscript. We agree that the link between FKBP35 knock-out and ribosome homeostasis is indirect and we now emphasize this more clearly in the revised manuscript. To prevent a general misinterpretation of our manuscript, we will adapt the title accordingly.

      We would still like to reiterate that the phenotype of FKBP35 knock-out parasites is best described by their defects in maintaining functional ribosomes. It is for several reasons that we believe the links between FKBP35 and ribosome function are purely evidence driven: First, pre-ribosomal and nucleolar factors are the first proteins (in generation 1 schizonts) to be affected upon knock-out of fkbp35 (Figure 2A, Table S1). We realized that Figure 2A falls short in showing this observation, which is why will update the figure accordingly. Second, the dysregulation of ribosomal factors and the general stall in protein synthesis is dominating the phenotype of FKBP35 knock-out parasites in generation 2. We thus believe it is appropriate to say that knock-out cells are most likely killed in response to defective ribosome maintenance – which is a consequence of reduced FKBP35 levels. We are aware that our experiments (and possibly any other reverse genetics approach) cannot rule out that FKBP35 affects ribosomal factors indirectly. Clearly, more work is required to disentangle this question in more detail in the future.

      We agree with the Reviewer that it is not possible to tell if the delayed death-like phenotype is due to a “delayed protein depletion”. We would however like to note that the DiCre/loxP approach allows for an immediate knock-out at the genome level and is thus as precise as possible. Further, in addition to the substantial depletion of FKBP35 in knock-out cells during the phenotypically silent generation, knocking out of fkbp35 at earlier time points (TPs 24-30 and 34-40 hpi in the preceding generation) resulted in the very same phenotype cycle (Figure 1). Here, parasite death was delayed substantially longer, i.e. more than one complete cycle. Together with the dysregulation of early ribosome maturation in generation 1, these findings point towards a delayed death phenotype. It is of course still possible to explain the delayed death-like phenotype by remnant activity of proteins synthetized prior to the genomic knock-out. We address this possibility and describe the two scenarios mentioned by the Reviewer in lines 141-144. Disentangling the two possibilities in future experiments will be difficult, not only with regards to FKBP35, but regarding “delayed death” phenotypes in general.

      In the second part, the authors investigate the activity of FK506 on P. falciparum, and conclude that FK506 exerts its antimalarial effects independently of FKBP35. This conclusion is based on the observation that FK506 has the same activity on FKBP35 wild type and knock-out parasites, suggesting that FK506 activity is independent of FKBP35 levels, and on the fact that FK506 kills the parasite rapidly whereas inducible gene knockout results in delayed death phenotype. However, there are alternative explanations for these observations. As mentioned above, the delayed death phenotype could be due to delayed depletion of the protein upon induction of gene knockout. FK506 could have a similar activity on WT and mutant parasites when added before sufficient depletion of FKBP35 protein. In some experiments, the authors exposed KO parasites to FK506 later, presumably when the KO is effective, and obtained similar results. However, in these conditions, the death induced by the knockout could be a confounding factor when measuring the effects of the drug. Furthermore, the authors show that FK506 binds to FKBP35, and propose that the FK506-FKBP35 complex interferes with ribosome maturation, which would point towards a role of FKBP35 in FK506 action. In summary, the study does not provide sufficient evidence to rule out that FK506 exerts its effects via FKBP35.

      Noteworthy, we were also very much surprised by data indicating that the antimalarial activity of FK506 is independent of FKBP35. It is for this reason that we conducted a comprehensive set of experiments to disprove our initial observations, but couldn`t find any evidence for an FKBP35-dependent mode of action of FK506:

      We were not able to see altered FK506 sensitivity in (i) inducible knock-down parasites, (ii) inducible overexpression parasites and (iii) inducible knock-out parasites. Parasites with altered FKBP35 levels (as assessed by Western blot and quantitative proteomics at 36-42 hpi, respectively) were equally sensitive to FK506. Importantly, at no sub-lethal FK506 concentration did lower FKBP35 levels lead to an altered response of FKBP35KO compared to the wild-type control population. Furthermore, (iv) induction of the knock-out in the cycle preceding FK506 exposure also had no effect on parasite sensitivity. As mentioned by the Reviewer, we also exposed the parasites to FK506 at 30-36 hpi and (v) did not see any effect, even though we measured a 19-fold difference in FKBP35 protein levels between the parasite populations at 36-42 hpi. At this point, parasite death induced by the knock-out cannot be a confounding factor (as it was mentioned by the Reviewer), because the FKBP35 knock-out has no effect on parasite survival in generation 1 in the absence of FK506 (Figure 1F). This demonstrates that the observed effect is only due to drug-mediated killing and not due to the FKBP35 knock-out.

      To account for a scenario in which the drop in FKBP35 levels only occurs after 36 hpi, we will perform an additional set of experiments, in which we induce the knock-out at 0-6 hpi and treat the parasites at 36-42 hpi (i.e. the time point at which the 19-fold difference in protein levels was measured by quantitative proteomics). This setup will allow determining whether or not the parasite killing activity of FK506 depends on FKBP35 levels.

      So far, our experiments cannot support any scenario in which FK506 kills P. falciparum parasites via inhibiting the essential role of FKBP35 and we would therefore want to insist that this statement is based on highly solid evidence. In this context, it is important to note that our conclusion includes two scenarios: “This indicates that either the binding of FK506 does not interfere with the essential role of _Pf_FKBP35, or that _Pf_FKBP35 is inhibited only at high FK506 concentrations that also inhibit other essential factors.” While this phrase is already present in our initial submission, we will emphasize this point more clearly in the revised manuscript. We are convinced that this information is of high importance for ongoing and future drug development.

      Reviewer #2 (Public Review):

      The manuscript by Thomen et al. FKBP secures ribosome homeostasis in Plasmodium falciparum and focuses on the importance of PfKBP35 protein, its interaction with the FK506 compound, and the role of PfKBP35 in ribosome biogenesis. The authors showed the interaction of the PfKBP54 with FK506, but the part of the FK506 and PfKBP54 in ribosome biogenesis based on the data is unclear.

      The introduction is plotted with two parallel stories about PfKBP35 and FK506, with ribosome biogenesis as the central question at the end. In its current form, the manuscript suffers from two stories that are not entirely interconnected, unfinished, and somewhat confusing. Both stories need additional experiments to make the manuscript(s) more complete. The results from PfFBP35 need more evidence for the proposed ribosome biogenesis pathway control. On the other hand, the results from the drug FK506 point to different targets with lower EC50, and other follow-up experiments are needed to substantiate the authors' claims.

      The strengths of the manuscript are the figures and experimental design. The combination of omics methods is informative and gives an opportunity for follow-up experiments.

      We thank the Reviewer for the evaluation of the manuscript. We apologize for the fact that the Reviewer found the manuscript to be inaccessible. We will use the comments as an incentive to restructure the manuscript and do our best to clarify the presentation, interpretation and conclusion of the presented data in the revised version. We believe that the FKBP35 data are strongly interlinked with the findings on FK506. We will emphasize these links more clearly and are convinced that the complementary nature of the datasets are a particular strength of the presented work.

      Reviewer #3 (Public Review):

      The study by Thommen et al. sought to identify the native role of the Plasmodium falciparum FKBP35 protein, which has been identified as a potential drug target due to the antiplasmodial activity of the immunosuppressant FK506. This compound has multiple binding proteins in many organisms; however, only one FKBP exists in P. falciparum (FKBP35). Using genetically-modified parasites and mass spectrometry-based cellular thermal shift assays (CETSA), the authors suggest that this protein is in involved in ribosome homeostasis and that the antiplasmodial activity of FK506 is separate from its activity on the FKBP35 protein. The authors first created a conditional knockdown using the destruction domain/shield system, which demonstrated no change in asexual blood stage parasites. A conditional knockout was then generated using the DiCre system. FKBP35KO parasites survived the first generation but died in the second generation. The authors called this "a delayed death phenotype", although it was not secondary to drug treatment, so this may be a misnomer. This slow death was unrelated to apicoplast dysfunction, as demonstrated by lack of alterations in sensitivity to apicoplast inhibitors. Quantitative proteomics on the FKBP35KO vs FKBP35WT parasites demonstrated enrichment of proteins involved in pre-ribosome development and the nucleolus. Interestingly, the KO parasites were not more susceptible to cycloheximide, a translation inhibitor, in the first generation (G1), suggesting that mature ribosomes still exist at this point. The SunSET technique, which incorporates puromycin into nascent peptide chains, also showed that in G1 the FKBP35KO parasites were still able to synthesize proteins. But in the second generation (G2), there was a significant decrease in protein synthesis. Transcriptomics were also performed at multiple time points. The effects of knockout of FKBP35 were transcriptionally silent in G1, and the parasites then slowed their cell cycles as compared to the FKBP35WT parasites.

      The authors next sought to evaluate whether killing by FK506 was dependent upon the inhibition of PfKBP35. Interestingly, both FKBP35KO and FKBP35WT parasites were equally susceptible to FK506. This suggested that the antiplasmodial activity of FK506 was related to activity targeting essential functions in the parasite separate from binding to FKBP35. To identify these potential targets, the authors used MS-CETSA on lysates to test for thermal stabilization of proteins after exposure to drug, which suggests drug-protein interactions. As expected, FK506 bound FKBP35 at low nM concentrations. However, given that the parasite IC50 of this compound is in the uM range, the authors searched for proteins stabilized at these concentrations as putative secondary targets. Using live cell MS-CETSA, FK506 bound FKBP35 at low nM concentrations; however, in these experiments over 50 ribosomal proteins were stabilized by the drug at higher concentrations. Of note, there was also an increase in soluble ribosomal factors in the absence of denaturing conditions. The authors suggested that the drug itself led to these smaller factors disengaging from a larger ribosomal complex, leading to an increase in soluble factors. Ultimately, the authors conclude that the native function of FKBP35 is involved in ribosome homeostasis and that the antiplasmodial activity of FK506 is not related to the binding of FKBP35, but instead results from inhibition of essential functions of secondary targets.

      Strengths:

      This study has many strengths. It addresses an important gap in parasite biology and drug development, by addressing the native role of the potential antiplasmodial drug target FKBP35 and whether the compound FK506 works through inhibition of that putative target. The knockout data provide compelling evidence that the KBP35 protein is essential for asexual parasite growth after one growth cycle. Analysis of the FKBP35KO line also provides evidence that the effects of FK506 are likely not solely due to inhibition of that protein, but instead must have secondary targets whose function is essential. These data are important in the field of drug development as they may guide development away from structure-based FK506 analogs that bind more specifically to the FKBP35 protein.

      Weaknesses:

      There are also a few notable weaknesses in the evidence that call into question the conclusion in the article title that FKBP35 is definitely involved in ribosomal homeostasis. While the proteomics supports alterations in ribosome biogenesis factors, it is unclear whether this is a direct role of the loss of the FKBP35 protein or is more related to non-specific downstream effects of knocking down the protein. The CETSA data clearly demonstrate that FK506 binds PfKB35 at low nM concentrations, which is different than the IC50 noted in the parasite; however, the evidence that the proteins stabilized by uM concentrations of drug are actual targets is not completely convincing. Especially, given the high uM amounts of drug required to stabilize these proteins. This section of the manuscript would benefit from validation of a least one or two of the putative candidates noted in the text. In the live cell CETSA, it is noted that >50 ribosomal components are stabilized in drug treated but not lysate controls. Similarly, the authors suggest that the -soluble fraction of ribosomal components increases in drug-exposed parasites even at 37{degree sign}C and suggests that this is likely from smaller ribosomal proteins disengaging from larger ribosomal complexes. While the evidence is convincing that this protein may play a role in ribosome homeostasis in some capacity, it is not sure that the title of the paper "FKBP secures ribosome homeostasis" holds true given the lack of mechanistic data. A minor weakness, but one that should nonetheless be addressed, is the use of the term "delayed death phenotype" with regards to the knockout parasite killing. This term is most frequently used in a very specific setting of apicoplast drugs that inhibit apicoplast ribosomes, so the term is misleading. It is also possible that the parasites are able to go through a normal cycle because of the kinetics of the knockout and that the time needed for protein clearance in the parasite to a level that is lethal.

      Overall, the authors set out to identify the native role of FKB35 in the P. falciparum parasites and to identify whether this is, in fact, the target of FK506. The data clearly demonstrate that FKBP35 is essential for parasite growth and provide evidence that alterations in its levels have proteomic but not transcriptional changes. However, the conclusion that FKBP35 actually stabilizes ribosomal complexes remains intermediate. The data are also very compelling that FK506 has secondary targets in the parasite aside from FKBP35; however, the high uM concentrations of the drug needed to attain results and the lack of biological validation of the CETSA hits makes it difficult to know whether any of these are actually the target of the compound or instead are nonspecific downstream consequences of treatment.

      We appreciate the detailed and valuable suggestions to improve the manuscript. We agree that CETSA could only identify potential targets of FK506 in the micromolar range, while FK506 showed a high affinity for FKBP35, consistent with earlier reports (2). We would however like to point out that FK506 kills P. falciparum at exactly these relatively high concentrations and not at those presumed from the high affinity interactions between FK506 and FKBP35. The relatively high FK506 concentration required to stabilize potential off target proteins is therefore not a concerning observation, but rather corroborates our conclusion that FK506 fails to inhibit the essential function of FKBP35 at concentrations that leave off targets unaffected. As mentioned in response to Reviewer 1, we will describe and discuss these data more clearly in the revised manuscript.

      We thank the Reviewer for pointing out the potential issues regarding the use of the term “delayed death phenotype”. We now refer to the FKBP35 phenotype as “delayed death-like” in the revised manuscript.

      We believe that follow-up work on specific FK506 CETSA hits is out of scope of the current and already quite complex manuscript.

      As mentioned in the response to Reviewer 1, we realize that the short title of the manuscript can be regarded as an overstatement. Again, this was clearly not our intention and we apologize that the Reviewers had to indicate this issue. While we believe that the message of the title holds true (see response to Reviewer 1), we recognize the misconception that might arise from it, which is why we propose the new title: “Genetic validation of _Pf_FKBP35 as an antimalarial drug target”.

      1. Kennedy K, Cobbold SA, Hanssen E, Birnbaum J, Spillman NJ, McHugh E, et al. Delayed death in the malaria parasite Plasmodium falciparum is caused by disruption of prenylation-dependent intracellular trafficking. PLoS Biol. 2019;17(7):e3000376.
      2. Kotaka M, Ye H, Alag R, Hu G, Bozdech Z, Preiser PR, et al. Crystal structure of the FK506 binding domain of Plasmodium falciparum FKBP35 in complex with FK506. Biochemistry. 2008;47(22):5951-61.
      3. Kasahara K, Nakayama R, Shiwa Y, Kanesaki Y, Ishige T, Yoshikawa H, et al. Fpr1, a primary target of rapamycin, functions as a transcription factor for ribosomal protein genes cooperatively with Hmo1 in Saccharomyces cerevisiae. PLoS Genet. 2020;16(6):e1008865.
    1. Author Response:

      The following is the authors' response to the original reviews.

      eLife assessment

      This study presents important findings regarding the quantification of dynamics in fish communities in changing ecosystems by combining a large-scale environmental DNA metabarcoding time series with novel statistical approaches. The methods are convincing, with controlled experiments, thorough statistical analyses, and a substantial dataset covering two years of detailed observation, which can provide sufficient power to detect fine-scale ecological interactions. This work is relevant for informing future research on assessing community stability under climate change.

      Thank you so much for your careful evaluation of our manuscript. We are very pleased to hear that you found our study important. We have revised our manuscript according to the helpful comments to further improve our manuscript.

      Reviewer #1 (Public Review):

      […] Their work provides a highly relevant approach to perform species-interaction strength analysis based on eDNA biodiversity assessments, and as such provides a research framework to study marine community dynamics by eDNA, which is highly relevant in the study of ecosystem dynamics. The models and analytical methods used are clearly described and made available, enabling application of these methods by anyone interested in applying it to their own site and species group of interest.

      Thank you so much for your time and effort to evaluate our manuscript. We are very pleased to hear that you found our study interesting. We have further revised the manuscript according to your comments and hope that the revised manuscript is now better than the original one.

      Strengths: The authors have a study setup that is suitable to measure the effects of temperature of the eDNA diversity, and have taken a large number of samples and all appropriate controls to be able to accurately measure and describe these dynamics. The applied internal spike in to enable relative eDNA copy number quantification is convincing.

      We are happy to hear that you found the study design and the method to estimate eDNA copy number are suitable and convincing.

      Weaknesses: The authors aim to study the relationship between species interaction strength and ecosystem complexity, and how temperature will influence this. However, there is only limited ecological context discussed explaining their results, and a link with climate change scenario's is also limited. A further discussion of this would have strengthened the manuscript.

      Thank you so much for the comment. We have added discussion about how our study contributes to understanding fish community assembly process and predicting the community-level response under ongoing climate change. We have added one subsection, "Implications for fish community assembly and the effect of global climate change ", at L679. As for the ecological discussion for each specific fish-fish interaction, we provided this in Supplementary file 1c.

      The authors were able to find a correlation between water temperature and interaction strengths observed. However, since water temperature is dependent on many environmental variables that are either directly or indirectly influencing ecosystem dynamics, it is hard to prove a direct correlation between the observed changes in community dynamics and the temperature alone.

      Thank you for pointing this. We have discussed the possibility of the effects of other environmental variables (e.g., oxygen) and how we could overcome this issue at L661. Some of the sentences were originally in the subsection " Interaction strengths and environmental variables ", but were moved to the subsection " Potential limitations of the present study and future perspectives".

      Reviewer #2 (Public Review):

      In this work Ushio et al. combine environmental DNA metabarcoding with novel statistical approaches to demonstrate how fish communities respond to changing sea temperatures over a seasonal cycle. These findings are important due to the need for new techniques that can better measure community stability under climate change. The eDNA metabarcoding dataset of 550 water samples over two years is, I feel, of sufficient scale to provide power to detect fine-scale ecological interactions, the experiments are well controlled, and the statistical analysis is thorough.

      Thank you so much for your time and effort to evaluate our manuscript. We are happy to hear that you found our study technically sound and important. We have revised the manuscript according to your comments to improve our manuscript further.

      The major strengths of the manuscript are: (1) the magnitude of the dataset, which provides densely replicated sampling that can overcome some of the noise associated with eDNA metabarcoding data and scale up the number of data points to make unique inferences; (2) the novel method of transforming the metabarcode reads using endogenous qPCR "spike-in" data from a common reference species to obtain estimates of DNA concentration across other species; and (3) the statistical analysis of time-series and network data and translating it into interaction strengths between species provides a cross-disciplinary dimension to the work.

      Thank you for your positive comments. Regarding (1), we are very pleased to hear that (1) our intensive and extensive water sampling, (2) our method for using the common fish species eDNA as "spike-in," and (3) our nonlinear time series analysis were positively evaluated.

      I feel like this kind of study showcases the power of eDNA metabarcoding to answer some really interesting questions that were previously unobtainable due to the complexities and cost of such an exercise. Notwithstanding the problems associated with PCR primer bias and PCR stochasticity, the qPCR "spike-in" method is easy to implement and will likely become a standardised technique in the field. Further studies will examine and improve on it.

      We must admit that our endogeneous "spike-in" method does not overcome all problems associated with PCR. However, we agree with you and believe that we are heading in a correct direction. The method

      does not require the addition of external internal standard DNAs and enables post-hoc evaluation of eDNA absolute concentrations. Although this approach requires an additional experiment (qPCR), the method may be an alternative for quantifying eDNA concentrations.

      Overall I found the manuscript to be clear and easy to follow for the most part. I did not identify any serious weaknesses or concerns with the study, although I am not able to comment on the more complex statistical procedures such as the "unified information-theoretic causality" method devised by the authors. The section on limitations of the study is important and acknowledges some issues with interpretation that need to be explained. The methods, while brief in parts, are clear. The code used to generate the results has been made available via a GitHub repository. The figures are clear and attractive.

      We are very happy to hear that you found our manuscript clear and not containing any serious weakness.

      Reviewer #1 (Recommendations For The Authors):

      This is a very nice manuscript discussing highly relevant methods to use eDNA analysis to study interactions in marine ecosystems. There are some minor concerns that we will address below:

      - As already mentioned above, based on the statements in the introduction we expected a very elaborate discussion section concerning the ecological interaction observed between species. This is however missing, and a more extensive general discussion of the biological interactions would be appreciated, either based on existing literature, or by suggesting further experiments. Alternatively, the claims made in e.g. line 124-128 (Overcoming these difficulties....) could be amended so this expectation is not raised.

      Thank you so much for the comment. As answered in the response above, we have added discussion about how our study contributes to the fish community assembly process and predicting the community-level response under ongoing climate change at L679.

      Specifically, we argued that our study provides a piece of evidence that temperature exerts influences on fish-fish interactions under field conditions at a relatively short time scale (weeks to months). We suggested that temperature effects on fish community assembly involve effects at different time scales, and thus, integrating results from different temporal (and spatial) scales are necessary to understand the fish community assembly process in nature. As stated above, we provided the detailed ecological discussion for each specific fish-fish interaction in the Supporting Information.

      - A lot of negative controls were taken and described in the material & methods. However, there is no clear mention of what was done with the outcome of these negative controls. How did the results of the negative controls influence your analysis? Or were they all completely negative?

      Thank you for pointing this out. The negative controls produced negligible reads (177 ± 665 reads [mean ± S.D.]), which accounted for ca. 0.1% of the positive sample reads. Moreover, all the reads were assigned to non-target taxa, such as fish species that had never been observed in the study region and freshwater fish species. Therefore, we conclude that any contaminations in our experiments were negligible, and we discarded the sequence reads from the negative control samples. We have explained this in L533–L539 in the main text.

      - Line 423 states: "..suggesting that weak interactions are key to the maintenance of species-rich communities." We are wondering if this can be stated like this, as it seems the other way around would also be true, since in a species rich community it can be expected that most interactions are weak?

      Thank you for pointing this. out We agree that there is a possibility that the high species diversity could be a cause of weak intearctions. To clarify this, we have revised the sentence as follows in L568: " ...suggesting that understanding the causes and effects of weak interactions is key to understanding the maintenance of species-rich communities. "

      - There is a correlation between DNA concentration and temperature (e.g. shown in fig. S2b). We wondering what could be an argument to not correct for this temperature effect on eDNA concentrations (as now described) or if it would be better to apply a correction factor for this, as it is also shown that there is a correlation between DNA concentration and interaction strengths.

      In the unified information theoretic (UIC) analysis, we took the effect of temperature into account if temperature had statistically clear influence on eDNA dynamics of a particular fish species (L439). This means that temperature was included as a conditional variable in the calculation of TE (i.e., Zt in Eqn. [1]). Other environmental variables were also included if they had statistically clear influence. Similarly, in the MDR S-map, we included temperature or other environmental variables as conditional variables if they had statistically clear influence on eDNA dynamics of a particular fish species. We explained this in L479.

      - The models used for the interaction dynamics calculations are extensively discussed in this manuscript, although these details are also present in the original papers describing these models, and therefore the manuscript could be shortened by removing some of this explanation.

      Thank you for your suggestion. As you understood, the details of the method (S-map and MDR S-map) are available in Sugihara (1994), Chang et al. (2021), and elsewhere. However, we have kept the explanation so that readers who are not familiar with the methods can briefly understand the methods without the needs to read the detail of the previoius studies.

      Reviewer #2 (Recommendations For The Authors):

      L50-L72: I feel like the abstract could be snappier, i.e. quicker to read with less detail. Consider reducing it a little.

      Thank you for your suggestion. We have deleted some redundant phrases and shortened the abstract a little.

      L173-L176: I don't understand exactly what is suggested here. Perhaps rephrase?

      We have revised the sentence as follows (L165): " As our eDNA time series was taken twice a month, the interactions detected should also have the same time scale (e.g., the interactions detected may cause changes in the population size at the same time scale), which means that we tend to focus on behavior-level interactions (e.g., schooling) rather than birth-death process in the present study (except for predation)."

      L228: How many PCR replicate reactions were undertaken per sample?

      We performed eight technical replicates for the same eDNA template. This information is described in the third paragraph of the section "Paired-end library preparation and MiSeq sequencing." This section has been moved from the previous supplementary methods to the main text in the revision.

      L236: There is no mention later of how these blanks are used to clean up or filter the dataset from the effects of contamination. Consider adding this information.

      Thank you for pointing this. As in the responses above, we have described the negative controls in L533–L539 in the main text. The negative controls generated negligible reads, so we simply discarded the sequence reads.

      L252-L253: "Primer sequences were removed from merged reads and reads without the primer sequences underwent quality filtering"? Wouldn't all of the reads not have primers after the primers were trimmed off? Or is something else intended here?

      All primer sequences were removed after merging the paired- end reads (see "Sequence analysis"). There is no specific reason for this process, and we think that the primer removal before merging the paired- end reads will generate the same results.

      L264-L265: "To refine the above taxon assignments". I assume because there were lots of assignments to species that were not known from the study area? Explain why this was done.

      At present, the reference sequences are available for about 70% of 4,500 fish species in Japan. However, due to the unknown degree of intraspecific variation, using a uniform threshold of 98.5% to delineate species can result in over-splitting or over-clustering MOTUs. To solve this issue, the manual refinement of the taxon assignments was performed based on the phylogenetic tree. This has been explained in L335.

      L274: More details of the qPCR assay are required, or a citation of previous study or supporting information.

      The details of the qPCR assay are provided in the secion "Quantitative PCR and estimation of DNA copy numbers." This section has been moved from the previous supplementary methods to the main text in the revision.

      L327: Explain further how seasonality was treated here? This is an important part of the study, so deserves further attention.

      We included water temperature (if it had statistically clear influence on fish eDNA dynamics) as a conditional variable z(t) in the calculation of TE, and this took the effect of the seasonality in detecting causation into account. We have described this in L436–444.

      L407: Consider giving the code repository a DOI to cite.

      We have archived the analysis codes at Zenodo and provided the DOI in L39 and L521.

      L411: How many MiSeq runs exactly?

      We performed 21 MiSeq runs (often with other eDNA samples). We have described this in the main text (L299).

      L411: What proportion of your total sequencing data were assigned to fishes? This is a useful statistic to compare methods between studies.

      About 98% of the total sequence reads was assigned to fish. We have described this in the main text (L528).

      Figure 2: There does not appear to be a key to the color-coded species ecologies.

      We have added a legend for the fish ecology in Figure 2.

    1. Author Response:

      The following is the authors' response to the original reviews.

      We thank the editor and reviewers for their careful consideration of our manuscript and very helpful feedback, which guided us in improving our manuscript. We would like to highlight three main areas of improvement in this version:

      • Statistical rigor: we have added more detail to justify our 2% cutoff for GLM variable coding, implemented stricter shuffling and cutoffs for value and history coding, and provided more information on the statistical significance of our pairwise comparisons across regions and groups. These go well beyond the field standard for identifying and comparing neural encoding of task features.

      • Identification of value coding: we have implemented reviewer suggestions about kernel regression and value coding shuffles, providing even stronger evidence that value signaling among cue neurons is more prevalent than expected by chance, more prevalent than any other cue coding patterns, and present in all recorded regions. The rigor of this analysis is only possible due to our unique task design with 6 cues across two stimulus sets, and our consideration of 153 possible coding models exceeds standard practice for identifying value signals. We now implement population decoding, as well, providing additional support for a robust and widely-distributed value code.

      • Stability of value code: we have updated our terminology to better highlight that the value signals in our imaging dataset are indeed identified across days, and we add new analysis to show conservation of value-like signals across training days.

      Thanks to the reviewers’ suggestions, our manuscript now has substantially stronger support for the presence of stable and distributed cue value signaling. We address the specific points below.

      Excerpts from the Consensus Public Reviews:

      One limitation is the lack of focus on population-level dynamics from the perspective of decoding, with the analysis focusing primarily on encoding analyses within individual neurons.

      To address this limitation, we now include population-level decoding analysis (new panels, Figs. 3G-H, 4E). This new analysis reveals that, although value neurons can be used to decode cue identity on par with other cue cells, value neurons are more accurate at predicting the value of held out cues (never seen by the model), highlighting the utility of a value signal as a way to consistently represent the value of different stimulus sets.

      Moreover, we find comparable value prediction performance when using value neurons from each region (Fig. 4E), adding more support for the similarity of this signal across regions:

      The authors use reduced-rank kernel regression to characterize the 5332 recorded neurons on a cell-by-cell basis in terms of their responses to cues, licks, and reward, with a cell characterized as encoding one of these parameters if it accounts for at least 2% of the observed variance. At least 50% of cells met this inclusion criterion in each recorded area. 2% feels like a lenient cutoff, and it is unclear how sensitive the results are to this cutoff, though the authors argue that this cutoff should still only allow a false positive rate of 0.02% (determined by randomly shuffling the onset time of each trial.)

      We have provided more information about the 2% cutoff in a new figure, Figure 2-figure supplement 3. We reanalyzed the false positive rate and found that at a cutoff of 2% (but not 0.5% or 1%) there were no false positives (Figure 2-figure supplement 3B). Thus, we are confident that all neurons contain true task-related signals. Moreover, we found that the pattern of results remains largely unchanged as we change the cutoff over a range from 0.5% to 5%. With more stringent cutoffs, we begin to lose neurons with robust task-related responses (Figure 2-figure supplement 3E), so we continue to use the 2% cutoff in this version of the manuscript.

      First, they show that the correlation between cell responses on all periods except for the start of day 1 is more correlated with day 3 responses than expected by chance (although the correlation is still quite low, for example, 0.2 on day 2).

      We agree that a correlation of 0.2 does not seem like a large effect, however the variability in neuronal responses and noise level of the measurement enforce a ceiling that we can estimate by predicting data from the same session that it was trained on. We have replotted these data (new panel Fig. 7G) with the correlation normalized to the cross-validated performance on the training day’s data. This shows that the models do about half as well in session 1 and session 2 compared to session 3. The original plot is in a new supplementary figure, Figure 7-figure supplement 1B.

      To further emphasize the similarity across days, we have added new panels (Fig. 7E and Figure 7-figure supplement 1A) showing that, across mice, a typical neuron was more correlated with its own activity on the subsequent day than with ~90% of the other neurons (shuffle controls, 50%).

      Second, they show that cue identity is able to capture the highest unique fraction of variance (around 8%) in day 3 cue cells across three days of imaging, and similarly for lick behavior in lick cells and cue+lick in cue+lick cells. Nonetheless, their sample rasters for all imaged cells also indicate that representations are not perfectly stable, and it will be interesting to see what *does* change across the three days of imaging.

      We agree that the representations are not perfectly stable and that is an interesting point of further investigation. One difference we did observe is increased cue coding across training (Figs. 6H, 7H).

      Importantly, the authors do not present evidence that value itself is stably encoded across days, despite the paper's title. The more conservative in its claims in the Discussion seems more appropriate: "these results demonstrate a lack of regional specialization in value coding and the stability of cue and lick [(not value)] codes in PFC."

      Due to confusing terminology on our part, the reviewers were mistaken about the timing of the experiment where we assess the stability of value coding. In the imaging sessions, odor sets were always presented on separate days. Thus, when we identify value coding in our imaged population, it is across two consecutive days with different odor sets, which is in itself evidence of a stable value code. We have updated our terminology and the text to make this clearer. We also added a new set of plots (Fig. 8H-I) showing the conservation of value-like signaling in cells we tracked across the first three sessions of odor set A, and, as above, that the correlation of these neurons across days is greater than expected by chance. These analyses lend further support to the stability of the value signal.

      Additional technical comments:

      1) The "shuffle #33" in figure 3B is confusing. The fit kernel in this shuffle shows that the "high" and "medium" responses increase above the pre-stimulus baseline. The "high" response is a combination of set 2 CS+ and set 1 CS50, both of which strongly suppressed the cell's firing over the 2.5-second window shown. Why then does the cue kernel fit these two trials predict an increase in firing rate above baseline at the 2.5-second time point? Is it a consequence of the reduced rank regression process, and if so, how? This strange-looking fit that does not well capture the response of the original cell makes me worry that the high fraction of identified "value" cells may be due to some constraint on the shuffle fits that leads them to often perform poorly.

      To address this concern, we refit the value shuffle and its models using a full kernel regression model (rather than reduced ranks). It does improve the appearance of the kernel fits (updated Fig. 3B), and we now use this new approach when fitting cue coding models in the revised manuscript. The regularization inherent in reduced rank constrains the shape of the cue kernel somewhat, which contributed to the shape of the fits (although this did not negatively impact the variance explained); however, because of the importance of the shape of these alternative cue coding models to the interpretation of the analysis, we agree with the reviewers that this was worth improving. The main constraint on the value model and its shuffles, however, is that all cues must use the same template, scaled according to particular values assigned to each cue in each shuffle, which will doubtless lead to compromised (and strange-looking) fits when the shuffled values do not match the ranking of neuron’s cue activity. Critically, this constraint is applied equally to the value model and all the shuffles and would not bias the fits of any one model.

      2) The "shuffle" condition when testing for value cells always assumes two high responses, two medium responses, and two low responses. This strategy doesn't account for cells that respond to only a subset of cues, as one might expect in a sparse-coding olfactory region. We suggest adding a set of shuffles where responses are split into two groups, with either 3 conditions per group or 2 in one group and 4 in the other.

      We appreciate this valuable suggestion. We added all permutations of models with high responses to 6, 5, 4, 3, 2, or 1 odor cue to the analysis. We still find that the value model is the most frequent best model, displayed in new panels Fig. 3C-D and Figure 3-figure supplement 1A-B. The additional models allowed us to identify other neurons with cue activity best fit by models highly correlated with the ranked value model, which we term “value-like” neurons, including most neurons previously described as “trial-type” neurons. All 153 models and the fraction of neurons best fit by each one are depicted in Figure 3-figure supplement 1.

      After implementing the changes to both the method of model fitting (full kernel regression, as noted above) and the possible alternative models, the distribution of value cells has changed slightly. All regions contain value cells, supporting our original conclusion that the value signal is distributed, but there is slight enrichment in PFC when combining these five regions together (Fig. 4A).

      We have updated the conclusions of the paper accordingly:

      Introduction: “Unexpectedly, in contrast to the graded cue and lick coding across these regions, the proportion of neurons encoding cue value was more consistent across regions, with a slight enrichment in PFC but with similar value decoding performance across all regions.”

      Results: “Interestingly, the frequency of value cells was similar across the recorded regions (Fig. 4A). Indeed, despite the regional variability in number of cue cells broadly (Fig. 2F-G), there were very few regions that statistically differed in their proportions of value cells (Fig. 4A, Figure 4-figure supplement 1). Overall, though, there were slightly more value cells across all of PFC than in motor and olfactory cortex (Figs. 4A, Figure 4-figure supplement 1). Although there were the most cue neurons in olfactory cortex, these were less likely to encode value than cue neurons in other regions (Figure 4-figure supplement 2). Value-like cells were also widespread; they were less frequent in motor cortex as a fraction of all neurons, but they were equivalently distributed in all regions as a fraction of cue neurons (Fig. 4B, Figure 4-figure supplement 1, Figure 4-figure supplement 2).”

      Discussion: “In contrast to regional differences in the proportion of cue-responsive neurons, cue value cells were present in all regions and could be used to decode value with similar accuracy regardless of region.” AND “The distribution of cue cells with linear coding of value was mostly even across regions, with slight enrichment overall in PFC compared to motor and olfactory cortex, but no subregional differences in PFC. Importantly, cue value could be decoded from the value cells in all regions with similar accuracy.”

      3) On pages 11-12, the authors write "value coding is similarly represented across the regions we sampled." I feel this isn't quite what was shown: the authors have shown that all recorded regions contain a roughly comparable number of individual cells that are modulated by value, i.e. "value cells". However, the authors also showed that some recorded cells have mixed selectivity for value and other factors- it is possible that these mixed selectivity cells do vary between brain regions in their quantity or degree of value coding. Regions could potentially also vary in the dynamics of their value response, or in the trial-to-trial variability in the activity of value cells. I suggest the authors revise their original statement, for example by writing "we find a similar proportion of value-specific cells across the regions we sampled."

      We thank the reviewer for carefully reviewing our claims. In addition to showing similar proportions of value cells, we also show that the value-related activity is similar (by plotting the first principal component of value and value-like cells, Fig. 4C-D) and that cue value could be decoded from the value cells in all regions with similar accuracy (new panel, Fig. 4E). We have updated the text to more accurately reflect these observations:

      “In contrast to regional differences in the proportion of cue-responsive neurons, cue value cells were present in all regions and value could be decoded from them with similar accuracy regardless of region.”

      4) We appreciate the authors' idea to introduce a history term to their value cell model but worry that the distinction between history-dependent value cells and lick/cue+lick cells in Figure 4 has gotten fuzzy. At this point, history-dependent value cells are the product of a set of steps: 1) they are identified as "cue" neurons because the cue type accounts for at least 2% of the variance, while the lick rate does not, then 2) among the cue neurons, a subset are identified as "value" neurons because their activity scales with the cue type across both odor sets, and then 3) among value neurons, the "history-dependent" value neurons show a response rate that scales with a model that predicts anticipatory licking. Our concern comes down to this: your conclusion that these cells are not licking cells hinges on the initial point that licking does not account for 2% of the observed variance in cell activity. But if you had dedicated an equal number of model parameters and selection steps to your licking model, might it still not turn out that a licking model predicts their activity as well as the history-dependent cue value model?

      What would bolster our confidence here would be a comparison of variance explained: if you compare the predictions of the history-dependent value-encoding cue neuron model to the predictions of a simple lick neuron model, how much better does the former predict what the cells are doing? Are all those extra parameters and selection steps really contributing to an improved description of how neurons will respond?

      First, we would like to emphasize that “cue” neurons, as a population, have no discernible modulation by licks, which can be seen when comparing their activity on CS50 trials with and without reward, when licking clearly varies (Figure 2-figure supplement 2D). A new panel, Figure 5E now depicts the improvement in variance explained by the history model over a lick only model. The improvement is robust and universal. This is because even though the number of anticipatory licks per trial is used to fit the weights of our trial value model, these cue neurons have temporal dynamics that are more consistent with cue presentation than the presence of licks. We explain more below in our response to point 7.

      5) The paper's title claims that the coding of cue value is both stable and distributed. While the point for value coding being distributed is well supported with analysis, the claim that cue value coding is "stable" is weaker. The authors show in Figure 6 that cue identity best accounts for unique variance among cue cells across three days of imaging, but it does not follow that cue value is similarly stable. Figure 7 shows that on day 3 of imaging, the two odor sets have similar encoding- but this analysis is only performed within day 3, not across days. Why not examine unique variance among value cells over days, as was done for a cue, lick, and both cells in Figure 6G? That seems to be an important missing piece and a logical next step. The Discussion is more conservative in its claims- "these results demonstrate a lack of regional specialization in value coding and the stability of cue and lick [(not value)] codes in PFC." But this subtlety is missing from the paper's title and introduction.

      First, an important correction. “This analysis is only performed within day 3, not across days,” is a misunderstanding of our experiment brought on by our confusing terminology, which we have updated. This figure (now Figure 8) analyzes two sessions performed on consecutive days: Odor Set A day 3 (A3) and Odor Set B day 3 (B3), which constitute days 5 and 6 of our experiment (see updated panels Fig. 1B, 6A). This is why identifying value signaling across both of these sessions is justification for a stable code; by definition, it was present on two consecutive days.

      A limitation of our imaging experiment prevents us from evaluating value signaling in each individual session (like we did for cues and licks). For the imaging, we only presented one odor set per session (unlike the electrophysiology, where odor sets were presented in blocks). Our method of identifying value signals relies on two odor sets, so we cannot quantify it on a per session basis in the imaging. However, to address this as best we could, we identified CS+-preferring cue cells in session A3 (odor set A day 3) and plotted them for sessions A1-A3 (Fig. 8H), which reveals a conserved value-like signal across days. We also found that the correlation of the activity of these neurons across days was higher than expected by chance (Fig. 8I).

      We have edited the discussion text about coding stability, adding in more detail and caveats:

      “Previous reports have observed drifting representations in PFC across time (Hyman et al., 2012; Malagon-Vina et al., 2018), and there is compelling evidence that odor representations in piriform drift over weeks when odors are experienced infrequently (Schoonover et al., 2021). On the other hand, it has been shown that coding for odor association is stable in ORB and PL, and that coding for odor identity is stable in piriform (Wang et al., 2020a), with similar findings for auditory Pavlovian cue encoding in PL (Grant et al., 2021; Otis et al., 2017) and ORB (Namboodiri et al., 2019). We were able to expand upon these data in PL by identifying both cue and lick coding and showing separable, stable coding of cues and licks across days and across sets of odors trained on separate days. We were also able to detect value coding common to two stimulus sets presented on separate days, and conserved value features across the three training sessions. Notably, the model with responses only to CS+ cues best fit a larger fraction of imaged PL neurons than the ranked value model, a departure from the electrophysiology results. It would be interesting to know if this is due to a bias introduced by the imaging approach, the slightly reduced CS50 licking relative to CS+ licking in the imaging cohort, or the shorter imaging experimental timeline.

      The consistency in cue and lick representations we observed indicates that PL serves as a reliable source of information about cue associations and licking during reward seeking tasks, perhaps contrasting with other representations in PFC (Hyman et al., 2012; Malagon-Vina et al., 2018). Interestingly, the presence of lick, but not cue coding at the very beginning of the first session of training suggests that lick cells in PL are not specific to the task but that cue cells are specific to the learned cue-reward associations. Future work could expand upon these findings by examining stimulus-independent value coding within session across many consecutive days.”

      6) Considering licking as the readout of value has pros and cons. Anticipatory licking may be correlated with subjective value, but certainly nonlinearly. After all, licking has a ceiling and floor (bounded rate from 0->10 Hz). Are results consistent with the objective value of the cues (which are 0, .5, 1)? Which measure better explained the data?

      Thanks to this important suggestion, we tried fitting another set of models with 0, 0.5, 1 as the cue values. We found the same pattern of results. Overall, the fits were slightly better with 0, 0.5, 1, with 50.6% of potential value neurons (found with either version of the model) better fit by 0, 0.5, 1, and with mean variance explained of 0.265 with 0, 0.5, 1 (compared to 0.264 with the anticipatory lick values). Without strong evidence to choose one model over the other, we decided to use 0, 0.5, 1 because it exactly reflects reward probability, and is more objective as the reviewer notes, whereas before we relied on a noisier estimate of subjective value. We have changed the text accordingly.

      7) How can a neuron encode "Cue" in a value-dependent manner and not also encode licking, given they are correlated? If the kernel window includes anticipatory licking, and anticipatory licking is by definition related to value, then how could a licking kernel not at least explain some of that neuron's variance?

      The trial estimates of value from the lick linear regression are derived from typical licking patterns across all sessions and do not incorporate the particular number of licks on a given trial or the latency of licking relative to cue onset. Although the trial value model is predicting the number of licks on each trial, it only uses cue identity and reward history to make its prediction, so it is not tightly correlated with the stochastic licks on a given trial. And, importantly, we input the trial value as a cue kernel spanning the entire cue period, whereas lick kernels, per our definition, are restricted to a window around when licking occurs, which generously encompasses neural signals relating to both lick initiation and feedback. Licking can explain some of value and (history) neurons’ variance, which you can see in our new panel Fig. 5E, but it does not contribute any unique variance to the model. That is, with or without licks, the model performs just as well, so the activity of the neuron does not track any of the unique features of licks over cues (like whether or not the mouse licked on trial, when the mouse started licking on a given trial). Without cues, however, the model does worse, which means that the neuron’s activity is modulated by cues separately from when the mouse is licking. Thus, we can conclude the neuron encodes cues, but we have no evidence the neuron encodes licks (beyond the extent to which licks are correlated with cues). In our example fit in 5E, you can see how, although licks track value, they cannot recapitulate the temporal dynamics of this cue neuron. We added more description of this distinction in the manuscript.

      8) The ordering analysis with the 89 permutations is very nice for showing across the population the "value ordered" gains are the best explanation of the neural activity. However, it doesn't tell you that any one neuron significantly encodes value, or the strength of this effect if they do. For the former, they could compare to a null distribution of shuffled order of neural vs CS data, and consider neurons for which model is better than chance ( a .05 FDR on a null distribution would be appropriate). This is important for supporting their conclusion of the fraction of neurons encoding value for each region.

      In fact, with so many alternative models, the probability of a neuron being best fit by the value model but not encoding value above chance is extremely low. To confirm this, we ran the reviewer’s suggested shuffle analysis, and found that 100% of value neurons performed above the 0.05 FDR. We have added this result to the methods:

      “To verify the robustness of value coding in the neurons best fit by the ranked value model, we fit each of those neurons with 1000 iterations of the cue value model with shuffled cue order to create a null distribution. The fits of the original value model exceeded the 98th percentile of the null for all value neurons.”

      9) Similarly the 65% cutoff for trial history relative to shuffled is unusually low and therefore not convincing these neurons significantly encode the value. Usually, 95% or 99% is selected to give you a more standard significance criterion (FDR).

      We have changed the cutoff to 95%. We originally selected 65% because neurons in the 65% to 95% range had clear history effects, especially at the population level, but we appreciate the importance of rigorous selection. Note this shuffle is very strict, preserving CS+, CS50, CS- ranking but shuffling within-cue fluctuations in value due to trial history. With the stricter value and history shuffling, we now observe fewer history neurons, and they are most prevalent in PFC (Fig. 5I)

      10) "Regions with non-overlapping CIs were considered to have significantly different fractions of neurons of that coding type." This isn't a statistical test. Confidence intervals are not the same as significance.

      We now perform Bonferroni-corrected pairwise contrasts between all regions in the generalized linear mixed effects model. We added the p-values for all the comparisons that previously relied on non-overlapping confidence intervals in supplementary tables.

      Minor comments:

      The methods are hard to read. Most of the information seems to be there but in general, paragraphs need to be read over multiple times for meaning to emerge.

      We have edited for clarity, and if there are particular sections that remain unclear, we would be happy to know which ones.

      Why is there a block predictor in the encoding model?

      Because not every odor is present in every block, we did not want our models to use the specific cue predictors to try to account for differences in baseline activity that naturally occur across the session. Thus, each of the six blocks has its own predictor that serves as a constant that can adjust for changing baseline firing rate. Importantly, the block predictor simply marks the passage of blocks and contains no information about the odors present. We added more information about this to the methods:

      “For electrophysiology experiments, the model also included 6 constants that identified the block number, accounting for tonic changes in firing rate across blocks. Because not all cues were present in every block, this strategy prevented the cue kernels from being used to explain baseline changes across blocks.”

      Did you use an elastic net rather than a lasso? What is the alpha parameter for lasso?

      We used an elastic net with alpha = 0.5. We added this information to the methods.

      Figure 3F legend doesn't seem to match the figure.

      Corrected.

    1. Author Response:

      The following is the authors' response to the original reviews.

      Consolidated response to public comments:

      We are grateful to the reviewers for their careful examination of our manuscript and for their insights for improving our work. We appreciate that they recognize the potential of the TARDIS approach for diverse transgenesis applications.

      We address two primary concerns that the reviewers raise. First is a concern that this approach is not as innovative as stated. We acknowledge that our work builds upon previous studies in the field, such as those by Nonet, Mouridi et al., with Malaiwong coming after our initial preprint. However, we believe that our approach offers a unique contribution, in that prior work does not provide a protocol or process to provide large-scale multiplexed transgenesis. Specifically, our introduction of large sequence library arrays (TARDIS Library Arrays or TLAs). While high throughput multiplexed transgenesis is discussed in Nonet & Mouridi manuscripts, it is never demonstrated. It is the combination of library construction, heritable transmission of the library itself, and then induced transgenesis of library components at a defined location within single individuals that makes this approach particularly useful.

      Second, there were concerns that we have not demonstrated that this approach will work beyond C. elegans. We agree that our discussion of the potential application of TARDIS beyond C. elegans is speculative at this point. Our intention was to highlight the potential for future development and application in other systems. In some cases, large integrations into the genome are possible, such as in the case of H11 locus in mice, which could provide a means to inherit a sequence library. We are hopeful that our success in C. elegans will inspire work in other systems. The motivation for this will naturally depend on the usefulness of actual TARDIS implementations, which will be forthcoming in due course.

      Reviewer #1 (Recommendations For The Authors):

      1. Section titled "Integration from TARDIS array to F1" beginning on line 161 has some missing details that make it difficult to follow. Many of those details are present in the following section titled "Generation and Integration of TARDIS promoter library", but should have been present sooner.<br /> a. How many barcodes were in the array in line PX786?<br /> b. Clarify the use of G-418, heat shock, hygromycin, etc. in this paragraph.<br /> c. Please clarify that the L1 death is due to selection with G-418 - "We found that a portion of the initially plated worms die, likely due to lack of array inheritance." is confusing unless you add that they are selected in this step.<br /> d. "These results suggest that approx. 100-200 worms need to be heat shocked to obtain an integrated line" - the math actually looks like 200-300, and this would be to get a single integrant.<br /> 2. In general, the barcoding study and results reported here read like a teaser/proof-of-concept but do not really robustly demonstrate the application of the method for barcoding and tracing individual lineages in a population of C. elegans. How many barcodes were in the array, and how many ended up in F1s? Would one need to screen for duplicate barcodes after integration?<br /> 3. The promoter library study is impressive but again, rather limited.<br /> 4. The Discussion section about extending this technology to other systems is fairly balanced, acknowledging the limitations that would need to be overcome. The language in the abstract and introduction is less balanced and oversells the current translation of this approach to systems outside C. elegans.

      Reviewer #2 (Recommendations For The Authors):

      As I mentioned in the Public Review, I appreciate the design of the selection markers for integration. However, I do not see a major advance in the field. The use of barcoding of individuals to address a biological question would change that impression.

      Regarding the integration of promoters, I think this is something that anyone could address in diverse forms using existing knowledge.

      Suggestions:<br /> - Use one or two more landing pads for barcoding of animals and check numbers, efficacy, enrichments..etc. About 500 sequences overrepresented may be too much for future applications;<br /> - Increase the number of landing pads for inserting promoters. Genomics context matters and this could help to have a better summary of the real expression patterns driven by the promoter of interest;<br /> - Other references about landing pads would be Vicencio et al, Genetics 2019, and Nonet microPublication Biology 2021.

      In addition to the general comments, the reviewers provided useful suggestions to the text that we have used to clarify the manuscript.

    1. Author Response:

      The following is the authors' response to the original reviews.

      Reviewer #1 (Public Review):

      The authors investigated state-dependent changes in evoked brain activity, using electrical stimulation combined with multisite neural activity across wakefulness and anesthesia. The approach is novel, and the results are compelling. The study benefits from an in-depth sophisticated analysis of neural signals. The effects of behavioral state on brain responses to stimulation are generally convincing.

      It is possible that the authors' use of "an average reference montage that removed signals common to all EEG electrodes" could also remove useful components of the signal, which are common across EEG electrodes, especially during deep anesthesia. For example, it is possible (in fact from my experience I would be surprised if it is not the case) that under isoflurane anesthesia, electrical stimulation induces a generalized slow wave or a burst of activity across the brain. Subtracting the average signal will simply remove that from all channels. This does not only result in signals under anesthesia being affected more by the referencing procedure than during waking but also will have different effects on different channels, e.g. depending on how strong the response is in a specific channel.

      We thank the reviewer for the positive comments and for raising this point. We do not believe that the average reference montage is obscuring an evoked slow wave in the isoflurane-anesthetized mice. Electrical stimulation did elicit a brief activation in nearby neurons that was followed by roughly 200 ms of quiescence, but no significant changes in firing in the other regions we recorded from (Author response image 1).

      Author response image 1

      ERP and evoked population activity during isoflurane anesthesia do not show evidence of global responses.

      (Top). ERP (-0.2 to +0.8 s around stimulus onset) with all EEG electrode traces superimposed. Data represented is the same: red traces have been processed with the average reference montage, black traces have not. (Bottom) Population mean firing rates from the areas of interest from the same experiment as above.

      We are familiar with the work from Dasilva et al. (2021), a study similar to ours because they also performed cortical electrical stimulation in mice anesthetized with isoflurane. They show widespread evoked multi-unit activity (derived from LFP) in isoflurane-anesthetized mice in response to electrical stimulation, but critical experimental differences may underlie the conflicting results presented in our study. Both works use similar levels of isoflurane to maintain anesthesia (we use a level roughly equivalent to their “deep” level). However, our experiments use only isoflurane, whereas Dasilva et al. induced anesthesia with ketamine and medetomidine followed by isoflurane. It has been shown that isoflurane and ketamine have different effects on neural dynamics (Sorrenti et al., 2021). Typically, isoflurane causes reduced spontaneous firing rates and decreased evoked response amplitudes compared to wakefulness, whereas ketamine has been shown to increase firing rates and evoked response amplitudes (Aasebø et al., 2017; Michelson & Kozai, 2018). Perhaps a more relevant difference are the electrical stimulation parameters used to perturb the brain. Dasilva et al. used 1 ms pulses of 500 μA, which would have a much larger effect than the stimulation used in this work, 0.2 ms pulses of 10-100 μA.

      Additionally, we would like to clarify that the average reference montage is not impacting the main findings of this work. As the reviewer correctly pointed out, the average reference montage does change the appearance of the ERP in the butterfly plots (Top panel in Author response image 1). However, all the quantitative analyses of the EEG-ERPs are performed on the global field power, computed by taking the standard deviation across all EEG channels, which is not affected by the average reference montage.

      Reviewer #2 (Public Review):

      […] The conclusions regarding the thalamic contributions to the ERP components are strongly supported by the data.

      The spatiotemporal complexity is almost a side point compared to what seems to be the most important point of the paper: showing the contribution of thalamic activity to some components of the cortical ERP. Scalp ERPs have long been regarded as purely cortical phenomena, just like most EEGs, and this study shows convincing evidence to the contrary.

      The data presented seemingly contradicts the results presented by Histed et al. (2009), who assert that cortical microstimulation only affects passing fibers near the tip of the electrodes, and results in distant, sparse, and somewhat random neural activation. In this study, it is clear that the maximum effect happens near the electrodes, decays with distance, and is not sparse at all, suggesting that not only passing fibers are activated but that also neuronal elements might be activated by antidromic propagation from the axonal hillock. This appears to offer proof that microstimulation might be much more effective than it was thought after the publication of Histed 2009, as the uber-successful use of DBS to treat Parkinson's disease has also shown.

      We thank the reviewer for their positive comments and thoughtful suggestions. We appreciate and agree with the reviewer’s perspective that the thalamic contribution to the cortical ERP is one of the key points of this study. We also thank the reviewer for their comment on the apparently contradictory results reported by Histed et al. (2009). This gives us the opportunity to further highlight the important contribution of our study to the field.

      First, we would like to highlight some key experimental differences between the two studies. In our study we used single pulse stimulation with currents between 10 and 100 μA, whereas Histed et al. used trains of pulses (100 ms in duration at 250 Hz) with lower current intensities (between 2 and 50 μA). We varied the depth of stimulation, targeting superficial and deep cortical layers; Histed et al. exclusively stimulated superficial cortical layers. In addition, the two studies used recording methods that are orthogonal in nature. We used Neuropixels probes that record from neurons that span all cortical layers depth-wise while Histed et al. used two-photon calcium imaging to record from a horizontal plane of neurons (again, in the superficial cortical layers).

      Because of these important methodological differences, it is more appropriate to compare the Histed et al. results to our results from superficial stimulation at comparable current intensities. In this case, we believe the two studies show similar results: stimulation activated a small fraction of neurons even hundreds of microns away from the stimulating electrode (see Figure 4A from our manuscript). However, our study adds an important observation pointing to the critical role of the depth of the stimulating electrode. We observe significant excitation of local cortical neurons (Figure 4D) and trans-synaptic activation of the thalamus only when we delivered deep stimulation (Figure5A). This effect is likely mediated by activation of large, myelinated cortico-thalamic fibers, which are thought to be more excitable that non-myelinated horizontal fibers (Tehovnik & Slocum, 2013).

      To summarize, Histed et al. (2009) concluded that microstimulation causes a sparse activation of a distributed set of neurons with little evidence of synaptically driven activation. Instead, we showed that microstimulation can robustly activate local neurons and trans-synaptically activate distant neurons when stronger stimuli are directed to deep cortical layers. Based on this, we conclude that electrical stimulation is indeed highly effective, and is a valid tool that can be used to probe and characterize the cortico-thalamo-cortical network of any behavioral state.

      ----------

      Reviewer #1 (Recommendations for the authors):

      1. I am not clear how "putative pyramidal" or RS and "putative inhibitory" fast-spiking neurons were identified. Please provide some further details on that, including average spike wave shapes, and distribution of firing rates, and it would be interesting to know the proportion of "putative" RS and FS neurons in your recorded population. Obviously, caution is warranted here because, without further work, you cannot be sure that those are indeed pyramidal cells or interneurons! Is this subdivision necessary at all?

      We added details regarding the cell-type classification to the Results (lines 136-140) and the Methods section. This classification is common practice in cortical extracellular electrophysiology recordings given that cell-type specific analyses can reveal important differences between the two putative populations (Barthó et al., 2004; Bortone et al., 2014; Bruno & Simons, 2002; Jia et al., 2016; Niell & Stryker, 2008; Sirota et al., 2008). Based on our findings that the two populations respond to electrical stimulation in similar ways (excitation followed by a period of quiescence and rebound excitation), we agree the subdivision is not necessary to support our conclusions. However, we believe that some readers will appreciate seeing the two putative populations presented separately.

      2. I wonder how the authors know whether the animals were awake, specifically when they were not running. Did you observe animals falling asleep when head-fixed? Providing some analyses of spontaneous EEG/LFP signals in each state could add some reassurance that only wakefulness was included, as intended.

      While we cannot conclusively rule out that mice were asleep during the “quiet wakefulness” periods we analyzed, we believe they are likely to be awake for two main reasons: 1) all the experiments are performed during the dark phase of the light/dark cycle, when the mice are less likely to enter a sleep state (Franken et al., 1999); 2) the animals are not undergoing specific training to promote drowsiness or sleep. Indeed, many sleep-focused studies in head-fixed mice are performed during the light phase of the animal’s cycle to maximize the likelihood of capturing sleep states (Kobayashi et al., 2023; Turner et al., 2020; Yüzgeç et al., 2018; Zhang et al., 2022). We have added this note to the Discussion section (lines 402-406).

      Because we do not specifically record during sleep states and our recording does not include electromyography, which is commonly used in conjunction with EEG to classify sleep stages, we cannot accurately perform spectral comparison between “quiet wakefulness” and sleep states in our recordings.

      3. I was unsure about the meaning of some of the terminology, specifically "rebound", "rebound spiking", "rebound excitation" etc. Why do you call it "rebound"?

      “Rebound” is a term often used to describe a period of enhanced spiking following a period of prolonged silence or inhibition (Guido & Weyand, 1995; Roux et al., 2014). Grenier et al. list “postinhibitory rebound excitation” as an intrinsic property of cortical and thalamic neurons (1998). We added this description to the text (lines 79-80).

      Reviewer #2 (Recommendations For The Authors):

      Regarding analysis, I would make three main points:

      Regarding the CSD analysis, I think the authors have done a good job of circumventing several of the known issues of this technique, especially by using ERPs rather than ongoing activity. However, although I do not immediately have access to the literature to back up this claim, I've heard that many assumptions behind CSD require a laminar structure with electrodes positioned perpendicular to these layers. In Figure 1B it seems like the neuropixels probe is not really perpendicular to the cortical layers, and I wonder if this might be an issue. I am also wondering how to interpret the thalamic CSD, as this structure is not laminar, lacks the mass of neatly stacked neuronal dipoles present in the cortex, and does not have an orderly array of synaptic inputs and outputs. I understand that CSD analysis helps minimize the contributions of volume conduction, but in this case, I also wonder if the thalamic CSD is even necessary to back up the paper's claims.

      One-dimensional CSD is computed assuming that the electrode is inserted perpendicular to cortex. This is mainly important for the interpretation of sinks and sources, since CSD can be also computed on radial voltages (e.g., EEG [Tenke & Kayser, 2012]). In general, our Neuropixels probes do not significantly deviate from perpendicular (mean deviation from perpendicular 15.3 degrees, minimum 5.2 degrees, and maximum 36.6 degrees). The probe represented in Figure 1B deviates from perpendicular by 31.2 degrees, which is an outlier compared to the rest of the insertions. Any deviation from perpendicular would result in the “effective” cortical thickness being larger by a factor of 1/cos(angle deviation from perpendicular) and thus would not affect the relative location of sources and sinks. We have added a statement to clarify this in the text (lines 126 and 454-456).

      We agree with the statement regarding CSD analysis in the thalamus. We originally included the CSD for the thalamus in Figure 2F for completeness. As the reviewer pointed out, thalamic CSD was not used to perform any subsequent analysis and is, therefore, not necessary to back up any claims. As such, we have removed CSD plot from Figure 2F to avoid any confusion and made a comment to this effect in the legend (lines 1175-1177).

      On the merits of using the z-score normalization for spike rates vs. other strategies like standardizing to maximum firing, I am aware that both procedures have limitations, but the z-score changes the range of the firing rate from [0, +Inf] to [-Inf, +Inf]. This does not seem correct considering that negative spiking rates do not exist. The standardization to maximum rate keeps the range within [0, 1], not creating negative rates. Another point that it will be worth discussing is the reported values of the z-scored values. For example, what does it mean to be 54 standard deviations away from the mean? 6 standard deviations is already a big distance from the mean.

      For Figure 2, we chose to represent the neural firing rates as z-scores because we found it important to report the magnitude of both the increase and decrease of the evoked firing rates in the post-stimulus period relative to the pre-stimulus rate. The normalization we used helps to visualize the magnitude of the effects of electrical stimulation in neuronal activity for both directions, which is an important result of the study. Despite the differences between the two normalization methods, the normalization based on the maximum firing does not significantly change the qualitative interpretation of Figure 2 in the manuscript (Author response image 2).

      Author response image 2

      Evoked firing rates for neurons in the areas of interest in response to deep stimulation in MO during the awake state. (Left) Firing rates of all neurons normalized by the average, pre-stimulus firing rate. (Right) Firing rates of all neurons normalized by the maximum post-stimulus firing rate.

      Regarding Figure 3 and the associated text, we would like to clarify that the magnitude metric is not simply a z-score value (with units of s.d.) but rather it is the integrated area under the z-scored response over the response window (with units of s.d.∙seconds). This can help explain why we see values of ~50 s.d.∙s. We chose to z-score firing rates, LFP, and CSD to normalize across the different signals and magnitudes of the evoked responses. We often observed the largest responses in the LFP (see Figure 3A), which may be partly due to the signal naturally having a larger dynamic range than the measured neural firing rates. Then we integrated the z-score response time series to capture the dynamic of the signal over the response window, rather than a static value such as the mean or maximum z-score. After performing a thorough literature search, we found no other ways to capture and compare the magnitudes of the different signals. We have added language to clarify the magnitude metric (lines 155-156) and added the appropriate units.

      In reporting the p-values, I recommend increasing the number of significant digits to four because the p-value seems to be the same for different tests in several places (e.g.: lines 207 to 218), which seems odd. I also wonder whether this could be an artifact of the z-scoring procedure. In the figures, I would like to advise the use of 1 asterisk to denote "weak evidence to reject the null hypothesis (0.05 > p > 0.01)" and two asterisks to denote "strong evidence to reject the null hypothesis (0.01 > p)", and make a note of it accordingly in the manuscript and/or figure legends.

      According to the reviewer’s suggestion, we have changed the statistics language to “* weak evidence to reject null hypothesis (0.05 > p > 0.01), ** strong evidence to reject null hypothesis (0.01 > p > 0.001), *** very strong evidence to reject null hypothesis (0.001 > p)” throughout the manuscript.

      We have also increased the number of significant digits to four throughout the manuscript. It is true that some of the p-values reported for Figure 3 (lines 169-180) are the same for different tests. This is not an artifact of the z-scoring, but rather a consequence of performing the Wilcoxon signed-rank test (an ordinal statistical test) with small sample numbers. Because the p-value depends only on the relative ordering, not the continuous distribution of values, the small sample size (N=6-14) increases the likelihood of obtaining the exact same p-value if the relative ordering of samples is the same.

      Line 202: If the magnitude corresponds to z-score data, please add "s.d." after the number, as z-scored values are expressed in standard deviation units. Please update this throughout the paper.

      As stated above the magnitude metric is the integrated area under the z-scored response over the response window (with units of s.d.∙seconds). We have added the correct units in all places.

      Line 214: Please report how the multiple comparisons correction was performed

      We have added the test used for multiple comparisons in line 169 (formerly line 214) and in the Methods section (line 770).

      Line 462: please replace "Neuropixels activity" with "LFP and single-unit activity".

      We changed the wording to specify “LFP, and single neuron responses…” (now line 337).

      Line 475: a short explanation of the bi-stability phenomena will be helpful for the reader.

      We added the following description: “a state characterized by spontaneous alternation between bouts of activity and periods of silence” (lines 350-351).

      Line 601: It is asserted that "Electrical stimulation directly activates local cells and axons that run near the stimulation site via activation of the axon initial segment" and the paper by Histed et al. 2009 is cited. This does not seem like an appropriate citation, as Histed et al. explicitly state that electrical microstimulation does not activate local neuronal bodies near the electrode tip. See my comment above.

      Upon further reading, we believe we are seeing evidence of direct axonal activation and subsequent antidromic activation of local cell bodies, as you suggested in your above comment and has been proposed by many including Histed et al. (2009) and Nowak and Bullier (1998). We edited our sentence accordingly, kept the Histed et al. citation, and added other relevant citations (lines 487-490).

      References

      • Aasebø, I. E. J., Lepperød, M. E., Stavrinou, M., Nøkkevangen, S., Einevoll, G., Hafting, T., & Fyhn, M. (2017). Temporal Processing in the Visual Cortex of the Awake and Anesthetized Rat. ENeuro, 4(4), 59–76. https://doi.org/10.1523/ENEURO.0059-17.2017

      • Barthó, P., Hirase, H., Monconduit, L., Zugaro, M., Harris, K. D., & Buzsáki, G. (2004). Characterization of Neocortical Principal Cells and Interneurons by Network Interactions and Extracellular Features. Journal of Neurophysiology, 92(1), 600–608. https://doi.org/10.1152/jn.01170.2003

      • Bortone, D. S., Olsen, S. R., & Scanziani, M. (2014). Translaminar Inhibitory Cells Recruited by Layer 6 Corticothalamic Neurons Suppress Visual Cortex. Neuron, 82, 474–485. https://doi.org/10.1016/j.neuron.2014.02.021

      • Bruno, R. M., & Simons, D. J. (2002). Feedforward Mechanisms of Excitatory and Inhibitory Cortical Receptive Fields. The Journal of Neuroscience, 22(24), 10966–10975. https://doi.org/10.1523/JNEUROSCI.22-24-10966.2002

      • Dasilva, M., Camassa, A., Navarro-Guzman, A., Pazienti, A., Perez-Mendez, L., Zamora-López, G., Mattia, M., & Sanchez-Vives, M. V. (2021). Modulation of cortical slow oscillations and complexity across anesthesia levels. NeuroImage, 224, 117415. https://doi.org/10.1016/j.neuroimage.2020.117415

      • Franken, P., Malafosse, A., & Tafti, M. (1999). Genetics of sleep regulation in mice-Franken et al Genetic Determinants of Sleep Regulation in Inbred Mice. SLEEP, 22(2). https://academic.oup.com/sleep/article/22/2/155/2731698

      • Grenier, F., Timofeev, I., & Steriade, M. (1998). Leading role of thalamic over cortical neurons during postinhibitory rebound excitation. Proceedings of the National Academy of Sciences of the United States of America, 95(23), 13929–13934. https://doi.org/10.1073/pnas.95.23.13929

      • Guido, W., & Weyand, T. (1995). Burst responses in thalamic relay cells of the awake behaving cat. Journal of Neurophysiology, 74(4), 1782–1786. https://doi.org/10.1152/JN.1995.74.4.1782

      • Histed, M. H., Bonin, V., & Reid, R. C. (2009). Direct Activation of Sparse, Distributed Populations of Cortical Neurons by Electrical Microstimulation. Neuron, 63(4), 508–522. https://doi.org/10.1016/j.neuron.2009.07.016

      • Jia, X., Siegle, J., Bennett, C., Gale, S., Denman, D. R., Koch, C., & Olsen, S. (2016). High-density extracellular probes reveal dendritic backpropagation and facilitate neuron classification 1 2. Journal of Neurophysiology, 121(5), 1831–1847. https://doi.org/10.1101/376863

      • Kobayashi, G., Tanaka, K. F., & Takata, N. (2023). Pupil Dynamics-derived Sleep Stage Classification of a Head-fixed Mouse Using a Recurrent Neural Network. The Keio Journal of Medicine, 2022-0020-OA. https://doi.org/10.2302/KJM.2022-0020-OA

      • Michelson, N. J., & Kozai, T. D. Y. (2018). Isoflurane and ketamine differentially influence spontaneous and evoked laminar electrophysiology in mouse V1. Journal of Neurophysiology, 120(5), 2232. https://doi.org/10.1152/JN.00299.2018

      • Niell, C. M., & Stryker, M. P. (2008). Highly selective receptive fields in mouse visual cortex. Journal of Neuroscience, 28(30), 7520–7536. https://doi.org/10.1523/JNEUROSCI.0623-08.2008

      • Nowak, L. G., & Bullier, J. (1998). Axons, but not cell bodies, are activated by electrical stimulation in cortical gray matter. II. Evidence from selective inactivation of cell bodies and axon initial segments. Experimental Brain Research, 118(4), 489–500. https://doi.org/10.1007/S002210050305/METRICS

      • Roux, L., Stark, E., Sjulson, L., & Buzsáki, G. (2014). In vivo optogenetic identification and manipulation of GABAergic interneuron subtypes. Current Opinion in Neurobiology, 26, 88–95. https://doi.org/10.1016/j.conb.2013.12.013

      • Sirota, A., Montgomery, S., Fujisawa, S., Isomura, Y., Zugaro, M., & Buzsáki, G. (2008). Entrainment of Neocortical Neurons and Gamma Oscillations by the Hippocampal Theta Rhythm. Neuron, 60(4), 683–697. https://doi.org/10.1016/j.neuron.2008.09.014

      • Sorrenti, V., Cecchetto, C., Maschietto, M., Fortinguerra, S., Buriani, A., & Vassanelli, S. (2021). Understanding the Effects of Anesthesia on Cortical Electrophysiological Recordings: A Scoping Review. International Journal of Molecular Sciences, 22(3), 1286. https://doi.org/10.3390/IJMS22031286

      • Tehovnik, E. J., & Slocum, W. M. (2013). Two-photon imaging and the activation of cortical neurons. Neuroscience, 245(March), 12–25. https://doi.org/10.1016/j.neuroscience.2013.04.022

      • Tenke, C. E., & Kayser, J. (2012). Generator localization by current source density (CSD): Implications of volume conduction and field closure at intracranial and scalp resolutions. Clinical Neurophysiology, 123(12), 2328–2345. https://doi.org/10.1016/J.CLINPH.2012.06.005

      • Turner, K. L., Gheres, K. W., Proctor, E. A., & Drew, P. J. (2020). Neurovascular coupling and bilateral connectivity during nrem and rem sleep. ELife, 9, 1. https://doi.org/10.7554/ELIFE.62071

      • Yüzgeç, Ö., Prsa, M., Zimmermann, R., & Huber, D. (2018). Pupil Size Coupling to Cortical States Protects the Stability of Deep Sleep via Parasympathetic Modulation. Current Biology, 28(3), 392. https://doi.org/10.1016/J.CUB.2017.12.049

      • Zhang, X., Landsness, E. C., Chen, W., Miao, H., Tang, M., Brier, L. M., Culver, J. P., Lee, J. M., & Anastasio, M. A. (2022). Automated sleep state classification of wide-field calcium imaging data via multiplex visibility graphs and deep learning. Journal of Neuroscience Methods, 366, 109421. https://doi.org/10.1016/J.JNEUMETH.2021.109421

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript, "A versatile high-throughput assay based on 3D ring-shaped cardiac tissues generated from human induced pluripotent stem cell-derived cardiomyocytes" developed a unique culture platform with PEG hydrogel that facilitates the in-situ measurement of contractile dynamics of the engineered cardiac rings. The authors optimized the tissue seeding conditions, demonstrated tissue morphology with expressions of cardiac and fibroblast markers, mathematically modeled the equation to derive contractile forces and other parameters based on imaging analysis, and ended by testing several compounds with known cardiac responses.

      To strengthen the paper, the following comments should be considered:

      1) This paper provided an intriguing platform that creates miniature cardiac rings with merely thousands of CMs per tissue in a 96-well plate format. The shape of the ring and the squeezing motion can recapitulate the contraction of the cardiac chamber to a certain degree. However, Thavandiran et al (PNAS 2013) created a larger version of the cardiac ring and found the electrical propagation revealed spontaneous infinite loop-like cycles of activation propagation traversing the ring. This model was used to mimic a reentrant wave during arrhythmia. Therefore, it presents great concerns if a large number of cardiac tissues experience arrhythmia by geometry-induced re-entry current and cannot be used as a healthy tissue model. It would be interesting to see the impulse propagation/calcium transient on these miniature cardiac rings and evaluate the % of arrhythmia occurrence.

      The size is a key factor impacting the electrical propagation within the generated tissues. Our ring-shaped cardiac tissues have a diameter of 360µm, which is largely smaller than other tissues proposed so far, including in Thavandiran et al (PNAS 2013) where circular tissues had a reported size > 1mm. As shown in Figure 4E (and highlighted below in Author Response Figure 1), tissues under basal conditions display regular beating rates without spontaneous arrhythmias. Videos also show that the tissue contraction is homogeneous around the pillar, suggesting that the smaller size favors the electrical propagation and limits the occurrence of spontaneous reentrant waves. Optical mapping measurements will be performed in the future to assess the occurrence of reentrant waves.

      Author Response Figure 1: Poincaré plot showing the plots between successive RR intervals (Data from Figure 4E in basal conditions). Linear regression with 95% confidence interval indicates identity.

      2) The platform can produce 21 cardiac rings per well in 96-well plates. The throughput has been the highest among competing platforms. The resulting tissues have good sarcomere striation due to the strain from the pillars. Now the emerging questions are culture longevity and reproducibility among tissues. According to Figure 1E, there was uneven ring formation around the pillar, which leads to the tissue thinning and breaking off. There is only 50% survival after 20 days of culture in the optimized seeding group. Is there any way to improve it? The tissues had two compartments, cardiac and fibroblast-rich regions, where fibroblasts are responsible for maintaining the attachment to the glass slides. Do the cardiac rings detach from the glass slides and roll up? The SD of the force measurement is a quarter of the value, which is not ideal with such a high replicate number. As the platform utilizes imaging analysis to derive contractile dynamics, calibration should be done based on the angle and the distance of the camera lens to the individual tissues to reduce the error. On the other hand, how reproducible of the pillars? It is highly recommended to mechanically evaluate the consistency of the hydrogel-based pillars across different wells and within the wells to understand the variance. Figure 2B reports the early results obtained as the system was tested and developed. Since then, we have tested different iPSC lines and confirm that the overall yield is higher (up to 20 tissues at D14 for some cell lines), however dependent of cell lines.

      The tissues do not detach from the glass slides. It is very rare to see tissues roll up on the central pillar. As shown in Figure 1B, the pillars have a specific shape to avoid tissues to roll up as they develop and contract.

      3) Does the platform allow the observation of non-synchronized beating when testing with compounds? This can be extremely important as the intended applications of this platform are drug testing and cardiac disease modeling. The author should elaborate on the method in the manuscript and explain the obtained results in detail. The arrhythmogenic effect of a drug can be derived from the regularity of the beat-to-beat time. Indeed, we show that dofetilide increases the variability in the beat-to-beat time by plotting for each beat, the beat-to-beat time with the next beat as a function of the beat-to-beat time with the previous beat.

      4) The results of drug testing are interesting. Isoproterenol is typically causing positive chronotropic and positive inotropic responses, where inotropic responses are difficult to obtain due to low tissue maturity. It is inconsistent with other reported results that cardiac rings do not exhibit increased beating frequency, but slightly increased forces only. Zhao et al were using electrical pacing at a defined rate during force measurement, whereas the ring constructs are not.

      We agree. The difference in the response to isoproterenol with previous papers may be explained by different incubation timing with the drug. In our case, the tissues were incubated for 5 minutes at 37•C before being recorded.

      Overall, the manuscript is well written and the designed platform presented the unique advantages of high throughput cardiac tissue culture. Besides the contractile dynamics and IHC images, the paper lacks other cardiac functional evaluations, such as calcium handling, impulse propagation, and/or electrophysiology. The culture reproducibility (high SD) and longevity (<20 days) still remain unsolved.

      Since the submission, we have managed to keep some tissues and analyze them up to 32 days. At that time point the tissues are still beating. Nevertheless, a specific study concerning tissue longevity has not been carried out as the tissues were usually fixed after 14 days to be stained and analyze their structure.

      Reviewer #2 (Public Review):

      The authors should be commended for developing a high throughput platform for the formation and study of human cardiac tissues, and for discussing its potential, advantages and limitations. The study is addressing some of the key needs in the use of engineered cardiac tissues for pharmacological studies: ease of use, reproducible preparation of tissues, and high throughput.

      There are also some areas where the manuscript should be improved. The design of the platform and the experimental design should be described in more detail.

      It would be of interest to comprehensively document the progression of tissue formation. To this end, it would be helpful to show the changes in tissue structure through a series of images that would correspond to the progression of contractile properties shown in Figure 3.

      Our results indicate that the fibroblasts/cardiomyocytes segregation likely happens as soon as the tissue is formed, as the fibroblasts are critical for tissue generation. The change with time in the shape of the contractile ring is reported in Figure 1E, with a series of images which correspond to the timepoints of Figure 3.

      The very interesting tissue morphology (separation into the two regions) that was observed in this study is inviting more discussion.

      Finally, the reader would benefit from more specific comparisons of the contractile function of cardiac tissues measured in this study with data reported for other cardiac tissue models.

    1. Author Response

      We believe that these findings make a significant contribution to the field of CNS endothelial cell biology and blood-brain barrier. We thank you for your time and consideration.

      Reviewers' 1 and 2 concern on endothelial cells (ECs) transcription changes on culture.

      We would like to express our gratitude to the reviewers for their critical comments. We are pleased to address the concerns raised by performing FACS sorting of the CNS ECs from E-13.5 and adult brain. However, it is important to note that both E-13.5 ECs and adult ECs were cultured in the same media. It is worth mentioning that this work was initiated in 2017, whereas the article mentioned by Reviewer 1 was published in 2020. We went through a series of standardization steps before identifying the Corning endothelial cell culture media (Cat#355054) with 2% FCS as the optimal medium for preserving EC identity in culture. Conversely, if PromoCell media (C-22110) is used, a decrease in the Wnt pathway can be observed, and the use of 5% FCS enhances the Wnt pathway in E-13.5 ECs. The article mentioned by Reviewer 1 (https://elifesciences.org/articles/51276) did not take these differences in culture media into account. Additionally, we did not employ puromycin for obtaining pure ECs, and the ECs were cultured for a maximum of 8 days. Our in vitro study serves as a model for identifying the epigenetic regulators HDAC2 and PRC2 as controllers of BBB gene transcription, which is subsequently validated in an in vivo model.

      Reviewer-1 Comment 2- An additional concern is that for many experiments, siRNA knockdowns are performed without validation of the efficacy of the knockdown

      In the revised version of this manuscript, we will include validation results to demonstrate the effectiveness of siRNA knockdown experiments.

      Reviewer-1 Comment 3- Some experiments in the paper are promising, however. For example, the knockout of HDAC2 in endothelial cells resulting in BBB leakage was striking. Investigating the mechanisms underlying this phenotype in vivo could yield important insights.

      We appreciate your positive comment. The in vivo HDAC2 knockout experiment will serve as a validation of our in vitro findings, indicating that the epigenetic regulator HDAC2 can control the expression of endothelial cell (EC) genes involved in angiogenesis, blood-brain barrier (BBB) formation, and maturation. We are actively working on this model, and we plan to publish additional molecular data on epigenetically regulated CNS vascular development and maintenance in our future publications.

      Reviewer 2 Comment-2 The use of qPCR assays for quantifying ChIP and transcript levels is inferior to ChIPseq and RNAseq. Whole genome methods, such as ChIPseq, permit a level of quality assessment that is not possible with qPCR methods. The authors should use whole genome NextGen sequencing approaches, show the alignment of reads to the genome from replicate experiments, and quantitatively analyze the technical quality of the data.

      We appreciate the reviewer's comment. While it is true that whole-genome methods such as ChIP-seq and RNA-seq provide comprehensive and high-throughput analysis compared to qPCR assays, it would be incorrect to consider qPCR as inferior. qPCR assays offer advantages in terms of sensitivity, specificity, validation, confirmation, and targeted analysis. We agree that performing a comprehensive analysis of HDAC2 and PRC2 targeted endothelial cell (EC) genes is important. We are currently in the process of generating this data, and as soon as it is complete, we will publish it accordingly.

      Reviewer 2 Comment-3 Third, the observation that pharmacologic inhibitor experiments and conditional KO experiments targeting HDAC2 and the Polycomb complex perturb EC gene expression or BBB integrity, respectively, is not particularly surprising as these proteins have broad roles in epigenetic regulation in a wide variety of cell types.

      We appreciate the comments from the reviewers. Our results provide valuable insights into the specific epigenetic mechanisms that regulate BBB genes It is important to recognize that different cell types possess stage-specific distinct epigenetic landscapes and regulatory mechanisms. Rather than having broad roles across diverse cell types, it is more likely that HDAC2 (eventhough there are several other class and subtypes of HDACs) and the Polycomb complex exhibit specific functions within the context of EC gene expression or BBB integrity.

      Moreover, the significance of our findings is enhanced by the fact that epigenetic modifications are often reversible with the assistance of epigenetic regulators. This makes them promising targets for BBB modulation. Targeting epigenetic regulators can have a widespread impact, as these mechanisms regulate numerous genes that collectively have the potential to promote the vascular repair.

      A practical advantage is that FDA-approved HDAC2 inhibitors, as well as PRC2 inhibitors (such as those mentioned in clinical trials NCT03211988 and NCT02601950, are already available. This facilitates the repurposing of drugs and expedites their potential for clinical translation.

      Please note: illustrations of Fig-1, 4 and 6 are created using Biorender.com, license purchased by Spiros Blackburn. This will be added to the Acknowledgments.

    1. Author Response

      eLife assessment

      This study presents a potentially valuable discovery which indicates that activation of the P2RX7 pathway can reduce the degree of lung fibrosis caused by other inflammatory pathways. If confirmed, the study could clarify the role of specific immune networks in the establishment and progression of lung fibrosis.

      Thanks for this positive comment. Indeed, knowing that lung fibrosis is partly driven by inflammation, with a dysregulated Th1/Th2/Th17 ratio (PMID 20176803, PMID 19682929), we hypothesized that modulating the immune response would be able to attenuate lung fibrosis. To address this issue, we proposed to boost the activation of P2RX7, a purinergic receptor with immunomodulatory properties (PMID 8614837, PMID 11035104), in the well characterized bleomycin-induced lung fibrosis mouse model (PMID 25959210). In this study, we used a pyroglutamic derivative compound (HEI3090) able to specifically enhance P2RX7-dependent biological activities (cationic channel and macropore opening) only in the presence of extracellular ATP, which was qualified as the first representative of an immunotherapy relying on the activation of P2RX7 expressed by dendritic cells (PMID 33510147), and we showed that lung fibrosis is attenuated in mice treated with HEI3090 as compared to vehicle treated mice.

      However, the presented data and analyses are incomplete as they rely on limited pharmacological treatments and because there is an absence of key control studies, validation experiments and statistical analyses.

      Quantification of lung fibrosis:

      Quantification of lung fibrosis was made on the basis of a modified Ashcroft score which assigns 8 grades to quantify lung fibrosis reliably and reproducibly (PMID 18476815). To be even more accurate and not biased by patchy lesions observed in all existing lung fibrosis induced mouse models, the whole lungs (left and right lobes) were divided in section of 880 µm2 and each section was scored individually. A total of 80 to 110 sections were analyzed per mouse. We agree that our text requires clarification. In parallel, the collagen amount given by the polarization intensity of the Sirius red staining of the lung slices was quantified with a homemade ImageJ/Fiji macro program. Further, we recently analyzed by FACS the percentage of PDGFRα (a specific marker of fibroblasts and myofibroblasts) positive cells in lungs isolated from vehicle and HEI3090-treated mice. All these 3 different markers of lung damage show that HEI3090 attenuates bleomycin-induced lung fibrosis and therefore validate the use of the Ashcroft score to accurately study the extend of lung fibrosis. We are going add quantification of collagen fibers in all figures.

      Limited pharmacological treatments:

      We have designed and characterized HEI3090 in a previous study and have shown that it is a positive modulator of P2RX7 (PMID 33510147).

      To test its effect on lung fibrosis, we tested two pharmacological regimens using HEI3090 and have shown that both regimens are effective in limiting the progression of fibrosis. While having shown the requirement of P2RX7 for the activity of HEI3090 (PMID 33510147), we used in this study p2rx7 KO mice which were adoptively transferred with splenocytes isolated from p2rx7 KO mice to demonstrate the involvement of P2RX7 to mediate the antifibrotic effect of HEI3090. This experiment also serves as control to validate the adoptive transfer experiment.

      We agree that proving and validating furthermore that activation of the P2RX7/IL-18 pathway can limit the progression of fibrosis requires the use of other activators of P2RX7. However, to date, HEI3090 is the only pharmacological compound described to activate the receptor. Indeed, the other chemical compounds described in the literature are negative allosteric modulator of P2RX7 (PMID 27935479),

      Absence of key control studies and validation experiments:

      The importance of P2RX7 in the antifibrotic effect of HEI3090 was demonstrated thanks to P2RX7 KO mice (supplementary figures S6B). We are going to implement this figure with additional mice.

      The importance of immune cells was demonstrated thanks to adoptive transfer of WT splenocytes (expressing P2RX7) into P2RX7 KO mice. We agree that lung fibrosis is attenuated in vehicle-treated P2RX7 KO mice, but lung fibrosis is still present and could be modulated by treatments as demonstrated by adoptive transfer of splenocytes isolated from IL-1B KO mice who still respond to HEI3090 as shown in Supplementary figure S6C.

      As suggested by reviewers we examined the effect of genetic background using two-way Anova test and the result is “the interaction is considered not significant”.

      The prevalence of transferred immune cells on endogenous cells is demonstrated in supplementary figure S5, where intravenous injection of splenocytes isolated from P2RX7 KO mice into WT mice abolishes the antifibrotic effect of HEI3090. This experiment further validates the requirement of immune cells and the efficacy of the adoptive transfer approach.

      Statistical analyses:

      In this study we compared side by side the effect of HEI3090 versus vehicle in different genetic backgrounds in order to characterize the implication of actors of the P2RX7/IL-18 pathway in the antifibrotic effect of HEI3090. We also examined the effect of genetic background using the two-way Anova test. Following European recommendations, and in agreement with the ARRIVE guidelines for mice studies, we performed provisional statistic to evaluate the number of mice required in the study and stopped the experiments when significantly statistical results were observed. We agree that results are heterogeneous, however this heterogeneity does not prevent data analyses as shown in supplementary figure S6D, where adoptive transfer of splenocytes isolated from IL-1B KO mice into P2RX7 KO mice dampens BLM-induced lung fibrosis (with an Ashcroft score of 1.8 versus 3 in WT mice) but still responds to HEI3090, thus indicating that IL-1B is not required to mediate the antifibrotic effect of HEI3090.

    1. Author Response

      We thank all reviewers for constructive critiques. We plan to perform new experiments and revise our manuscript accordingly. The text and Figures are currently undergoing the revision process. Below highlights our revision plan.

      eLife assessment

      The findings of this article provide valuable information on the changes of cell clusters induced by chronic periodontitis. The observation of a new fibroblast subpopulation, which was named as AG fibroblasts, was quite interesting, but needs further evidence. The strength of evidence presented is incomplete.

      RESPONSE: We discovered a new subpopulation of gingival fibroblasts, named AG fibroblasts, using non-biased single cell RNA sequencing (scRNA-seq) of mouse gingival samples undergoing the development of ligature-induced periodontitis. AG fibroblasts exhibited a unique gene expression profile: [1] constitutive expression of type XIV collagen; and [2] ligature-induced upregulation of chemokines such as CXCL12. As a biomedical data science experiment, we validated the scRNA-seq observation using immunohistochemical experiment, which showed the presence of type XIV collagen-positive and CXCL12-positive gingival fibroblasts localized immediately under the gingival epithelium and the coronal region of periodontal ligament.

      We agree that the functional/pathological role of AG fibroblasts must be further explored. We have hypothesized that AG fibroblasts initially sense the pathological stress including oral microbial stimuli and secrete inflammatory signals through chemokine expression. To address this hypothesis, in this revision, we plan to analyze a separate scRNA-seq data for AG fibroblast gene expression profile derived from mouse gingival tissues that have been stimulated by Toll-Like Receptor 9 (TLR9) ligand (unmethylated CpG oligonucleotide) and TLR2/4 ligand (LPS). This approach mimics the initial pathological stress applied to gingival tissue. The new insight of AG fibroblasts will be presented in the revision.

      Reviewer #1 (Public Review):

      In this article, the authors found a distinct fibroblast subpopulation named AG fibroblasts, which are capable of regulating myeloid cells, T cells and ILCs, and proposed that AG fibroblasts function as a previously unrecognized surveillant to orchestrate chronic gingival inflammation in periodontitis. Generally speaking, this article is innovative and interesting, however, there are some problems that need to be addressed to improve the quality of the manuscript.

      RESPONSE: We appreciate this comment. As suggested, we further investigated the surveillant function of AG fibroblasts by reanalyzing the scRNA-seq data for stress sensing receptors such as Toll-Like Receptors (TLR). Therefore, we analyzed AG fibroblast gene expression profile when the putative ligands to TLR2/4 and TLR9 are applied to mouse gingival tissue instead of ligature placement. We believe that this first step analysis should warrant to dissect further the function of AG fibroblasts in the future.

      Results:

      1) It is recommended to add HE staining and immunohistochemistry staining to observe the inflammation, tissue damage, and repair status from 0 to 7 days, so that readers can understand cell phenotype changes corresponding to the periodontitis stage. The observation index can include inflammation and vascular related indicators.

      RESPONSE: As recommended, representative histological figures will be included. We will further perform new immunohistochemistry experiment of mouse gingival tissue (D0, D1, D4, D7). We plan to highlight the infiltration of CD45+ immune cells. We also plan to highlight the progressive degeneration of gingival collagen fiber by picrosirius red staining.

      2) Figure 1A-1D can be placed in the supplementary figure.

      RESPONSE: Combining the new data above, Figure 1 will be revised as suggested.

      3) I suggest the authors to put the detection of the existence of AG fibroblasts before exploring its relationship with other types of cells.

      4) The layout of the picture should be closely related to the topic of the article. It is recommended to readjust the layout of the picture. Figure 1 should be the detection of AG cells and their proportion changes from 0 to 7 days. In other figures, the authors can separately describe the proportion changes of myeloid cells, T cells and ILCs, and explored the association between AG fibroblasts and these cell types.

      RESPONSE: As suggested, the presentation order of Figures and text will be revised to bring the information about AG fibroblasts first. The chemokine-receptor analysis is moved below.

      Methods:

      It is recommended to separately list the statistical methods section. The statistical method used in the article should be one-way ANOVA.

      RESPONSE: A separate statistical method section is created. As pointed out, we used one-way ANOVA with post-hoc Tukey test (when multiple groups were compared).

      Reviewer #2 (Public Review):

      This study proposed the AG fibroblast-neutrophil-ILC3 axis as a mechanism contributing to pathological inflammation in periodontitis. However, the immune response in the vivo is very complex. It is difficult to determine which is the cause and which is the result. This study explores the relevant issue from one dimension, which is of great significance for a deeper understanding of the pathogenesis of periodontitis. It should be fully discussed.

      RESPONSE: We agree with this comment. We expanded the current understanding of oral immune signal communication in Discussion and highlight how AG fibroblast may fit to it.

      1) Many host cells participate in immune responses, such as gingival epithelial cells. AG fibroblast is not the only cell involved in the immune response, and the weight of its role needs to be clarified. So the expression in the conclusion should be appropriate.

      RESPONSE: Following this critique, we revised INTRODUCTION, DISCUSSION and CONCLUSION, to highlight how AG fibroblasts function within a comprehensive immune response network.

      2) This study cannot directly answer the issue of the relationship between periodontitis and systemic diseases.

      RESPONSE: We agree with this critique. We either deleted or de-emphasized the relationship between periodontitis and systemic diseases throughout the text.

    1. Author Response:

      We appreciate the thorough, fair and concise comments and agree with most, if not all, of the interpretations and critiques. We also value the recommendations and guidance for what constitute the most important additional experiments and analyses. Thank you for your hard work and time. Your investment helps improve the impact and clarity of our work and that is very much appreciated. We look forward to submitting a revised version soon.

    1. Author Response:

      Reviewer #1 (Public Review):

      Overall, I find the work performed by the authors very interesting. However, the authors have not always included literature that seems relevant to their study. For instance, I do not understand why two papers Dunican et al 2013 and Dunican et al 2015, which provide important insight into Lsh/HELLS function in mouse, frog and fish were not cited. It is also important that the authors are specific about what is known and in particular about what is not known about CDCA7 function in DNA methylation regulation. Unless I am mistaken, there is currently only one study (Velasco et al 2018) investigating the effect of CDCA7 disruption on DNA methylation levels (in ICF3 patient lymphoblastoid cell lines) on a genome-wide scale (Illumina 450K arrays). Unoki et al 2019 report that CDCA7 and HELLS gene knockout in human HEK293T cells moderately and extremely reduces DNA methylation levels at pericentromeric satellite-2 and centromeric alpha-satellite repeats, respectively. No other loci were investigated, and it is therefore not known whether a CDCA7-associated maintenance methylation phenotype extends beyond (peri)centromeric satellites. Thijssen et al performed siRNA-mediated knockdown experiments in mouse embryonic fibroblasts (differentiated cells) and showed that lower levels of Zbtb24, Cdca7 and Hells protein correlate with reduced minor satellite repeat methylation, thereby implicating these factors in mouse minor satellite repeat DNA methylation maintenance. Furthermore, studies that demonstrate a HELLS-CDCA7 interaction are currently limited to Xenopus egg extract (Jenness et al 2018) and the human HEK293 cell line (Unoki et al 2019). Whether such an interaction exists in any other organism and is of relevance to DNA methylation mechanisms remains to be determined. Therefore, in my opinion, the conclusion that "Our co-evolution analysis suggests that DNA methylation-related functionalities of CDCA7 and HELLS are inherited from LECA" should be softened, as the evidence for this scenario is not very compelling and seems premature in the absence of molecular data from more species.

      We appreciate this reviewer’s thorough reading of our manuscript.

      Regarding the citation issues, we will cite Dunican 2013 and Dunican 2015.

      As pointed out by the reviewer, the role of CDCA7 in genome DNA methylation was extensively studied in Velasco et al 2018. The result, together with Thijssen et al (2015), and Unoki et al. (2018), supports the idea that ZBTB24, CDCA7 and HELLS act within the same pathway to promote DNA methylation, the pattern of which is overlapping but distinct from DNMT3B-mediated methylation. This observation suggests that a ZBTB24-CDCA7-HELLS mechanism for DNA methylation may involve an alternative DNMT. Interestingly, our analysis of the gene presence-absence pattern revealed that the presence of CDCA7 coincides with DNMT1 more than DNMT3 genes. Indeed, while CDCA7 is lost from diverse branches of eukaryote species, genomes encoding CDCA7 always encode HELLS, and almost always encode DNMT1. Based on this observation, we speculate the role of CDCA7 is tightly linked to HELLS and DNA methylation throughout evolution.

      As pointed out by Reviewer 1, the link between CDCA7, HELLS and DNA methylation has not been determined experimentally across these species. However, based on our previously published and unpublished data, we are confident about the functional interaction between CDCA7 and HELLS in Xenopus laevis and Homo sapiens. Furthermore, the importance of HELLS homologs in DNA methylation has been extensively studied in human, mouse and plants. We hope our current study will motivate the field to experimentally test the evolutionary conservation of HELLS-CDCA7 interaction, as well as their importance in DNA methylation, in other species.

      The authors used BLAST searches to characterize the evolutionary conservation of CDCA7 family proteins in vertebrates. From Figure 2A, it seems that they identify a LEDGF binding motif in CDCA7/JPO1. Is this correct and if yes, could you please elaborate and show this result? This is interesting and important to clarify because previous literature (Tesina et al 2015) reports a LEDGF binding motif only in CDCA7L/JPO2.

      We searched for a LEDGF binding motif ({E/D}-X-E-X-F-X-G-F, also known as IBM described in Tesina et al 2015) in vertebrate CDCA7 proteins, and reported their position in Figure 2A. Examples of identified LEDGF-binding motifs will be presented.

      To provide evidence for a potential evolutionary co-selection of CDCA7, HELLS and the DNA methyltransferases (DNMTs) the authors performed CoPAP analysis. Throughout the manuscript, it is unclear to me what the authors mean when referring to "DNMT3". In the Material and Methods section, the authors mention that human DNMT3A was used in BLAST searches to identify proteins with DNA methyltransferase domains. Does this mean that "DNMT3" should be DNMT3A? And if yes, should "DNMT3" be corrected to "DNMT3A"? Is there a reason that "DNMT3A" was chosen for the BLAST searches?

      As described in the Methods section, both Human DNMT1 and DNMT3A were used to initially identify any proteins containing a domain homologous to the DNA methyltransferase catalytic domain. Within Metazoa, if their orthologs exist, the top hit from BLAST search using human DNMT1 and DNMT3A show E-value 0.0, and thus their orthology is robust. This is even true for DNMT1 and DNMT3 homologs in the sponge Amphimedon queenslandica, which is one of the earliest-branching metazoan species. For other DNMTs, such as DNMT2, DNMT4, DNMT6, we conducted separate BLAST searches using those proteins as baits as described in Methods. The domain was then isolated using the NCBI conserved domains search. The selected DNMT domain sequences were aligned with CLUSTALW to generate a phylogenetic tree to further classify DNMTs (Figure S6). It has been suggested that vertebrate DNMT3A and DNMT3B are derived from duplication of a DNMT3 gene of chordates ancestor (e.g., Liu et al 2020, PMID 31969623). As such many invertebrates encode only one DNMT3. As previously shown (Yaari et al., 2019, PMID 30962443), plants have two distinct DNMT3-like protein family, the ‘true DNMT3’ and DRM, the plant specific de novo DNMT that is often considered to be a DNMT3 homolog (see Reviewer 2’s comment). Our phylogenetic analysis successfully deviated the clade of DNMT3 and DRM from the rest of DNMTs (Figure S6). Yaari et al noted that PpDNMT3a and PpDNMT3b, the two DNMT3 orthologs encoded by the basal plant Physcomitrella patens, are not orthologs of mammalian DNMT3A and DNMT3B, respectively. Therefore, to minimize such nomenclature confusions, any DNMTs that belong to either the DNMT3 or DRM clades indicated in Figure S6 are collectively referred to as ‘DNMT3’ throughout the paper (see Figure S2 for overview).

      CoPAP analysis revealed that CDCA7 and HELLS are dynamically lost in the Hymenoptera clade and either co-occurs with DNMT3 or DNMT1/UHRF1 loss, which seems important. Unfortunately, the authors do not provide sufficient information in their figures or supplementary data about what is already known regarding DNA methylation levels in the different Hymenoptera species to further consider a potential impact of this observation. What is "the DNA methylation status" of all these organisms? This information cannot be easily retrieved from Table S2. A clearer presentation of what is actually known already would improve this paragraph.

      As the DNA methylation status of the species in the Hymenoptera clade has not been comprehensively tested, this precluded us from adding this information to Figure 7. However, we have included the published reports of DNA methylation status for these species in Supplementary Table S2 (see column ‘5mC’; species for which 5mC is detected are marked with Y and the relevant PMID). As indicated, DNA methylation was detected in most tested species except for Microplitis demolitor. Many of these data are based on Bewick et al. 2017 (PMID 28025279). During the preparation of this response, we realized that the DNA methylation status reported for some species in Bewick et al. was inferred from the CpG frequency instead of the direct experimental detection of methylated cytosines. Therefore, we have amended Table S2 to indicate the presence of DNA methylation only for those species where this was experimentally tested. As such, we now consider the DNA methylation status of Fopius arisanus, which lacks DNMT1 and CDCA7, to be unknown. In addition, we realized that Bewick et al. reported that DNA methylation is absent in Aphidius ervi. We originally conducted synteny analysis on Aphidius gifuensis, which lacks DNMT1 and CDCA7, since Aphidius ervi protein data were not available in NCBI. By conducting tBLASTn search against the Aphidius ervi genome, we confirmed that the presence and absence pattern of CDCA7, HELLS, DNMT1, DNMT3 and UHRF1 in Aphidius ervi is identical to that of Aphidius gifuensis. In other words, DNA methylation is known to be absent in Aphidius ervi, which has lost DNMT1 and CDCA7. Altogether, among the 17 Hymenoptera species that we analyzed (listed in the amended Table S2), the 6 species that have detectable DNA methylation all encode CDCA7, whereas the 2 species that do not have detectable DNA methylation lack CDCA7. We will note this finding in the revised text.

      Furthermore, A. thaliana DDM1, and mouse and human Lsh/Hells are known to preferably promote DNA methylation at satellite repeats, transposable elements and repetitive regions of the genome. On the other hand, DNA methylation in insects and other invertebrates occurs in genic rather than intergenic regions and transposable elements (e.g. Bewick et al 2017; Werren JH PlosGenetics 2013). It would be helpful to elaborate on these differences.

      This point was discussed in the third paragraph of the Discussion, but we will better highlight this. It should be noted that, in the Arabidopsis ddm1 mutant, reduction of CG methylation of gene bodies is common (50% of all methylated euchromatic genes) (Zemach et al, 2013). In addition, hypomethylation is not limited to satellite repeats and transposable elements in ICF patients defective in HELLS or CDCA7 (Velasco et al., 2018).

      Reviewer #2 (Public Review):

      In this manuscript, Funabiki and colleagues investigated the co-evolution of DNA methylation and nucleosome remolding in eukaryotes. This study is motivated by several observations: (1) despite being ancestrally derived, many eukaryotes lost DNA methylation and/or DNA methyltransferases; (2) over many genomic loci, the establishment and maintenance of DNA methylation relies on a conserved nucleosome remodeling complex composed of CDCA7 and HELLS; (3) it remains unknown if/how this functional link influenced the evolution of DNA methylation. The authors hypothesize that if CDCA7-HELLS function was required for DNA methylation in the last eukaryote common ancestor, this should be accompanied by signatures of co-evolution during eukaryote radiation.

      [...]

      The data and analyses reported are significant and solid. However, using more refined phylogenetic approaches could have strengthened the orthologous relationships presented. Overall, this work is a conceptual advance in our understanding of the evolutionary coupling between nucleosome remolding and DNA methylation. It also provides a useful resource to study the early origins of DNA methylation related molecular process. Finally, it brings forward the interesting hypothesis that since eukaryotes are faced with the challenge of performing DNA methylation in the context of nucleosome packed DNA, loosing factors such as CDCA7-HELLS likely led to recurrent innovations in chromatin-based genome regulation.

      Strengths: - The hypothesis linking nucleosome remodeling and the evolution of DNA methylation. - Deep mapping of DNA methylation related process in eukaryotes. - Identification and evolutionary trajectories of novel homologs/orthologs of CDCA7. - Identification of CDCA7-HELLS-DNMT co-evolution across eukaryotes.

      Weaknesses: - Orthology assignment based on protein similarity. - No statistical support for the topologies of gene/proteins trees (figure S1, S3, S4, S6) which could have strengthened the hypothesis of shared ancestry.

      We appreciate the reviewers’ accurate summary, nicely emphasizing the importance of the our study. We agree that better phylogenetic analysis for orthology assignment will strengthen our conclusion, and we would like to explore this. Having anticipated this weakness, we specifically conducted a CoPAP analysis exclusively for Ecdysozoa species, where orthology assignment is straightforward, which supported our major conclusion. (For example, if we conduct BLAST search the clonal raider ant Oocerea biroi using human HELLS as a query, top 1 hit is a protein sequence annotated as one of three isoforms of ‘lymphoid-specific helicase” (i.e., HELLS), with E value 0.0. Similarly, top BLAST hit from Oocerea biroi using human DNMT1 as a query also returns with isoforms of DNMT1 with E value 0.0. As such, there are little disputes in orthology assignment in Ecdysozoa. Outside of Chordata, regardless of the alternative methods employed for orthology assignment, this will never be perfect (particularly in Excavata and SAR). Our current orthology assignment for the major targets in this study (HELLS, DNMT1, DNMT3, DNMT5) is largely consistent with published results (Ponger et al., 2005 PMID 15689527; Huff et al, 2014 PMID 24630728; Yaari et al., 2019 PMID 30962443; Bewick et al., 2019 PMID 30778188). However, while we are preparing this response and re-crosschecking our assignments with these references, we realized that we erroneously missed DNMT5 orthologs of Leucosporidium creatinivorum, Postia placenta, Armillaria gallica and Saitoella complicata., and DNMT6 ortholog from Fragilariopsis cylindrus. We also had recognized that DNMT4 orthologs were identified in Fragilariopsis cylindrus and Thalassiosira pseudonana In Huff et al 2014 (PMID 24630728), but in our phylogenetic analysis, these proteins form a distinct clade between DNMT1/Dim-2 and DNMT4 (Figure S6). Due to this ambiguity, we did not count them as DNMT1 or DNMT4 in our CoPAP analysis. These minor errors and ambiguity should not affect our presence-absence pattern in our original CoPAP analysis, and thus we feel that further refinement is unlikely to significantly affect our major conclusion.

    1. Author Response:

      The following is the authors' response to the current reviews.

      Reviewer #1 (Public Review):

      This revised manuscript by Walker et. al. addresses some of the editorial points and conceptual discussion, but in general, most of my suggestions (as the previous reviewer #1) for additional experimentation or addition were not addressed as discussed below. Therefore, my overall review has not changed.

      In our previous response, we included i) extra experimental data illustrating the reproducibility of our results and ii) added transcription start site data at the request of this reviewer. We included the information because we agreed with the reviewer that these were important points to address. For the points raised again below, we explained why the additional analysis was unlikely to add much in terms of insight or rigour. We have elaborated further below.   

      1) For example, in point 1, the suggested analysis was not performed because it is not trivial. My reason for making this suggestion is that the original manuscript was limited to Vibrio cholerae, and the impact of the manuscript would increase if the findings here were demonstrated to be more broadly applicable. I expect papers published in eLife to have such broad applicability. But no changes were made to the manuscript in this regard. The revised version is still limited to only Vibrio cholerae.

      Our paper is focused on the unexpected co-operative interactions between HapR and CRP. Such co-binding of two transcription factors to the same DNA site is unexpected. Consequently, it is this mode of DNA binding that is likely to be of broad interest. With this in mind, we did provide experimental, and bioinformatic, analyses for other regulatory regions and other vibrio species (Figures S3 and S6). This, in our view, is where the “broad applicability” for papers published in eLife comes from.

      The analysis the reviewer suggests is not related to the main message of our paper. Instead, the reviewer is asking how many HapR binding sites seen here by ChIP-seq are also seen in other vibrio species by ChIP-seq. This is only likely to be of interest to readers with an extremely specific interest in both vibrio species and HapR. The reviewer states above that we did not make the change “because it is not trivial”. This is an oversimplification of the rationale we presented in our response. The analysis is indeed not straightforward. However, much more importantly, the outcome is unlikely to be of interest to many readers, and has no bearing on the rigour of work. With this in mind, we do not think our position is unreasonable. We also stress that, should a reader with this very specific interest want to explore further, all of our data are freely available for them to do so.

      2) For point 2, the activity of FLAG-tag luxO could have been simply validated in a complementation assay. Yes, they demonstrated DNA binding, but that is not the only activity of LuxO.

      DNA binding by LuxO is the only activity of the protein with which we are concerned in our paper. Furthermore, LuxO is very much a side issue; we found binding to only the known targets and potentially, at very low levels, one additional target. No further LuxO experiments were done for this reason. Indeed, even if these data were removed completely, our conclusions would not change or be supported any less vigorously. We are happy to remove the LuxO data if the reviewer would prefer but this would seem like overkill.

      3) For point 7, the transcriptional fusions were not explored at different times or different media, which is also something that was hinted at by other reviewers. In regard to exploring expression at different time points, this seems particularly relevant for QS regulated genes.

      In their previous review, the reviewer did not request that such experiments were done. Similarly, no other reviewer requested these experiments. Instead, this reviewer i) commented that lacZ fusions were not as sensitive as luciferase fusions ii) asked if we had done any time point experiments. We agreed with the first point, whilst also noting that lacZ is not unusual to use as a reporter. For the second point, we responded that we had not done such experiments (which by the reviewer’s own logic would have been complicated using lacZ as a reporter). This seems like a perfectly reasonable way to respond.   

      We should stress that these comments all refer to Figure 2a, which was our initial screening of 23 promoter::lacZ fusions, supported by separate in vitro transcription assays. Only one of these fusions was followed up as the main story in the paper. Given that the other 22 fusions were not investigated further, and do not form part of the main story, there would seem little value in now going back to assay them at different time points.

      4) For point 13, the authors express that doing an additional CHIP-Seq is outside of the scope of this manuscript. Perhaps that is the case, but the point of the comment is to validate the in vitro binding results with an in vivo binding assay. A targeted CHIP-Seq approach specifically analyzing the promoters where cooperative binding was observed in vitro could have addressed this point.

      We did appreciate the original comment, and responded as such, but we do think additional ChIP-seq assays are outside the scope of this paper.

      Reviewer #2 (Public Review):

      This manuscript by Walker et al describes an elegant study that synergizes our knowledge of virulence gene regulation of Vibrio cholerae. The work brings a new element of regulation for CRP, notably that CRP and the high density regulator HapR co-occupy the same site on the DNA but modeling predicts they occupy different faces of the DNA. The DNA binding and structural modeling work is nicely conducted and data of co-occupation are convincing. The work seeks to integrate the findings into our current state of knowledge of HapR and CRP regulated genes at the transition from the environment and infection. The strength of the paper is the nice ChIP-seq analysis and the structural modeling and the integration of their work with other studies.

      We thank the reviewer for the positive comments.

      The weakness is that it is not clear how representative these data are of multiple hapR/CRP binding sites

      This comment does not consider all data in our paper. We did test our model experimentally at multiple HapR and CRP binding sites. These data are shown in Figure S6 and confirm the co-operative interaction between HapR and CRP at 4 of a further 5 shared binding sites tested. We also used bioinformatics to show the same juxtaposition of CRP and HapR sites in other vibrio species (Figure S3). Hence, the model seems representative of most sites shared by HapR and CRP.

      or how the work integrates as a whole with the entire transcriptome that would include genes discovered by others.

      At the request of the reviewers, our revision integrated our ChIP-seq data with dRNA-seq data. No other suggestions to ingrate transcriptome data were made by the reviewers. 

      Overall this is a solid work that provides an understanding of integrated gene regulation in response to multiple environmental cues.

      We thank the reviewer for the positive comment.

      —————

      The following is the authors' response to the original reviews.

      Reviewer #1 (Public Review):

      This manuscript by Walker et. al. explores the interplay between the global regulators HapR (the QS master high cell density (HDC) regulator) and CRP. Using ChIP-Seq, the authors find that at several sites, the HapR and CRP binding sites overlap. A detailed exploration of the murPQ promoter finds that CRP binding promotes HapR binding, which leads to repression of murPQ. The authors have a comprehensive set of experiments that paints a nice story providing a mechanistic explanation for converging global regulation.

      We thank the reviewer for their positive evaluation.

      I did feel there are some weak points though, in particular the lack of integration of previously identified transcription start sites

      For completeness, we have now added the position and orientation or the nearest TSSs to each HapR or LuxO binding peak in Table 1 (based on Papenfort et al.).

      the lack of replication (at least replication presented in the manuscript) for many figures,

      We assume that the reviewer is referring to gel images rather than any other type of assay output (were error bars, derived from replicates, are shown). As is standard, we show representative gel images. All associated DNA binding and in vitro transcription experiments have been done multiple times. Indeed, comparison between figures reveals several instances of such replication (e.g. Figures 4b & 5d, Figures 4d & 5e). We have added details of repeats done to the methods section.

      some oddities in the growth curve

      We do not know why cells lacking hapR have a growth curve that appears biphasic. We can only assume that this is due to some regulatory effect of HapR, distinct from the murQP locus. Despite the unusual shape of the growth curve, the data are consistent with our conclusions.

      and not reexamining their HapR/CRP cooperative binding model in vivo using ChIP-Seq.

      We agree that these would be interesting experiments and, in the future, we may well do such work. Even without these data, our current model is well supported by the data presented (and the reviewer seems to agree with this above).

      Reviewer #2 (Public Review):

      This manuscript by Walker et al describes an elegant study that synergizes our knowledge of virulence gene regulation of Vibrio cholerae. The work brings a new element of regulation for CRP, notably that CRP and the high density regulator HapR co-occupy the same site on the DNA but modeling predicts they occupy different faces of the DNA. The DNA binding and structural modeling work is nicely conducted and data of co-occupation are convincing. The work could benefit from doing a better job in the manuscript preparation to integrate the findings into our current state of knowledge of HapR and CRP regulated genes and to elevate the impact of the work to address how bacteria are responding to the nutritional environment. Importantly, the focus of the work is heavily based on the impact of use of GlcNAc as a carbon source when bacteria bind to chitin in the environment, but absent the impact during infection when CRP and HapR have known roles. Further, the impact on biological events controlled by HapR integration with the utilization of carbon sources (including biofilm formation) is not explored.

      We thank the reviewer for their overall positive evaluation.

      The rigor and reproducibility of the work needs to be better conveyed.

      Reviewer 1 made a similar comment (see above) and we have modified the manuscript accordingly.

      Specific comments to address:

      1)  Abstract. A comment on the impact of this work should be included in the last sentence. Specifically, how the integration of CRP with QS for gene expression under specific environments impacts the lifestyle of Vc is needed. The discussion includes comments regarding the impact of CRP regulation as a sensor of carbon source and nutrition and these could be quickly summarized as part of the abstract.

      We have added an extra sentence. However, we have used cautious language as we do not show impacts on lifestyle (beyond MurNAc utilisation) directly. These can only be inferred.

      2)  Line 74. This paper examines the overlap of HapR with CRP, but ignores entirely AphA. HapR is repressed by Qrrs (downstream of LuxO-P) while AphA is activated by Qrrs. With LuxO activating AphA, it has a significant sized "regulon" of genes turned on at low density. It seems reasonable that there is a possibility of overlap also between CRP and AphA. While doing an AphA CHIP-seq is likely outside the scope of this work, some bioinformatic or simply a visual analysis of the promoters known AphA regulated genes would be interest to comment on with speculation in the discussion and/or supplement.

      In short, everything that the reviewer suggests here has already been done and was covered in our original submission (see text towards the end of the Discussion). Also, we would like to point the referee to our earlier publication (Haycocks et al. 2019. The quorum sensing transcription factor AphA directly regulates natural competence in Vibrio cholerae. PLoS Genet. 15:e1008362).

      3)  Line 100. Accordingly with the above statement, the focus here on HapR indicates that the focus is on gene expression via LuxO and HapR, at high density. Thus the sentence should read "we sought to map the binding of LuxO and HapR of V. cholerae genome at high density".

      Note that expression of LuxO and HapR is ectopic in these experiments (i.e. uncoupled from culture density).

      4)  Line 109. The identification of minor LuxO binding site in the intergenic region between VC1142 and VC1143 raises whether there may be a previously unrecognized sRNA here. As another panel in figure S1, can you provide a map of the intergenic region showing the start codons and putative -10 to -35 sites. Is there room here for an sRNA? Is there one known from the many sRNA predictions / identifications previously done? Some additional analysis would be helpful.

      We have added an extra panel to Figure S1 showing the position of TSSs relative to the location of LuxO binding. We have altered the main text to accommodate this addition..

      5)  Line 117. This sentence states that the CHIP seq analysis in this study includes previously identified HapR regulated genes, but does not reveal that many known HapR regulated genes are absent from Table 1 and thus were missed in this study. Of 24 HapR regulated investigated by Tsou et al, only 1 is found in Table 1 of this study. A few are commented in the discussion and Figure S7. It might be useful to add a Venn Diagram to Figure 1 (and list table in supplement) for results of Tsou et al, Waters et al, Lin et al, and Nielson et al and any others). A major question is whether the trend found here for genes identified by CHIP-seq in this study hold up across the entire HapR regulon. There should also be comments in the discussion on perhaps how different methods (including growth state and carbon sources of media) may have impacted the complexity of the regulon identified by the different authors and different methods.

      We have added a list of known sites to the supplementary material (new Table S1). We were unsure what was meant by the comment “A major question is whether the trend found here for genes identified by CHIP-seq in this study hold up across the entire HapR regulon”. We have added the extra comment to the discussion re growth conditions, also noting that most previous studies relied on in vitro, rather than in vivo, DNA binding assays.

      6)  The transcription data are generally well performed. In all figures, add comments to the figure legends that the experiments are representative gels from n=# (the number of replicate experiments for the gel based assays). Statements to the rigor of the work are currently missing.

      See responses above. We have added a comment on numbers of repeats to the methods section.

      7)  Line 357-360. The demonstration of lack of growth on MurNAc is a nice for the impact of the work. However, more detailed comments are needed for M9 plus glucose for the uninformed reader to be reminded that growth in glucose is also impaired due to lack of cAMP in glucose replete conditions and thus minimal CRP is active. But why is this now dependent of hapR? A reminder also that in LB oligopeptides from tryptone are the main carbon source and thus CRP would be active.

      We find this point a little confusing and, maybe, two issues (murQP regulation, and growth in general) are being conflated. In particular, we do not understand the comment “growth in glucose is also impaired due to lack of cAMP in glucose replete conditions and thus minimal CRP is active”.

      Growth in glucose should indeed result in lower cAMP levels*, and hence less active CRP, but this does not impair growth. This is simply the cell’s strategy for using its preferred carbon source. If the reviewer were instead referring to some aspect of P_murQP_ regulation then yes, we would expect promoter activity to be lower because less active CRP would be available in the presence of glucose. The reviewer also comments “why is this now dependent of hapR?”. We assume that they are referring to some aspect of growth in minimal media with glucose. If so, the only hapR effect is the change in growth rate as cells enter mid-late log-phase (i.e. the growth curve looks somewhat biphasic). A similar effect is seen in all conditions. We do not know why this happens and can only conclude this is due to some unknown regulatory activity of HapR. Overall, the key point from these experiments is that loss if luxO, which results in constitutive hapR expression, lengthens lag phase only for growth with MurNAc as the sole carbon source.

      *Although in V. fischeri (PMID: 26062003) cAMP levels increase in the presence of glucose.

      8)  A great final experiment to demonstrate the model would have been to show co-localization of the promoter by CRP and HapR from bacteria grown in LB media but not in LB+glucose or in M9+glycerol and M9+MurNAc but not M9+glucose. This would enhance the model by linking more directly to the carbon sources (currently only indirect via growth curves)

      This is unlikely to be as straightforward as suggested. The sensitivity of CRP binding to growth conditions is not uniform across different binding sites. For instance, the CRP dependence of the E. coli melAB promoter is only evident in minimal media (PMID: 11742992) whilst the role of CRP at the acs promoter is evident in tryptone broth (PMID: 14651625). Similarly, as noted above, in Vibrio fischeri glucose causes and increase in cAMP levels. (PMID: 26062003).

      9) Discussion. Comments and model focus heavily on GlcNAc-6P but HapR has a regulator role also during late infection (high density). How does CRP co-operativity impact during the in vivo conditions?

      We really can’t answer this question with any certainty; we have not done any infection experiments in this work.

      Does the Biphasic role of CRP play a role here (PMID: 20862321)?

      Again, we cannot answer this question with any confidence as experimentation would be required. However, the suggestion is certainly plausible.

      Reviewer #3 (Public Review):

      Bacteria sense and respond to multiple signals and cues to regulate gene expression. To define the complex network of signaling that ultimately controls transcription of many genes in cells requires an understanding of how multiple signaling systems can converge to effect gene expression and ensuing bacterial behaviors. The global transcription factor CRP has been studied for decades as a regulator of genes in response to glucose availability. It's direct and indirect effects on gene expression have been documented in E. coli and other bacteria including pathogens including Vibrio cholerae. Likewise, the master regulator of quorum sensing (QS), HapR), is a well-studied transcription factor that directly controls many genes in Vibrio cholerae and other Vibrios in response to autoinducer molecules that accumulate at high cell density. By contrast, low cell density gene expression is governed by another regulator AphA. It has not yet been described how HapR and CRP may together work to directly control transcription and what genes are under such direct dual control.

      We thank the reviewer for their assessment of our work.

      Using both in vivo methods with gene fusions to lacZ and in vitro transcription assays, the authors proceed to identify the smaller subset of genes whose transcription is directly repressed (7) and activated (2) by HapR. Prior work from this group identified the direct CRP binding sites in the V. cholerae genome as well as promoters with overlapping binding sites for AphA and CRP, thus it appears a logical extension of these prior studies is to explore here promoters for potential integration of HapR and CRP. Inclusion of this rationale was not included in the introduction of CRP protein to the in vitro experiments.

      We understand the reviewer’s comment. However, the rationale for adding CRP was not that we had previously seen interplay between AphA and CRP (although this is a relevant discussion point, which we did make). Rather, we had noticed that there was an almost perfect CRP site perfectly positioned to activate PmurQP. Hence, CRP was added.

      Seven genes are found to be repressed by HapR in vivo, the promoter regions of only six are repressed in vitro with purified HapR protein alone. The authors propose and then present evidence that the seventh promoter, which controls murPQ, requires CRP to be repressed by HapR both using in vivo and vitro methods. This is a critical insight that drives the rest of the manuscripts focus. The DNase protection assay conducted supports the emerging model that both CRP and HapR bind at the same region of the murPQ promoter, but interpret is difficult due to the poor quality of the blot.

      There are areas of apparent protection at positions +1 to +15 that are not discussed, and the areas highlighted are difficult to observe with the blot provided.

      We disagree on this point. The region between +1 and +15 is inherently resistant to attack by DNAseI and there are only ever very weak bands in this region (lane 1). Other than seeing small fluctuations in overall lane intensity (e.g. lanes 7-12 have a slightly lower signal throughout) the +1 to +15 banding pattern does not change. Conversely, there are dramatic changes in the banding pattern between around -30 and -60 (again, compare lane 1 to all other lanes). That CRP and HapR bind the same region is extremely clear. Also note that this is backed up by mutagenesis of the shared binding site (Figure 4c).

      The model proposed at the end of the manuscript proposes physiological changes in cells that occur at transitions from the low to high cell density. Experiments in the paper that could strengthen this argument are incomplete. For example, in Fig. 4e it is unclear at what cell density the experiment is conducted.

      Such details have been added to the figure legends and methods section.

      The results with the wild type strain are intermediate relative to the other strains tested.

      This is correct, and exactly what we would expect to see based on our model.

      Cell density should affect the result here since HapR is produced at high density but not low density. This experiment would provide important additional insights supporting their model, by measuring activity at both cell densities and also in a luxO mutant locked at the high cell density. Conducting this experiment in conditions lacking and containing glucose would also reveal whether high glucose conditions mimicking the crp results.

      We agree with this idea in principle but note that the output from our reporter gene, β- galactosidase, is stable within cells and tends to accumulate. This is likely to obscure the reduction in expression as cells transition from low to high cell density. Since we have demonstrated the regulatory effects of HapR and CRP both in vivo using gene knockouts, and in vitro with purified proteins, we think that our overall model is very well supported. Further experimental additions may provide an incremental advance but will not alter our overall story. Also note the unexpected increase in intracellular cAMP due to addition of glucose, in Vibrio fischeri (PMID: 26062003).

      Throughout the paper it was challenging to account for the number of genes selected, the rationale for their selection, and how they were prioritized. For example, the authors acknowledged toward the end of the Results section that in their prior work, CRP and HapR binding sites were identified (line 321-22).

      This is not quite what we say, and maybe the reviewer misunderstood, which is our fault. The prior work identified CRP sites whilst the current work identified HapR sites. We have made a slight alteration to the text to avoid confusion.

      It is unclear whether the loci indicated in Table 1 all from this prior study. It would be useful to denote in this table the seven genes characterized in Figure 2 and to provide the locus tag for murPQ.

      Again, we are unsure if we have confused the reviewer. The results in Table 1 are all HapR sites from the current work, not a prior study. However, some of these also correspond to CRP binding regions found in prior work.

      The reviewer mentions “the seven genes characterised in Figure 2” but 23 targets were characterised in Figure 2a and 9 in Figure 2b. The “VC” numbers used in Figure 2 are the same as used in Table 1 so it is easy to cross reference between the two. We have added a footnote to Table 1, also referred to in the Figure 2 legend, to allow cross referencing between gene names and locus tags (including for murQP and hapR).

      Of the 32 loci shown in Table 1, five were selected for further study using EMSA (line 322), but no rationale is given for studying these five and not others in the table.

      This is not quite correct, we did not select 5 from the 32 targets listed in Table 1. We selected 5 targets from Table 1 that were also targets for CRP in our prior paper. This was the rationale.

      Since prior work identified a consensus CRP binding motif, the authors identify the DNA sequence to which HapR binds overlaps with a sequence also predicted to bind CRP. Genome analysis identified a total of seven sites where the CRP and HapR binding sites were offset by one nucleotide as see with murPQ. Lines 327-8 describe EMSA results with several of these DNA sequences but provides no data to support this statement. Are these loci in Table 1?

      This comment is a little difficult to follow, and we may have misunderstood, but we think that the reviewer is asking where the EMSA data referred to on lines 327-328 resides. We can see that the text could be confusing in this regard. We had referred to the relevant figure (Figure S6) on line 324 but did not again include this information further down in the description of the result. We have changed the text accordingly.

      Using structural models, the authors predict that HapR repression requires protein-protein interactions with CRP. Electromobility shift assays (EMSA) with purified promoter DNA, CRP and HapR (Fig 5d) and in vitro transcription using purified RNAP with these factors (Figure 5e) support this hypothesis. However, the model proports that HapR "bound tightly" and that it also had a "lower affinity" when CRP protein was used that had mutations in a putative interaction interface. These claims can be bolstered if the authors calculate the dissociation constant (Kd) value of each protein to the DNA. This provides a quantitative assessment of the binding properties of the proteins.

      The reviewer is correct that we do not explicitly provide a Kd. However, in both Figures 5d and 5e, we do provide very similar quantification. In 5d, our quantification is the % of the CRP-DNA complex bound by HapR (using either wild type or E55A CRP). Since the % of DNA bound is shown, and the protein concentrations are provided in the figure legend, information regarding Kd is essentially already present. In 5e, we show the % of maximal promoter activity. This is a reasonable way of quantifying the result. Furthermore, Kd is not a metric we can measure directly in this experiment that is not a DNA binding assay.

      The concentrations of each protein are not indicated in panels of the in vitro analysis, but only the geometric shapes denoting increasing protein levels.

      The protein concentrations are all provided in the figure legend. It is usual to indicate relative concentrations in the body of the figure using geometric shapes.

      Panel 5e appears to indicate that an intermediate level of CRP was used in the presence of HapR, which presumably coincides with levels used in lane 4, but rationale is not provided.

      There was no particular rationale for this, it was simply a reasonable way to do the experiment.

      How well the levels of protein used in vitro compare to levels observed in vivo is not mentioned.

      The protein concentrations that we use (in the nM to low μM range) are very typical for this type of work and consistent with hundreds of prior studies of protein-DNA interactions. The general rule of thumb is that 1000 molecules of a protein per bacterial cell equates to a concentration of around 1 μM. However, molecular crowding is likely to increase the effective concentration. Additionally, in vitro, where the DNA concentration is higher.

      The authors are commended for seeking to connect the in vitro and vivo results obtained under lab conditions with conditions experienced by V. cholerae in niches it may occupy, such as aquatic systems. The authors briefly review the role of MurPQ in recycling of the cell wall of V. cholerae by degrading MurNAc into GlcNAc, although no references are provided (lines 146-50). Based on this physiology and results reported, the authors propose that murPQ gene expression by these two signal transduction pathways has relevance in the environment, where Vibrios, including V. cholerae, forms biofilms on exoskeleton composed of GlcNAc.

      We have added a citation to the section mentioned.

      The conclusions of that work are supported by the Results presented but additional details in the text regarding the characteristics of the proteins used (Kd, concentrations) would strengthen the conclusions drawn. This work provides a roadmap for the methods and analysis required to develop the regulatory networks that converge to control gene expression in microbes. The study has the potential to inform beyond the sub-filed of Vibrios, QS and CRP regulation.

      As noted above, quantification essentially equivalent to Kd is already shown (% of bound substrate is indicated in figures and all protein concentrations are given in the figure legends).

      Reviewer #1 (Recommendations For The Authors):

      1.  As similar experiments have been performed in other Vibrios, it would be interesting to do a more detailed analysis of the similarities and differences between the species. Perhaps a Venn diagram showing how many targets were found in all studies versus how many are species specific.

      We appreciate this suggestion but would prefer not to make this change. A cross-species analysis would be very time consuming and is not trivial. The presence and absence of each target gene, for all combinations of organisms, would first need to be determined. Then, the presence and absence of binding signals for HapR, or its equivalent, would need to be determined taking this into account. For most readers, we feel that this analysis is unlikely to add much to the overall story. Given the amount of effort involved, this seems a “non-essential” change to make.

      2.  Line 101-Are the FLAG tagged versions of LuxO and HapR completely functional? Can they complement a luxO or hapR deletion mutant?

      The activity of FLAG tagged HapR (LuxR in other Vibrio species) has been shown previously (e.g. PMIDs 33693882 and 23839217). Similarly, N-terminal HapR tags are routinely used for affinity purification of the protein without ill effect. We have not tested LuxO-3xFLAG for “full” activity, though this fusion is clearly active for DNA binding, the only activity that we have measured here, since all know targets are pulled down.

      3.  Line 106-As the authors state later that there are additional smaller peaks for HapR that could be other direct targets, I think a brief mention of the methodology used to determine the cutoff for the 5 and 32 peaks for LuxO and HapR, respectively, would be informative here.

      We have added a little more text to the methods section. The added text states “Note that our cut- off was selected to identify only completely unambiguous binding peaks. Hence, weak or less reproducible binding signals, even if representing known targets, were excluded (see Discussion for further details)”.

      4.  Line 118-Need a reference here to the prior HapR binding site.

      This has been added.

      5.  Figs. 1e-What do the numbers on the x-axis refer to? Why not just present these data as bases? The authors also refer to distance to the nearest start codon, but this is irrelevant for 4/5 of the luxO targets as they are sRNAs. They should really refer to the distance to the transcription start site. Likewise, for HapR, distance to the nearest start codon is not as informative as distance to the nearest transcription start site. A recent paper used transcriptomics to map all the transcription start sites of V. cholerae, and these results should be integrated into the author's study rather than just using the nearest start codon (PMID: 25646441).

      The numbers are kilo base pairs, this has been added to the axis label. We have also changed “start codon” to “gene start” (since “gene start” is also suitable for genes that encode untranslated RNAs).

      Re comparing binding peak positions to transcription start sites (TSSs) rather than gene starts, this analysis would be useful if TSSs could be detected for all genes. However, some genes are not expressed under the conditions tested by PMID: 25646441, so no TSS is found. Consequently, for HapR or LuxO bound at such locations, we would not be able to calculate a meaningful position relative to the TSS. We stress that the point of the analysis is to determine how peaks are positioned with respect to genes (i.e. that sites cluster near gene 5’ ends). Also note that nearest TSSs are now shown in the revised Table 1. In some cases, these are unlikely to be the TSS actually subject to regulation (e.g. because the regulated gene is switched off).

      6.  Fig. 1e-Is there directionality to the site? In other words, if a HapR binding site is located between two genes that are transcribed in opposite directions, is there a way to predict which gene is regulated? It looks like this might be the case with the list presented in Table 1, but how such determination is made and what the various symbol in Table 1 mean are not clear to me. This also has ramifications for Fig. 2a as the direction to construct the fusion is critical for the experiment.

      The site is a palindrome so lacks directionality. The best prediction re regulation is likely to be positioning with respect to the nearest TSS (which is now included in Table 1). However, this would remain just a prediction and, where TSSs are in odd locations with respect to binding sites (taking into account the caveats above) predictions would be unreliable.

      We are unsure which symbol the reviewer refers to in Table 1, a full explanation of any symbols used is provided in the table footnotes.

      With respect to Figure 2a, if sites were between divergent genes, and met our other criteria, we tested for regulation in both directions. For example, see the divergent genes VCA0662 (classified inactive) and VCA0663 (classified repressed).

      7.  Fig. 2a-It is a little disappointing that the authors use LacZ fusions to measure transcription as this is not the most sensitive reporter gene. Luciferase gene fusions would have been much more sensitive. Also, did the authors examine multiple time points. The methods only describe "mid-log phase" but some of the inactive promoters could be expressed at other time points. Also, it would be simple to repeat this experiment in different media, such as minimal with glucose or another non- CRP carbon source, to expand which promoters are expressed.

      The reviewer is correct regarding the sensitivity of β-galactosidase, which is very stable and so accumulates as cells grow. Even so, this reporter has been used very successfully, across thousands of studies, for decades. We did not examine multiple timepoints. We appreciate that the 23 promoter::lacZ fusions could be re-examined using varying growth conditions but this is unlikely to impact the overall conclusions, though it could generate some new leads for future work.

      8.  Fig. 2a legend-typos

      This has been corrected.

      9.  Line 138-I think you mean Fig. 2a here.

      This has been corrected.

      10.  Fig. 2b and many additional figures quantify band intensity but do not show any replication or error. Therefore, it is impossible to gauge reproducibility of these experiments.

      We have added a reproducibility statement (all experiments were done multiple times with similar results) as is standard throughout the literature. Also note that there is a lot of internal replication between figures. Figure 4d and Figure 5e lanes 1-9 show essentially the same experiment (albeit with slightly different protein concentrations) and very similar results. To the same effect, Figure 5e lanes 10-18 and lanes 19-27 show the same experiment for two different mutations of the same CRP residue. Again, the results are very similar. Also see the response to your comment 15 below.

      11.  Fig. 4a-lanes 2-4-the footprint does not change with additional CRP. In other words, it looks the same at the lowest concentration of CRP versus the highest concentration of CRP. The footprints for HapR look similar. This is somewhat troubling as in these types of experiments one would like to observe a dose dependent change in the footprint correlating with more DNA occupancy.

      For CRP we agree but are not concerned at all by this. The site is simply full occupied at the lowest protein concentration tested. Given that the footprint exactly coincides with a near consensus CRP site (which, when mutated, abolishes CRP binding in EMSAs, and regulation by CRP in vivo) all our results are perfectly consistent. Note that i) our only aim in this experiment was to determine the positions of CRP and HapR binding ii) our conclusions are independently backed up using gel shifts and by making promoter mutations. With respect to HapR, there are changes at the periphery of the main footprint.

      12.  Fig. 4e-Why does the transcriptional activation of murQP decrease with increasing concentrations of CRP? This is also seen in Fig. 5e.

      In our experience, this often does happen when doing in vitro transcription assays (with CRP and many other activators). The anecdotal explanation is that, at higher concentrations, the regulator can start to bind the DNA non-specifically and so interfere with transcription.

      13. The authors demonstrate in vitro that HapR requires binding of CRP to bind the murQP promoter. It would strengthen their model if they demonstrated this in vivo. To do this, the authors only need to repeat their ChIP-Seq experiment in a delta CRP mutant and the HapR signal at murQP would be lost. In fact, such an experiment would experimentally confirm which of the in vivo HapR binding sites are CRP dependent.

      We agree, appreciate the comment, and do plan to do such experiments in the future as a wider assessment of interactions between transcription factors. However, doing this does have substantial time and resource implications that we cannot devote to the project at present. We feel that our overall conclusions, regarding co-operative interactions between HapR and CRP at PmurQP, are well supported by the data already provided. This also seems the overall opinion of the reviewers.

      14.  Fig. 5b-I am confused by the Venn diagram. The text states that "In all cases, the CRP and HapR targets were offset by 1 bp", but the diagram only shows 7 overlapping sites. The authors need to better describe these data.

      We mean that, in all cases where sites overlap, sites are offset by 1 bp (i.e. we didn’t find any sites

      overlapping but offset by 2, 3 4 bp etc).

      15. Line 287-288 and Fig. 5d-The authors state that HapR binds with less affinity to the CRP E55A mutant protein bound to DNA. There does seem to be a difference in the amount of shifted bands at the equivalent concentrations of HapR, but the difference is subtle. In order to make such a conclusion, the authors should show replication of the data and calculate the variability in the results. The authors should also use these data to determine the actual binding affinities of HapR to WT and the E55A mutant CRP, along with error or confidence intervals.

      All of these experiments have been run multiple times and we are absolutely confident of the result. With respect to Figure 5d, this was done many times. We note that not all experiments were exact repeats. E.g. some of the first attempts had fewer HapR concentrations. Even so, the defect in HapR binding to the CRP E55A complex was always evident. The two gels to the left show the final two iterations of this experiment (these are exact repeats). The top image is that shown in Figure 5d. The lower image is an equivalent experiment run a day or so previously. Both clearly show a defect in HapR binding to the CRP E55A complex. We appreciate that our conclusion re these experiments is somewhat qualitative (i.e. that HapR binds the CRP E55A complex less readily) but this is not out of kilter with the vast majority of similar literature and our results are clearly reproducible.

      16.  Fig. 6a-It is odd that the locked low cell density mutants have such a growth defect in MurNAc, minimal glucose, and LB. To my knowledge, such a growth defect is not common with these strains. Perhaps this has to do with the specific growth conditions used here, but I can't find that information in the manuscript (it should be there). Furthermore, the growth rate of the luxO and hapR mutants appears to be similar up to the branch point (i.e. slope of the curve), but the lag phage of the luxO mutant is much longer. The authors need to address these issues in relationship to previous published literature and specify their growth conditions because the results are not consistent with their simple model described in Fig 6b.

      This comment is a little difficult to pick apart as it covers several different issues. We’ll try and

      answer these individually.

      a)     The unusual “biphasic growth curve with hapR and hapRluxO cells: We do not know why cells lacking hapR have a growth curve that appears biphasic. We can only assume that this is due to some regulatory effect of HapR, distinct from the murQP locus. Despite the unusual shape of the growth curve, the data are consistent with our conclusions.

      b)     The extended lag phase of the luxO mutant in minimal media + MurNAc: We appreciate this comment and had considered possible explanations prior to submission. In the end, we left out this speculation but are happy to include it as part of our response. The extended lag phase might be expected if CRP/HapR regulation is largely critical for controlling the basal transcription of murQP. The locus is likely also regulated by the upstream repressor MurR (VC0204) as in E. coli. So, if deprepression of MurR overwhelms the effect of HapR on murQP, we think you would expect that once the cells start growing on MurNAc, the growth rates are unchanged. But the extended lag is due to the fact that it took longer for those cells to achieve the critical threshold of intracellular MurNAc-6-P necessary to drive murR derepression. Obviously, we can not provide a definitive answer.

      c)     We have added further details regarding growth conditions to the methods section and the Figure 6a legend.

      17.  Fig. S6-The data to this point with murPQ suggested a model in which CRP binding then enabled HapR binding. But these EMSA suggest that both situations occur as in some cases, such as VCA0691, HapR binding promotes CRP binding. How does such a result fit with the structural model presented in Fig. 5?

      This is to be expected and is fully consistent with the model. Cooperativity is a two-way street, and each protein will stabilise binding of the other. Clearly, it will not always be the case that the shared DNA site will have a higher affinity for CRP than HapR (as at PmurQP). Depending on the shared site sequence, expected that sometimes HapR will bind “first” and then stabilise binding of CRP.

      18. Line 354-356-The HCD state of V. cholerae occurs in mid-exponential phase and several cell divisions occur before stationary phase and the cessation of growth, at least in normal laboratory conditions. Therefore, there is not support for the argument that QS is a mechanism to redirect cell wall components at HCD because cell wall synthesis is no longer needed.

      We did not intent to suggest cell wall synthesis is not needed at all, rather that there is a reduced need. We made a slight change to the discussion to reflect this.

      19. Line 357-360-Again, as stated in point 16, the statement that cells locked in the HCD are "defective for growth" is an oversimplification. The luxO mutants have a longer lag phage, but they actually outgrow the hapR mutants at higher cell densities and reach the maximum yield much faster.

      In fairness, we do go on to specify that the defect is an extended lag phase. Also see our response above.

      Reviewer #2 (Recommendations For The Authors):

      Comments to improve the text

      1)  Line 103-106, line 130, line 136, etc. Details of the methods and the text directing to presentations of figures should be in the methods and/or figure legends with (Figure x) in citation after the statement. The sentences in lines indicated can be deleted from the results. Although several lines are noted specifically here, this comment should be applied throughout the entire results section.

      We appreciate this comment but would prefer not to make this change (it seems mainly an issue of personal stylistic choice). It is sometimes helpful for the reader to include such information as it avoids them having to cross reference between different parts of the manuscript.

      2)  Line 115. Recommend a paragraph between content on LuxO and HapR (before "Of the 32 peaks for HapR binding")

      We agree and have made this change.

      3)  Line 138 and Figure 1a. I am not convinced this gel shows that VC1375 is activated by HapR. Is the arrow pointing to the wrong band? There does seem to be an induced band lower down.

      We understand this comment as it is a little difficult to see the induced band. This is because this is a compressed area of the gel and the transcript is near to an additional band.

      4)  Line 147. Add the VC0206-VC0207 next to murQP (and the gene name murQP into Table 1).

      We have added the gene name to the figure foot note. The text has been changed as requested.

      5) Methods. It is essential for this paper to have detailed methods on the bacterial growth conditions. Referring to prior paper, bacteria were grown in LB (add composition...is this high salt LB often used for vibrios or low salt LB often used for E. coli). Growth is to "mid log". Please provide the OD at collection. Is mid log really considered "high density". Provide a reference regarding HapR activity at mid log to support the method. Could the earlier collection of bacteria account for missing known HapR regulated genes? In preparing the requested ç, include growth conditions for other experiments in the legends.

      Note that we have included a new supplementary table, rather than a Venn diagram. We have also added further details of growth conditions as mentioned above. Also not that, for the ChIP-seq, HapR and LuxO were expressed ectopically and so uncoupled from the switch between low and high cell density.

      6)  Content of Table 1, HapR Chip-seq peaks, needs to be closely double checked to the collected data as there seems to be some errors. Specifically, VC0880 and VC0882 listed under Chromosome I are most likely VCA0880 (MakD) and VCA0882 (MakB), both known HapR induced genes on Chromosome II with VCA0880 previously validated by EMSA. This notable error suggests the table may have other errors and thus requires a very detailed check to assure its accuracy.

      We appreciate the attention to detail! We have double checked, thankfully this is not an error, the table is correct (even so, we have also checked all other entries in the table). As an aside, VCA0880 is one of the locations for which we see a weak HapR binding signal below our cut-off (included in the new Table S1). In cross checking between Table 1 and all other data in the paper we noticed that we had erroneously included assay data for VC0620 in Figure 2A. This was not one of our ChIP-seq targets but had been assayed at the same time several years ago. This datapoint, which wasn’t related to any other part of the manuscript, has been removed.

      If VCA0880 and VCA0882 are correctly placed on Chr. I, then add comment to text that the Mak toxin genomic island found on Chromosome II in N16961 is on Chr. I in E7946. (See recent references PMID: 30271941, 35435721, 36194176, 34799450).

      See above, this is not an error.

      7)  Alternatively for both comments 8 & 9, are these problems of present/missing genes or misannotations the result of the annotation of E7946 gene names not aligning with gene names of N16961? (if so, in Table 1, please give the gene name as in E7946 but include a separate column with the N16961 name for cross study comparison)

      See above and below, this is not an issue.

      8)  Line 126-127. Also regarding Table 1, please add a column with function gene annotation. For example, VC0916 needs to be identified as vpsU. If function is unknown, type unknown in the column. This will help validate the approach of selecting "HapR target promoters where adjacent coding sequence could be used to predict protein function."

      We added an extra column to Table 1 in response to a separate reviewer request (TSS locations). This leaves no space for any additional columns. Instead, to accommodate the reviewer’s request, we have added alternative gene names to the footnote.

      Not following up on VCA0880 (promoter for the mak operon) is a sad missed opportunity here as it is one of the most strongly upregulated genes by HapR (PMC2677876)

      As noted above, this was not an error and VCA0880 was not one of our 32 HapR targets. As such, we would not have followed this up.

      9)  Figure Legends. Add a unit to the bar graphs in Figure 1e (should be kb??) This has been corrected.

      10) The yellow color text labels in figures 3c, 4a, 4c are difficult to read. Can you use an alternative darker color for CRP.

      We have made this slightly darker (although to our eye it is easily reliable). We haven’t changed the colour too much, for consistency with colour coding elsewhere.

      11) Figure S3. Binding is misspelled. Add units to the x-axis

      This has been corrected.

      12) Figure S7. The text in this figure is too small to read. Figure could be enlarged to full page or text enlarged. Are these 4 the only other known regulated promoters? Could all the known alternative promoters linked to HapR be similarly probed?

      We have increased the font size and included a new Table S1 for all previously proposed HapR sites.

      13) Figure S8. Original images..are any of these the replicate gels (see public comment 6)

      We have added a statement regarding reproducibility, and also note the internal reproducibility between different figures in our reviewer response. The gels in Figure S8 are full uncropped versions of those shown in the main figures.

      Reviewer #3 (Recommendations For The Authors):

      None

      Whilst there were no specific recommendations from this reviewer, we have still responded to the public review and made changes if required.

    1. Author Response:

      Regarding the two main points emphasized by the eLife assessment:

      • Potentially confounding effects of overcrowding: This is indeed an important point, which we avoided, unfortunately without explicitly mentioning it in the manuscript (assuming that it went without saying.) We will point out that our proliferation assays, already part of the original manuscript, indicated that cells were not overcroweded. Nevertheless, we will include additional evidence indicating that our cells were not overcrowded and remained subconfluent.

      • Mechanisms: We will mention even more explicitly than we already did that this is beyond the scope of this story and why that is. As we did say, there are lots of factors directly or indirectly involved in translation that depend on Hsp90. Figuring out which one or which ones it might be is a whole new and totally open-ended project.

      Regarding some of the other public comments:

      • While we did provide quantitative (!) data on changes in cytoplasmic density (e.g. diffusion coefficients, total amount of protein relative to cell size), we will emphasize in the revised manuscript that the changes in cell size, as measured by both flow cytometry and image analyses, are a relative and approximate measure of the 3D changes in cell volume. Although our data on the diffusion coefficients, which report on cytoplasmic density, are directly comparable, our measurements of the amounts of protein relative to cell size (if this is what the comment meant with "cell density") have at least relative value.

      • Results of proteomic data not shown in sufficient detail: We recognize that it is not trivial to "read" the data as presented in the paper (volcano plots, full datasets as an Excel file and through ProteomeXchange). We will add subsets of the proteomic data to the Excel file and include some Gene Ontology analyses.

      • We did demonstrate that Hsf1 most likely acts transcriptionally to promote the observed cell size increase.

      • We acknowledge that a large fraction of our data is "observational", but some experiments clearly go beyond providing correlations. When we manipulate some of the players genetically (KO, knockdown, overexpression) or pharmacologically, we get results that support our conclusions about underlying mechanistic connections.

      • GADD34: This protein is not known to be an Hsp90 client (or interactor), which is also supported by our mass spec data since its steady-state levels don't change in Hsp90α or β KO cells compared to wild-type cells.

      • Non-dividing cells: it would indeed be exciting to determine whether the same phenomena and mechanisms apply to non-dividing cells. However, there are likely to be substantial technical challenges. We would need primary human (or alternatively murine) cells such as B-cells or hepatocytes, and it is difficult to predict whether they would tolerate mild heat stress for several days. It might also be possible to explore this with a mouse model, but clearly, this must be left to future studies.

  3. May 2023
    1. Author Response:

      The following is the authors' response to the original reviews.

      We’d like to take this opportunity to thank the reviewers and editors for their consideration of our work. As detailed below, we have made the majority of the suggested corrections by the reviewers and believe these have greatly improved our manuscript. The reviewer’s comment are in blue font below and our response to each of these in black font.

      Reviewer #1 (Recommendations For The Authors):

      Suggestions to improve the manuscript:

      -  Line 33 and 34: "This protein" is vague. Please reword to state whether you are referring to TcaA or to WTA

      This has been corrected in the revised manuscript (Line 33)

      -  Intro: It would be helpful to provide more rationale for testing serum as a surrogate to whole blood in the GWAS screen. Serum is obviously lacking components of the clotting cascade, and some of these components have antimicrobial functions. However, this is easily justified in the text- e.g. to avoid clumping during the screen, to focus only on serum-derived antimicrobial compounds, etc.

      This has been edited in the revised manuscript (Line 84-86)

      -  Line 120: Please state if the 300 clinical isolates represent 300 distinct patients, or if some of the isolates came from the same patient during sequential collections. If the latter, were there any instances in the which the tcaA SNP appeared during the course of infection?

      They each came from individual patients so we were unfortunately unable to look for within host events. This information has been added to the revised manuscript (line 104).

      -  Line 133: the closed parenthesis sign is missing after "CC22"

      This has been corrected in the revised manuscript (Line 135)

      -  Table 1a - NE1296 is misspelled as ME1296. Also there is a typo in the last entry of this table for the locus tag

      This has been corrected in the revised manuscript.

      -  Table 1b - the authors should comment (in the discussion) on the potential reasons why tcaA was not identified in the CC30 background.

      A comment to this effect has been added to the revised manuscript (Lines 553-59)

      -  Figure 2a - Why is the mutant with the empty complementation vector not significantly different from WT JE2?

      The most widely used and reliable expression plasmid for complementation of mutated phenotypes in S. aureus is the pRMC2 plasmid, which requires chloramphenicol selection and anhydrotetracycline to induce expression of the cloned gene. These antibiotics, and the presence of the plasmid often affect the expression of other genes by the bacteria (as noted by this reviewer). As such, to verify complementation of a mutation the comparison we make is between the strain containing the empty plasmid induced with anhydrotetracycline with a strain with the gene containing plasmid induced with anhydrotetracycline. In that situation, the only difference between those two strains under those conditions is whether the gene is expressed or not. A comment explaining this has been added to the revised manuscript (lines 149-153).

      -  Line 188: Statistical analyses should be applied to figure 3C, which also appears to be underpowered.

      P values have been added to this in the revised manuscript. We present data point of three biological replicates, which are the mean of three technical replicates, which we believe is sufficiently powers for this analysis.

      -  Figure 3 legend - Tecioplanin is mentioned in the title, but the data are not included here

      This legend title has been the revised (Line 193).

      -  Figure 4 - here is an example where testing the actual tcaA SNP could have been enlightening. For example, what if the selective pressure makes the SNP more relevant to a specific AMP or AA?

      While we agree that this would be an interesting experiment to perform, the complementing vector that we would need to use to compare the wild type and SNP contains gene requires antibiotics to select for the plasmid and another to induce expression. As such it becomes quite a complex and messy experiment where synergy between the antimicrobial agents would be likely, the results of which will be difficult to interpret.

      -  Lines 317-321 - Suggest moving this to discussion

      We have left this here as we felt it a necessary summation/explanation of the results described in that section. It is discussed again later in the discussion section.

      -  Line 341 - I believe "serum" should actually be "teicoplanin"

      This has been corrected in the revised manuscript (Line 342).

      -  Figure 6e - wouldn't it be more powerful to determine the WTA levels in the supernatants of these strains and conditions?

      We could have done this both ways, but we focussed here only on how TcaA ligates WTA into the cell wall in the presence of serum.

      -  Figure 6 - What is the explanation for the different growth yields for JE2 in tecioplanin in panel A versus panel F? Are these actually two different concentrations? If so, please update the figure legend and the methods.

      The concentration used for the A was inhibitory and for F sub-inhibitory. To improve the clarity of this we have now used a table displaying the MICs for the six strains as panel A. We have also included the concentration of teicoplanin used for each experiment in the legend.

      -  Line 413: Consider more precise language than "the cell wall is stronger". E.g. More crosslinks?

      This has been edited in the revised manuscript (Line 421)

      -  Line 415: Consider changing "altered" to a directional term such as increases. It can be difficult for the reader to follow the expected change when you are discussing how the lack of a gene versus the presence of a gene changes susceptibility in one direction and another phenotype in the opposite direction.

      This has been edited in the revised manuscript (Line 423).

      -  Figure 7: The conclusions made from panels A and B need to be supported by statistical analyses. It is unclear if these lines are truly different from one another.

      These have been included in the revised fig 7.

      -  Line 426: I believe "tcaA" is missing following "producing"

      This has been corrected in the revised manuscript (Line 434).

      -  Line 446: "increase" to "increases"

      This has been corrected in the revised manuscript (Line 460).

      -  Figure 8C: if one goal of the mouse experiment was to look at survival during transit in whole blood, earlier timepoints are indicated based on the described kinetics of bloodstream dissemination in this model.

      The primary goal of this experiment was to see if TcaA contributed positively or negatively to the development of the infection. Work on this protein is ongoing, and so we hope in coming years to be able to provide more detail on its activity in vivo.

      -  Line 506: "changes to the structural integrity of peptidoglycan" seems overstated without additional studies.

      This has been edited in the revised manuscript (Line 524).

      -  Line 564: "represents" to "represent"

      This has been corrected in the revised manuscript (Line 603).

      -  Line 588: The figures all refer to "100 net". Please confirm the concentration used.

      This has been corrected in the revised manuscript (Line 628).

      -  Line 609: This refers to capsule production? Is this a copy error from a prior paper?

      Yes it is, and has been corrected in the revised manuscript (Line 650).

      - Line 763: Please provide the concentrations of arachidonic acid used for each experiment.

      This has been included in the revised manuscript (Line 805)

      - Line 836 and 837: This mentions a time course for blood culture from the infected mice. Where are these data?

      Apologies, this is another cut and paste mistake from another paper, and had been removed.

      -  Line 870: please discuss how multiple comparisons testing was handled.

      This has been included in the revised manuscript (Line 908).

      -  Supplemental figure 5 - Please add statistical analyses to support the conclusions in the manuscript. For example, there appears to be no differences for dalbavancin. Please also italicize tcaA in the legend.

      These have been included and corrected in the revised manuscript.

      Reviewer #2 (Recommendations For The Authors):

      Line 65 - I would suggest adding the reference (doi: 10.1128/Spectrum.00116-21), which shows increased mortality in S. aureus bacteremia patients due to agr deficient isolates.

      The suggested manuscript shows this effect of Agr dysfunction to be limited to patients with moderate to severe SOFA scores. As such it would require a nuanced description here that we think will detract from the flow of the introduction.

      Line 68 - Please add DOI: 10.1016/j.cmi.2022.03.015 as a reference to support the mortality rate in S. aureus bacteremia. A systematic review and meta-analysis provides the highest level of evidence, and this is a contemporary study performed in 2022

      This has been included in the revised manuscript (Line 68).

      Line 70 - please add supporting reference for this statement

      This has been included in the revised manuscript (Line 70).

      Figure 2 - This image is low quality and appears pixelated. Please revise

      This has been replaced with a higher resolution image in the revised manuscript.

      Figure 3c Also appears slightly pixelated

      This has been replaced with a higher resolution image in the revised manuscript.

      Line 173 - I think it would helpful to mention the catalytic activity encoded by tcaA (aside from mediating sensitivity to glycopeptides) is unknown.

      This has been included in the revised manuscript (Line 174)

      Line 174 - also confers sensitivity to vancomycin https://doi.org/10.1128/AAC.48.6.1953- 1959.2004

      This has been included in the revised manuscript, albeit at a later point than suggested here (Line 406)

      Line 209 - did the authors test any other antimicrobial fatty acids such as palmitoleic acid? If common mechanism would also expect decreased sensitivity to other HDFA

      No, we focused on arachidonic acid as this is the most relevant antimicrobial fatty acid in serum and it is produced by neutrophils and macrophages during the inflammatory burst.

      Figure 4a-D: it would be useful to know what the MIC to these different components is and how that MIC relates to the concentration in human serum

      We do not have MICs for all of these compounds tested here but can confirm that the concentrations used are physiologically relevant.

      Figure 4b - Can you mention in the legend how the killing assays varied for arachadonic acid versus the other AMPs? I am not immediately clear how this experiment was performed, despite referring to methods

      This has been included in the text of revised manuscript (Line 211-213) and the figure legend.

      Figure 5 - there is no panel D

      This has been corrected in the revised manuscript.

      Figure 6a: Lines 328-329 state the experiment was performed in the MIC for each strain. The legend (line 374) states 0.5 ug/ml teicoplanin was used, which is below the MIC for all of the strains tested per supp table 2. Please correct this discrepancy.

      This figure has been revised and the additional information included to improve the clarity of this section in the revised manuscript.

      Figure 6a: On line 328, the authors state that the tcpA knockout increases the MIC for teicoplanin in each background. Figure 6a is performed in the presence of teicoplanin at 1x the MIC of the wild type (which will be below the MIC for the knockout). Therefore, we know each tcpA mutant will be able to grow in the presence of sub-mic concentrations of teicoplanin. Would a more informative way of conveying this information be to have MIC on the Y axis and background on the X axis?

      This has been corrected and clarified in the revised manuscript with a table showing the MICs (fig. 6a).

      Figure 6b-c: Similarly, would it be more helpful to show how the MIC varies with the different clinical isolate tcpA mutants?

      While MICs have uses in clinical setting, they are a relatively crude and binary (growth V no growth) way to measure and compare sensitivity. For these two groups of isolates the MICs did not vary, which is why we used a concentration that sat that the threshold and quantified growth of all the isolates in this. This information has been added to the legend.

      Figure 6e: The figure legends instructs us to refer to supplemental figure 3 to see the densiometry results. However, Figure 6e appears to be 4 conditions (WT and mutant +/- serum) and only examines the cell wall, whereas the supplemental figure refers to two conditions (WT + mutant) and looks at the cell wall and supernatant. I would recommend providing the densitometry data associated with the conditions in figure 6e, especially as differences seem more subtle by eye.

      This has been included in the revised manuscript (fig. 6f)

      Line 689-691 - description of teicoplanin concentrations used in figure 2. However, no teicoplanin was used in figure 2. Assume is referring to a different figure (figure 6?)

      This has been corrected and clarified in the revised manuscript. Line 724.

      Please add a section in the methods describing how the MIC was determined for JE2, SH1000 and Newman. Was it performed in CA-MHB or the media that the experiment in figure 6a was performed in. Serum can alter the MIC of several antibiotics

      This has been corrected and clarified in the revised manuscript. Line 724-29.

      Please add a section to the methods describing the whole blood killing assay, ideally describing how the blood was not frozen and used same day as venipuncture. This is important as freeze/thaw or time periods >12 hours are likely to severely effect the function of phagocytes, especially neutrophils.

      This has been corrected and clarified in the revised manuscript. Lines 635-639

      Line 588: ng/ul should read ng/µl

      This has been corrected in the revised manuscript too ng/ml. Line 628

      Reviewer #3 (Recommendations For The Authors):

      We have now included a graphical abstract (Fig. 9)

      Major:

      1-    Line 102: I was not able to find the accession numbers of these 300 genomes, did the authors submit it to any public repository (e.g. NCBI)?

      These were submitted previously to a public repository and the associated reference cited, but we have provided these in supplementary Table 1.

      Minor:

      1 -    Typo in line 133. Fix parenthesis after CC22.

      Corrected.

      2 -    Typo: Fix figure 5 panels (5e should be 5d).

      Corrected.

      3 -    Line 276: It is not clear why the extract for this experiment was supplemented at 2% while the other part of the experiment was done with 10%. Clarification is needed.

      The experiments at 10% was using overnight supernatant, whereas those with 2% was a purified WTA extract. This has been clarified in the revised manuscript (lines 283 and in the figure legend)

      4 -    Line 278: Typo: Figure 6e should be figure 5d.

      Corrected. (Line 278)

      5 -    Figure 5f: There is no explanation in the text or in the figure legend what the purpose of using mprF was.

      A comment has been included in the figure legend.

      6 -    Line 328: It would be good if we the authors reports the CC of Newman and SH1000 for a better context for the readers.

      This has been added. (Line 332)

      7 -    Line 341: Did the authors mean less sensitive to teicoplanin?

      Corrected. (Line 342)

      8 -    Line 367: Dose dependent effect does not seem to be followed not only in panel H of Supp. Fig. 4(LL37 and EMRDA15) but also panels C, D and G.

      Corrected.

      9 -    Line 587: Typo: Table 2.

      These have all been corrected and/or clarified in the revised manuscript.

    1. Author Response:

      First and foremost, we would like to thank all the editors and reviewers for their thoughtful and thorough evaluations of our manuscript. We greatly appreciate their assessment about the novelty and strength in this study and will revise the manuscript according to their recommendations. Here we offer a provisional response to Reviewer 2 to clarify our rationale for using TH-Cre rather than DAT-Cre mice in our study of frontal cortical dopaminergic projections.

      We agree with Review 2 that the DAT-Cre line can provide specific labeling of midbrain dopamine neurons projecting to the striatum, as discussed in the cited study (Lammel et al., 2015). But unfortunately, mesocortical dopamine neurons in the VTA are known to express very little DAT (Lammel et al., 2008; Li, Qi, Yamaguchi, Wang, & Morales, 2013; Sesack, Hawrylak, Matus, Guido, & Levey, 1998). This limitation in the use of the DAT-Cre line to target mesocortical dopamine neurons has been acknowledged in the cited publication (Lammel et al., 2015). It is an issue we have also observed when testing the DAT-Cre line in our lab. Additionally, and interestingly, recent extensive evaluation of the DAT-Cre line reported ectopic labeling of multiple non-dopaminergic neuronal populations (Papathanou, Dumas, Pettersson, Olson, & Wallen-Mackenzie, 2019; Soden et al., 2016; Stagkourakis et al., 2018). Our own evaluation of the DAT-Cre line’s utility for cortical imaging also captured sporadic ectopic labeling of cortical cell somas.

      Because mesocortical dopamine neurons have stronger TH expression than DAT (Lammel et al., 2008; Lammel et al., 2015; Li et al., 2013; Sesack et al., 1998), TH-Cre lines have been frequently used to study the mesocortical pathway (Ellwood et al., 2017; Gunaydin et al., 2014; Lammel et al., 2012; Lohani, Martig, Deisseroth, Witten, & Moghaddam, 2019; Vander Weele et al., 2018). While TH-Cre expression itself is not restricted to dopaminergic neurons, we targeted our viral injections to the VTA and optogenetic stimulation to the cortical dopaminergic projection target area (Patriarchi et al., 2018) to specifically modulate mesocortical dopaminergic axons. In addition, we tested D1 antagonist’s effects in our manipulations. Although we targeted dopamine neurons in our adolescent stimulation, the final behavioral outcome likely includes contributions from co-released neurotransmitters and non-dopaminergic neurons via network effects. We will revise our discussion and methods sections to clarify these points of interest. Additionally, we will provide DAT-Cre images in the revised supplementary materials to further explain our choice of the TH-Cre line rather than the DAT-Cre line for our study.

      References

      Ellwood, I. T., Patel, T., Wadia, V., Lee, A. T., Liptak, A. T., Bender, K. J., & Sohal, V. S. (2017). Tonic or Phasic Stimulation of Dopaminergic Projections to Prefrontal Cortex Causes Mice to Maintain or Deviate from Previously Learned Behavioral Strategies. J Neurosci, 37(35), 8315-8329. doi:10.1523/JNEUROSCI.1221-17.2017

      Gunaydin, L. A., Grosenick, L., Finkelstein, J. C., Kauvar, I. V., Fenno, L. E., Adhikari, A., ... Deisseroth, K. (2014). Natural neural projection dynamics underlying social behavior. Cell, 157(7), 1535-1551. doi:10.1016/j.cell.2014.05.017

      Lammel, S., Hetzel, A., Haeckel, O., Jones, I., Liss, B., & Roeper, J. (2008). Unique properties of mesoprefrontal neurons within a dual mesocorticolimbic dopamine system. Neuron, 57(5), 760-773. doi:DOI 10.1016/j.neuron.2008.01.022

      Lammel, S., Lim, B. K., Ran, C., Huang, K. W., Betley, M. J., Tye, K. M., ... Malenka, R. C. (2012). Input-specific control of reward and aversion in the ventral tegmental area. Nature, 491(7423), 212-217. doi:10.1038/nature11527

      Lammel, S., Steinberg, E. E., Foldy, C., Wall, N. R., Beier, K., Luo, L., & Malenka, R. C. (2015). Diversity of transgenic mouse models for selective targeting of midbrain dopamine neurons. Neuron, 85(2), 429-438. doi:10.1016/j.neuron.2014.12.036

      Li, X., Qi, J., Yamaguchi, T., Wang, H. L., & Morales, M. (2013). Heterogeneous composition of dopamine neurons of the rat A10 region: molecular evidence for diverse signaling properties. Brain Struct Funct, 218(5), 1159-1176. doi:10.1007/s00429-012-0452-z

      Lohani, S., Martig, A. K., Deisseroth, K., Witten, I. B., & Moghaddam, B. (2019). Dopamine Modulation of Prefrontal Cortex Activity Is Manifold and Operates at Multiple Temporal and Spatial Scales. Cell Rep, 27(1), 99-114 e116. doi:10.1016/j.celrep.2019.03.012

      Papathanou, M., Dumas, S., Pettersson, H., Olson, L., & Wallen-Mackenzie, A. (2019). Off-Target Effects in Transgenic Mice: Characterization of Dopamine Transporter (DAT)-Cre Transgenic Mouse Lines Exposes Multiple Non-Dopaminergic Neuronal Clusters Available for Selective Targeting within Limbic Neurocircuitry. Eneuro, 6(5). doi:10.1523/Eneuro.0198-19.2019

      Patriarchi, T., Cho, J. R., Merten, K., Howe, M. W., Marley, A., Xiong, W. H., ... Tian, L. (2018). Ultrafast neuronal imaging of dopamine dynamics with designed genetically encoded sensors. Science, 360(6396), 1420-+. doi:10.1126/science.aat4422

      Sesack, S. R., Hawrylak, V. A., Matus, C., Guido, M. A., & Levey, A. I. (1998). Dopamine axon varicosities in the prelimbic division of the rat prefrontal cortex exhibit sparse immunoreactivity for the dopamine transporter. J Neurosci, 18(7), 2697-2708. doi:10.1523/JNEUROSCI.18-07-02697.1998

      Soden, M. E., Miller, S. M., Burgeno, L. M., Phillips, P. E. M., Hnasko, T. S., & Zweifel, L. S. (2016). Genetic Isolation of Hypothalamic Neurons that Regulate Context-Specific Male Social Behavior. Cell reports, 16(2), 304-313. doi:10.1016/j.celrep.2016.05.067

      Stagkourakis, S., Spigolon, G., Williams, P., Protzmann, J., Fisone, G., & Broberger, C. (2018). A neural network for intermale aggression to establish social hierarchy. Nat Neurosci, 21(6), 834-842. doi:10.1038/s41593-018-0153-x

      Vander Weele, C. M., Siciliano, C. A., Matthews, G. A., Namburi, P., Izadmehr, E. M., Espinel, I. C., ... Tye, K. M. (2018). Dopamine enhances signal-to-noise ratio in cortical-brainstem encoding of aversive stimuli. Nature, 563(7731), 397-401. doi:10.1038/s41586-018-0682-1

    1. Author Response:

      I appreciate the time and effort of both Reviewers, who have raised important points that I would like to briefly discuss before I start working on a full revision of the paper.

      Generality. First, there is the question of how much these conclusions broadly apply across experimental paradigms and subjects, which could give rise to potentially very different TGMs. As the Reviewers mention, I have focussed on one specific TGM that I assumed prototypical, and it could be that these conclusions fit other TGMs less well. Further, the model has quite a few hyperparameters so that it can flexibly accommodate a broad span of scenarios. This flexibility comes at a price, as pointed out by Reviewer 1: that “a different selection of parameters could lead to similar results”, i.e. that other configurations could fit this specific TGM just as well. This is related to the next point, so I will address them jointly.

      Lack of quantitative evaluation, “making it hard to draw firm conclusions”. Indeed, I have not explicitly quantified the fit of the hyperparameters to this empirical TGM using a specific measure, and (related to the previous point) I have not made a systematic search through the space of model configurations based on such measure.

      There is here a trade-off between generality and specificity. In fact, it is intentional that I did not optimise the hyperparameters to this specific TGM, and that I chose not to show a quantitative measure of fitness. This is because the TGM that I show in the paper is only meant as an example. Instead of focussing on fitting a specific TGM, I aimed at characterising some prominent general features that we often see throughout the literature, which this specific TGM shows in its own specific way. That is, if the paper was meant to focus on a specific paradigm (e.g. passive vision), then the use of a specific metric to fit the model to one or various empirical TGMs would have perhaps been more adequate, but this was not the case here. In future work, when focussing on specific paradigms, I will adapt methods of Bayesian optimisation (Lorenz et al., 2017) for this purpose, as mentioned in the Discussion. Note that doing this right is not trivial and would complicate the paper significantly; for this reason, I feel it should belong to a different piece of work.

      I would also like to note that evaluating the different features of the data one by one (“in a stepwise manner”) was necessary for interpretation. One can loosely think of it as a sort of F-test: one is showing how important a feature is by comparing the full model vs. a nested model that does not have that feature. While the Reviewer is right that there might be interactions between the features that we can only unveil through a joint evaluation, my approach is at least valid as a first approximation. I will discuss this limitation in an updated version of the paper in more detail.

      In a future revision of the paper, I will argue more specifically why and how these model configurations are, in general terms, necessary to produce these main effects in the TGM, and why other alternative configurations could not easily generate them.

      Practical guidelines for researchers. It was suggested to make it clearer how researchers could leverage this model in their own studies to understand their data better and to help relating their TGMs to specific neurobiological mechanisms.

      In a future revision of the paper, I will introduce a new section explaining how to use genephys practically, emphasising both opportunities and current limitations.

      Neurobiological interpretation. It was criticised that the results were a mere characterisation of sensor space data, and that these were not related clearly to any neurobiological aspect.

      In a future revision, I will work toward relating the main findings to existing literature in order to strengthen the neurobiological interpretation of the results, and toward a better justification of how genephys can help shed light on specific brain mechanisms.

      Above and beyond these specific points, I intend to restructure the text so that the main goals of the study become clearer. This includes clarifying in the Introduction more unambiguously what is the gap of knowledge this work is specifically tackling.

      Again, I would like to thank the Reviewers for helping me realise the limitations of the current version of the paper.

    1. Author Response:

      The following is the authors' response to the original reviews.

      In brief, we incorporated all wording and clarity suggestions into the manuscript. We also updated figure legends to include additional details, including replicate numbers. New data have been added in response to requests from the reviewers. Volumetric intake data are included as a supplemental figure (Figure 1–Figure Supplement 1A) and we will include movies of the confocal stacks from our CaMPARI imaging. We worked hard to address all the reviewers’ concerns and provide a detailed response below to the reviewers’ public comments as well as their author-specific comments.

      Reviewer #1 (Public Review):

      1) All feeding data presented in the manuscript are from the interactions of individual flies with a source of liquid food, where interaction is defined as 'physical contact of a specific duration.' It would be helpful to approach the measurement of feeding from multiple angles to form the notion of hedonic feeding since the debate around hedonic feeding in Drosophila has been ongoing for some time and remains controversial. One possibility would be to measure food intake volumetrically in addition to food interaction patterns and durations (e.g. via the modified CAFE assay used by Ja).

      We acknowledge that our FLIC assays address only one dimension of feeding behavior, physical interaction with liquid food. However, there is clear evidence that interactions are strongly predictive of consumption, and it would be technically difficult to measure feeding durations at the resolution of milliseconds using a Café assay.  Nevertheless, we appreciate the spirit of this comment and agree that expanding our inference to other measures of feeding, as well as feeding environments, is an important next step. To this end, we now include measures of feeding on more traditional solid food, using the ConEx assay, and find that flies in the hedonic environment consume twice as much sucrose volume compared to flies in the control environment. These have been added as supplemental data (Figure 1 – Figure Supplement 1A), and the text has been updated to reflect our findings.

      2) Some of the statistical analyses were presented in a way that may make understanding the data unnecessarily difficult for readers. Examples include:

      a) In Table I the authors present food interaction classifications based on direct observation. These are helpful. However, the classification system is updated or incompletely used as the manuscript progresses, most importantly changing from four categories with seven total subcategories to three categories and no subcategories. In subsequent data analyses, only one or two of these categories are assessed. It would be helpful, especially when moving from direct observation to automated categorization, to quantify the exact correspondences between all of the prior and new classifications, as well as elaborate on the types of data that are being excluded.

      We appreciate the feedback on our usage of the behavioral classification system and have made several adjustments to improve it. We renamed some of the behaviors to make them more intuitive (see Reviewer #2, comment #1), and updated the main text and Table 1 to reflect these changes. We updated the text and figures to be more transparent about when we group subcategories into main categories for quantification and when we quantify all subcategories separately. Because these videos required manual scoring by an experimenter, after our initial characterizations we opted to score only main categories (which contain subcategories). We agree that it would be useful to quantify correspondence between subcategories and the automated FLIC signal. However, we believe this task is better suited for more advanced and automated video tracking software, and, incidentally, more sophisticated analysis of FLIC data, which has a very high-dimensional character that has yet to be properly exploited. At the moment, therefore, we are not confident in the ability to understand the data at the desired resolution.

      b) The authors switch between a variety of biological and physiological conditions with varying assays, which makes following the train of reasoning nearly impossible to follow. For example, the authors introduce us to circadian aspects of feeding behavior to introduce the concept of 'meal' and 'non-meal' periods of the day. It is then not clear in which of the subsequent experiments this paradigm is used to measure food interactions. Is it the majority of the subsequent figure panels? However, the authors also use starved flies for some assays, which would be incompatible with circadian-locked meals. The somewhat random and incompletely reported use of males and females, which the authors show behave differently, also makes the results more difficult to parse. Finally, the authors are comparing within-fly for the 'control environment' and between flies for their 'hedonic environment' (Figure 3A and subsequent panels), which I believe is not a good thing to do.

      We apologize for our difficulties conveying our inference, which was also noted by Reviewer #2.  We have worked hard to improve this component in the revision. With respect to the confusion about circadian feeding, we introduced circadian meal-times to complement starvation as a second (perhaps more natural) way to measure behaviors associated with hunger. Importantly, we do not use circadian meal-times beyond Figure 1; all subsequent FLIC experiments were conducted during non-meal times of day for 6 hours, which avoids confounding our data with circadian-locked meals even when we use starved flies. We have clarified this point in the revision.

      The reviewer also points out that we make both within-fly and between-fly comparisons, which is a point that we note. Perhaps some concern arises, again, from the challenges that we faced in properly delineating our inferences about different types of feeding measures (and motivations). Inference about homeostatic feeding was made using within-fly measures, comparing events on sucrose vs. those on yeast.  Inference about hedonic feeding was made using between fly measures (average durations of different flies on 2% vs. 20% sucrose).  Treatment comparisons to control always used measures of the same type, such that inference was not made using between-fly measures for treatment and within-fly for control (i.e., all of our figure panels were either within-fly or between fly). We have worked to clarify this in the revision.

      Importantly, our approach to all experiments avoided confounding by used randomized design at multiple levels (e.g., randomizing control and hedonic environments to FLIC DFMs, alternating food choice sidedness in the DFMs), by ensuring that flies in both environments are sibling flies that came from the same vial environment before being tested, and by performing each experiment multiple times.

      c) Statistical analyses are not always used consistently. For example, in Figures 3B and C, post hoc test results are shown for sucrose vs. yeast interactions, but no such statistics are given for 3E and 3F, preventing readers from assessing if the assay design is measuring what the authors tell us it is measuring.

      We report p-values for two-way ANOVA interaction terms for all appropriate experiments. If (and only if) the interaction term is significant, we conduct post-hoc tests for more detailed statistical analysis and report the p-values. The reviewer points out that we do not perform post-hoc tests in figures 3E and 3F. These figures had a non-significant interaction term, and thus, we did not feel a post-hoc test was warranted.

      Reviewer #2 (Public Review):

      1) The dissection of feeding into distinct behavioral elements and its correlation with electrical FLIC signals that allow interpreting feeding types is a fundamental new method to dissect feeding in flies. However, the categories of micro-behaviors in Table 1 are not intuitive.

      We agree and have updated the Table, figures, and main text. Please see also our response to Reviewer #1, comment #1.

      2) The details for the behavioral data analysis are not clear and should be made more obvious. For example, how many males and females were used in each experiment? Were any of the females mated or were they all virgins? If all virgins, why not use mated females? Mating status may have an effect on the feeding drive. If mated and virgin females were used, are there any differences between them? Similarly, for diurnal feeding experiments, it is not immediately clear from the graphs how many animals were used and how the frequencies were obtained (Fig. 1F, presumably averages for each category per fly but that is inconsistent with the legend in the supplement for this figure). Why does the transition heat map not include all micro-behaviors (Fig. 1E, no LQ data which are significant in diurnal feeding)?

      We have clarified the number of flies and events for each behavioral experiment in Figure 1, and we updated the figure legend appropriately. We note that these behavioral datasets are non-overlapping, and each time we mention the number of events scored in the text, that number includes only “new” videos. Female and male flies for all experiments were mated, and we have clarified this in the main text and methods.

      For the diurnal experiment in Figure 1F, we scored over 700 events from new (non-overlapping) video compilations and updated the number of flies and event number in the figure legend. The diurnal data we present in the supplement for this figure is a separate experiment conducted on 38 flies, intended only to demonstrate the circadian nature of fly feeding.

      For the transition heat map, analysis of this sort seems to require a large amount of data to have sufficient power to return a transition matrix. LQ events are relatively low in frequency, so we opted to combine them with L events for this analysis. We have updated the figure and figure legend to reflect this.

      3) The CaMPARI images do not look great, particularly in the pan-neuronal condition (Fig. 5A). It would be useful to include the movie of the stack. Did any other brain regions show activity differences, such as SEZ or PI? These regions are known to be involved in feeding so it seems surprising they show no effect.

      We find that CaMPARI imaging is subject to high levels of noise and background, especially when using a broad driver as the reviewer has pointed out. This is why we opted to follow-up our pan-neuronal CaMPARI experiment using a more specific mushroom body driver and to test our correlational findings of increased MB activity in hedonic environments with genetic approaches in the remainder of Figure 5. We have included movies of the confocal stacks for both CaMPARI experiments, as requested. 

      Reviewer #1 (Recommendations For The Authors): 

      Main concern: 

      No measurements of intake, either in volume or in caloric value. Hence, 'hedonic' feeding is only indirectly supported. 

      I would like to suggest to the authors that they measure intake volumetrically in addition to food interaction patterns and durations. For example, William Ja developed a modified CAFE assay that measures consumption volume in real-time in freely behaving flies (http://dx.doi.org/10.1038/nprot.2017.096). Liming Wang has another capable assay. Additional values of expanding measurement methods for feeding are that it helps tie the research more directly to that of others, and it helps remove the concern that any one assay may introduce unknown biases. 

      For the CaMPARI, it would be helpful to provide a demonstration of its effectiveness by recapitulating a deep brain neural pathway known to be engaged by a stimulus by GCaMP or electrophysiology. 

      Additional concern: 

      The authors assume satiety states during different circadian periods (line 253, for example). It seems critical to directly measure the satiety state. 

      Technical concerns: 

      Figure 5 A, B: there is reported near zero UV transmission through the head: https://doi.org/10.1364%2FBOE.6.000514, hence the CaMPARI measurements are suspect. It appears that there may be an effect in the optic lobes that may receive greater UV illumination by being more peripheral. A positive control to demonstrate deep brain access by UV is needed. 

      Y-axes vary for the same measurement types within figures, for example, Figure 5 C-G. Also Figures 3F, G, I, K, M and Figures 3D, E, H, J, L. This hinders direct comparisons. 

      Figure 2: why are there no statistics to distinguish interaction (I) events from F and L? Why are the example graphs presented using different scale x-axes? For A-C, why no averaged response graphs for the classifications? Were there other events that did not fit these classifications? 

      In lines 224-226, the claim of statistical significance at p=0.061 makes the reader suspicious of the statistical interpretations throughout the manuscript. 

      Figure 3B starved looks the same as Figure 3C sated for females, using the same assay and conditions. This implies a huge amount of variance in behavior between experiments. 

      We appreciate the recommendations from Reviewer #1 and have done our best to address many of their concerns. Regarding their main concerns, we have added volumetric feeding data to the manuscript, included movies of the confocal stacks for the CaMPARI experiments, and clarified the circadian timing of our behavioral experiments. These details are outlined in our public response to both reviewers. The reviewer also expressed a few technical concerns, mostly regarding statistical analyses. We agree that there seems to be a large amount of biological variability between experiments, which we do indeed find to be the case with behavioral experiments of this sort. For this reason, we avoid making direct comparisons on absolute values between experiments, as the reviewer suggests, and thus allow our Y-axes to vary for each figure to better facilitate within-experiment comparisons. The reviewer also points out that, in one instance, we refer to a p-value of 0.061 as statistically significant in the text. While we have changed our language to reflect the perceived convention, we note that there is little inferential difference between these values, and we report exact p-values to allow the reader to make an informed decision.

      Reviewer #2 (Recommendations For The Authors): The writing and data presentation in this paper is somewhat dense and confusing at times. Comments and questions below are intended to help improve data presentation and resolve questions that will help the reader navigate and understand the data to better appreciate the significance of the findings. 

      Comments and questions: 

      Line 160 cites Chen et al, 2002 as an example of behavioral characterization that is useful for read-outs of neural states, but no neural states were defined in that work. A better example where a circuit was linked to a specific behavioral category is PMID30415997 (Duistermars et al., 2018). 

      Line 171: were the females mated or virgin or was it variable? 

      The classification system in Table 1 is a bit confusing. For example, the distinction is made between Fast and Long feeding events as well as interactions with food and other events. FH meet the requirements of F and H, presumably meaning that flies are fast feeding and touching the food with their front legs. Why are front legs and hind legs touching food abbreviated H and FF respectively instead of something more obvious like IF and IH (referring to Interaction with Front legs or Interaction with Hind legs)? 

      Also was there never any tasting with the middle legs? In Fig1B, all the I events are grouped. Are most of these H or FF events? The frequency in Fig. 1B is shown as normalized as a frequency of all events. The statistical analyses are all parametric. Are these data normally distributed? 

      Lines 224-229: the relative frequency of L-type feeding is increased in starved flies and the relative frequency of F feeding is decreased. Is the relative L- or F-type feeding frequency considered on total behavior or just the sum of long and fast feeding or the sum of all types of feeding? 

      The events that are analyzed vary throughout the paper. Line 173 mentions 300 events, line 222, 500 events, and line 257, 700 feeding events. Are these all independent experiments, or are these overlapping data sets analyzed for different parameters? 

      For diurnal feeding behavior, the authors analyzed 700 events and found significantly more LQ events during meal time (i.e. at the beginning and end of the day). Based on the figure legend in the supplement to Figure 1, it appears that these data were collected on 38 female flies. But in Fig 1F, there are ~8 points per feeding type (F, L, and LQ) during meal and non-meal conditions. Shouldn't all 38 flies have an average frequency for each type of feeding during meal and non-meal times? Were these females mated or not? Is this effect also true for males? To help the reader understand the data better, it would be helpful to note the number of flies used in each experiment or in each analysis in the different figures and wherever the data are mentioned in the manuscript. It also seems likely that the mating state may have an effect on feeding so knowing the result in mated versus unmated would be a useful analysis. 

      It is interesting that there is a difference in feeding in starved flies versus diurnal feeding in the presumably hungry versus sated phase (meal versus non-meal phases). As mentioned by the authors earlier in the manuscript, starved flies have a relative increase in L-type feeding. However, they perform less LQ feeding than sated flies, and yet LQ feeding is the only significantly different type of feeding in the hungry state of diurnal feeding. In the morning, the transition to feeding is very abrupt compared to the gradual increase in the evening. Is there any difference between the type of feeding or the transition matrix in the evening versus morning meal times? Also, why is LQ feeding not included as a category in the transition matrix in Fig 1E? 

      In Fig 2, the authors examine FLIC signals with video data to identify feeding types from FLIC signals. Why are there signal durations for F-type feeding that are longer than 3 seconds when it is defined as 1-3 sec of the proboscis contact with food and conversely signals of L-type feeding shorter than 4 seconds when it is defined as >4 seconds of continuous proboscis contact? Does this mean that signal can be longer or shorter than the actual time the proboscis is in the food? 

      With these parameters, the authors develop an assay to identify homeostatic and hedonic feeding by applying the signal analysis to food choices representing homeostatic (2% sucrose versus yeast) and hedonic (2% sucrose versus 20%) conditions. In Fig 3C, they show that fully-fed females show a stronger preference for yeast food than sugar food compared to males (line 335). Is this in fully fed animals? The yeast preference in females looks almost the same as in the starved females in Fig 3B. 

      The CaMPARI images shown in Fig 5A (and to a lesser extent Fig 5B) are not particularly convincing although the quantification looks clear. Providing the movies of the stacks may help the reader better appreciate the difference in MB red signal in the hedonic state. It would also help to show the number of flies that were tested in these experiments as well as the sex and mating status. Provide the n in the figure legend and in the relevant sections in the text. 

      Were the mushroom bodies the only brain region with significant, measurable activity changes? One might expect changes in other feeding areas, such as the subesophageal zone (SEZ) and the peptidergic regions of the brain (PI), which are both known to affect feeding in flies. This may also be a useful method to examine differences in mated versus unmated flies. 

      In Fig 5C the caption reads MB lambda lobe inhibition. Shouldn't this be gamma lobe inhibition as suggested in the figure legend? 

      The paper largely distinguishes homeostatic from hedonic feeding only. It may be useful to discuss other non-homeostatic mechanisms as well or at least make the distinction in the introduction and or discussion.

      We thank reviewer #2 for their thoughtful suggestions to improve the clarity of the manuscript. They suggest several improvements, which we implemented, including that we improve the classification system in Table 1 to make it more intuitive, state how we normalized observed behavioral frequencies, clarify that the number of events we cite for each experiment are non-overlapping, and explain the use of circadian meal vs. non-meal times. We also noticed, as did this reviewer, that the usage of L vs. LQ events differs between starved flies and flies observed during meal-time. We agree that it may be interesting to sort out the nuances of why and how these differences occur, as it suggests that starvation may in some ways be different from physiological hunger. However, our method of manually observing flies would make this difficult at present. We hope to utilize more advanced video tracking software in the future to investigate this question. The reviewer also posed several questions about the hunger/satiety state of flies that we used for each experiment, which we clarified throughout the main text, figure legends, and methods.

      This reviewer points out two technical concerns, which we have addressed. The concerns about our CaMPARI imaging are noted, and we have discussed them in response to reviewer #1 and in our public response. We now include movies of the confocal stacks, as requested. There was also a question about FLIC durations of F and L events in Figure 2, with some visually identified F events producing FLIC signals longer than 4 seconds and some L events producing FLIC signals shorter than 4 seconds. Although we show that population averages from the FLIC can reliably recapitulate our visual metrics, there is occasional noise at the individual level. For example, although a fly may have contact of its proboscis with the food for less than 4 seconds, the FLIC signal may persist slightly beyond that interaction due to sustained contact with a non-proboscis body part or due to liquid food contacting the signal pad. We also occasionally observed L events that we visually identified to last longer than 4 seconds, but nevertheless did not produce a FLIC signal of equal length. This can occur when a fly feeds on the liquid food but transiently loses contact with the signal pad. Although there is some noted technical noise, we show that population-level data is sufficient to reflect our visual observations.

    1. Author Response

      Reviewer 1:

      The reviewer indicated the data convincingly demonstrates absence of Perlecan causes a severe perturbation of the ECM-based neural lamella, that synaptic terminals degenerate, and that axons and even entire nerve bundles break. The reviewer noted that future studies will be important to define the precise source of Perlecan and the underlying mechanism for axonal breakage, and suggested several follow-up experiments. We address these comments below.

      1. The reviewer noted our data indicate Perlecan’s role in synaptic retraction is not due to its absence from neurons and that some of the wording is confusing in this regard.

      We’ve tried to make it clear throughout the manuscript that Perlecan functions non-cell autonomously, as our failure to rescue with neuronal re-expression or recapitulate the phenotype with neuronal-only RNAi indicates. As such, we agree that the phenotypes are not due to Perlecan loss within neurons, consistent with our data showing breakdown of the neural lamella ECM and subsequent axonal breakage. These phenotypes do manifest in neurons, but the defect is triggered non-cell autonomously as described in our study and stated by the reviewer here.

      1. The reviewer suggested future experiments to resolve the source(s) of Perlecan secretion from defined tissues that control neuronal stability, noting that showing ubiquitous rescue with a pan-cellular Gal4 driver would be useful.

      We did do pan-cellular rescue and overexpression experiments with the ubiquitous Tubulin-Gal4 driver, but expression of our two UAS-trol transgenes with this strong driver resulted in lethality. This observation indicates too much Perlecan expression is also detrimental for ECM function. Interestingly, we found that NMJ synapses do not retract following ubiquitous Perlecan overexpression in wildtype larvae, so another aspect of ECM dysfunction is responsible for lethality under this condition. As reported in the manuscript, we found driving a Trol RNAi with multiple Gal4 lines expressed in specific cell populations was unable to recapitulate the synaptic retraction phenotype, including pan-neuronal (elavC155), neuronal and muscle (elavC155 and mef2-Gal4), glial (repo-Gal4), fat body (ppl-Gal4, Lsp2-Gal4), hemocytes (Hml-Gal4), and fat body and hemocytes (c564-Gal4) driven expression. These data suggest Perlecan secretion is required by multiple cell types to achieve sufficient accumulation in the ECM to prevent neuronal instability.

      1. The reviewer indicates future studies of the blood-brain barrier might reveal insights into the pathology and axonal breakage we observe. The reviewer also suggests we perform a detailed timeline of the axonal breakage timeline.

      We agree with the reviewer that examination of the blood-brain barrier and glial dysfunction will be exciting experiments for future studies. For the phenotypic timeline, this was an important component of our study and was done in two ways and described in the manuscript. In Figure 4, we describe serial in vivo imaging of synapses with briefly anesthetized larvae over 4 full days of imaging. In Figure 9, we describe fixed imaging of larval axons at specific developmental stages (2nd, early 3rd, wandering 3rd instar). This set of experiments provided a detailed timeline for synaptic retraction and axonal breakage. As suggested, we also used single neuron drivers (MN1-Ib) to label a single motoneuron and examine axonal breakage and synaptic retraction at this scale. This data is shown in Figure 9E. Together, these experiments provided a timeline for the biology we observe – disruptions of the neural lamella ECM, disorganization of the axonal microtubule cytoskeleton, followed by axonal breakage and fragmentation (usually in a hemi-segment coordinated manner), with subsequent synaptic retraction at NMJs.

      1. The reviewer indicates the final model in Figure 10 may not be fully representative.

      We feel this model best describes our complete dataset on the Trol mutant. We provide evidence for each of these phenotypic events in detail in the paper. The disruptions to the neural lamella are described in Figure 8. The onset of synaptic retraction does occur in the 3rd instar stage and not the 2nd instar stage – Figure 4 shows this with serial in vivo imaging where we see normal synaptic morphology on Day 1 (2nd instar stage) and degeneration over the 3rd instar period (Days 2-4). The figure does not indicate Perlecan functions for synaptic stability by residing at the NMJ, only that synaptic retraction occurs. Indeed, as stated in the text, we argue against a role for Perlecan function directly at the NMJ for the phenotypes we describe, but rather as a downstream consequence of ECM disruption and following axonal breakage.

      Reviewer 2:

      The reviewer noted the work provided a strong and thorough genetic analysis of the role of Perlecan in neuronal stability and axonal retraction. The reviewer provided some suggestions for future experiments and requested a few clarifications.

      1. The reviewer wondered whether mutations in other neural lamella components also cause synaptic retraction and potential genetic interactions between Trol and Vkg.

      We agree further genetic studies of other neural lamella components will be of interest. In the case of Vkg, null mutations in the locus result in embryonic lethality, suggesting it plays a more critical role in overall ECM function. Although we did not perform genetic interaction studies between the two mutants (for example trans-heterozygotes), they have been shown to interact in multiple other contexts as described in the manuscript.

      1. The reviewer noted the lack of whole animal Trol rescue.

      As described in point #2 above, we did do pan-cellular rescue experiments with the ubiquitous driver Tubulin-Gal4, but driving our two UAS-trol transgenes resulted in lethality, indicating a strong-dosage sensitivity to Perlecan function.

      1. The reviewer indicated the hyperactive Mhc mutant was an interesting experiment but only examines one alternative. They wondered if we could reduce muscle contraction and see if that "rescues" the trol phenotype. The Mhc1 null mutant is embryonic lethal, and the retraction phenotypes do not occur until the 3rd instar stage, so that experiment would not be possible. However, we did attempt to block muscle contraction by expressing a UAS-tetanus toxin to eliminate evoked neurotransmitter release with our MN1-Ib Gal4 driver (pan-neuronal expression of tetanus is lethal). This did not alter the synaptic retraction phenotype, but it was difficult to make strong conclusions for this experiment as the co-innervating Is motoneuron was not expressing tetanus toxin. As such, we did not include this data in the manuscript, though it does generally support the model that synaptic retraction is independent of muscle contraction and rather occurs downstream of the axonal breakage that we highlight.

      2. The reviewer wondered whether other Wnt signaling manipulations might be useful to test interactions with the Trol retraction phenotype.

      Given we used the same Sgg-CA that was used to block the previously reported ghost bouton phenotype in Trol mutants and saw no effect on retraction, we did not feel that was a fruitful pathway to keep pushing on. Indeed, all our evidence point to a non-Wnt role, with neural lamella disruption and axonal breakage being the key insults.

      Reviewer 3:

      The reviewer indicated the work described an interesting and important role for Perlecan in motor neuron axon maintenance. The reviewer suggested experiments to elucidate the mechanism of action of Perlecan would benefit the study.

      1. The reviewer indicated it would be beneficial to validate the Wnt and Wallerian degeneration transgenic lines used in the study to provide a positive control.

      Our study used previously published and well-established Sarm RNAi and Sgg-CA transgenic lines (Sarm RNAi from the DiAntonio lab) and Sgg-CA from Kamimura et al., 2013, via BDSC) that have been published multiple times and are well-validated in the field. These were not new lines that we generated. We also blocked Wallerian degeneration with a number of other perturbations to the pathway and did not see rescue of synaptic retraction in these cases either. Sarm is an upstream pathway component and thus the manipulation we included in the manuscript.

      1. The reviewer notes similar questions on cell-autonomy that we addressed in point 2 to Reviewers 1 and 2 above.

      The reviewer noted it would be helpful to show that the single cell-type RNAi experiments are working by western blotting for Perlecan. We performed a similar approach by examining knockdown of the endogenous Trol-GFP by the RNAi with immunostaining. Pan-cellular knockdown with Tubulin-Gal4 eliminates the staining (validating the RNAi line, Figure 1D-I), while knockdown with the individual drivers does not (Figure 5C-G). Although we used well-established cell-type specific Gal4 drivers that have been used to many other studies, we cannot eliminate strength of expression of the driver as an issue for failure to recapitulate the phenotypes. However, other experiments we performed and presented in the figures supports a non-cell autonomous role for Perlecan in axonal breakage and synaptic retraction.

      1. The reviewer suggested a similar approach that Reviewer 2 did above in point 3 about the role of muscle contraction.

      We agree eliminating muscle contraction altogether would be a nice assay for the role of mechanical stress, but we don’t have muscle specific drivers to eliminate contraction from only a single muscle (eliminating it everywhere is lethal). However, we did attempt to block muscle contraction by expressing a UAS-tetanus toxin to block evoked neurotransmitter release with our MN1-Ib Gal4 driver as described above. Future experiments with the newly described BoNT-C toxin produced by the Dickman lab might be a promising approach for a full elimination of all motoneuron release to achieve a similar effect and test in the Trol mutant.

      1. The reviewer wondered what other components of the ECM are affected beyond Vkg in the Trol mutant.

      This is an exciting question to pursue in future studies. Together with genetic interaction experiments with other ECM components, as well as a detailed analysis of the effects on glia that surround larval nerves, such studies will further refine mechanistic actions on how loss of Perlecan triggers axonal breakage and downstream synaptic retraction.

    1. Author Response

      Reviewer #1 (Public Review):

      In the present study, Yasuko Isoe, Ryohei Nakamura & colleagues follow a lineage analysis study aiming at identifying the clonal organization of the dorsal telencephalon. The authors use the teleost fish medaka to conduct their experiments since it displays a clearly delineated dorsal pallium. After identifying the clonal units that constitute the dorsal telencephalon, they analyze the epigenetic landscape in each unit. The authors identify then differential open chromatin patterns that they relate to functional aspects of each unit, and additionally, use the epigenetic landscape to infer the identity of transcription factors operating as putative regulators. Overall, the study consists of an impressive amount of data that shed light on the organization of a central brain region in vertebrates.

      The findings in the manuscript are organized into two main sections: lineage analysis and epigenetic organization. The authors combine genetic tools with laser dissections of specific clones and ATAC-seq and RNA-seq analysis in multiple samples, an approach that is very elegant and follows high technical standards. For lineage analysis, the authors used a basic, but appropriate, tool to induce and follow clones generated in early embryos, with the side note that lineages are followed using a non-ubiquitous promoter so that the authors restrict their analysis to neural progenitors. My overall impression is that the authors have collected a massive amount of high-quality data, which unfortunately is not properly integrated or discussed in the manuscript. There is only a superficial effort in incorporating the two main findings, which contrasts with the depth of acquired data.

      The observation of clonal sectors in the pallium is a great finding that deserves a more detailed analysis in terms of their developmental and evolutionary origin: How many progenitors are used to set up the entire pallium? What is the smallest clone that contributes to it? Is there any laterality bias in the clonal architecture?

      Thank you for the question. We interpret the first question as, “how many neural progenitors (or neural stem cells) at the early developmental stage contribute to the adult pallium?”. Based on the number of clonal units visualized in the pallium, we assume that there are around 50 neural stem cells at the neurula stage that provide cells in the pallium.

      In terms of the smallest clone, we found a dozen of cells in the anterior lateral pallium region (Dla) as the smallest clone. But since the HuC promoter activity is not strong in Dla (shown in Figure 1 – figure supplement 2B), we didn’t observe the clones in a reproducible way, so we removed the clones in Dla from the comprehensive structural analysis. The second smallest clone is the cells in the Dcpm, in which only a few dozens of cells were labeled at once.

      And for the last question, we didn’t find any lateral bias in the clonal architecture in the telencephalon (shown in Figure 1- figure supplement 3A, 3C)

      We added the explanation above in the revised manuscript. (page 29, line 591 - 595)

      Is the clonal architecture exclusive for progenitors or does it extend to neurons as well?

      Though we used HuC promoter to visualize the clones which should label the neural progenitors, we observed long axonal projections from Dp to the olfactory bulb, which suggest that this transgenic line labels both neural progenitors and young mature neurons, at least in some brain regions. So yes, we assume this clonal architecture extends to neurons as well, and we added descriptions to the revised manuscript. (page 10, line 205-207)

      How has the clonal architecture impacted the morphological diversity of the pallium among teleosts? What are possible evolutionary paths to explain this phenomenon? The authors' discussion on this point circles around one concept, illustrated in the following sentence: " (The clonal architecture) ... possibly explains how the difference in diversity between the pallium and subpallium has emerged: the subpallium is conserved because cells belong to various clonal units intertwined with each other, which has constrained their modification during evolution; whereas the pallium is diverse because of the modular nature of the clonal units which allows for the emergence of diversity". This is the concept that I have the most problems with. The authors' reason that a more defined clonal structure (pallium) makes a system more prone to evolutionary novelties, while a region where clones intermingle (subpallium) is more rigid and therefore more conserved between species. Is there experimental data that backs up this statement in any other systems? If there is, I urge the authors to share these here. If this is just a speculation, then the argument would benefit from an explanation of how this clonal organization allows for evolutionary novelty.

      We appreciate the reviewer’s question. In order to make our point, we added the following paragraph to the revised manuscript,

      “Our structural analysis in the adult medaka telencephalon revealed that the clonal architecture between the pallium and subpallium differs in the distribution of cells in clonal units: clonal units in the subpallium intertwine with each other, whereas the pallium is formed by the compartmentalized clonal units, giving rise to a modular structure. Modular structure is frequently seen in the animal body, including brain; central complex in insect 40, cerebellum in vertebrates 41. And the modularity of cell populations or organs is generally thought to contribute to evolutionary flexibility; one module can acquire a new phenotype without impacting the others.42, 43, 44 . We assume that the modular nature of the clonal units in the pallium plays a key role in the diversity across teleost.” (page 23, line 448-452)

      Would it happen by the appearance of more clones at the early stages of development? The authors leave this central point untouched even when discussing the evolutionary origin of the pallium in teleosts.

      Thank you for the comment. As shown in the previous report, when the Cre-loxp recombination was induced at the early developmental stage, a wider expression of GFP is observed across the whole brain (Okuyama et al. 2013). This suggests that the neural stem cells at the earlier developmental stage generate daughter neural stem cells which produce neural progenitors later. We added a few sentences mentioning this in the revised manuscript. (page 7, line 146-149)

      Having shown the clonal architecture of the pallium and conducted a detailed epigenetic analysis in clones, the authors could also speculate on what is special about this type of organization. Particularly, how they envision that cells belonging to the same clone inherit a common epigenetic landscape that will define their function later on.

      Thank you for the comment. To explain the epigenetic feature of this pallial organization, we added the following paragraph in the revised manuscript.

      “As shown in mammals, the epigenetic landscape can be inherited from apical progenitors, which have a multipotency, to the late neural progenitors during development 37. Since the teleost exhibit post-hatch neurogenesis in the entire life, we think that the common epigenetic landscape is inherited in each clonal unit in the adult medaka telencephalon. And as a result, we make the assumption that function and characteristic of each clonal unit is defined already in progenitors by specific regulators (e.g. TFs), and those progenitors continuously produce neurons that possess the same property to function in a coordinated manner.” (page 22, line 433-439)

      There is little analysis of the cellular organization of each clone, mainly because the authors labeled only a subset of the real, genetic clone. The authors present images of entire brains and optical horizontal and transverse sections, which largely sustain their claims for a clonal organization. The difference in the clonal arrangements between the Dld and the Vd is clear, but the authors could provide a higher-resolution image of some clones in the telencephalon to get an idea of the cellular composition of the regions they use for their analysis.

      Here, we added a new panel in Figure 2 which is a combination of previous supplemental figures S3-1,2,3 to show our analysis on the cellular organization of each clone. We showed how the pallial regions, other than Dld, are formed by multiple genetic clones in different colors, and also the projection from each clone. (page 9, Figure 2B)

      What is the extent of non-GFP cells in the regions they use for RNAseq and ATACseq? From the images shown it is very difficult to realize whether all cells in the clonal sector do indeed belong to the clone.

      Thank you for your question. In our revised manuscript, we analyzed the ratio of cells labeled in this transgenic line (HuC:loxp-DsRed-loxp-GFP). We found that a large portion of cells (around 60-70% cells) are DsRed positive in our transgenic line (Figure 1 - figure supplement 2B). (page 7, line 142-143)

      Reviewer #2 (Public Review):

      In this study, Isoe and team produced an atlas of the telencephalon of the adult medaka fish with which they better defined pallial and subpallial regions, characterized the expression of neurotransmitters, and performed clonal analysis to address their organization and maintenance during the continuous neurogenesis. They show that pallial anatomical regions are formed by independent clonal units. Furthermore, the authors demonstrate that pallial compartments exhibit region-specific chromatin landscapes, suggesting that gene expression is differentially regulated. Specifically, synaptic genes have a distinct chromatin landscape and expression in one of the regions of the dorsal pallium, the Dd2. Using the region-specific RNA expression and chromatin accessibility data they have generated; the authors propose several transcription factors as candidate regulators of Dd2 specification. Lastly, the authors use the enrichment of transcription factor binding motifs to establish homology between medaka and human telencephalon, aiming to describe an evolutionary origin for the Dd2 region.

      Overall, the study carefully describes diverse aspects of neurogenesis in the telencephalon of the adult medaka fish. As such, the manuscript has the potential to contribute insights to the understanding of circuits and neurogenesis in teleosts and the medaka fish, as well as the evolution of cellular heterogeneity and organization of the telencephalon. Furthermore, the atlas, if easily accessible to the broader community, could be a substantial resource to researchers interested in medaka and teleosts neuroscience. However, there are some conceptual and technical concerns that should be addressed to strengthen this work.

      Improving the atlas: The different interpretations of the imaging data generated remain isolated or fragmented and could be better integrated to describe anatomical, connectivity, and ontogeny differences through pallial and subpallial regions.

      In the revision process, we described the details of anatomical, and connectivity differences in the adult pallial and subpallial regions in Table 2. This document includes the description of comparing the brain regions with previous atlases.

      In terms of the ontogeny differences, we described the neural stem cells localization in the telencephalon in Figure 1 figure supplement 4. “The cell-body distribution in the pallium and subpallium is consistent with the pattern of the neural stem cell (radial glial) (Figure 1 – figure supplement 4). In the teleost telencephalon, the cell bodies of radial glia are located in the surface of the hemispheres and project inside the telencephalon 15. Since neural progenitors migrate along those axons, it is consistent that the cell bodies of the pallial clonally-related units are clustered along those axons in a cylindrical way.”(page 8, line 175-179; page 22, line 427-431))

      Molecular differences across regions and species: Differential gene expression and chromatin accessibility throughout regions should be better and more deeply characterized and presented, exhibiting more region-specific features, and leading to a better description of candidate transcription factors that could differentially regulate regional gene expression.

      The comparison between medaka fish and human telencephalon regions would benefit from a more extensive molecular analysis. Comparison of gene expression and accessible regions could expand the analysis together with TF-binding motif enrichment.

      In order to check the gene expression across brain regions in the different vertebrate species, we examined the mammal gene expression data (in situ hybridization) from the Allen Institute database. We analyzed the expression of all the Dd-specific expressing genes (809 genes) across the mammalian brain regions (12 regions), but we could not observe strong correlations with any specific brain regions in mice. Therefore, we have revised our conclusions regarding the correspondence between medaka's Dd2 and mammalian brain regions to be more cautious. (page 20, line 396 - page 21, line 401)

      Lineage tracing: The authors claim that the functional compartmentalization of the pallium relies on different cell lineages, which also mostly share connectivity patterns and, at least to some extent, expression patterns. It would be interesting to see how homogenous these lineages are at the molecular level and whether their compartmentalization is retained when neurons reach maturity.

      Thank you for the comment. We think single-cell RNA-seq in cell lineages in the future will allow us to see how homogenous cells that derived from the same lineages are at the molecular level and to assess the cell-type of the cells.

    1. Author Response

      Reviewer #2 (Public Review):

      The paper by Arribas et al. examines the coding properties of adult-born granule cells in the hippocampus at both single cell and network level. To address this question, the authors combine electrophysiology and modeling. The main findings are:

      Noisy stimulus patterns produce unreliable spiking in adult-born granule cells, but more reliable responses in mature granule cells.

      Analysis of spike patterns with a spike response model (SRM) demonstrates that adult-born and mature GCs show different coding properties.

      Whereas mature GCs are better decoders on the single cell level, heterogeneous networks comprised of both mature and adult-born cells are better encoders at the network level.

      Based on these results, the authors conclude that granule cell heterogeneity confers enhanced encoding capabilities to the dentate gyrus network.

      Although the manuscript contains interesting ideas and initial data, several major points need to be addressed.

      Major points:

      1) The authors use and noisy stimulation paradigm to activate granule cells at a relatively high frequency. However, in the intact network in vivo, granule cells fire much more sparsely. Furthermore, granule cells often fire in bursts. How these properties affect the coding properties of granule cells proposed in the present paper remains unclear. At the very least, this point needs to be better discussed.

      In vivo whole cell recordings of granule cells are very scarce. In our study, we based the design of our stimulus on recordings from the intact network in vivo (PerniaAndrade and Jonas 2014), which show that granule cells receive a wide range of frequencies, with a power spectrum that exhibits a power law decay. These properties are built in our noisy stimuli. These in vivo recordings have also reported the presence of theta oscillations, showing a peak in the spectrum. However, in our approach we deliberately removed these oscillations from our stimuli because it is best to fit GLMs using white noise or noise with an exponentially decaying autocorrelation (Paninski et al. 2004).

      Thus, our choice of the stimuli is far from arbitrary, but rooted on experimental evidence from intact network in vivo recordings, together with previous knowledge about GLM/SRM fitting. This comment reveals to us that we did not clarify this enough in the manuscript. We are grateful to the reviewer for revealing this omission, since this is in fact an important aspect of the study strategy. In the revised manuscript, we brought these points up front in the results section when we introduce the stimulus for the first time, and more thoroughly discussed it in the Methods section that describes the stimulus.

      Still, the bursts observed in granule cells are an important feature and they have been observed to be phase locked to the theta-gamma oscillations in vivo (Pernia-Andrade and Jonas 2014). In the revised version of the manuscript we included new experiments and simulations with stimuli that include a peak in theta frequency. We found that immature neurons also improve decoding performance with these theta modulated stimuli.

      2) The authors induce spiking in granule cells by injection of current waveforms. However, in the intact network, neurons are activated by synaptic conductances. As current and conductance have been shown to affect spike output differently, controls with conductance stimuli need to be provided. Dynamic clamp is not a miracle anymore these days.

      The use of dynamic clamp sounds in principle like a good suggestion. However, in the manuscript we have taken a different approach to enable the use of a single neuron GLM that uses currents as inputs. To control for the differences between mature and immature neurons we used currents with amplitude normalized by the input resistance, and both types of neurons were measured with the same technique to allow for the comparison.

      Importantly, the GLM type model that we use assumes that the membrane potential is a linear convolution of the input, which permits a straightforward and robust fitting approach. We argue that this is not a minor issue, since using dynamic clamp would require a drastic modification of the model. Furthermore, the use of conductance stimuli would not allow for the straightforward model fitting we perform with our approach. The key point here is that the membrane potential would not be correctly approximated as a linear function of the conductance stimulus, precluding the fitting strategy.

      Finally, at the moment we do not have the equipment to perform the suggested experiment, so this suggestion would require a big amount of time to acquire the equipment and set up the experiments in mature and immature neurons. In addition, we would have to change the model and develop a different fitting strategy. With the controls that we already have in the manuscript, we do not think dynamic clamp experiments would fundamentally change the conclusions of the manuscript. Thus, we argue that this is beyond a reasonable timeframe for this revision, but could be something to further explore in future. We now mention this possibility in the discussion.

      3) The greedy procedure is a good idea, but there are several issues with its implementation. First, it is unclear how the results depend on the starting value. What we end up with the same mixed network if we would start with adult-born cells? Second, the size of the greedy network is very small. It is unclear whether the main conclusion holds in larger networks, up to the level of biological network size (1 million). Finally, the fraction of adult-born granule cells in the optimal network comes out very large. This is different from the biological network, where clearly four or five-week-old granule cells cannot represent the majority. Much more work is needed to address these issues.

      The reviewer approves the greedy procedure that we apply in our manuscript and poses three issues for consideration.

      First, the reviewer queries what would be the result of starting the procedure with a different pool of simulated neurons, and whether we would obtain “the same mixed network if we would start with adult-born cells”. Let us remark that the outcome of the greedy procedure is not always the same mixed population of neurons. For each different mature neuron that we use to start the procedure, the trajectory (see Fig. 4A) of selected neurons will be different. Thus, the final population (network) will be different, and this is reflected in the error bars that we obtain in Fig. 4. Presumably, starting with adult-born cells will change the outcome of the greedy procedure. However, note that this is not the point of the approach. The motivation to start with mature neurons is to ask whether adult-born cells can contribute something to decoding, given that mature cells on their own perform better.

      Second, the reviewer questions the size of the population that we reach with the greedy procedure. Note that for the population sizes that we show in the manuscript the decoding performance already begins to saturate, Fig. 4F-H. Furthermore, it is unfeasible to construct a 1M neurons population due to the computational cost –the time it takes to run the algorithm. These two facts motivated us to stop at 12 neurons as it strikes a good balance between computational time and saturation. Importantly, as we expand below, the aim of the greedy procedure simulation is not reconstructing the actual network of the dentate gyrus. Rather, we seek to understand whether immature neurons could improve coding in a population.

      Third, the reviewer observes that the fraction of adult born cells in the reconstructed populations using the greedy procedure are large as compared to the biological network. Again, here note that the aim of the whole in-silico experiment is not to recover the biological network, where other aspects are at play. More simply, we query the possible contribution of adult born cells to coding. In fact, if we obtained the same proportion it would be by chance, since we do not think that adult-born cells in the dentate gyrus are chosen according to the greedy algorithm.

      Still, this comment from the reviewer motivated us to include further simulations of the greedy procedure with constraints. In the revised manuscript we show new results using the greedy procedure, but constraining the fraction of immature neurons in the resulting populations, see Figure 4-figure supplement 2.

      More generally, we think that these comments reveal a possible misunderstanding about the approach, its purpose and the interpretation of the results. The point of the greedy procedure is to show that immature neurons do in fact contribute to improve the decoding, despite being generally worse individually. We do not claim that the population obtained with the greedy procedure faithfully reflects the actual shape of the in vivo network. We are aware that it does not. We see that this may have not been clear in the original version. In the revised version, we now explain the purpose of the greedy procedure when we introduced it. Additionally, we comment on the proportion of immature neurons in the same paragraph.

      4) Likewise, the idea of dynamic pattern separation seems quite nice. However, the authors focus on the differences between mixed and pure networks, which are extremely small. Furthermore, the correlation coefficients of "low", "medium", and "high" correlation groups are chosen completely arbitrarily. A correlation coefficient of 0.99, considered low here, would seem extremely high in other contexts. Whether dynamic pattern separation is possible over a wider range of input correlation coefficients is unclear (see O'Reilly and McClelland, 1995, Hippocampus, for a possible relationship). Finally, aren't code expansion and lateral inhibition the key mechanisms underlying pattern separation? None of these potential mechanisms are incorporated here.

      The reviewer positively appreciates the idea of the pattern separation task that we propose in the manuscript, and poses some questions concerning the extent of the contribution of adult-born neurons.

      We agree that code expansion and lateral inhibition are key mechanisms for pattern separation in the DG, and we do not claim that adult-born neurogenesis is the key mechanism behind pattern separation. Rather, in our work we explore the role of adultborn immature neurons in coding in general, and in pattern separation in particular, given that it’s a commonly attributed function to the DG.

      We note that the correlation in O'Reilly and McClelland 1994 (actually, what they call pattern overlap) is of a very different nature than the one we compute in our work. They compute the overlap between different patterns of activation in a population of neurons, that is the probability that a single neuron is active in two different patterns of activation. In our manuscript we compute the correlation between different continuous time-varying stimuli that stimulate single neurons.

      Importantly, previous work has shown that ablating neurogenesis particularly affects fine spatial discrimination, that is when the separation between patterns is small, but not when it is large (Clelland 2009, Science). Hence, we were actually expecting the impact of adult-born neurons to be important only for relatively large correlation coefficient values.

      In the revised manuscript, we now explain the rationale for the choice of correlation values, both in the main text when we introduce the task, and in the Methods when we set the values for the low, medium and high correlation classes. We also added a sentence to the discussion on pattern separation, bringing in the importance of the ideas of lateral inhibition, code expansion, and the work of O’Reilly 1994.

      5) A main conclusion of the paper is that while mature GCs are better decoders on the single cell level, heterogeneity in mixtures improves coding in neuronal networks. However, this seems to be true only for r^2 as a readout criterion (Fig. 4F). For information, the result is less clear (Fig. 4G). The results must be discussed in a more objective way. Furthermore, intuitive explanations for this paradoxical observation are not provided. Saying that "this is an interesting open question for future work" is not enough.

      This is an interesting point raised by the Reviewer. While r^2 is quantified by comparing the decoded stimuli with the true stimuli, mutual information is related to the uncertainty about the decoding. That is, it quantifies the correspondence between decoded and true stimuli, but does not tell us whether it is a good approximation to it. For example, a decoder could achieve perfect mutual information but result in a poor reconstruction by performing a perfectly scrambled one-to-one mapping of the true stimulus [Schneidman et al. 2003], see also our reply to point [5] by Reviewer #1 above.

      We agree that this is an important point and we realize that it was not clear in the original version of the manuscript. In the revised manuscript we added some sentences to clarify this point.

      6) The authors ignore possible differences in the output of mature and adult-born granule cells in their thinking. If mature and adult-born granule cells had different outputs, this could affect their contributions to the code (either positively or negatively). At the very least, this possibility should be discussed.

      Newborn neurons contact the same targets as mature neurons, born during development: pyramidal cells in CA3, and interneurons in CA3 and the DG. During the maturation, there is a sequence of connectivity with CA3 and within the DG (Toni et a. 2008). At 4 weeks, newborn cells are already contacting their postsynaptic targets. Still, there may be subtle differences in the strength of these connections compared to mature neurons.

      So, although the targets are the same, there may be quantitative differences in the way they contribute to the code. Thus the point raised by the reviewer is interesting, so we decided to discuss it further in the revision.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript by Toshima et al. describes a study of the organization of traffic in the endomembrane system of the budding yeast Saccharomyces cerevisiae. The authors address the relation between endocytosis and the Golgi (TGN: a collection of maturing membrane elements derived from the trans-Golgi). The study builds on a previous article by the group of Benjamin Glick. In that study (Day et al., 2018), it was postulated that the TGN is the first destination for yeast endocytic traffic after internalization from the plasma membrane. Additionally, Day et al. had shown that endocytic recycling traffic towards the plasma membrane departs from the TGN as well. Therefore, early endosome and recycling endosome compartments would be identical to the TGN or contained within it. Here, Toshima et al. use super-resolution confocal live imaging microscopy (SCLIM) to refine a model of endocytic pathway organization. This powerful imaging technology allows them to show that out of two partially overlapping TGN markers, namely Tlg2 and Sec7, the syntaxin Tlg2 correlates better with the arrival of fluorescently labeled endocytic cargo than alternative TGN marker Sec7. Building on this main finding, the authors conclude that a specific part of the TGN (an "independent sub-compartment") functions as the early endosome. Further experiments in mutants for GGA clathrin adaptors, required for departure of endocytic cargo from the TGN to the Rab5-positive prevacuolar endosome, show again that endocytosed cargo accumulation correlates better with Tlg2 than with Sec7. Furthermore, in GGA mutants the overlap between Tlg2 and Sec7 is decreased, suggesting that GGA is required for maturation of this Tlg2 sub-compartment.

      The study is well conducted and its main conclusion that a Tlg2 subregion within the TGN functions as the early endosome seems well supported by the superb live imaging and the analysis of GGA mutants.

      Although a technical feat in live superresolution imaging, this single kind of data (moving, shape-shifting blobs of fluorescently-labeled proteins) does not totally fill with meaning the terms "compartment", "sub-compartment", or "independent sub-compartment". This, I think, is the main limitation of the study. Are these compartments or sub-compartments individuated membrane elements, collections of vesicles, regions of the same cisterna or saccule? For this, electron microscopy would be needed.

      We are very grateful for the reviewer’s favorable evaluation of our study. In accordance with the editors’ judgment in "Essential Revision", we have not performed electron microscopy analysis for this revision. However, we have addressed all other valuable comments.

      Reviewer #2 (Public Review):

      In this manuscript Toshima et al document the use of sophisticated microscopy - with powerful spatial and time resolution - to image markers of the yeast endosomal system.

      The initial work documented in this paper does a good job of defining the compartment endocytic cargoes internalise to. This is convincingly shown to be a compartment that is not marked by Sec7 but is instead a distinct (sub)compartment marked by the SNARE protein Tlg2. This agrees with many previous studies, (including biochemical experiments and microscopy of cargoes in a series of membrane trafficking mutants) but has different conclusions to another study (Day et al 2018 - Developmental Cell). Although the microscopy techniques used in the two studies are different, the yeast system and many of the reporters (FP tagged Tlg1, Sec7, Vps21 and fluorescently labelled mating factor) are the same. The Day et al study is suitably referenced throughout the manuscript but as to why the authors have come to fundamentally different answers about endocytic cargoes internalising to a Sec7+ compartment, is not discussed.

      According to the reviewer's suggestion, we have added a paragraph discussing about this (line 533-539).

      The work goes on to show endocytic carriers (marked by Abp1) and endocytic cargoes like fluorescently labelled mating factor internalise to the Tlg2+ compartment. The forward trafficking of these molecules is then observed to transit to a later endosome compartment labelled by Vps21. The super-resolution and time lapse imaging, sometimes even using 3 colours - is of very high quality and fully support the model presented at the end of the paper for this trafficking itinerary. Trafficking mutants are also used (such as a defective allele of arp3 and deletion of VPS21 / YPT52 GTPases) to interrupt trafficking routes and define the pathways followed by endocytosed mating factor.

      The endocytic trafficking from Tlg2+ to Vps21 compartments is shown to be defective in mutants lacking GGA adaptors (gga1∆ gga2∆), with cargoes accumulating in the Tlg2+ compartment and other clathrin adaptor mutants not causing this defect. This research avenue also reveals that the GGA proteins are required to maintain the distinct Tlg2 sub compartment.

      The final section of the paper uses the same tools to analyse the localisation of the recycling v-SNARE protein Snc1. This is arguably the most important set of experiments in the paper, not only is Snc1 a putative v-SNARE that functionally interacts with Tlg2, but this cargo, unlike pheromone, allows the investigation of recycling back to the PM from TGN/endosomes. However, the authors do not comment on the fact that Snc1 does not localise to the plasma membrane in either experiments using different microscopy techniques (Figure 5A + 5B), calling into question whether the recycling pathway is operating properly or that the FP-tagged machinery has disrupted processing? The steady state localisation of Snc1 in WT cells only looks normal in Supplemental figure, this discrepancy should be discussed or addressed.

      As the reviewer points out, fluorescent protein-tagged Snc1p usually localizes to the plasma membrane in addition to cytosolic puncta, as shown in Fig. 6–figure supplement 1A. In Fig. 6A, localization of GFP-Snc1p is demonstrated by focusing on the cell surface using a TIRF microscope, which differs from that focusing on the medial focal plane. Therefore, Fig. 6A shows that GFP-Snc1p localizes to the plasma membrane, albeit with evident punctate localization.

      Localization of mCherry-Snc1p to the plasma membrane was also observed in the images obtained by SCLIM. However, since the intracellular signals of mCherry-Snc1p are partially blocked by those around the plasma membrane, in Fig. 6B the intracellular localization has been emphasized by modulating the contrast, thereby reducing the fluorescence signals at the plasma membrane. In the new manuscript, we have added an image with only slight contrast (Fig. 6–figure supplement 1C) in the same cell as that shown in Fig. 6B.

      Reviewer #3 (Public Review):

      The manuscript by JY Toshima et al. is an excellent and important study that demonstrates very clearly the existence of an endosomal compartment in yeast, distinct from the trans-Golgi network, to which endocytic vesicles fuse upon internalization. They show that this compartment is enriched in the SNARE protein Tlg2, a yeast homologue of syntaxin, and is segregated from the Golgi-localized Sec7-containing compartment, indicating that the organization of the endocytic system in yeast is similar to that of animal cells. Furthermore, they demonstrate the trafficking machinery required for maturation of this compartment, and that it is also a station on the pathway back to the plasma membrane. Because there have been conflicting reports in the literature as to the existence of an endosomal compartment in yeast distinct from the trans-Golgi network, this paper is of great importance for the cell biology community.

      Major strengths of this study are the cutting-edge imaging technology used, and the careful, quantitative analyses carried out. The authors use a super-resolution live cell imaging approach that allows them to discriminate to a high resolution different compartments and membrane domains of highly dynamic yeast organelles, and to follow an internalizing cargo over time. With their manuscript, they have provided a full set of movies, along with quantifications, to support their conclusions.

      The authors use fluorescent-protein-labelled endocytic cargo (alpha-factor) and florescent-protein-labelled compartment markers, assaying them in high resolution and super-resolution live cell imaging microscopy systems. In this way, they are able to follow trafficking of cargo through compartments in real time. The authors first demonstrate that the alpha-factor cargo substantially colocalized with the SNARE protein Tlg2, a marker of early endosomes, but very little with Sec7. They also show that Tlg2 marks a sub-compartment distinct from the Sec7 compartment, but adjacent to it. Furthermore, they demonstrate using super-resolution microscopy and triple color 4D imaging that endocytosed alpha-factor cargo structures make contact with the Tlg2 compartment, adjacent to the Sec7 compartment, then disappear, supporting the conclusion that endocytic vesicles first fuse with the Tlg2 compartment. Next the authors show that alpha factor is transported from the Tlg2 compartment to the Vps21 compartment, a process that requires the GGA adaptors Gga1 and Gga2. Finally, the authors show that recycling of the endocytic R-SNARE Snc1 also occurs by passage through the Tlg2 compartment.

      The use of mutants that affect different stages of endosomal trafficking is a strength of the manuscript, as it allows elucidation of the mechanism of transport through successive compartments. Importantly, using a gga1-delta gga2-delta mutant, the authors demonstrate convincingly that the GGA adaptors Gga1 and Gga2 are required for alpha factor transport from the Tlg2 compartment to the Vps21 compartment.

      Throughout this study, the authors use fluorescent protein-labelled cargo and compartment markers (EGFP, mCherry, iRFP), but don't explicitly state to what extent these fusion proteins are functional compared to the endogenous proteins. They could cite previous publications or their results describing the functionality of the fusion proteins used.

      According to the reviewer's suggestion, we have cited previous publications for GFP-Tlg2 (Seron et al., MBoC, 1998), Sec7-GFP/-mCherry (Seron et al, MBoC, 1998; Llinares et al., Sci Rep, 2015), Abp1-mCherry (Kaksonen et al., Cell, 2003; Picco et al., eLife, 2015), GFP-Vps21 (Toshima et al., Nat. Comm, 2016), Gga2-mCherry (Daboussi et al., NCB, 2012), GFP-Snc1p (Lewis et al., MBoC, 2000), and GFP-Ypt31 (Kim et al., Dev Cell, 2016). We have also added data showing the functionality of Abp1-mCherry (Fig. 2–figure supplement 1A), Sec7-iRFP (Fig. 1–figure supplement 1F), Gga2-mCherry (Fig. 5–figure supplement 2G), and GFP-Ypt31p (Fig. 7–figure supplement 1A) in the new manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      Luu et al. have developed a genome-edited African elite rice variety, Komboka. The work was initiated in response to the outbreak in Eastern Africa by Xanthomonas oryzae strains that are phylogenetically related to Asian strains and carry TALes, similar to strains from China, possessing an expanded repertoire of TALes compared to those in endemic strains. As these emerging strains contain TALe targeting SWEET11a, as well as that suppressing Xa1, pthXo1, and iTALes, the authors have generated edited lines targeting promoter regions of SWEET11a, 13 and 14 in African elite rice variety, Komboka. The same team has previously generated genome-edited lines targeting the promoter regions of SWEET11a, 13, and 14 in varieties Kitaake, IR64, and Ciherang-Sub1. Bacterial blight outbreaks and emerging pathogen lineages remain to be a threat to rice production. Thus, efforts in targeting pathogen weaknesses to generate genome-edited varieties possessing broad-spectrum resistance are required. The survey, collection of isolates, and strain characterization studies on >800 strains are commendable. This study has taken advantage of this ongoing collection to stay ahead in the arms race to deploy broad-spectrum resistance in an elite rice variety using TALe targets.

      Overall conclusions presented here are supported to some extent; however, I have listed some of my comments and concerns below.

      1) Data in supplementary table 2 suggests that Komboka is still a moderately resistant variety under field conditions in Africa, with a disease severity scale of 2 i.e. 4-6% disease severity, compared to other varieties having a disease severity scale of 5. Thus, I am not convinced that emerging strains are of concern on the Komboka variety under field conditions, thus, question the justification of Komboka being a choice for editing to tackle emerging strains.

      We apologize, because the Table 2 is admittedly hard to read with the geo data. We have thus added a new figure 1 with maps. Please note that the data in this Table are from 2022. If you look at for example the Morogoro region (Dakawa and Lunkege), it appears that also there, the initial scale (number of plants infected) was low and became more severe in the subsequent years as one might expect. We thus hypothesize, that in the upcoming analyses, the scale will also become much higher, thus this snapshot cannot serve as a measure of general susceptibility. As we noted in the response to the Editor, the Kaufmann clipping assays are widely used by breeders to evaluate resistance in greenhouse conditions, and since the assays uses severe wounding and extremely high bacterial inocula, this assay is a reliable measure of susceptibility. Note also, that Komboka was chosen before the outbreak was characterized. Our data show that Komboka is highly susceptible to Asian strains, as well as to the introduced strains. Note also that we characterized the R gene outfit as far as feasible, an found two R genes that can explain the resistance to the endemic African strains. Note that single, double and triple R gene mutant combinations have been broken in India, thus we deemed it necessary to create a rational approach that prevents SWEET gene recruitment to generate broad spectrum resistance. xa13 has likely only been broken by circumventing SWEET11a (by using SWEET13 or 14), but still stands up in quintuple breeding combinations in India. Thus, we expect that our lines will be rather robust, which will have to be tested in future field trials in Kenya where this variety is highly cultivated. We added text to Results, Discussion sections and a new section on sampling in Methods with respective references that show the correlation of data from assays with the same strains in greenhouse and field.

      2) Is Xa4 from Komboka related to Xa4_Teqing? The breakdown of Xa4T was due to the mutant allele of avrXa4 in virulent Xoo CR6. But this breakdown was accompanied by a fitness penalty and residual QTL had a significant residual effect on virulent strains. Would this be why Komboka carrying Xa1 (Xa45(t) and Xa4 under field conditions still showed moderate resistance? But Xoo strains showed susceptibility in leaf clipping assays.

      We apologize, this was a typo that has been corrected. Komboka is a high yielding variety, we thus cannot comment on any yield penalty here, it is superior and widely accepted now in Kenya. And we responded regarding on the moderate resistance in the previous paragraph. Komboka is fully susceptible to the Asian strains that induce SWEET11a.

      3) I felt a bit of a disconnect in sections on phenotypic assays confirming the virulence profile of strains on Komboka and then understanding mechanisms underlying virulence since the same strains used in path data were not the ones mentioned in WGS and TALe analysis, leaving the readers with the only one strain to support the hypothesis of the basis for higher disease severity on Komboka due to new TALes, pthXo1, and iTALe. Do authors have pathogenicity data for African strains T19, Dak16, and Xoo3-1 that grouped with endemic African strains on Komboka? Authors present data on CIX4457, 4458, and 4462 being virulent on Komboka, and show that they cluster with Asian strains. However, in the tree, 4462 is the only one shown to be closely related to Chinese strains. Where are 4457 and 4458 placed? Do 4457 and 4458 also contain pthXo1 and iTALe? Authors could also provide path data for 4506/4509 that they included in TALe figure and in the phylogenetic tree.

      We had initiated WGS of 8 strains (3 from Dakawa and 5 from Lukenge), but at the time of submission, not all genomes were fully polished. Although not all are in a publishable state by now, we were able to determine the similarity as well as presence of pthXo1 and iTALes. The number of SNPs among the 8 strains is extremely low (between 1 and 4), strongly intimating that they are siblings. They are so similar, that we can at present not trace the origin. All eight strains isolated in Dakawa in 2019 and in Lukenge in 2021 contain iTALes and the PthXo1B variant. With near certainty that they are all derived from a single introduction event. We fully understand your comment. We apologize, since we should not have used the CIX nomenclature, which was introduced to obtain a more reliable code for the strains. We have introduced a clearer nomenclature while keeping the code for the database. We added a new Figure 2-supplement 1 which shows that Komboka is susceptible not only to the three strains isolated in Dakawa in 2019, but also to one of the strains isolated from Lukenge in 2021. We replaced Fig. 3 with a new phylogenetic tree including the eight strains and provide more detailed information on the relation of those strains. In principle it would be sufficient to use a single isolate in this case. We now provide, as far as possible the new data (analysis is ongoing) as well as new data for some strains collected in 2022 and conclude that also the strains identified in 2022 are derivatives from an initial introduction in the Morogoro region. It is also clear from Fig. 2 and supplement that Komboka is fully susceptible to the strains isolated from Dakawa and Lukenge, as susceptible as to the Philippine reference strain PXO99A, which also uses PthXo1.

      4) The authors present pathogenicity data on EBE-edited T0, T1, and T2 lines of Komboka which are promising against the Tanzanian strains carrying new TALes. The cas9/cpf1 system developed here to target multiple EBEs will be a valuable contribution to the scientific community. What are the agronomic characteristics of these edited lines? As the edited lines have not been tested against a diversity panel of Asian and African strains, I would be skeptical of the choice of the term "broad-spectrum" yet.

      Virulence of Xoo depends critically on the recruitment of at least one of the three SWEETs (11a, 13 or 14). Single R genes, such as xa13 can be overcome by using SWEET13 or 14. All strains that are virulent carry at least one TALe that targets a SWEET. Thus, by blocking all known EBEs, we obtain broad spectrum resistance. We have not observed a single case yet where this is not working. Note that in the case of EBE edited Kitaake, we tested about 100 different strains from a world-wide collection, for IR64 and Ciherang-Sub 1 also many representative strains, and we now show data for Komboka and additional varieties. Thus, based on the current knowledge, including the information gained from Xoo genome sequences that have been published, e.g., recently from India, there is at present no strain known that can overcome this resistance.

      Regardless of my comment earlier on Komboka being moderately resistant under field conditions and thus a questionable choice for EBE-editing here, the genome-edited lines in any variety imparting resistance to bacterial blight remain to be a valuable contribution to managing disease outbreaks.

      We commented on the interpretation of moderate resistance above, but appreciate the comment that these lines will be valuable.

      5) As this manuscript utilizes the diversity of African strains to generate edited lines, it would be good to make diagnostics and path data for 833 strains available to the scientific community (instead of select strains as indicated in the supplementary table), especially for the fact stated here in the manuscript about scarce data on Xoo in Africa and the goal of systematic comparison of the pathogen population.

      We are currently preparing a manuscript that will include an extensive analysis of these strains, and focus on the diversity of African Xoo strains, i.e., MLVA-based diversity of the collection. This manuscript, which is in preparation, will include the requested data.

      Reviewer #2 (Public Review):

      This study describes the emergence of virulent strains of the rice bacterial blight pathogen Xanthomonas oryze pv. oryzae in the Morogoro rice-growing region in Tanzania. The aims of the study were to describe the virulence features of the emerging population, as compared to previous bacterial blight outbreaks in Africa, and generate an elite rice variety that is resistant to both pathogen populations. To achieve these aims, the authors characterized the genetic basis of the virulence of these new strains by sequencing the genomes of three representative strains and phenotyping using excellent genetic resources for identifying the susceptibility gene targets of this pathogen in rice. They then used two rounds of hybrid CRISPR-Cas9/Cpf1 to successfully edit six targets of the pathogen in an East African rice variety, which conferred resistance to all strains tested.

      The strengths of this paper are the systematic analysis of the virulence of emerging pathogen strains relative to strains from previous outbreaks and the successful creation of edited lines that will form the basis for continued efforts to gain regulatory approval for the introduction of resistant rice in East Africa. The creation of the edited line is a substantial and important contribution, indeed, the authors include strains collected in 2021 and include disease severity data from 2022 in the supplementary data, illustrating the urgent need for solutions.

      The weaknesses of the study are largely related to the quick turnaround between data collection and manuscript submission.

      1) Different strains are used for different experimental work and sequence analysis, making relationships between different parts of the work unclear and also more challenging for the reader to follow because of changing strain designations. CIX4457, CIX4458, and CIX4462 were virulent on rice near-isogenic-lines, CIX4457 and CIX4505 were used for identifying SWEET targets and phenotyping edited lines, while whole genome sequencing was conducted with CIX4462, CIX4506, CIX4509.

      We added new information which demonstrates that the strains isolated in 2019 in Dakawa and the strains from Lukenge (2021) are very closely related and differ only by a 1 to 4 core genome SNPs (see new supp Fig. 3A). We added a new Figure2-supplementary Figure 1 and expanded Table 1 to show that the strains from Lukenge and Dakawa behave in a similar manner. We are aware of the differences in the figures but hopefully have now addressed them in an acceptable manner, we did not want to combine data from different experiments. The differences in strain use are due to i) the different timing of strains sampling and isolation (those from 2019 were isolated first and the long and tedious work of leaf-clipping the whole set of NILs with all the diversity strain panel did therefore not include Tanzanian strains from 2021 that were isolated much later also due to long delay in having the infected leaf material sent out; including them in the NILs testing would have taken us another year given the volume of this experiment), and ii) the variable quality of whole genome sequencing of the strains. Overall, we have sequenced the genome of 8 newly introduced strains including 3 from Dakawa_2019 and 5 from Lukenge_2021 (see new suppl. Table 3 that gives a detailed overview of the genomic analysis of these strains). The best genome sequences were obtained for strains CIX4462, CIX4506 and CIX4509 (renamed in the revised version of this MS and for sake of clarity as iTzDak19-3, iTzLuk21-1 and iTzLuk21-2) of which a circularized chromosome could be generated. Unfortunately, these were not the strains that we had selected for SWEET characterization and phenotyping of edited lines, whereby one representative strain of each collection had been randomly picked, namely CIX4457 and CIX4505 (now iTzDak19-1 and iTzLuk21-3, respectively). To reconcile these two sets of data and show that strains from Dakawa and Lukenge are actually extremely similar, we have performed a SNP-based phylogenetic analysis of the 8 strains demonstrating that they all cluster as one homogeneous genetic lineage, in line with a scenario whereby all these strains result of a single introduction event from Asia. Careful analysis of these additional genomes also confirmed the presence of a pthXo1like allele (pthXo1B) and iTALes in all Tanzanian strains introduced from Asia. One exception is strain iTzLuk21-3 (CIX4505) where the poor quality of the pthXo1B sequence with potential frameshifts prevented any confirmatory analysis. Taken together, these data support the hypothesis that all new isolates, irrespectively of the year of sampling, are genetically very close and share the same virulence characteristics.

      2) Disease survey results from 2022 are listed in Supplementary Table 2, but it is challenging for the reader to summarize across many lines of data, which appear to represent individual samples.

      We agree that this was not the best way to show the data. In addition to the new suppl. Tables 1 and 3 we have now generated a new Figure 1 which contains maps of the disease distribution and severity across Tanzania in the different years as well as photos from the fields in Dakawa from 2019 and Lukenge in 2021 that highlight the massive infections.

      3) The focus of the editing is Komboka but bacterial blight in 2022 was mostly on other varieties. It would be helpful to have more context on this variety and what has prevented adoption by the growers in the Morogoro region to date.

      The variety was chosen several years ago after extensive consultations with breeders from IRRI, IRRI Africa, and India, since it is high yielding, and was specifically generated for Kenya where it has a high level of adoption. Tanzania has apparently not yet adopted this variety as you can see from Table 2. Also, Tanzania does NOT have any regulations for genome edited crops and we can thus NOT provide the lines to Tanzania. By contrast, Kenya has established a regulatory framework by which the local government authorities can import transgene-free edited lines. We are currently segregating the transgenes out and have established a through set of measures to validate whether the lines still contain transgenes (including vector backbone and T-DNA remnants). Tanzania will have to establish suitable guidelines. We would like to note that establishing protocols for different elite varieties is challenging and time consuming and we had early on, in 2019, decided to initiate work on transformation protocols for this variety. If Tanzania also adopt regulations, it would be possible to provide the lines to Tanzania as well, and possibly by then Tanzania has a higher level of adoption of Komboka. If you look at the maps we show, it is very likely that the disease will spread to all neighboring countries, including Kenya. Thus, our lines may become one possible measure to try to address the outbreak.

      Reviewer #3 (Public Review):

      One key finding of this work is the identification of Xanthomonas oryzae pv. oryzae (Xoo) strains in Africa, based on their genomes sequence and their TALE repertoires, have high similarity with Asian strains. Asian Xoo strains typically overcome NLR-mediated recognition of TALEs in rice by so-called iTALEs. Moreover, some Asian strains contain a TALE resembling PthXo1, a TALE protein that was shown to overcome xa5 resistance.

      The authors now found that some of the newly identified African strains have iTALEs and PthXo1-like TALEs. Such newly evolved African strains were found to be fully virulent on the African rice elite variety Komboka, which is resistant to a broad panel of African Xoo strains.

      Previous studies have shown that TALEs bind to effector binding elements (EBEs) present in promoters of rice SWEET genes to promote disease. Work from the lab of the authors and other labs has shown that TALEs can no longer promote the disease if matching EBEs are changed or deleted by CRISPR or TALEN-mediated mutagenesis. In fact, pioneering work by Bing Yang, one of the authors of this article published about ten years ago a Nature Biotechnology article where he showed that rice plants with mutated EBEs are resistant to Xoo. Recently, a combined effort of the Yang and Frommer labs resulted in two further Nature Biotechnology publications (2019), in which they described along with other useful tools rice lines where multiple EBEs were mutagenized in parallel and that provide broad spectrum resistance.

      The work under review describes now CRISPR mutagenesis of an African elite cultivar resulting in a line that mediates resistance to Asian and newly evolved African strains.

      Overall, the work is technically sound. Yet, the approach that has been described - mutagenesis of multiple EBEs - has been used before and is a routine procedure for labs that are focused on such undertakings. While such approaches do not provide new insights for fundamental research, they nevertheless are certainly important and useful in translational research, as demonstrated here.

      We thank reviewer for the comments. If we may, we would like to add aspects of novelty. We detected an outbreak that is spreading. We determined the disease mechanism, and we used CPF1 to obtain ‘optimal’ mutations at all sites (massive improvement over 2019 publication, which used Cas9) and we try to provide a solution for the outbreak when it spreads to Kenya, or when Tanzania and neighboring Countries adopt similar guidelines. This seems highly urgent das Reviewer 2 points out.

    1. Author Response

      Reviewer #1 (Public Review):

      This study used intersectional genetic approaches to stimulate a specific brainstem region while recording swallow/laryngeal motor responses. These results, coupled with histology, demonstrate that the PiCo region of the IRt mediates swallow/laryngeal behaviors, and their coordination with breathing. The data were gathered using solid methods and difficult electrophysiological techniques. This study and its findings are interesting and relevant. The analysis (and/or the presentation of the analysis) is incomplete, as there are analyses that need to be added to the manuscript. The interpretation of the data is mostly valid, but there are claims that are too speculative and are not well-supported by the results. The introduction and discussion would benefit from more citations and a deeper exploration of how this study relates to other work - especially a thorough accounting of and comparison to other studies concerning putative swallow gates.

      General/major concerns:

      The field of respiratory control is far from unified regarding the role of PiCo in breathing or any other laryngeal behaviors. If anything, the current consensus does not support the triple-oscillator hypothesis (in which PiCo is one of 3 essential respiratory oscillators). The name "PiCo", short for "post-inspiratory complex", suggests a function that has not been well-supported by data - it is a putative post-inspiratory complex, at best. I suggest putting this area in context with other discussions i.e. IRt (such as in Toor et al., 2019) or Dhingra et al. 2020 showed broad activation of many brainstem sites at the post-I period (including pons, BotC, NTS)

      The reviewer’s comment refers to our previous publication and not the present one. With all due respect to the reviewer, the submitted study investigates PiCo’s involvement in swallow and laryngeal activation and its coordination with breathing.

      We did not feel that it is appropriate for us to critique the Dhingra paper in the present study. However, since this seems to be important to this reviewer, we would like to clarify: Because of filter characteristics, and the low temporal and spatial resolution of these field recordings, the approach used by Dhingra is inappropriate for providing insights into the presence or absence of PiCo. We therefore developed an alternative approach, which provides more detailed insights into population activity, the Neuropixel approach. This Neuropixel recording from PiCo (black trace) exemplifies how field recordings (yellow) fail to pick up post-I activity. We could provide many more examples, but as stated above, addressing the study by Dhingra is tangential to the present study.

      We would also emphasize that the study by Dhingra was never designed to provide negative evidence, and Dhingra et al. never claimed that their study demonstrates the absence of PiCo. Unfortunately, the data by Dhingra were misinterpreted by Swen Hülsmann in his Journal of Physiology editorial which created considerable confusion, but also sensation in the field. Objectively, Toor et al reproduced the Anderson study in rats as we will elaborate below. Unfortunately, Toor et al added to the confusion, by renaming the PiCo area into IRt. The field of respiration would have also been confused if the first study reproducing the Smith et al. 1991 study in a different rodent species would have refused to call this area preBötC and instead would have called it e.g. ventrolateral reticular field.

      Did you perform control experiments in which the opto stimulations were done on animals without the genetic channels (for example, WT or uncrossed ChAT-ires-cre, etc.), or in mice with the genetic channels that weren't crossed (uncrossed Ai32 mice)? If so, please include. If not, why?

      Yes, we performed many control experiments. Aside of many recordings in which viral injections were targeted outside PiCo, we also performed optogenetic stimulations in mice lacking channelrhodopsin. We have now added the following statements and supplemental figure.

      Optogenetic stimulation in mice lacking channelrhodopsin

      Stimulation of PiCo, across all stimulation durations, in 3 Ai32+/+ mice and 4 ChATcre:Vglut2FlpO:ChR2 mice where the ChR2 did not transfect ChATcre:Vglut2FlpO, as confirmed by a post-hoc histological analysis, resulted in no response (Fig. S3).

      How do you know that your opto activations simulate physiological activation? First, the intensive optical activation at the stim site does not occur in those neurons naturally.

      This seems like a generic critique of the optogenetic approach. In none of the 10,000+ published optogenetic studies is it known to what extent optogenetic activation stimulates exactly the same neurons and the same degree of activity as during a natural behavior. What we know is that PiCo neurons are activated during postinspiration (Anderson et al. 2016) and that optogenetic activation stimulates these neurons and that this activation evokes the same muscles in the same temporal sequence as a water-evoked swallow. We assume that the reviewer’s comment does not intend to imply that “swallows” evoked by nonspecifically stimulating the SLN is more physiological than the optogenetically-evoked swallows of a specific neuron population? From the reviewer’s other comments, it is obvious that the reviewer has no problems with the results of the Toor study that used exclusively SLN stimulations, an approach which is known to be very non-specific.

      Doing a natural (water) stim for comparison is good, but it cannot necessarily be directly compared to the opto stim. The water stim would activate many other brainstem regions in addition to PiCo.

      Can the reviewer provide any hard evidence that “many other brainstem regions” are activated by water stimulation in comparison to optogenetic stimulation?

      A caveat is that opto PiCo stim =/= water stim (in terms of underlying mechanisms) should be included. Second, in looking at the differences between water vs opto swallows in Table S2: it appears that the ChAT animals (S2A) have something weaker than a swallow with opto stim. For the Vglut2 and ChAT/Vglut2 (S2B&C), the opto swallows also aren't as "strong" as the water swallows (the X and EMG amplitudes are smaller). The interpretation/discussion attributes this to the lack of sensory input during opto stim, but does not mention the strong possibility that there is a difference in central mechanisms occurring. It also seems to be dismissed with the characterization of the swallow as "all-or-none" (see note on Fig 3 results).

      With all due respect, we are somewhat surprised that the reviewer dismisses the entire paragraph in the discussion that specifically addresses the comparison between water-swallows and PiCo-stimulated swallows. We discussed the possibility that PiCo stimulated swallows may not activate the full pathway/mechanism as does the water swallow. We carefully compared and confirmed that PiCo-stimulated swallows have the same temporal motor sequence of the same muscles as those activated in water swallows. As already stated, it is surprising that the reviewer has no problem with accepting the validity of previously published methods like electrical non-specific stimulations of the cNTS or SLN, a frequently used and accepted model to produce and study swallow.

      The writing needs extensive copy editing to improve clarity and precision, and to fix errors.

      Thank you for this comment, we have revised and reviewed the writing.

      Results/Fig 1: What proportion had no/other motor response (non-swallow, non-laryngeal) to the opto stim? I can extrapolate by subtraction, but it would be nice to see the "no/other response" on the plot.

      With all due respect to this reviewer, but it is not possible to address this question. Specifically, it is not possible to know if a “No response” (meaning “no behavioral output” occurred in response to PiCo stimulation), would have resulted in a swallow or laryngeal activation. However, figure 2 contains responses other than swallows, i.e. “non swallows”, which includes both laryngeal activation as well as “no responses” meaning “no behavioral response” in response to PiCo stimulation. This was determined to assess how the respiratory rhythm is affected when a swallow is not produced by PiCo stimulation.

      The explanation of genetics is too spread out and confusing. There needs to be more detail about all the genetic tools used, using the standard language for such tools, in one spot. Please also provide a clear explanation of what those tools accomplish. Include a figure if necessary.

      We apologize for creating confusion. We added more explanations to the text.

      Pick a conventional designator/abbreviation for the different strains, define them in the methods and in the first paragraph of the results section, and use those abbreviations throughout. I think that using ChAT as an abbreviation for your ChAT-ires-cre x Ai32 mice is confusing because it makes it sound like you're talking about the enzyme rather than the specific strain/neurons. Saying "ChAT stimulated swallows... swallows evoked by water or ChAT" makes it sound like the enzyme choline acetyltransferase itself is stimulating swallow. As is convention, pick a more precise abbreviation like ChAT-cre/Ai32 or ChAT:Ai32 or ChAT-ChR2 or ChAT/EYFP. This goes for the other strains as well.

      Thank you for pointing this out. To avoid confusion the strains/neurons are now referred to as: ChATcre:Ai32, Vglut2cre:Ai32, and ChATcre:Vglut2FlpO:ChR2

      For Fig S2C&D, why does it say mCherry? Isn't it tdTomato? Is it just an anti-ChAT antibody and then the tdTomato Ai65 is only labeling Vglut2? I don't see this in the methods section.

      Thank you for pointing this out. We apologize for our mistake, and we have corrected the manuscript to say tdTomato.

      I also don't see methods for all the staining in Fig S3. The photomicrograph says Vglut2-cre Ai6, but there's no mention of Ai6 anywhere else. Which mice are these? Did you cross Vglut2-cre with an Ai6 reporter mouse? How can you image an Ai6 mouse (which I assume expresses ZsGreen? and that you excited at 488?) and a 488 anti-goat in the same section (that's the only secondary listed in the methods that would work with your goat anti-ChAT)? Is there an error in listing the fluorophores in the methods? Please give more details on the microscopy including which filters were used for the triple staining.

      We have decided to remove the CTb data from the manuscript.

      Regarding the staining: I would expect the staining/maps in for the 2 different ChAT/Vglut2 intersectional strains to be similar (Fig 5A/B and S2C/D). The photomicrographs look very different to me, while the heat maps (this goes for all the heat maps in the paper) have barely distinguishable differences. In Fig 5, the staining looks much stronger than in Fig S2C. Why does it look like there are so many more transfected neurons in Fig 5A2 than there are red neurons in the corresponding panel Fig S2C2? And for Fig 5A4 and Fig S2C44? The plot and results text for Fig 5 says the avg number of neurons was 123+¬11. The plot for Fig S2D says 112+¬15, but the results text says 242+¬12 (not sure which is the correct number).

      Thank you for your comments. Previously the heat maps had different scale bars if you compare Fig 5A/B and S2C/D (now figure S4C/D). We changed the heat maps keeping the same scale for all of them. Discussing the representative photomicrography, even figure Fig 5A/B and S4C/D represents the same cluster of cells (PiCo Chat/Vglut+). Figure S4D states 242 ± 12 neurons (also stated in the results section).

      However, we want to point out that there are several technical differences between both, 1) figure 5A represents the transfection promoted by the virus injection, impacting the number of cells stained/transfected (133 ± 16 neurons), 2) figure S4C/D represents a intersectional mouse ChATcre: Vglut2FlpO: Ai65; (242 ± 12 neurons). In this case, we have more tdTomato positive cells because this genetic approach is able to detect most of the Chat and Vglut2 cells. The difference between figures is considered normal for anatomical studies, in some studies the same bregma can show different number of cells. Thus, the differences are due to the differences in the type of approaches (viral expressions vs. intersectional approach).

      We have also added additional experiments to figure 5 (now N=7) which has been reflected in the text and figures.

      The results text for Fig S2C also says the staining is "similar to the previous ChAT staining...", which I assume refers to S2A/B. The plot and results text for Fig S2B reports 403+¬39 neurons, while S2D is either 112 or 242 (not sure?). The plots have different Y scales, which should be changed to be the same. But why do the photomicrographs and the heat maps look so similar? I would expect far fewer neurons to be stained in the intersectional mice (Fig 5 and Fig S2C/D) than in the ChAT staining (Fig S2A/B). I am having trouble reconciling the different presentations/quantifications and making sense of the data in these histology figures.

      We removed “similar to the previous ChAT staining” and we have reviewed the heat maps. Since the original submission, we performed more experiments and now added more animals to the analysis (now N=7), each heat map represents the correct number of neurons in PiCo, respectively to each experiment.

      The Y scales has been adjust to better demonstrate the Chat staining vs. the intersectional mice triple conditioned.

      How can you distinguish PiCo from non-PiCo in the histology, especially in the ChAT-only staining? It seems that you have arbitrarily defined the PiCo region, and only counted neurons within that very constrained area.

      Even in ChAT-only staining, the N.ambiguus is very distinct from the cholinergic neurons located more medial to the N.ambiguus. This can be unambiguously be confirmed by combining ChAT with glutamatergic in situ staining as done in the Anderson et al. study, or unambiguously be demonstrated with the viral approach as done in the present study. Thus, we don’t see why it is arbitrary to define the distribution of PiCo neurons. What is arbitrary is the definition of the preBötC, yet the field of respiration seems to have no problem with this. We assume that the reviewer knows that Dbx1 neurons are spread along the entire ventral respiratory column and dorsal portion of the PreBötzinger Complex up to the level of the XII nucleus. Yet it is commonly accepted for authors to refer to the PreBötzinger Complex by counting dbx1 neurons within a constrained area of what is believed to be the PreBötzinger Complex, even though the borders are arbitrary. It is e.g. known that some of the ventrally located preBötC neurons are presumed rhythmogenic while the more dorsally located Dbx1 neurons are premotor. The transition from rhythmogenic to premotor is gradual. Similarly, NK1 staining, or SST staining is not restricted to the preBötC and it is arbitrary to define where preBötC begins and what to include. Indeed, our PNAS paper indicates that inspiratory bursts can be generated by optogenetically stimulating Dbx1 neurons along the entire VRC column – so it is not clear where the rhythmogenic portion of the preBötC begins rostrocaudally and dorsoventrally and where the rhythmogenic portion and preBötC itself ends. Thus, we want to re-iterate and emphasize, that for the present study, we developed a method using the cre/FlpO approach to unambiguously define the PiCo region. It is surprising that this reviewer does not acknowledge this technical advance that added significantly more specificity to the anatomical and physiological characterization of PiCo, than the Toor et al. study, and also the Anderson et al. study.

      I can see stained neurons in the area immediately outside of PiCo, and I'd like to see lower-magnification images that show the staining distribution in a broader region surrounding PiCo as well, especially in the rest of the reticular formation.

      We characterized the PiCo area based on the histological phenotype and in vitro and in vivo experiments performed by Anderson et al., 2016. PiCo is an area located close to the NAmb, presenting the same ChATcre phenotype. As stated above, the distribution and agglomeration of the NAmb is clearly very compact, and different then the observed ChATcre: Vglut2FlpO: Ai65 neurons located outside of NAmb. It is also important to emphasize, that like is the case for the preBötC, other transmitter phenotypes of neurons are also present in the PiCo region (i.e. GABA or Dbx1). However, the study performed by Anderson et al, 2016 paper, described only the functions of cholinergic neurons located in PiCo, and we always planned to publish a paper of the other neurons within PiCo – this area e.g. contains pacemaker neurons etc. But, I hope that the reviewer acknowledges that many investigators have studied the preBötC for the past 30 years. Hence, much more information has been accumulated on this region (which btw was at least as controversial at the beginning), and it will likely take at least another 30 years to fully identify and characterize PiCo.

      Similarly, how can you be sure you're stereotaxically targeting PiCo precisely (600um in diameter?) with your opto fiber (200um in diameter). Wouldn't small variations in anatomy put the fiber outside the tiny PiCo area?

      We assume the reviewer means “stereotactically”. And yes, the reviewer is correct, it is necessary to position the laser at a consistent anatomical location. Placement of the optical fibers outside of this area does not result in activation of PiCo. We have added an additional supplemental figure (Figure S6) to address this.

      Please put N's and stats results in Table S1 for both swallow and laryngeal activity. I took what I assume to be the Ns (10, 11, and 4) and did some stats like the ones you presented for the laryngeal duration. The differences between vagus duration for 40 and 200 ms pulse durations are all significant for each strain, by my calculations. Also, I think there must be an error in the orange swallow plot in Fig 3A. The orange dots don't correspond to the table values. I plotted all the Table S1 values for each strain. Each line looks similar to the blue laryngeal activation plot in Fig 3A. The slopes of the Vglut2 were less than the other strains, and the slopes for the swallow behavior were less than the laryngeal behavior for all strains. Otherwise, they all look similar. Please double-check your values/stats to address these discrepancies. If it is indeed true that the stim pulse duration affects swallow duration, revise the interpretations and manuscript accordingly.

      We thank the reviewer for the diligence in reviewing our manuscript. But, with all due respect, the reviewer is incorrect and misunderstood the data. To clarify: Table S1 is only presenting data for laryngeal activation, swallow data is presented in Table S2. The orange data points in Fig 3A are not detailed in Table S1 or S2. Table S2 is the average of all swallows across all laser pulse durations since the laser pulse duration does not affect swallow behavior duration. All data will be publically available after publication of the manuscript.

      Figure 3A is only representing the ChATcre:Vglut2FlpO:ChR2 column of Table S1

      The N’s have been added to table S1

      Please add more details on stats in general, including the specific tests that were performed, F values and degrees of freedom, etc.

      Thank you, this has been added throughout the results section. Please refer to the results section for this addition. However below we have provided an example.

      An example: A two-way ANOVA revealed a significant interaction between time and behavior (p<0.0001, df= 4, F= 23.31) in ChATcre:Vglut2FlpO:ChR2 mice (N=7).

      How do you know that you're not just activating motoneurons in the NA when you stimulate your ChAT animals, especially given the results in Fig 1B? In this case, the phase-specific results could be explained by inhibitory inputs (during inspiration) to motoneurons in the region of the opto stim.

      As stated in this paper as well as the Anderson et al 2016 paper (and for that matter also the Toor et al study) this is a caveat. This major caveat motivated the development and use of the ChATcre:Vglut2FlpO:ChR2 (specifically targeting the PiCo neurons that co-express ChAT and Vglut2, not laryngeal motor neurons) experiments that have mostly the same response as the ChATcre:Ai32 mice. We cannot say this is due to inhibitory inputs to laryngeal motoneurons, since the cre/FlpO specific experiments are not directly activating laryngeal motoneurons. But we do not want to entirely exclude that some premotor mechanisms may also occur in PiCo. The reviewer may know that there is overlap of rhythmogenic and premotor functions for the Dbx1 neurons in the PreBötC, But, addressing this issue is beyond the scope of this study. In fact, we are working on a separate connectivity study using novel, still unpublished antegrade and retrograde vectors that do not reveal any direct connections to laryngeal motoneurons. Hence, we expect that the connectivity from PiCo to laryngeal motoneurons is more complex and addressing this question cannot be done as a simple add-on to an already complex study. Again, we would refer to the PreBötzinger complex, where nobody expects that one study can resolve all the physiological and anatomical characterizations that have been accumulated over 30 years in one study. We would argue that in some ways, our cre/FlpO approach is more specific than the Dbx1 stimulations which activates not only rhythmogenetic PreBötzinger complex neurons, but also pre motoneurons as well as glia cells, and many neurons rostral and caudal to the PreBötzinger complex. We are aware of these caveats, and we have discussed this in the original submission, and also in the revision.

      While the study from Toor et al is cited, there needs to be a much more thorough discussion of how their findings relate to the current study.

      Many thanks for asking for a more thorough discussion of Toor et al., which we are happy to provide here. Perhaps we were too polite in our original manuscript to emphasize all the problems in that study.

      They demonstrated that PiCo isn't necessary for the apneic portion of swallow. Inhibiting this region also didn't affect TI.

      Please note – the fact that Toor et al did not find an effect on TI confirms Anderson et al. 2016: In Figure 3G,3F of the Nature paper, the reviewer will find that injections of DAMGO and SST into PiCo inhibited post-I activity without affect inspiratory duration. This figure also shows that the inspiratory burst can terminate in the absence of postinspiratory activity.

      The reviewer states: “They demonstrated that PiCo isn't necessary for the apneic portion of swallow”. With all due respect to this reviewer, this is NOT correct. Toor et al showed that inhibiting PiCo did block SLN-evoked fictive-swallows but not the apnea caused by SLN stimulation. This is not the apnea caused by swallows (which was never studied by Toor), but by the SLN stimulation. The apnea evoked by SLN stimulation has most likely nothing to do with the apnea caused by swallows. Unfortunately, the Toor et al. makes the same misleading claim as the reviewer.

      PiCo cannot be the sole source of post-I timing, and the evidence overwhelmingly favors the major involvement of other regions such as the pons.

      This comment seems to be unrelated to the main thrust of this paper that studies PiCo’s involvement in swallow and laryngeal activation in coordination with breathing. However, since this comment seems to discredit the Ramirez lab in general, we would like to clarify that inhibiting PiCo with DAMGO and SST inhibits post-I activity (Anderson et al 2016, Fig.3G,3F). Thus, we don’t understand the rationale or actual data for the reviewer’s conclusion that PiCo cannot be the sole source of post-I timing? We also don’t understand the basis for the reviewer’s conclusion that “the evidence overwhelmingly favors the major involvement of other regions such as the pons”. We also want to add, that no-where in the Anderson et al. study did we state that the pons plays NO role. Indeed, we specifically stated: “In this context it will be interesting to resolve the role of the PiCo in specific postinspiratory behaviors and to identify how the PiCo interacts with other neural networks such as the Kolliker-Fuse nucleus, a pontine structure that has been hypothesized to gate postinspiratory activity and the periaqueductal grey a structure involved in vocalization and the control of postinspiration”.

      They also showed that inhibition of all neurons (not just ChAT/Vglut) in the PiCo region suppresses post-I activity in eupnea. This suppression was overcome by the increased respiratory drive during hypoxia.

      Before comparisons are made with Toor et al. it is important to note the species and methodological differences between Toor et al. rat anesthetized, vagotomized, paralyzed and artificially ventilated model which evaluated fictive swallows (deafferented and paralyzed). By contrast this study uses a mouse anesthetized, vagal intact, freely breathing model and evaluates natural physiologic swallow via water and central stimulation. It seems that the reviewers does not acknowledge one of the main innovations of this study. For this study we introduced a genetic approach to specifically target and activate ChATcre/Vglut2FlpO PiCo neurons. This has never been done before, and developing this approach took more than 4 years of breeding and crossing and testing different options.

      As for Toor et al., these authors pharmacologically, bilaterally inhibited neurons in the area of PiCo with isoguvacine, a specific GABA-A agonist. Even though this pharmacological intervention does not specifically inhibit cholinergic/glutamatergic neurons in PiCo, these authors essentially confirm the study by Anderson et al. We do not find this finding controversial. Perhaps the reviewer finds the definition of PiCo “controversial”, because Toor et al called the identical area IRt instead of PiCo, even though they exactly reproduce the finding by Anderson. Toor et al. not only arrive at the same conclusion as Anderson but they added more details – none of which is contradicting the results by Anderson et al.: Here are excerpts from the Toor study “We therefore conclude that the ongoing activity of neurons in the IRt contributes to eupneic respiratory and sympathetic post-I activities without exerting significant control on other respiratory or cardiovascular parameters” “IRt significantly inhibited the post-I components of VNA” “IRt inhibition was also associated with a reduction in PNA” “increase in respiratory cycle frequency” “due to a reduction in TE“ “with no effect on TI observed”. “Bilateral microinjection of isoguvacine selectively reduced the magnitudes of post-I VNA and rSNA, but not PNA responses to acute hypoxemia”.

      In this statement the reviewer probably refers to one particular aspect, i.e. the fact that Toor et al. did not significantly block some of the post-I activity – they state: “had no significant effect on the AUC of post-I rSNA (305+/- 24 vs 230+/- 28,p=0.16,n=6)”. Please note that there is a tendency, a reduction from 305-230. Perhaps the Toor study was not sufficiently powered to fully block the effect, perhaps the drug did not inhibit the entire PiCo. These are all open questions that a critical reader should know. The reviewer will agree that it is as difficult if not more difficult to demonstrate the absence of an effect. To arrive at a negative conclusion experiments should be done with the same scrutiny than to demonstrate a positive result. We also assume that the reviewer is familiar with animal experiments and will understand that pharmacological injections are often difficult to interpret, in particular in case of local in vivo injections. It is possible that Toor et al is inhibiting e.g. parts of the Bötzinger complex.

      We have added to the manuscript the following statement: It is important to note that SLN stimulation does not only trigger swallows, but also changes in the overall stiffness and tension of the vocal cords (Chhetri et al., 2013) as well as prolonged hypoglossal activation independent of swallowing (Jiang, Mitchell, & Lipski, 1991). It has been hypothesized that inhibition of the IRt blocks fictive swallow but not swallow-related apnea. Yet this apnea was generated by SLN stimulation and not by a natural swallow stimulation (Ain Summan Toor et al., 2019). It is known that SLN stimulation causes endogenous release of adenosine that activates 2A receptors on GABAergic neurons resulting in the release of GABA on inspiratory neurons and subsequent inspiratory inhibition (Abu-Shaweesh, 2007), suggesting that the SLN evoked apnea may not be the same as a swallow related apnea. Moreover, microinjections of isoguvacine into the Bötzinger complex attenuated the apneic response but not the ELM burst activity (Sun, Bautista, Berkowitz, Zhao, & Pilowsky, 2011), suggesting the Bötzinger complex, not PiCo, could be involved in modulating apnea.

      We would also like to add that our current study characterized swallow-related specific muscles and nerves in both water-triggered and PiCo-triggered swallows to better characterize the physiological properties of this swallow behavior. By contrast, Toor et al. only characterized nerve activities that are involved in multiple upper airway activities and breathing. It is somewhat surprising that the reviewer did not consider the fact that Toor et al. characterized putative swallows that were triggered by SLN stimulation and that Toor et al. were content with nerve-recordings and failed to confirm that the behavior that they evoked is actually a physiological swallow. Which, according to the comments from this reviewer (see above), indicates the possibility of differences in central mechanisms occurring between fictive swallow and physiological swallows.

      While we have cited Toor et al and their truly excellent work in the broad iRt we did not feel it is appropriate to critique them for the fact that they are confusing the field by using a different anatomical term for the area that was clearly defined by us as an area containing cholinergic-glutamatergic neurons. We also did not feel it is appropriate to discuss results that are similar to comparing Apples and Oranges. Toor et al. never specifically manipulated glutamatergic-cholinergic neurons, thus their entire results rest on indirect stimulation affecting this general area – which will unavoidably also include laryngeal motoneurons. We don’t want to criticize this approach, since PiCo is heterogenous, which is another misunderstanding that we find in the reviewers’ critique. We used cholinergic-glutamatergic neurons to define this area. However, like the preBötC, PiCo is also heterogenous. This region contains inhibitory neurons, it also contains glutamatergic neurons that are not cholinergic, and cholinergic neurons that are not glutamatergic. Because of this heterogeneity we compared the effects of stimulating glutamatergic neurons and cholinergic neurons as well as cholinergic-glutamatergic neurons. This is an approach that is generally accepted in the field. As already stated, there is not a single marker that uniquely characterizes the PreBötC. Thus, when stimulating Dbx1 neurons, glutamatergic neurons, or Somatostatin neurons it only captures subpopulations of this region. The recently published study by Menuet et al. in eLife, used even more indirect methods to inhibit preBötC. They used a pan-neuronal CBA promotor that targets neurons irrespective of phenotype. It is not our intention to discredit this very elegant study, but we object the statement that we “have arbitrarily defined the PiCo region”.

      This study has not demonstrated some of the things that are depicted in Fig 7 and included in the discussion. While swallow can inhibit inspiration, there are many mechanisms by which this can happen other than a direct inhibitory connection from the DGS to PreBotC. You cite Sun et al., 2011 findings of "a group of neurons that inhibits inspiration" during SLN stim, but don't mention that it is the BotC and that the paper shows that swallow apnea is dependent on BotC. That is also supported by the Toor study. I don't understand how post-I (aka E) can be discussed without discussion of the BotC - this is a glaring omission.

      We have removed figure 7, which was only meant as a hypothetical schematic.

      Why is it necessary for PiCo to innervate the cNTS?

      This was a hypothesis based on CTb data that we have now removed.

      That is true if the conjecture that PiCo gates swallowing is true, as the cNTS is the only known region for central swallow gating. However, PiCo could influence afferent input to the NTS less directly, and therefore not function as a gating hub per se. The experimental evidence does not warrant the claim that PiCo gates swallowing. The definition of a swallow gate(s) is a topic of much debate and no conclusive experimental evidence has emerged for swallow gating regions to exist anywhere except in the NTS. The current study's evidence also does not meet the criteria necessary to conclusively call PiCo a swallow gate. The authors should soften this claim and language throughout the manuscript.

      Although we do not know of any studies that has optogenetically gated swallow in the cNTS, it seems the reviewer objects our use of the word “gate”. We have revised the manuscript and removed any wording stating PiCo is a swallow “gate”. It would be interesting to know whether the reviewer has the same objections of the use of the word “relay” as done by Toor et al.?

      It is also unclear that PiCo acts directly on the swallow pattern generator to gate swallowing. It is not just "conceivable that the gating mechanism involves" the pons, but nearly certain. Swallow gating by respiratory activity may not be able to be ascribed to one particular location. At a minimum, it likely involves the NTS/DSG, pons, and possibly IRt (inclusive of PiCo). The authors are correct that "further studies are necessary to understand the interaction between PiCo and the pontine respiratory group on the gating swallow and other airway protective behaviors." This is why it shouldn't be stated that "this small brainstem microcircuits acts as a central gating mechanism for airway protective behaviors."

      We have removed all language stating PiCo is a swallow gate.

      PiCo is likely part of the VSG (and thus the swallow pattern generator). PiCo, as part of the IRt/VSG could indeed be surveilling afferent information and providing output that affects swallow or other laryngeal activation and the coordination of these behaviors with breathing. However, this is not the responsibility of PiCo alone. This role is likely shared by other parts of the SPG, and may require distributed SPG network participation to be functional. If one were to stim other regions of the distributed SPG, similar results might be expected. When Toor et al silenced the PiCo area (and locations that lie at least lightly beyond the borders of what the present study defines as PiCo), stim-evoked fictive swallows were greatly suppressed. However, swallow-related apnea was unaffected. This supports the role of PiCo as a premotor relay for swallow motor activation, but not as the site that terminates inspiration. Therefore, it cannot be called a gate.

      We already addressed the issue that Toor never demonstrated that the “swallow-related” apnea was unaffected. Toor et al only demonstrated that the SLN-evoked apneas were unaffected, and their conclusions were only based on nerve recordings under fictive conditions (deafferented and paralyzed). Also, to the best of our knowledge, many aspects of the putative swallow pattern generator that this reviewer mentions are purely hypothetical. However, to avoid further arguments, we have removed the word gate and Figure 7 from this manuscript.

      Similarly, Fig 7 does not accurately depict things that are already well-supported by evidence. PiCo should be included as part of the swallow pattern generator (VSG), not as a separate entity acting on it. The BotC and pons are glaring omissions. This study has not demonstrated the labeled inhibitory connection from DSG to PreBotC. The legend states speculations as fact and needs to be dialed way back to either include statements with solid experimental evidence or to clearly mark things as putative/speculative.

      We have removed figure 7.

      The discussion of expiratory laryngeal motoneurons needs to be expanded and integrated better into the discussion of swallow, post-I, and other laryngeal motor activation. Why can't PiCo just be premotor to ELMs?

      If PiCo would “only” or “just” be premotor to ELM then it would not be expected that it could trigger an all-or-none swallow response with a temporal activity pattern similar to the one of a water-evoked swallow. We would also not expect that the activation of the activity pattern is independent of the laser stimulation duration as demonstrated in Figure 3. This was our reasoning why we originally called PiCo a “gate” because at the correct phase it will gate/trigger a complex swallow sequence. But, as stated above, we avoid the word gate in the revised manuscript.

      Concerning the discussion of "PiCo's influence as a gate for airway protective behaviors is blurred...": The incomplete swallow motor sequence didn't seem super different in timing or duration compared to the fully transfected animals (comparing plots from Fig 6 to Fig S1, and Table S2 to Table S3. The values for swallow durations (XII and X) for each group for water and opto seem within similar ranges, as do the differences between water & opto-evoked swallows between strains. While the motor pattern is distinctive from the normal swallow, with laryngeal activity rather than submental activity leading, one might not even be able to call that a swallow. Is it evidence against a classic all-or-nothing swallow behavior any more than the graded swallow results from (fully transfected) Table S1?

      We fully agree that it is possible that this unidentified behavior may not be a swallow. We have changed the name of this behavior to “upper airway motor activity.” However we also cannot rule out the possibility of this being some portion of a graded swallow which would argue that a graded swallow response is exact evidence against the classic all or nothing swallow behavior.

      Please expand on this point and put it into context with others' results: "This brings into question whether this is the first evidence against the classic dogma of swallow as an "all or nothing" behavior, and/or whether this is an indication that activating the cholinergic/glutamatergic neurons in PiCo is not only gating the SPG, but is actually involved in assembling the swallow motor pattern itself."

      This has been expanded and included citation of other studies. The following paragraph can be found in the discussion

      Swallow has been thought of as an “all or nothing” response as early as 1883 (Meltzer, 1883). Whether modulating spinal or vagal feedback (Huff A, 2020b), central drive for swallow/breathing (Huff, Karlen-Amarante, Pitts, & Ramirez, 2022) or lesions in swallow related areas of the brainstem (Car, 1979; Robert W Doty, Richmond, & Storey, 1967; Wang & Bieger, 1991) swallow either occurred or did not. Swallows are thought to be a fixed action pattern, with duration of stimulation having no effect on behavior duration (Fig. 3) (Dick, Oku, Romaniuk, & Cherniack, 1993). Thus, it was particularly interesting that in instances when few PiCo neurons were transfected, either unilateral or bilateral, an unknown activation of upper airway activity occurred. Motor activity no longer outlasted laser stimulation rather was contained within, and the timing of the motor sequence was reversed in comparison to a water or PiCo evoked swallow (Fig. 6). Thus, if insufficient numbers of neurons are activated, PiCo’s influence to specifically activate swallow or laryngeal activation is blurred, resulting in the uncoordinated activation of muscles involved in both behaviors. This brings possible evidence against the classic dogma of swallow as an “all or nothing” behavior, or the presence of an entirely different behavior. We are not the first to bring possible evidence against the classic dogma, “small swallows” were described but failed to be discovered if this was in-fact a partial or incomplete swallow (Miller & Sherrington, 1915). The SPG is thought to consist of bilateral circuits (hemi-CPGs) that govern ipsilateral motor activities, but receive crossing inputs from contralateral swallow interneurons in the reticular formation, thought to coordinate synchrony of swallow movements (Kinoshita et al., 2021; Sugimoto, Umezaki, Takagi, Narikawa, & Shin, 1998; Sugiyama et al., 2011). Incomplete activation of PiCo activates the muscular components of a swallow, without establishing the coordinated timing and sequence of the pattern. It is possible that PiCo is involved in assembling the swallow motor pattern itself and unilateral activation of PiCo could either desynchronize swallow interneurons or activates only one side of the SPG. Since we did not record bilateral swallow related muscles and nerves this question needs to be further examined.

      Reviewer #3 (Public Review):

      Huff et.al further characterise the anatomy and function of a population of excitatory medullary neurons, the Post-inspiratory Complex (PiCo), which they first described in 2016 as the origin of the laryngeal adduction that occurs in the post-inspiratory phase of quiet breathing. They propose an additional role for the glutamatergic and cholinergic PiCo interneurons in coordinating swallowing and protective airway reflexes with breathing, a critical function of the central respiratory apparatus, the neural mechanics of which have remained enigmatic. Using single allelic and intersectional allelic recombinase transgenic approaches, Huff et al. selectively excited choline acetyltransferase (ChAT) and vesicular glutamate transporter-2 (VGluT2) expressing neurons in the intermediate reticular nucleus of anesthetised mice using an optogenetic approach, evoking a stereotyped swallowing motor pattern (indistinguishable from a water-induced swallow) during the early phase of the breathing cycle (within the first 10% of the cycle) or tonic laryngeal adduction (which tracked tetanically with stimulus length) during the later phase of the breathing cycle (after 70% of the cycle).

      They further refine the anatomical demarcation of the PiCo using a combination of ChAT immunohistochemistry and an intersectional transgenic strategy by which fluorescent reporter expression (tdTomato) is regulated by a combinatorial flippase and cre recombinase-dependent cassette in triple allelic mice (Vglut2-ires2-FLPO; ChAT-ires-cre; Ai65).

      Lastly, they demonstrate that the PiCo is anatomically positioned to influence the induction of swallowing through a series of neuroanatomical experiments in which the retrograde tracer Cholera Toxin B (CTB) was transported from the proposed location of the putative swallowing pattern generator within the caudal nucleus of the solitary tract (NTS) to glutamatergic ChAT neurons located within the PiCo. We would like to thank the reviewer for acknowledging the technical advances of the present study and for the positive statements in general.

      Methods and Results

      The experimental approach is appropriate and at the cutting edge for the field: advanced neuroscience techniques for neuronal stimulation (virally driven opsin expression within a genetically intersecting subset of neurons) applied within a sophisticated in vivo preparation in the anaesthetized mouse with electrophysiological recordings from functionally discrete respiratory and swallowing muscles. This approach permits selective stimulation of target cell types and simultaneous assessment of gain-of-function on multiple respiratory and swallowing outputs. This intersectional method ensures PiCo activation occurs in isolation from surrounding glutamatergic IRt interneurons, which serve a diverse range of homeostatic and locomotor functions, and immediately adjacent cholinergic laryngeal motor neurons within the nucleus ambiguous (seen by some as a limitation of the original study that first described the PiCo and its roll in post-I rhythm generation Anderson et al., 2016 Nature 536, 76-80). These experiments are technically demanding and have been expertly performed.

      Again, we would like to thank the reviewer for these positive comments acknowledging the advances of the present study.

      The supplemental tracing experiments are of a lower standard. CTB is a robust retrograde tracer with some inherent limitations, paramount of which is the inadvertent labelling of neurons whose axons pass through the site of tracer deposition, commonly leading to false positives. In the context of labelling promiscuity by CTB, the small number of PiCo neurons labelled from the NTS (maybe 5 or 6 at most in an optical plane that features 20 or more PiCo neurons) is a concern. Even assuming that only a small subset of PiCo neurons makes this connection with the presumed swallowing CPG within the cNTS, interpretation is not helped by the low contrast of the tracer labelling (relative to the background) and the poor quality of the image itself. The connection the authors are trying to demonstrate between PiCo and the cNTS could be solidified using anterograde tracing data the authors should already have at hand (i.e. EYFP labelling driven by the con-fon AAV vectors from PiCo neurons (shown in Fig5), which should robustly label any projections to the cNTS).

      We fully agree with the reviewer that the CTB staining is of a lower standard and have removed this approach.

      The retrograde labelling from laryngeal muscles seems unnecessary: the laryngeal motor pool is well established (within the nAmb and ventral medulla), and it would be unprecedented for a population of glutamatergic neurons to form direct connections with muscles (beyond the sensory pool).

      The authors support their claim that PiCo neurons gate laryngeal activity with breathing through the demonstration that selective activation of glutamatergic and cholinergic PiCo neurons is sufficient to drive oral/pharyngeal/laryngeal motor responses under anaesthesia and that such responses are qualitatively shaped by the phase of the respiratory cycle within which stimulation occurs. Optical stimulation within the first 10% of the respiratory cycle was sufficient to evoke a complete, stereotyped swallow that reset the breathing cycle, while stimuli within the later 70% of the cycle, evoked discharge of the laryngeal muscles in a stimulus length-dependent manner. Induced swallows were qualitatively indistinguishable from naturalistic swallow induced by the introduction of water into the oral cavity. The authors note that a detailed interpretation of induced laryngeal activity is probably beyond the technical limits of their recordings, but they speculate that this activity may represent the laryngeal adductor reflex. This seems like a reasonable conclusion.

      We thank the reviewer for this comment. Unfortunately, we felt compelled to remove the word “gating” based on the statements by reviewer 1.

      The authors propose a model whereby the PiCo impinges upon the swallowing CPG (itself a poorly resolved structure) to explain their physiological data. The anatomical data presented in this study (retrograde transport of CTB from cNTS to PiCo) are insufficient to support this claim. As suggested above, complementary, high-quality, anterograde tracing data demonstrating connectivity between these structures as well as other brain regions would help to support this claim and broaden the impact of the study.

      We fully agree with this reviewer. We have been working on a thorough anatomical characterization for more than 3 years using cutting edge anterograde and retrograde viruses in collaboration with vector experts at the University of Irvine. But these are partly novel, unpublished techniques that are in development, and require many careful controls and characterization. We feel that this is a separate study as it doesn’t relate to swallowing coordination and also includes partly different authors. We hope to submit this as a separate study later this year.

      This study proposes that the PiCo in addition to serving as the site of generation of the post-I rhythm also gates swallowing and respiration. The scope of the study is small, and limited to the subfields of swallowing and respiratory neuroscience, however, this is an important basic biological question within these fields. The basic biological mechanisms that link these two behaviors, breathing and swallowing, are elusive and are critical in understanding how the brain achieves robust regulation of motor patterning of the aerodigestive tract, a mechanism that prevents aspiration of food and drink during ingestion. This study pushes the PiCo as a key candidate and supports this claim with solid functional data. A more comprehensive study demonstrating the necessity of the PiCo for swallow/breathing coordination through loss of function experiments (inhibitory optogenetics applied in the same transgenic context) along with robust connectivity data would solidify this claim.

      Thanks again for the positive assessment of our study.

    1. Author Response

      Reviewer #1 (Public Review):

      This paper explores the potential regulatory role of a previously unstudied phosphorylation site in the Src kinase SH3 domain. The data presented conclusively demonstrate that a phosphomimetic mutation of this site, src90E, causes an elevation in Src kinase activity, changes the structure of the Src catalytic domain as determined with a FRET sensor, disrupts certain SH3 domain interactions, causes changes in kinase intracellular dynamicity, and promotes cell invasiveness. Based on the behavior of the phosphomimetic mutant, the idea that constitutive phosphorylation of Y90 could have all of these effects is well-supported by the data. However, in wild-type cells or cells transformed by activated forms of Src, there is no constitutive phosphorylation of this site. Therefore, the question remains whether Y90 phosphorylation occurs to any significant extent in cells, and the data suggesting that it could do so is limited. It also remains to be conclusively established whether Y90 phosphorylation occurs via autophosphorylation.

      Major comments:

      1) Y90 was identified as a site of phosphorylation in Luo et al. It would be helpful if more information were provided about its significance relative to other sites identified in that study. Was it detected in non-transformed cells? Was it a major site? How does it relate to Y416 in abundance? The reference to the identification of the site in a different study from the White lab is made in the discussion but not in the introduction (this should be corrected). How abundant was it that study? A fuller description of its detection would strengthen the rationale for this study. Any additional phosphoproteomics studies that identified it should also be included.

      As indicated in the manuscript (Figure 3C and new 3D), the amount of Y90 phosphorylation increases with the level of Src activation. Standard proteomic/phosphoproteomic data cannot be quantified in absolute values for technical reasons, only relative quantification is possible to some extent. To overcome this issue and address the question of the amount of Y90 phosphorylation, we newly prepared the corresponding stable isotope-labeled phosphopeptides and used them as internal standards. To the best of our knowledge, this allowed us to quantify for the first time the amount of specific tyrosine phosphorylation of Src kinase in cells. We found that in case of WT Src, the major phosphorylation site localized in the activation loop of the kinase domain, Y416, is phosphorylated in 22 % of molecules. In activated Src, this pool of Y416-phosphorylated molecules increases 2,5 times to 57 %. Y90 is phosphorylated in approximately 1 % of WT Src molecules but becomes 5 times more abundant in case of the activated kinase (5,3 % of phosphorylated molecules). This newly added data of absolute Src tyrosine phosphorylation (Figure 3D) is consistent with values we obtained from relative MS quantification of Src variants differing in catalytic activity (Figure 3C). Although the enrichment of Y90 phosphorylation in the catalytically active kinase is lower compared to Y416 phosphorylation in terms of percentage of phosphorylated molecules, it’s increment with respect to the basal state is significantly higher. We believe that this broader dynamic range of Y90 phosphorylation is in agreement with the demonstrated regulatory function of Y90 phosphorylation. We incorporated these new results and methodological approach into the revised manuscript. We also extended the original description of the MS protocol to include a description of relative quantification, which was included in the original manuscript.

      Phosphorylation of Y90 was only detected in Luo et al. and Johnson et al. phosphoproteomic screens. However, phosphorylation of tyrosines homologous to Src Y90 was described in a vast number of proteins. Some of them are mentioned in the discussion e.g., Btk, Abl, p130Cas or Src family kinases Yes and Fyn. The presence of phosphorylation on homologous tyrosines and the evolutionary conserved nature of Y90 in SH3 domains supports relevance of Src Y90 phosphorylation despite the small number of studies that were able to identify it. In our opinion, this can be attributed to its low abundance in the basal state and the technical difficulties of its detection, as discussed below in response to point 2.

      We emphasize the Luo et al. study in the introduction because it was the only study reporting Y90 phosphorylation at the time of the project’s initiation and led us to study Y90 further. Both studies are then mentioned in the discussion, which we believe is appropriate and sufficient.

      2) Related to point 1, is there evidence from the literature indicating a significant site of phosphorylation in Src has been overlooked? Or, was this site only identified because of the recent advances in MS technology and increased sensitivity of this methodology? An introduction to these points would also enhance the rationale for the study.

      In the manuscript discussion, we mention an early study (Erpel et al., 1995) which mapped conserved residues within the binding surface of the Src SH3 domain. It showed that mutation of Y90 to alanine led to partially deregulated Src and reduced affinity of the SH3 domain. Although they acknowledged the importance of Y90 for SH3 domain binding ability, they did not probe or discuss the effect of Y90 phosphorylation status. Furthermore, the level of Src Y90 phosphorylation in untransformed cells is relatively low (20-fold lower than Y416 phosphorylation). It is therefore not surprising that it has not been identified in most general phosphoproteomic studies performed on untransformed cells. In fact, in many of these studies, Y416 phosphorylation was not detected either. The detection of Y90 phosphorylation by Luo et al. likely reflects the fact that it was performed in Src527F-transformed cells, similarly Johnson et al. used HGF-activated cells. Last, we also cannot exclude that the tryptic peptides with Y90/pY90 are less detectable in MS depending on the experimental conditions. In fact, the "heavy" Y90 peptide was consistently much less (10-80 times less) detectable in our hands than the Y416 peptide. This could be because of its worse ionizability, stability in vacuum or some other technical reasons.

      In our approach, we used immunoprecipitated Src molecules to maximize the amount of Src in the sample and targeted MS, which allowed us to specifically detect even low abundant ions/peptides. This represented the critical technical approach that allowed us to consistently detect Y90 phosphorylation in untransformed cells.

      3) The explanation of the MS experiment designed to show that Y90 phosphorylation happens in cells is insufficient in the text. It is not clear why the SYF cells were not used and not clear why the FRET sensor constructs were used. It is also not clear whether or how the proteins were purified before MS analysis. Also, rather than showing the MS data as a relative level, it would be preferable to provide the number of spectra obtained for each peptide/phosphopeptide and compare this also to Y416. A fuller comparison between the phosphorylation of Y90 to that of Y416 is necessary in order to place the potential Y90-mediated phosphoregulation in context.

      We are sorry for the confusing description. With the new quantification data, we have rewritten this section and hopefully made it clearer. We kept the original relative quantification data as they nicely show that abundance of Y90 phosphorylation increases with enhanced activity of Src. However, we added new MS analysis of Src tyrosine phosphorylation performed with labeled peptides as internal standards that provides absolute numbers of Y416 and Y90 phosphorylation in cells. The new <br /> measurements confirm the original data showing increased Y90 phosphorylation in activated Src variants and suggest that Y90 phosphorylation is not a rare event but represents an important regulatory element in Src activation. Our approach of MS quantification of phosphorylation events using labeled peptides as standards, allowed us, to the best of our knowledge, for the first time, to measure absolute quantities of Y416 and Y90 phosphorylation and therefore also the amount of activated Src molecules in cells.

      For technical reasons, the SrcFRET biosensor was used in all these experiments. We attempted to analyze endogenous Src in several cell lines to assess its Y90 phosphorylation. However, in our hands, the amount of Src efficiently precipitated was never sufficient to detect the "very elusive" phosphopeptide containing Y90. We believe this was not caused by low amounts of Src in the cells, <br /> but rather because the anti-Src antibody performed much worse than the anti-GFP antibody used for SrcFRET biosensor (two high affinity epitopes) immunoprecipitation. We have previously shown that the SrcFRET biosensor functions in the same way as endogenous Src (Koudelková et al., 2019), and therefore we presume that it is phosphorylated in a similar manner and rate as endogenous Src.

      4) I would like to see conclusive evidence that Y90 phosphorylation is due to autophosphorylation. This would involve relatively simple experiments. As one possibility, an IP kinase assay followed by immunoblotting with a site-specific antibody or MS or other types of phosphopeptide visualization/identification.

      We further addressed the question of Y90 autophosphorylation using a kinase dead version of Src527F bearing K295M substitution. To quantify the amount of phosphorylated Src we applied the identical approach with labeled standards and measured phosphorylation levels of Y416 and Y90. Compared to Src WT and Src527F, phosphorylation of both tyrosines in the kinase dead variant was negligible despite the presence of endogenous Src and other SFKs in the U2OS cells we used for the experiments. These results suggest that phosphorylation of Y90 does indeed depend on the intrinsic kinase activity of Src and is therefore very likely autophosphorylation.

      We have tried to address the question of Src autophosphorylation on Y90 by analyzing the level of Y90 phosphorylation in cells expressing a kinase-inactive SrcFRET construct with open conformation (527FKD) by quantitative MS. Despite the open conformation, SrcFRET527F-KD did not display any significant phosphorylation of neither Y90 nor Y416, even though we used U2OS cells which express endogenous Src and other SFKs. These results suggest that phosphorylation of Y90 depends on catalytic activity of the kinase rather than on compactness of its conformation and is therefore very likely autophosphorylation.

      5) A few other mutations would be interesting to examine in both kinase and transformation assays for comparison to the mutants that were: Y527F Y416F; Y527F Y416F Y90E. The first is a low activity control and the second is for understanding whether Y90E could overcome the lack of Y416 phosphorylation.

      Due to lack of time, we did not perform these experiments. However, we analyzed our new kinasedead 527F mutant for FRET and found that despite its inactive kinase domain and lack of Y416 phosphorylation, it still retains an open conformation. We believe that this is a strong indication that the Y90E kinase-dead mutant would behave the same way, maintaining an open conformation albeit the kinase domain is inactive.

      6) I recommend that the results are discussed in a more circumspect manner. The results presented in Figure 7 on the double mutant, Y527F Y90F, suggest that phosphorylation of Y90 is not a very significant component of Src kinase regulation, at least in these biological contexts. That Y90 phosphorylation isn't a major regulatory factor does not diminish the value of the work describing Y90 phosphorylation. However, it does alter the interpretations. I encourage a more conservative discussion of its importance and a broader discussion of why it isn't a major site of Src phosphorylation, particularly if it is due to autophosphorylation.

      We believe that given our new quantifications showing that Y90 phosphorylation is indeed considerably present and utilized in cells, the original discussion is consistent with the new data and does not need to be changed.

      Reviewer #2 (Public Review):

      The manuscript "Phosphorylation of tyrosine 90 in SH3 domain is a new regulatory switch controlling Src kinase" describes efforts to understand how phosphorylation of tyrosine (Y90) in the SH3 domain of Src affects the activity and function of this multi-domain kinase. The authors find that an Src variant containing a phospho-mimetic mutation (Glu) at position 90 demonstrates elevated activation levels in lysates and cells (Figure 1) and adopts a less compact autoinhibited conformation within the context of a SrcFRET biosensor in lysates (Figures 3A, 3B). A series of pulldown experiments with an isolated SH3 domain (Figure 2A, 2B) or full-length Src (Figure 2C, 2D) that contain the phospho-mimetic Y90E mutation demonstrates that phosphorylation of Tyr90 would likely disrupt the interaction of Src's SH3 domain with intermolecular binding partners and the linker that couples SH2 domain/C-tail binding to autoinhibition, which provides a mechanistic basis for the observed elevated kinase activity of Src Y90E. By performing a series of imaging experiments with a SrcFRET biosensor, the authors show that the Y90E mutation does not show enhanced localization at focal adhesions like a hyperactivated Src mutant (Y527F) that contains a non-phosphorylatable C-tail (Figure 4A). However, using ImFCS combined with TIRF microscopy (Figure 4B), the authors demonstrate that Src Y90E shows similarly reduced mobility (relative to the WT SrcFRET biosensor) at the plasma membrane (especially at focal adhesions) as Src Y527F. Consistent with the elevated kinase activity of Src Y90E, the authors go on to demonstrate that the Src Y90E variant shows an ability to transform fibroblasts-at levels that are intermediate between wild-type Src and the hyperactive Src mutant Y527F (Figure 5). Similarly, Src Y90E confers an intermediate level (between wild-type Src and Src Y527F) of invasiveness and ability to form spheroids. Together, these comprehensive experiments with a Y90 phospho-mimetic strongly support a model where phosphorylation of Src's SH3 domain at Tyr90 would lead to a more intramolecularly disengaged SH3 regulatory domain and enhanced kinase activity in cells.

      Most of the conclusions in this paper are well supported by solid data, but confidence in several assays would be higher if additional technical detail or controls were provided and the biological significance of these findings would be higher if the role that Y90 phosphorylation plays in Src regulation and function were better delineated.

      1) The kinase activity assays in Figures 1C,1D, and 7A need to be scaled to the Src variant levels present in the lysate (quantification of relative Src levels is not provided).

      For kinase activity measurements, we used lysates of equal protein concentrations prepared from cell lines stably expressing Src variants. These cell lines were sorted and repeatedly tested for equal expression of Src constructs using immunodetection of Src on Western blots. We corrected the <br /> methods section and added this information to the description of kinase assays experimental setup.

      2) More details are required for the experiments quantifying Y90 phosphorylation levels in Figure 3C. The experimental states that equal amounts of IP'd proteins were used for these analyses but there are no details on how this was confirmed. In addition, the experimental states that normalized intensities were used for your quantifying the Y90 phospho-peptide but no details are provided on how normalization was performed (the legend states that a base peptide was used but it is unclear what this means).

      The paragraph on mass spectrometry analysis in the Materials and Methods section has been updated with the required information.

      3) A key question is whether Y90 phosphorylation serves a regulatory role in Src's cellular activity and, if so, what is the regulatory network that mediates this phospho-event. Using a mass spectrometry readout with three Src variants (wild type vs. Y527F vs. E381G) that possess differing kinase activities, the authors demonstrate that Y90 phosphorylation levels correlate to Src's kinase activity (Figure 3C), which they suggest is an indication that this residue is an autophosphorylation site (or phosphorylated by another Src family kinase). However, as Src's kinase activity correlates with SH3 domain disengagement (which leads to a more accessible Y90), it is also entirely possible that another tyrosine kinase is responsible for this phosphorylation event. More importantly, it is unclear under which signaling regime Y90 phosphorylation would play a significant regulatory role. This phospho-event was observed in a previous phospho-proteomic study but it is unclear whether the phosphorylation levels of this site occur high enough stoichiometry to modulate the intracellular function of Src and whether there is a regulatory signaling network that influences Y90 phosphorylation levels.

      We have tried to address the question of Src autophosphorylation on Y90 by analyzing the level of Y90 phosphorylation in cells expressing a kinase-inactive SrcFRET construct with open conformation (527F-KD) by quantitative MS. Despite the open conformation, SrcFRET527F-KD did not display any significant phosphorylation of neither Y90 nor Y416, even though we used U2OS cells which express endogenous Src and other SFKs. These results suggest that phosphorylation of Y90 depends on catalytic activity of the kinase rather than on compactness of its conformation and is therefore very likely autophosphorylation.

      To further support our data on relevance of Y90 phosphorylation in cells, we performed a new MS analysis of Y90 and Y416 phosphorylation in WT and activated Src. This time we used corresponding stable isotope-labeled peptides and phosphopeptides as internal standards for MS quantification. This allowed us to measure absolute amounts of phosphorylated molecules and changes in their numbers, which is information that cannot be acquired by standard biochemical or proteomic approaches and is usually lacking for the majority of known phosphorylation sites. We found that in case of WT Src, the major phosphorylation site localized in the activation loop of the kinase domain, Y416, is phosphorylated in 22,6 % of molecules. In activated Src, this pool of Y416-phosphorylated molecules increases 2,5 times to 57 %. Y90 is phosphorylated in approximately 1 % of WT Src molecules but becomes 5,1 times more abundant in case of the activated kinase (5,3 % of phosphorylated molecules). This newly added data of absolute Src tyrosine phosphorylation (Figure 3D) is consistent with values we obtained from relative MS quantification of Src variants differing in catalytic activity (Figure 3C). Although the enrichment of Y90 phosphorylation in the catalytically active kinase is lower compared to Y416 phosphorylation in terms of percentage of phosphorylated molecules, it’s increment with respect to the basal state is significantly higher. We believe that this broader dynamic range of Y90 phosphorylation is consistent with and reflects the demonstrated regulatory function of Y90 phosphorylation. We incorporated these new results and methodological approach into the revised manuscript.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript focuses on a set of neurons from the border between the central and medial amygdala (AMGc/m-PAG ) that project to neurons in the periaqueductal gray (PAG) that gate ultrasonic vocalizations (USVs). These neurons suppress vocal production and are active in contexts where vocalizations would be inappropriate (e.g. in the presence of predator cues, or aggressive encounters with conspecifics). They then further characterized these neurons, demonstrating that like in males, these neurons are GABAergic in females and in both sexes, half of these neurons express estrogen receptor alpha (Esr1). To examine the inputs into these neurons, the authors performed monosynaptically-restricted transsynaptic rabies tracing and identified numerous cortical and subcortical projections. Of particular interest, neurons from the preoptic area of the hypothalamus (POA) in addition to terminating on PAG-USV neurons also project to AMGc/m-PAG neurons. Imaging the terminals of these neurons revealed elevated activity during vocalization-promoting contexts and optogenetically stimulating them resulted in evoking USVs. Together, these experiments further identify and quantify a circuit incorporating external factors (e.g. predatory factors, social interactions) in the drive to produce vocalizations.

      The authors are commended for use of male and female mice, demonstrating that even though they produce USVs in different social contexts, AMGc/m-PAG neurons share a function in suppressing USV production in both sexes. They do this convincingly with a variety of methodologies while incorporating appropriate controls (e.g. light-only and GFP-control in optogenetic experiments). The experiments are performed in a logical order and the data generated is elaborate.

      We appreciate the reviewer’s commendations and for their appreciation of the convincing insights provided by our study. We provide detailed responses to their recommendations in the following section. We hope the reviewer finds these revisions satisfactorily address their concerns.

      Reviewer #2 (Public Review):

      The existence of PAG-USV-producing neurons has been recently established, alongside two independent pathways, POA->PAG, and AMG->PAG, that promote and inhibit the production of ultrasound vocalizations in female and male mice, respectively. Because vocalizations can be modulated in a variety of contexts, such as in the presence of a predator, the authors first show that the AMG->PAG pathway is activated in situations where mice stop vocalizing, such as in the presence of a predator or aggressive conspecifics, and can inhibit natural vocalizations in contexts where females vocalize (extending to their previous findings in male mice). Interestingly, AMG->PAG neurons also receive input from POA neurons that are known to promote vocalizations via their connection to PAG interneurons that inhibit PAG-USV-producing neurons. This POA->AMG and PAG pathway is inhibitory and therefore its capacity to promote vocalizations via these two parallel pathways might be achieved by its inhibition of AMG and PAG neurons that inhibit the PAG-USV producing neurons. While these results hint at possible mechanisms that could underlie the hierarchical control of vocalization, and how different external signals impinge on existing pathways to produce behavior flexibility, the study is missing important elements to draw such conclusions. Overall, the study is also missing important information on how experiments were performed.

      We appreciate the reviewer’s efforts to evaluate our manuscript and provide constructive feedback. In the following section, we have responded to all the reviewer’s comments and concerns and provide all but one of the previously missing elements and information. We also maintain that the results and additional analysis we provide in this manuscript go beyond merely hinting at possible mechanisms, and instead provide explicit synaptic mechanisms by which vocal-promoting and vocal suppressing signals interact in the mouse’s brain.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors worked towards a better understanding of the functional diversification of flavodoxins among diatoms, and this represents a quantum contribution building on the initial findings of Whitney, Lins, Hughes, Wells, Chappelle, and Jenkins (2011), with the inclusion of metatranscriptomic and other data from field collections and on-deck incubation experiments, relatively new genomic and transcriptomic datasets, and the adoption of reverse genetics tools that are not yet widely used in T. pseudonana. They hypothesize that clade I flavodoxins play a role in mitigating oxidative stress, while additional clade II flavodoxins would respond according to canon, in response to low iron availability.

      The authors embarked on several field campaigns across environmental gradients where iron-responsive and oxidative stress-responsive flavodoxins were expected to show differential expression. The use of metatranscriptomics allowed taxa-specific assignment of relative transcript expression levels, and the results of both measurements across the environmental gradient and manipulative incubation experiments show the widespread taxonomic distribution of iron-responsive clade II flavodoxin. The fieldwork was well thought out, and biogeochemical trends comported to expectations. It's worth noting that the concomitant inclusion of geochemical data such as dissolved iron further strengthened the work. The authors also found clade I flavodoxins were not iron-responsive (as expected), but rather exhibited diel patterns in transcript abundance that suggest responses to photo-oxidative stress. Taken together, these field data are stunning.

      We thank the reviewer for this kind assessment.

      Lab experiments with five diatom species grown under varied iron and induced oxidative (H2O2) stress and transcript abundances for flavodoxin genes are reported. One reservation concerns the untoward and unknown effects of inducing outright iron starvation with the strong chelator, DFB (as opposed to achieving steady-state growth rate limitation from low iron by use of weak chelators such as EDTA). With DFB it is also difficult to predict sample timing (when cells have hit that "correct" and reproducible iron-limited space) when independent replicates are collected on different dates. Similarly, the use of DFB also makes it difficult to sample low and high iron cells at the same density or to maintain densities among replicate samples collected on different dates. pH and CO2 availability change with density unless special measures are taken.

      We agree with the reviewer that DFB is a strong iron chelator that may affect diatom physiology in inadvertent ways. We designed the DFB experiments to allow us to screen multiple diatoms for whether they transcribed clade I and II flavodoxins in a short-term response to iron limitation.

      We added the logic behind this experimental design (L177-179):

      “In order to screen multiple diatoms for whether they transcribed clade I and II flavodoxins in response to iron limitation, we used the strong iron-chelator desferrioxamine B (DFB) and enhanced short-term iron limitation.”

      Additionally, we now discuss the possible effect of DFB in our discussion (L395-410):

      “Notably, we used the strong iron chelator DFB to enhance iron limitation in a variety of diatoms, as previously described (Andrew et al., 2019; Kranzler et al., 2021; Lampe et al., 2018; Timmermans et al., 2001; Wells, 1999), while recognizing that undesirable effects of DFB, that are not related to iron-limitation per se cannot be ruled out. Here, DFB was used in experiments designed to test whether transcription of the two flavodoxin clades differentially responded to iron limitation. The results from T. oceanica, and T. pseudonana agree with the literature, in which DFB was not added. In T. oceanica only the expression of one clade II flavodoxin was induced (Figure 2B-C, as in Lommer et al., 2012). The minor induction in mRNA of T. pseudonana clade I flavodoxin in response to iron limitation was detected in both long- and short-term adaptation to low iron, without added DFB (Goldman et al., 2019; Thamatrakoln et al., 2012). This flavodoxin seems to have diel regulation, and the observed induction might be specific to the circadian time and the setting of the diel cycle (Goldman et al., 2019).”

      Based on the reviewer comments, we realized that our transcriptome sampling protocol was not clear. Because the diatom species have different growth rates, as well as different rates of growth-inhibition by iron limitation, we adjusted the sampling day for each species based on cell counts and photosynthetic efficiency. Importantly, the 9 samples (triplicates of 3 conditions) of each species were sampled together, at the same date and time. We also ensured that the biological replicates of each species and treatment had similar cell density at the time of harvest.

      We clarified these concerns in the Results section (L188-206):

      “For each diatom 6 replicates were grown in iron-replete conditions and 3 replicates in iron-limiting conditions until the low-iron cultures displayed a decrease in maximum photochemical yield of photosystem II (Fv/Fm), 3-6 days (depending on species, Figure 2 -figure supplement 1A-C, Figure 2A, supplementary file 1c), indicative of iron limitation, at which point transcriptome samples were collected for both the iron-limited and iron-replete conditions. Three of the iron-replete replicates were exposed to oxidative stress, mimicked by a lethal dose of H2O2, and transcriptome samples were collected about 1.5 h after exposure, when the cell phenotype (Fv/Fm or cell abundance) was unaltered from control.”

      In the Materials and Methods section (L542-545)

      "Cells were harvested by filtration onto 0.22 µm filters. Full details of the number of cells harvested per treatments, per species, and samples that failed library preparation are indicated in supplementary file 1c. The 9 samples of each diatom species were sampled together, at the same date and time. Filters were snap-frozen…”

      A second set of lab experiments involved the (non-trivial) establishment and use of "knock out" clones of the clade I flavodoxin gene in the model diatom T. pseudonana to test the oxidative stress hypothesis. This is an exciting idea and the data suggest this flavodoxin may confer resistance to oxidative stress. The conclusion would be greatly strengthened if different phenotypes could be observed between WT and KO clones in response to environmentally relevant oxidative stress (such as supra-optimal irradiance), rather than exogenous H2O2 addition.

      Based on the reviewer suggestion, we conducted a preliminary experiment with irradiation of up to 500 µE. As with the light level originally tested, there were no differences in growth rate or Fv/Fm between the WT and KO lines. We agree that future study of these knock-lines a series of much higher irradiation levels, photosynthetic-inhibitors, and other environmental stresses is interesting, but it is out of the scope of the current study.

      We now also mention this in the revised manuscript (L417-419):

      “Future studies in which the oxidative stress is driven by other environmental conditions as supra-optimal irradiation, UV radiation or biotic interactions are needed to further support the role of clade I flavodoxins in oxidative stress.”

      We clarify that our use of exogenous H2O2 additions was based on previous studies with Phaeodactylum and T. pseudonana that indicate that exogenous addition of micromolar range of H2O2 is representative for other oxidative stress-responses (Graff van Creveld, 2015, Volpert 2018, Mizrachi 2019) (L185-188):

      “Oxidative stress was induced by the lowest lethal dose of H2O2 (200-250 µM), as similar treatment was shown to be representative to other environmentally-relevant oxidative stressors in T. pseudonana and Phaeodactylum (Graff van Creveld et al., 2015; Mizrachi et al., 2019; Volpert et al., 2018).”

      The relationship between the experimental conditions and results in Figure 3C and Supplemental Figure 3H was not clear.

      Figure 3C summarize parts of Figure S3H information, Figure S3D-I present the individual clones, while Figure 3 only shows WT vs Flav-KO.

      According to the reviewer comments, we modified Figure S3H (it is now Figure S3I), and specify this relationship in the legend:

      “H-I. Percentage of Sytox Green-positive (dead) cells, measured by flow cytometry 24 h after treatment with H2O2 treatment. Orange and gray box plots represent a Flav-KO and WT respectively, single measurements are marked, color-coded by the individual colonies. H. Results of a single dose-response experiment. I. Results from additional experiments, experiments marked with an asterisk are summarized in main Figure 3C.”

      In the introduction, the authors suggest that Fe-S-containing proteins are particularly sensitive to damage via oxygen and ROS and that reliance on ferredoxin (Fd) for electron shuttling carries an enhanced sensitivity to the ROS generated during photosynthesis. References would be helpful here. Fe-S cluster-containing proteins are not monolithic regarding their behavior or susceptibility towards ROS. My limited understanding is that (i) several 4Fe-4S cluster proteins (such as aconitase, isopropylmalate isomerase) are particularly sensitive but that (ii) this is less so for canonical 2Fe-2S cluster ferredoxins; (iii) in some phototrophs Fd catalyzes the reduction of molecular oxygen to superoxide, as part of a mechanism that keeps the electron transport chain less reduced under extremely high light. Thus, ferredoxins may not necessarily be susceptible to in vivo ROS-mediated damage.

      Thank you for these comments.

      We modified our original sentence (L37-39):

      “Moreover, iron-sulfur-containing proteins are particularly sensitive to damage via oxygen and reactive oxygen species (ROS).”

      Corrected sentence:

      “Moreover, iron containing proteins are sensitive to damage via oxygen and reactive oxygen species (ROS), and Fd is down-regulated in response to oxidative stress (Singh et al., 2010, 2004).”

      Reviewer #2 (Public Review):

      In their manuscript, Van Creveld et al. set out to demonstrate divergent functions for two clades of flavodoxin in diatoms. To achieve their goals, the authors combined metatranscriptomic results originating from three separate research cruises in the North Pacific Ocean with laboratory experiments with a clade I flavodoxin knock-out mutant in the diatom Thalassiosira pseudonana. Overall, their field study confirmed that Clade II flavodoxin is mostly up-regulated under iron limitation in most diatoms that were represented in their metatranscriptomic data (Figure 5 A-F). Their field study also demonstrated that clade I flavodoxin is expressed at levels that are several orders of magnitude lower than clade II flavodoxin (figure 5H). The lower expression of clade I flavodoxin was also observed in laboratory culture experiments (Figure 2). The laboratory experiments also demonstrated that the clade I flavodoxins were responsive to iron limitation in some of the species studied (Their Figure 2C), such that the assignment of function based solely on the clade I and clade II flavodoxin classification may not always be straight forward, and that exceptions will likely be found as more diatom species are studied.

      In their quest to determine whether Clade I flavodoxin plays a role in adaptation to oxidative stress, the authors created several knock-out mutants where the clade I flavodoxin is not functional. These mutant strains responded to iron limitation in the same way as the WT strains. However, the mutant strains defective in the clade I flavodoxin were more slightly more sensitive to oxidative stress (created by exposure to lethal doses of hydrogen peroxide) than the wild-type strains. The results of the oxidative stress challenges would have been stronger if a broader concentration range of hydrogen peroxide had been used in the experiments leading to a dose-response curve for both the mutant and wild-type strains.

      Thank you for this suggestion. We now tested a broader range of H2O2 concentrations on the WT and KO strains and added a new Figure S3H, which includes responses to 0, 25, 50, 75, 100, 150, 200, 250 µM H2O2.

      The supplemental information provided in the main manuscript holds a lot of important information. Take for example Figure S4 showing the placement of reads for Clade I and Clade II in a Maximum-likelihood tree for flavodoxin in the North Pacific Ocean. The results show that clade II flavodoxin is much more commonly found in the transcripts than clade I flavodoxin.

      Perhaps different results would have been obtained by conducting a similar sampling of metatranscriptome in the Atlantic Ocean that is less subject to iron limitation.

      We agree completely and would love to analyze metatranscriptomes from the Atlantic Ocean in the future.

      Overall, the authors have provided results that support a role for Clade I flavodoxin in alleviating oxidative stress in Thalassiosira pseudonana, however, whether or not this role is universal for clade I flavodoxin in other diatom species will require further studies.

      We agree with this assessment that additional experiments with additional diatoms is a fruitful research area into the future.

    1. Author Response

      Reviewer #1 (Public Review):

      In their study Mas Sandoval and colleagues estimate, from human genomic data, two important parameters that measure how intermarriages have been affected by social stratification in the Americas: sex-biased admixture (SB), which refers to sex differences in the chances to intermarry with another ethnic group, and ancestry-based assortative mating (AM), which refers to the higher probability of partners to intermarry when they carry similar genetic ancestries. To do so, the authors train a deep neural network (DNN) with simulations of admixture with non-random mating and use ancestry tract length distributions to infer the two parameters. They show that their approach estimates SB and AM parameters with a relatively good accuracy in a number of scenarios. When applying the DNN to empirical data, they find solid evidence that social stratification has constrained the admixture processes in the Americas for the last centuries.

      In contrast with the vast majority of population genetic studies, which assume random mating, this study assesses if mating has been random or not in American populations. Furthermore, the study is very valuable because it leverages, for the first time, a deep learning approach and local ancestry inference to co-estimate the extent of SB and AM from genomic data.

      One limitation of the study, however, is that it assumes that (i) the admixture date in the simulations is known and equals 19 generations and (ii) admixture started at the same time in all admixed American populations. The authors also implicitly assume that the variance of the difference between male and female ancestry proportions only depends on AM, and not admixture timing. This may be problematic, as it has been shown that linkage disequilibrium between local ancestry tracts depends both on AM and admixture timing (Zaitlen et al., Genetics 2017).

      To clarify the assumption of fixed admixture date, we have added the following sentence in the results section (line 170) where the model is firstly described: “In both models we assume a continuous admixture process that starts 19 generations ago, knowing that the populations analysed trace the first contact of Native American and European populations in the first half of 16th century and assuming a generation time of 26 .9 years (Wang et al., 2023). In contrast with the approaches that aim to find an admixture date assuming random mating, we assume that the admixture process starts with the contact, and it is continuous and modulated with the mating parameters.”

      We thank the reviewer for such an important reference we had not included in our manuscript, whose findings support the basis of our approach. It is now included on line 70 to justify the analysis of the length of the ancestry tracts: Herein, we argue that the tract length information can measure the non-randomness of mating associated with genetic ancestry and, therefore, it can also monitor the permeability of socioeconomic and cultural barriers between subpopulations with different genetic ancestries (Zaitlen et al., 2017)

      This is also suggested by the authors' results, showing that AM estimates are much lower in admixed Americans under the two-pulse model, relative to the one-pulse model, i.e., when admixture extends over time. Estimates of AM in admixed Americans may thus be biased, if admixture actually started less (or more) than 19 generations ago.

      We evaluated the resemblance of the footprints left by either assortative mating or gene flow, by testing how a neural network trained on models with gene flow due to a second migration pulse predicts migration size on data generated by models without a second migration pulse but assortative mating only . We then tested how neural networks trained on models with assortative mating detect assortative mating from data with no assortative mating but only migration. Results are summarised in Figures 4 – supplement 1 – supplement 2 and show a strong correlation of the predicted size of the second migration pulse and the simulated level of assortative mating. Parallelly, there is also a strong correlation between the predicted assortative mating level and the size of the second migration pulse. Below, we respond to the reviewers in more detail regarding this question.

      Another potential limitation concerns local ancestry inference. The authors assume that RFMix makes no errors when inferring ancestry tracts. This can be a concern, as recent studies have shown that RFMix has reduced accuracy compared to other methods (Hilmarsson et al., bioRxiv 2022).

      In response to this comment, we performed a local ancestry analysis with Gnomix and generated the tract length profile according to the results obtained. One possible issue shared by Gnomix and RFMix is that they may infer a higher fraction of short tracts (at the expense of breaking longer ones). This issue was reported by Gravel et al. (2012). In this study, authors decided to filter out the short tracts because these tracts showed a high rate of false positives and false negatives. Therefore, we conducted an experiment to test if filtering out the shortest tract length window (i) improve the accuracy of the predictions of the simulated test data through the Mean Squared Error (MSE), ii) decrease the uncertainty of the estimations, and (iii) increase the correlation between Gnomix and RFMix-based estimates through the generalised variance.

      We also tested a modification of the tract lengths profile by dividing (or not) the tract lengths profile by the total amount of tracts in either the Autosomes or the X chromosome. Our goal was to force the neural network to focus on the profile shape rather than on the absolute value of tracts at each window to mitigate the possible bias in the tract length profile. Our experimental set-up consisted of three combinations of modifications of the tract length profile, in addition to the non-modified one.

      In Figures 4 supplement 3 – supplement 7, we show the predicted mating parameters using the modifications of the tract length profile outcoming from the local ancestry inference. Each point represents a prediction using RFMix and Gnomix tract length profiles (x and y axis, respectively) as input for each of the 1000 trained neural networks with the same architecture. We evaluated the uncertainty of the estimations for both Gnomix and RFMix and the correlation between them through the Generalised Variance. The Generalised Variance is the determinant of the covariance matrix, which increases with low values of covariance of the bivariate distribution and high values of the respective variances.The estimations of the parameters based on the tract length profile normalised by dividing by the total amount of fragments in Autosomes or X chromosome had both low values of Generalised Variances in the Gnomix-RFmix bivariate distribution of predicted parameters and low values of MSE in the prediction of simulated test data. These results indicate that by normalising the tract length profile by the total amount of fragments, the distribution is still informative and less sensitive to possible biases introduced by errors in the local ancestry analysis .

      Therefore, we present the results obtained from this RFMix profile in the main figures and tables, while showing the other predictions in the supplementary figures.

      In addition, the authors do not report a measure of uncertainty for the estimation of SB and AM, which is another important weakness. Interpretation of parameter estimates is limited if no measures of uncertainty are provided.

      We now provide the 95% CI for each parameter obtained from the distribution of predicted parameters from the 1000 trained neural networks, for both RFMix and Gnomix for the tract length profile.

      Finally, the authors compare the likelihood of two competing models, assuming a single or two admixture pulses, but do not determine the accuracy of their model choice procedure.

      We now include the confidence intervals of the composite likelihood by replicating the test for each of the 1000 bootstrapped tract length profiles for each population. None of the 95% confidence intervals includes both negative and positive results and all of them support either the one pulse or the two pulses model, except for the sub-Saharan ancestry in the Columbian (CLM) population.

      Overall, besides these methodological limitations, I expect that the study by Mas Sandoval and colleagues could be of great and broad interest for the scientific community studying population genetics, anthropology, sociology and history.

      Reviewer #2 (Public Review):

      This paper introduces a method to quantify how genetic ancestry drives non-random mating in admixed populations. Admixed American populations are structured by racial, gender, and class hierarchies. This has the potential to cause both ancestry-related assortative mating, in which the ancestry of mates tends to be correlated, and ancestry-related sex bias, in which individuals have a preference for mates with a particular ancestry composition. By applying their method to several African American and Latin American populations, Sandoval et al. further our understanding of ancestry-based population structure in this region more broadly.

      Strengths

      As many others have recently done, Sandoval et al. leverage the ability of a neural network to predict demographic parameters from high-dimensional population genomic data. Sandoval et al. first develop a clever probabilistic model of mating by defining the probability of a male and female mating as a function of the difference in ancestry between the individuals. They use this model to simulate population genomic data under various demographic scenarios, and then train a neural network on these simulated data. Finally, they apply the neural network to empirical data and learn the parameters of the underlying probability distribution, which can be related back to assortative mating and sex bias.

      One clear strength of this paper is their ability to jointly assess assortative mating and sex bias, as well as their ability to apply their model to multiple contemporary admixed populations.

      Importantly, the authors couch their results in an intersectional understanding of populations and consistently refer to research from historians and other social scientists throughout their paper, which reflects a very thoughtful awareness of the interdisciplinary nature of this research.

      Weaknesses

      The definition of assortative mating is conceptually confusing - in the text, assortative mating is introduced as genetic similarity between mates, i.e. positive assortative mating. However, based on the definition of assortative mating in their model, a population can have high assortative mating for a particular ancestry component even when there is non-zero sex bias for that component (e.g. males with low Native American ancestry are more likely to mate with females with high Native American ancestry). Fundamentally, this scenario cannot reflect positive assortative mating; rather, it reflects negative assortative mating (i.e. there is structured genetic dissimilarity between mates). However, the authors do not discuss the fact that the interpretation of the assortative mating parameter changes with the value of the sex bias parameter.

      We acknowledge that our definition of assortative mating requires more clarity. We now define it on line 155 as: The AM parameter measures the non-randomness of mating associated to a genetic ancestry. This includes both positive assortative mating -genetic similarity between mates- (when SB is zero) and negative assortative mating -genetic dissimilarity between mates- (when SB is not zero). This approach allows accounting for the male-female way of negative assortative mating through SB parameter.

      In addition, the results of the inference in ASW are difficult to interpret. They find that males of high African ancestry are more likely to mate with females of low African ancestry. This result seems counterintuitive given the body of literature that suggests sex-biased admixture in African Americans has greater male European and female African contributions. The authors do not suggest potential explanations for this observation.

      We agree that results regarding the ASW population can be confusing. Our hypothesis to explain such results is that the sex bias parameter captures both sex-biased migrations and sex-biased admixture. Therefore, it is difficult to accommodate the complex genetic history of ASW. We have extended the discussion on this aspect as follows on line 380:

      In addition, African American populations might have a complex genetic history involving on one hand male-biased sub-Saharan migration and on the other hand an admixture femalebiased in the sub-Saharan ancestry. However, our current model can only accommodate this demographic scenario with a single sex-bias parameter, and the results regarding this population should be interpreted with caution.

      Lastly, the authors have not done any simulations to assess how accurate parameter estimates are if the demographic model is misspecified, which weakens the interpretability of the results.

      We have performed a new analysis where we vary AM to generate tract length profiles to predict GFR, and viceversa. The results of this analysis are shown in the new figure 4Supplement 1. Results show how the footprint in the genome of the admixing populations of assortative mating and multiple pulse migration is similar. In the discussion we argue that both One Pulse and Two pulse models must be considered because they are supported by results obtained using X chromosome and Autosomes, respectively. We discuss how accounting for migration reduces AM values and how the resulting admixture dynamics resemble in both cases.

    1. Author Response

      Reviewer 1 (Public Review):

      1) In Figure 2, electron microscopy images represent n=1 cell, making it hard to know how generalizable the mitochondrial phenotypes are. It would be useful to see a quantitative summary of a larger dataset indicating how frequently the mitochondrial defects are seen.

      As requested, we performed quantitative analysis of mitochondrial ultrastructure in a larger dataset (n=163 analyzed in WT and n=206 in the KO) confirming that this finding is very consistent. This additional quantitative analysis that we included in the revised manuscript confirms a very significant and diffuse alteration of mitochondrial ultrastructure in Parl-/- vs WT spermatocytes (p=0.0002).

      2) In Figure 3, representative images are shown for a single field from n=1 animal. It is hard to decisively conclude that the phenotype of Pink1-/-;Pgam5-/- and Ttc19-/- testes is completely normal based on this limited data. There may be other tubules outside the field of view that are abnormal, or more subtle changes in cell ratios. This conclusion would be significantly strengthened by cell counting (e.g. # round spermatids per Sertoli cell per tubule and # spermatocytes per Sertoli cell per tubule) or other quantitation. Likewise, the similarities in phenotype between Parl-/-, Parl-/-;Pink2-/-, and Parl-/-;Pgam5-/- should be more thoroughly documented. At least some additional images should be shown.

      The goal of figure 3 is to indicate that WT, Pink1-/-;Pgam5-/- and Ttc19-/- have no gross morphological abnormality and have preserved sperm production in sharp contrast with Parl/-, Parl-/-;Pink1-/-, and Parl-/-;Pgam5-/- and the TKO that show total lack of sperm in the tubular lumen, indicating that the loss of Parl alone or in combination drives this phenotype. To strengthen these conclusions we performed additional work. We stained testis sections from all strains with an antibody for AIF-1, a marker of post-mitotic spermatids/spermatozoa included in Fig3-figure supplement 1. This additional experiment clearly confirms that production of differentiated germ cells occurs only in WT, Pink1-/-;Pgam5-/- and Ttc19-/-, but not in Parl-/, Parl-/-;Pink1-/-, and Parl-/-;Pgam5-/-. These results are consistent with the reproductive capacity of these mouse lines (the first group is fertile, the second is infertile). We acknowledge we cannot rule out minimal subclinical differences in reproductive fitness between the fertile mouse groups, but this is beyond the goal of our study.

      3) In Figure 4, it looks like there is a significant decrease in CIV-driven respiration in Parl knockouts, but the text describes this as "did not significantly enhance" - that is, the absence of an increase. This result is difficult to interpret without further explanation.

      We recognize this might be confusing but it is specified in the text that CIV driven (TMPD+ascorbate) respiration- relying on endogenous cytochrome c- is diminished (line 195) in Parl-/- testis mitochondria. This test reflects cytochrome c oxidase respiratory capacity/activity. We performed then an additional experiment just after the previous where we add exogenous cytochrome c in the cuvette to test the integrity of the outer mitochondrial membrane and checked if CIV-driven respiration increases or not after,compared to before, the addition of cytc. Exogenous cytochrome c does not cross intact mitochondrial outer membranes, so the test is performed to verify the good quality of mitochondrial preparations and/or pathological changes by looking if of the outer membrane integrity, not the function of CIV. CIV driven respiration increases only modestly after compared to before the addition of cytc and to a similar extent in both WT and Parl-/- indicating a good quality of the mitochondrial preparations and that the outer mitochondrial membrane of these mitochondria is overall well preserved in both WT and KO.

      4) In Figure 5B, there is some variation in band intensity between replicates. Quantifying the band intensity relative to the loading control would help to increase confidence in the conclusion that coQ levels are reduced.

      We performed this quantification, as suggested by the reviewer, and added the quantification in figure 5B. Quantification of the band intensity relative to the loading control confirms a significant difference between WT and KO. Moreover, we performed quantitative immunofluorescence of COQ4 in SCP-1 positive cells included now in Fig 5-figure supplement 1, which confirms a significantly decreased expression of COQ4 in Parl-/- primary spermatocytes.

      5) GPX4 is not a Parl substrate, and no explanation is provided for why it might be reduced in Parl-/- testes. This makes the result and model difficult to interpret.

      We thank the reviewer for pointing this out. We acknowledged this limitation in the discussion. We mentioned in the discussion that decreased GPX4 levels have been observed in other conditions (chemical inhibition, pathological conditions, etc.) and no mechanism has so far been demonstrated to our knowledge, but some evidence raises a possible link with CoQ deficiency that we discussed. Potential mechanisms including protein degradation are likely although unproven. This remains an important and intriguing issue to address in future studies.

      6) Since Parl knockout induces necrosis in the brain, necrosis could be a contributing factor to cell death in spermatocytes alongside ferroptosis. No data is presented that can exclude this possibility.

      Ferroptosis is actually considered, by some authors, a form of regulated necrosis (Seibt TM FRBM 2019). Therefore, we can affirm that PARL deletion leads to regulated necrosis in testis via ferroptosis through specific ferroptosis pathways that do not appear to be activated in the brain, or at least not overtly. Importantly, there is no recognized marker or specific molecular pathway for generic «accidental» necrosis that can be tested to differentiate between the 2 different cell death modalities.

      7) The severe spermatogenesis phenotype implies that Parl knockout males should be infertile, but the fertility status is not described in the manuscript. It may be difficult to test fertility in these animals due to the neurodegeneration phenotype; if so, this can be clarified. If it is feasible to test fertility, demonstration of a fertility phenotype would significantly strengthen the conclusion that loss of Parl leads to spermatogenic arrest.

      We specify in the text that Parl-/- mice are sterile due to total lack of sperm production caused by arrested spermatogenesis, as evidenced by detailed histological analysis and AIF1 staining. This is not due to the neurodegeneration since Parl-Ncre knockout have normal production of sperm as presented in the paper. Fertility in Parl-/- cannot be tested in vitro since these mice have no sperm due to the complete block of spermatogenesis, nor in vivo since they die young due to neurodegeneration. With these limitation Parl-/- males and WT females are kept together and in no single exception since the beginning of the colonies a pregnancy has ever been observed. Parl-/- mice are sterile.

    1. Author Response

      Reviewer #1 (Public Review):

      The authors tried to measure the accuracy of the decision-making of honey bees by carrying out behavioural experiments in which they trained the bees to forage on artificial flowers of 5 different colours that offered different levels of reward. Subsequently, the bees' decision-making behaviour was tested with flowers of the same or different colours, with no reward present. The authors found that bees tend to approach a flower only when they are highly certain of a reward, and these decisions are made quickly. The majority of flowers were rejected by the bees. Based on the results of the tests, the authors created a model to identify what circuit elements or connections would be necessary to mimic the bees' decisions. This model could be potentially used for robotics.

      The study is well supported by the signal detection theory and the experiments are well designed which is a major strength. However, the methods are not completely clear, so would be better to make a clearer description. Another weakness is the lack of clear explanations of the importance and relevance of the model.

      Given the experimental design was optimal, the authors could potentially achieve the aims of this study.

      Thank you for expressing your interest and providing constructive inputs. Based on your suggestions, we have thoroughly revised our manuscript to offer a more comprehensive explanation of the rationale behind our approach, as well as its comparison to existing knowledge and methods in the field. We believe that these revisions will significantly enhance the comprehensibility of our study and facilitate a better understanding of our findings.

      Reviewer #2 (Public Review):

      By elegantly designing experiments, MaBouDi et al. elucidated honeybee's behavioral strategy to quantitatively associate sensory cues with valences. The description is simple and concise enough to understand the logic. Particularly, the authors clearly demonstrated how sensory evidence and reward likelihood quantitatively affect the decision-making process and animals' response time. Their behavioral characterization approach and proposed model could also be helpful for studies using higher animal species. I have a few doubts regarding the definition of rejection behavior and the structure of the model that is critical to lead their main conclusions.

      Thank you for your interest and valuable feedback. We greatly appreciate your input, and as a result, we have thoroughly reviewed your comments and implemented significant revisions to our manuscript. We have taken care to provide more comprehensive explanations of our methods, results, and the proposed model in order to enhance the overall comprehensibility of our study. Our intention is to ensure that readers can better understand our findings through these revisions.

    1. Author Response

      Reviewer #1 (Public Review)

      Using in vitro assays that take advantage of thymic slices, with or without the ability to present pMHC antigens, the authors define an early period in which CCR4 expression is induced, which induces their migration to the medulla and likely encounter with cDC2 and other APCs. Notably, the timing for CCR4 expression precedes that of CCR7 and illustrates the potential role for this early expression to initiate the movement of post-positive selection thymocytes to the medulla. The evidence for supporting a role for CCR4, as well as CCR7, in sequential tolerance induction is provided using multiple approaches, and although the observed changes amount to small percent changes, the significance is clear and likely biologically relevant over the lifespan of a developing T cell repertoire. Overall, the model provides a holistic view of how tolerance to self-antigens is likely induced during T cell development, which makes this work highly topical and influential to the field.

      We thank the reviewer for their comments and for highlighting the significance of identifying distinct roles for CCR4 and CCR7 in promoting medullary localization and inducing self-tolerance of thymocytes at different stages of T-cell development.

      Reviewer #2: (Public Review )

      This manuscript describes that CCR4 and CCR7 differentially regulate thymocyte localization with distinct outcomes for central tolerance. Overall, the data are presented clearly. The distinct roles of CCR4 and CCR7 at different phases of thymocyte deletion (shown in Figure 6C) are novel and important. However, the conclusion that expression profiles of CCR4 and CCR7 are different during DP to SP thymocyte development was documented previously. More importantly, the data presented in this manuscript do not support the conclusion that CCR7 is uncoupled from medullary entry. Moreover, it is unclear how the short-term thymus slice culture experiments reflect thymocyte migration from the cortex to the medulla.

      We thank the reviewer for pointing out the significance of our finding that CCR4 and CCR7 regulate different phases of thymocyte deletion. We agree that prior reports, including our own (Cowan et al. 2014, Hu et al., 2015) have shown that CCR4 and CCR7 are expressed by different post-positive selection thymocytes. However, the expression data we present here provides a higher resolution perspective on the specific thymocyte subsets that express these two receptors, as well as the different timing with which the receptors are expressed after positive selection. These data, coupled with chemotaxis assays of the granular thymocyte subsets responding to CCR4 versus CCR7 ligands, and 2-photon imaging data showing that CCR4 and CCR7 are required for medullary accumulation of distinct thymocyte subsets, are critical for delineating the unexpectedly distinct roles of these two chemokine receptors in promoting medullary entry and central tolerance.

      The reviewer raises an important question about our conclusion that CCR7 is “uncoupled” from medullary entry. We think there was likely a misunderstanding of our intended meaning, as we did not mean to imply that CCR7 does not promote medullary entry of thymocyte subsets; we have modified the wording of the abstract to replace “uncoupled” to clarify. As we detail in the Introduction, the role of CCR7 in directing chemotaxis of single-positive thymocytes towards the medulla and inducing their medullary accumulation is well established (Ehrlich et al., 2009; Kurobe et al., 2006; Kwan & Killeen, 2004; Nitta et al., 2009; Ueno et al., 2004). Instead, our data demonstrate that 1) the most immediate post-positive selection thymocyte subset (DP CD3loCD69+) does not require CCR7 for medullary entry, and 2) the next stage of post-positive selection thymocytes (CD4SP SM) express CCR7, but CCR7 recruits these cells only modestly into medulla. In contrast, CCR7 promotes robust medullary accumulation of more mature thymocyte subsets (CD4SP M1+M2), in keeping with the well-known role of CCR7 in promoting thymocyte medullary localization. We think these findings are highly significant for the field because currently, there is a widely held assumption that post-positive selection thymocytes that do not express CCR7 are located in the cortex, while those that express CCR7 are located in the medulla. Our data show that neither of these assumptions is true: CCR4 drives medullary accumulation of cells that do not yet express CCR7, and the earliest post-positive selection cells that express CCR7 continue to migrate in both the cortex and medulla. These findings form the basis of our statement that CCR7 expression is “not synonymous with” medullary localization. The finding that thymocytes do not robustly accumulate in the medulla in a CCR7-dependent manner until more the mature SP stages has important implications for central tolerance, as localization of thymocytes in the cortex versus medulla will impact which APCs and self-antigens they encounter when testing their TCRs for self-reactivity.

      The reviewer also raised concerns about whether short-term thymus slice cultures reflect physiological thymocyte migration. Short-term live thymic slice cultures have been widely used to investigate the development, localization, migration, and positive and negative selection of thymocytes, as they have been shown to faithfully reflect these in vivo processes, including confirming the role of CCR7 in inducing chemotaxis of mature thymocytes from the cortex into the medulla (Au-Yeung et al., 2014; Dzhagalov et al., 2013; Ehrlich et al., 2009; Lancaster et al., 2019; Melichar et al., 2013; Ross et al., 2014). However, we acknowledge that thymic slices are not equivalent to intact thymuses and have now discussed limitations of this system in our revised Discussion.

      Comment 1: Differential profiles in the expression of chemokine receptors, including CCR4, CCR7, and CXCR4, during DP to SP thymocyte development were well documented. Previous papers reported an early and transient expression of CCR4, a subsequent and persistent expression of CCR7, and an inverse reduction of CXCR4 (Campbell, et al., 1999, Cowan, et al., 2014, and Kadakia, et al. 2019). The data shown in Figures 1, 2, and 3 are repetitive to previously published data.

      The expression profile of CCR4, CCR7 and CXCR4 on thymocytes has been documented previously in the studies cited above and in our prior publication (Hu et al., 2015). Campbell et al. (Campbell, Haraldsen, et al., 1999) investigated chemotactic effects of chemokines, but did not directly address expression of chemokine receptors by thymocyte subsets. Cowan et al. (Cowan et al., 2014) examined the expression of CCR4 versus CCR7 on DP and CD4SP thymocytes. However, our data provide a more detailed analysis of expression of these distinct chemokine receptors by subsets of DP, CD4SP, and CD8SP thymocyte subsets along the trajectory of differentiation after positive selection, using a gating scheme inspired by a study published after the above-cited papers (Breed et al., 2019). Our more nuanced evaluation of CCR4 versus CCR7 expression sets the stage for finding that they play distinct roles in promoting medullary entry and central tolerance of early- versus late-stage post-positive selection thymocytes. Without examining CCR4 and CCR7 expression patterns by distinct thymocyte subsets in detail, we would not have made the unexpected observation that although CCR7 is expressed at high levels by many CD4SP SM thymocytes, it does not induce strong chemotaxis or medullary accumulation of this subset, relative to its role in more mature SP thymocyte subsets. This finding has important implications for which APCs thymocytes encounter as they are tested for self-reactivity to enforce central tolerance. As we were working on these studies, Kadakia et al. reported that extinguishing CXCR4 expression was important for enabling medullary entry (Kadakia et al., 2019). Thus, we thought it was important to place CXCR4 in the context of CCR4 and CCR7 expression on thymocyte subsets in our study, and in doing so found another example of asynchronous chemokine receptor expression and function, further indicating that expression of a chemokine receptor alone is not a reliable marker of functional activity or thymocyte localization, as cells migrate dynamically between the cortex and medulla.

      Through more extensive gating and simultaneous investigation of chemokine receptor expression and function, our data have provided new insights into how thymocytes respond to chemokine cues at different time points during their post-positive selection development. Moreover, our refined gating scheme (Figure 1) can be used to distinguish thymocyte subsets at different development stages without relying on chemokine receptor expression, thus providing an unbiased way of investigating chemokine receptor expression at different developmental stages.

      Comment 2: The manuscript describes the lack of CCR7 at early stages during DP to SP thymocyte development (Figure 1-3). However, CCR7 expression is detected insensitively in this study. Unlike CCR4 detection with a wide fluorescence range between 0 and 2x104 on the horizontal axis, CCR7 detection has a narrow range between 0 and 2x103 on the vertical axis (Figure 1C, 1D, 4B, 4C, 6B, S2, S3), so that flow cytometric CCR7 detection in this study is 10-times less sensitive than CCR4 detection. It is therefore likely that the "CCR7-negative" cells described in this manuscript actually include "CCR7-low/intermediate" thymocytes described previously (for example, Figure S5A in Van Laethem, et al. Cell 2013 and Figure 6 in Kadakia, et al. J Exp Med 2019).

      We provide new data to address the possibility that we were failing to detect low levels of CCR7 expression on early post-positive selection DPs (CD3loCD69+). We agree that CCR7 immunostaining of mouse cells is known to be more challenging than immunostaining of other chemokine receptors, including CCR4 and CXCR4. CCR7 immunostaining needs to be carried out at 37°C, which we did throughout our studies. We provide new data comparing CCR7 expression by Ccr7+/+ versus Ccr7-/- thymocyte subsets (Figure 1—figure supplement 2A-B), which confirm that CCR7 is not expressed at detectable levels by CD3loCD69+ DP cells above the background seen in CCR7-deficient cells. As thymocytes transition to theCD4SP SM stage, low/intermediate to high expression of CCR7 can be detected (Figure 1—figure supplement 2A). To further test whether we were failing to detect low levels of CCR7 by post-positive selection DPs, we incubated thymocytes at 37°C for up to 2 hours prior to immunostaining for CCR4 and CCR7, as a prior study indicated in vitro culture would enable increased cell surface expression of CCR7 by alleviating ligand-mediated CCR7 internalization (Britschgi et al., 2008). However, we did not observe increased CCR7 (or CCR4) expression by any thymocyte subset incubated at 37°C (Figure 1—figure supplement 2C-D). Lack of expression of CCR7 by CD3loCD69+ DP cells is consistent with their failure to undergo chemotaxis to CCR7 ligands in vitro, and initial expression of CCR7 by CD4SP SM is consistent with their chemotaxis towards CCR7 ligands in vitro (now show in greater detail in Figure 2—figure supplement 1), albeit at a much lower migration index than subsequent thymocyte subsets.

      Comment 3: Low levels of CCR7 expression could be functionally evaluated by the chemotactic assay as shown in Figure 2. However, the data in Figure 2 are unequally interpreted for CCR4 and CCR7; CCR4 assays are sensitive where a migration index at less than 1.5 is described as positive (Figure 2A and 2B), whereas CCR7 assays are dismissal to such a small migration index and are only judged positive when the migration index exceeds 10 or 20 (Figure 2C and 2D). CCR7 chemotaxis assays should be carried out more sensitively, to equivalently evaluate the chemotactic function of CCR4 and CCR7 during thymocyte development.

      We thank the reviewer for his insight about the possibility that we could have overlooked CCR7-mediated chemotaxis at lower migration indexes. When data from the chemotaxis assays were evaluated separately for each thymocyte subset, CCR7-mediated chemotaxis of CD4SP SM and subsequent DP CD3+CD69+ co-receptor reversing thymocytes could be detected. However, DP CD3loCD69+ thymocytes still did not undergo CCR7-meidated chemotaxis, but were responsive to the CCR4 ligand CCL22 (Figure 2—figure supplement 1).

      We did not detect CCR7-mediated chemotaxis of CD4SP SM and DP CD3+CD69+ subsets in our previous analysis because their lower-level chemotactic index relative to mature thymocytes did not reach statistical significance when chemotaxis of all subsets were compared simultaneously (Figure 2D). We note that the magnitude of difference in the responsiveness of CD4SP SM cells compared to mature CD4SP and CD8SP M1 & M2 thymocytes (Figure 2D) is likely physiologically important as CCR7 deficiency results in severely reduced medullary accumulation of CD4SP M1+M2 cells, but only a very mild reduction in medullary accumulation of CD4SP SM cells, which is only detected with our new paired analyses in Figure 5C. We feel these new analyses provide important new insights and thank the reviewer for this suggestion.

      Comment 4: Together, this manuscript suffers from the poor sensitivity for CCR7 detection both in flow cytometric analysis and chemotactic functional analysis. Conclusions that CCR7 is absent at early stages of DP to SP thymocyte development and that CCR7 is uncoupled from medullary entry are the overinterpretation of those results with the poor sensitivity for CCR7. The oversimplified scheme in Figure 3D is misleading.

      We agree that the scheme in Figure 3D, as previously constructed, did not ideally display the difference in scale between thymocyte responses to CCR7 ligands versus CCR4 and CXCR4 ligands (as detected in vitro). Thus, we have now modified the schematic to include the mild response to CCR7 ligands that we observed in CD4SP SM thymocytes (comment 3) and to emphasize the higher chemotactic response of mature thymocytes to CCR7 ligands than of DPs and CD4SP SM to CCR4 ligands. Likewise, we have modified the manuscript to clarify the importance of CCR7 expression in the medullary entry and accumulation of mature thymocyte subsets.

      We respectfully disagree that the sensitivity of CCR7 detection was poor in our flow cytometry and chemotactic analyses. Our CCR7 stains identified a range of CCR7 expression levels, from no expression by pre- and post-positive positive selection DP cells to high expression by CD4SP M1 cells, and we now provide new data confirming our ability to detect CCR7 expression (Figure 1—figure supplement 2), as described in response to Comment 3. Our chemotaxis assays detected CCR7 responses over a range of migration indexes from ~ 2 up to 100, showing our sensitive ability to detect CCR7-mediated chemotaxis in vitro (Figure 2 and Figure 2—figure supplement 1). In live thymic slices, we were also able to capture a range of biologic activities of CCR7, from mediating modest medullary accumulation of CD4SP SM cells to robust medullary accumulation of CD4SP M1+M2 cells (Figure 5A-C). Importantly, our results demonstrate that CCR7 is not the only chemokine receptor responsible for medullary entry and accumulation of thymocytes. Complex spatiotemporal regulation of thymocytes at distinct stages of development is achieved through tight orchestration of expression and signaling through multiple chemokine receptors, including CCR4, as shown by our data. However, our study does not negate an important role for CCR7 in mediating medullary entry of thymocytes, which we have clarified in the text.

      Comment 5: The short-term thymus slice culture experiments should be described more carefully in terms of selection events during DP to SP thymocyte development, which takes at least 2 days for CD4 lineage T cells and approximately 4 days for CD8 lineage T cells (Saini, et al. Sci Signal 2010 and Kimura, et al. Nat Immunol 2016). The slice culture experiments in this manuscript examined cellular localization within 12 hours and chemokine receptor expression within 24 hours (Figures 4, 5) even for the development of CD8 lineage T cells (Figure S2), which are too short to examine entire events during DP to SP thymocyte development and are designed to only detect early phase events of thymocyte selection.

      Experiments in Figures 4 and 5 were indeed designed to capture behaviors of thymocytes relatively early after introduction onto thymic slices. Figure 4 (and Figure 4—figure supplement 1) shows that the timing of CCR4 versus CCR7 expression after positive selection is dramatically different: CCR4 is expressed within hours of positive selection, concomitant with medullary entry, while CCR7 expression takes several days in the slices (sufficient time for CD8SP development, Figure 4—figure supplement 1). Figure 5 shows that medullary accumulation of CD4SP M1+M2 cells occurs robustly in the medulla of thymic slices within a couple of hours after introduction into the slices, and this localization is CCR7 dependent, while CCR4 induces more mild medullary accumulation of post-positive selection DPs. As indicated by the reviewer, it has been shown that it takes days for DP thymocytes to develop into mature CD4SP and CD8SP cells (Kimura et al., 2016; Lutes et al., 2021; Saini et al., 2010), as recapitulated in the thymus slice system (Figure 4—figure supplement 1) (Lutes et al., 2021). The relatively short time frame of our time-course experiments (up to 12 hours after addition of pre-positive selection thymocytes to positively selecting thymic slices) allowed us to detect expression of CCR4 within a few hours after positive selection and to determine that this timing correlated with medullary entry. Thus, the 12-hour time-course was important for temporal resolution of chemokine receptor expression and medullary localization after initial stages of positive selection.

      Comment 6: It is unclear what the medullary density alteration measured in the thymus slice culture experiments represents. Although the manuscript describes that the increase in the medullary density reflects the entry of cortical thymocytes to the medulla (Figure 4E and S2E), this medullary density can be affected by other mechanisms, including different survival of the cells seeded on the top of different thymus microenvironments. Thymocytes seeded on the medulla may be more resistant to cell death than thymocytes seeded in the cortex, for example, because of the rich supply of cytokines by the medullary cells. So, the detected alterations in the medullary density may be affected by the differential survival of thymocytes seeded in the cortex and the medulla. Also, the medullary density is measured only within a short period of up to 12 hours. The use of MHC-II-negative slices and CCR4- or CCR7-deficient thymocytes in the thymus slice cultures may verify whether the detected alteration in the medullary density is dependent on TCR-initiated and chemokine-dependent cortex-to-medulla migration.

      We thank the reviewer for pointing out these possibilities. The purpose of the positive selection timing experiment (Figure 4) was to establish the early correlation between receiving a positive selection signal, upregulating CCR4, and migrating into the medulla. In this system, cells only enter only the cortex in the first hour after migration in the slice, consistent with prior studies of localization of pre-positive selection thymocytes to the cortex (Ehrlich et al., 2009; Ross et al., 2014); subsequently, they move into the medulla. Because CCR7 is widely accepted to be essential for medullary entry, we feel it is important to demonstrate the disconnect between the timing of medullary entry and CCR7 expression in multiple ways. The timing experiment design utilized MHCII-/- and β2m-/- slices to show that positive selection was necessary for expression of CCR4. To test whether CCR4 or CCR7 were required for medullary entry of early post-positive selection DPs, we evaluated medullary accumulation of this subset from WT, Ccr4-/-, Ccr7-/-, and Ccr4-/-Cc7-/- mice. This experiment provided a more robust means of determining the extent to which CCR4 deficiency impacted medullary localization of a large cohort of cells that had passed positive selection (Figure 5), and again showed that the post-positive selection thymocytes, which express CCR4 but not CCR7, accumulate in the medulla in a CCR4-dependent manner. We note that in Figure 5, we show that all Ccr4-/-Ccr7-/- thymocyte subsets imaged have medullary:cortical density ratios of ~1, indicating an even distribution across cortex and medulla, which is highly consistent with an essential role for these two chemokine receptors in cooperating to mediate medullary accumulation of different stages of developing T cells.

      The reviewer makes an interesting point that survival cues could differ in the cortex versus medulla. However, if thymocytes lacking one or both chemokine receptors had impaired survival because they didn’t enter a region of the thymus efficiently to receive survival cues, we would expect to detect increased apoptosis in Ccr4-/-, Ccr7-/-and Ccr4-/-Cc7-/- thymocytes. However, we found that chemokine receptor deficiencies resulted in diminished apoptosis of different thymocyte subsets (Figure 6). This finding is more consistent with reduced negative selection of these subsets due to reduced clonal deletion. We nonetheless discuss this possibility in our revised manuscript, as it important to consider that chemokine-mediated migration of thymocytes into different microenvironments could alter their access cytokines and other pro-survival cues.

      Reviewer #3 (Public Review)

      In this manuscript, Li et al. examine how the expression of the chemokine receptor CCR4 impacts the movement of thymocytes within the thymus. It is currently known that the chemokine receptor CCR7 is important for developing thymocytes to migrate from the cortical region into the medullary region and CCR7 expression is therefore often used to define medullary localization. This is important because key developmental outcomes, like enforcing tolerance to self-antigens amongst others, occur in the medullary environment. The authors demonstrate that the chemokine receptor CCR4 is induced on thymocytes prior to expression of CCR7 and thymocytes exhibit responsiveness to CCR4 ligands earlier in development. Using elegant live confocal microscopy experiments, the authors demonstrate that CCR4 expression is important for the entry and accumulation of specific thymocyte subsets while CCR7 expression is needed for the accumulation of more mature thymocyte subsets. The use of cells deficient in both CCR4 and CCR7 and competitive migration/accumulation experiments provide strong support for this conclusion. The elimination of CCR4 expression results in decreases in apoptosis of thymocyte subsets that have been signalled through their antigen receptor and are responsive to CCR4 ligands. As expected, more mature thymocyte subsets show decreased apoptosis when CCR7 is absent. Distinct antigen-presenting cells in the thymus express CCR4 ligands supporting a model where CCR4 expressing thymocytes can interact with thymic antigen-presenting cells for induction of apoptosis. The absence of CCR4 results in an increase in peripheral T cells that can respond to self-antigens presented by LPS-activated antigen-presenting cells providing further support for the model. Collectively, the manuscript convincingly demonstrates a previously unappreciated role for CCR4 in directing a subset of thymocytes to the medulla.

      We thank the reviewer for appreciating the novelty of the finding that CCR4 directs distinct subsets of thymocytes into the medulla relative to CCR7, as supported by multiple lines of evidence.

    1. Author Response

      Reviewer #1 (Public Review):

      The sustainability of vaccination programs is subject to multiple threats, from a pandemic like COVID-19 to political changes. The present study assesses different strategies, including gender-neutral vaccination, to better respond to threats in HPV national immunization programs. The authors showed that vaccinating boys against HPV (compared to vaccinating girls alone), would not only prevent more cases of cervical cancer but also limit the impact of disruptions in the program. Moreover, it would help attain the goal set by the World Health Organization of eliminating cervical cancer as a public health problem sooner, even in the case of disruptions.

      Strengths and weaknesses: I found the manuscript well-written and easy to read. Decision-makers may find the results helpful in policy development and other researchers may use the study as an example to investigate similar scenarios in their local contexts. Nevertheless, there are some limitations. First, it should be considered that the present study is only applicable to India and other countries with a similar HPV context. Second, because it is a study based on a mathematical model, errors might arise from the assumptions considered for its construction. It also relies on the quality of the data used to construct and calibrate the model.

      Models are important tools for decision-making, they allow us to assess different scenarios when obtaining real-world data is not feasible. They also allow to carried-out multiple sensitivity analyses to test the strengths of the results. The study carries out a necessary assessment of different vaccination strategies to minimize the impact on cervical cancer prevention due to disruptions in the HPV immunization program. By using a mathematical model, the authors are able to assess different scenarios regarding vaccination coverage rates, disruption time, and cervical cancer incidence. Therefore, decision-makers can consider the scenario which best represents their current situation.

      The present study is not only valuable for decision-making, but also from a methodological point of view as future research can be conducted exploring more in deep the impact of vaccination disruptions and prevention measures.

      The conclusions of this paper are mostly well supported by data, but some aspects of the methodology need clarification; furthermore, some aspects of the calculations can be improved. It would be more informative, and better for comparisons between the four scenarios, to have relative measures instead of the absolute numbers of cases prevented.

      We thank the reviewer for the kind acknowledgement of the merits of the paper. We have tried to address the suggestions and questions as much as possible in the revised manuscript.

      We agree to the points of weaknesses raised by the reviewer regarding the applicability of our study results is limited to other countries and the possible errors arising from a using a mathematical model. We have added more elaborate discussion of these points in the manuscript, as follows: - Page 15 lines 310-312: “Extrapolation of the results of this study to other populations will be limited to those sharing similar patterns of demography, social norms, and cervical cancer epidemiology as India.” - Page 17 lines 361-363: “…, within the limitations of our model, the modelbased estimates show that shifting from GO to GN vaccination may improve the resilience of the Indian HPV vaccination programme while also enhancing progress towards the elimination of cervical cancer.”

      Furthermore, we have tried to clarify the rationale, advantages, and limitations of the measure of resilience we have adopted.

      Reviewer #2 (Public Review):

      This study evaluated the effect of population-based HPV vaccination programs in India which is suffering from the disease burden of cervical cancer. The authors used model simulations for estimating the outcomes by adopting the latest available data in the literature. The findings provide evidence-based support for policymakers to devise efficient strategies to reduce the impacts of cervical cancer in the country.

      Strengths.

      The study investigated the potential impact of cervical cancer elimination when HPV vaccination was disrupted (e.g., during the COVID-19 pandemic) and for meeting the WHO's initiatives. The authors considered several settings from the low to high effects of vaccination disruption when concluding the findings. The natural history was calibrated to local-specific epidemiological data which helps highlight the validity of the estimation.

      Weaknesses.

      Despite the importance and strengths, the current study may likely be improved in several directions. First, the study considered the scenario of using a recently developed domestic HPV vaccine but assuming vaccine efficacy based on another foreign HPV vaccine that has been developed and used (overseas) for more than 10 years. More information should be provided to support this important setting.

      Second, the authors are advised to discuss the vaccine acceptability and particularly the feasibility to achieve high coverage scenarios in relatively conservative countries where HPV vaccines aim to prevent sexually transmitted infection. Third, as the authors highlighted, the health economics of gender-neutral strategies, which is currently missing in the manuscript, would be a substantial consideration for policymakers to implement a national, population-based vaccination program.

      We thank the reviewer for the kind acknowledgement of the merits and strengths of the paper.

      We have tried to address the reviewer’s three points of weaknesses as comprehensively as possible in the revised manuscript.

      Regarding the first two points of weaknesses, we have provided more background information about the current situation of HPV introduction and screening in India (see the more specific replies below for where changes have been made), and some data of observed coverage in India in the states where HPV vaccination has been introduced.

      Regarding the reviewer’s third point about the health economics of genderneutral strategies, we agree fully that it is an important aspect to consider for the local policymakers. However, a health economic assessment is out of the scope of the present paper. In the present paper, we are interested in highlighting the potential health benefits on GN HPV vaccination. Given the current context of HPV vaccination in India we think it is too early to provide a realistic assessment of the health-economic balance of GN vaccination. Please note that one manuscript (de Carvalho et al., MedRxiv, doi: https://doi.org/10.1101/2023.04.14.23288563) based on the same modelling exercise and reporting a health economic assessment of girls-only (routine and catch-up) HPV vaccination in India is currently submitted for peer-review.

      Reviewer #3 (Public Review):

      The authors put together a rigorous study to model the impact of HPV vaccine programme disruptions on cervical cancer incidence and meeting WHO elimination goals in a low-income country - using India as an example. The study explores possible scenarios by varying HPV vaccination strategies for 10-year-old children between a) increasing vaccine coverage in a girls-only vaccination programme and b) vaccinating boys in addition to girls (i.e a gender-neutral vaccination programme).

      The main strength of this study is the strength of the modelling methodology in helping to make predictions and in contingency planning. The study methodology is rigorous and uses models that have been validated in other settings. The study employs a high level of detail in calibrating and adapting the model to the Indian context despite poor data availability. The detailed methodology allows future studies to employ the model and techniques with locally-contextualised parameters to study the potential impact of HPV vaccine programme disruptions in other countries.

      The work in this field can begin to help lower-income countries explore varying HPV vaccination strategies to reduce cervical cancer incidence, keeping in mind the potential for future supply chains or other related disruptions. However, the scenarios could be better sculpted to model potentially realistic scenarios to guide policymakers to make decisions in situations with limited vaccine supplies - in other words comparing scenario alternatives based on a fixed number of vaccines being available. Using comparative alternatives will help policymakers grapple with the decisions that need to be made regarding planning national HPV vaccination programmes. The results could afford to provide readers with a clearer measure of vaccine strategy 'resilience'.

      In all, the authors are able to successfully explore the potential impact of varying HPV vaccination strategies on cervical cancer cases prevented in the context of vaccine disruptions, and make valid conclusions. The results produced are rich in information and are worthy of deeper discussion.

      We thank the reviewer for the kind acknowledgement of the merits and strengths of the paper.

    1. Author Response

      Reviewer #3 (Public Review):

      The strongest aspects of this study are the structural analysis of the 90 residue KER domain. This is an important advance, discovering a founding member of a novel class of DNA binding motifs, termed a SAH-DBD (single alpha helix-DNA binding domain). Interestingly, they define a subregion of KER (termed "middle-A", residues 155-204 of Cac1) that has nearly the same DNA binding affinity and confers similar in vivo phenotypes as the full KER domain.

      This study also shows that the biological role of KER partially overlaps compensatory factors in vivo, both within the same Cac1 protein subunit (e.g. the WHD domain) and also with other proteins acting in parallel (e.g. Rtt106). That is, the presence of either WHD or Rtt106 renders the drug-resistance and silencing assays employed here insensitive to loss of the KER domain.

      However, the drug resistance and gene silencing phenotypes are inherently indirect measures of the most important claim of this work, that KER is a molecular ruler for DNA for the purpose of ensuring sufficiently large templates deposition of histone H3/H4 cargoes. Therefore, this study would be of greater impact if the authors more directly tested this measurement idea in assays that directly assess histone deposition. There are multiple options. Since the authors have in hand recombinant wild-type and mutant CAF-1 complexes, one could examine the number and/or spacing of nucleosomes formed during in vitro deposition reactions. Complementary in vivo experiments using the authors' existing mutant strains could be based on the finding that CAF-1 is particularly important for histone deposition onto nascent Okazaki fragments during DNA replication (Smith and Whitehouse, 2012; pmid: 22419157), and that the spacing pattern of nucleosomes on this DNA is greatly perturbed in cac1-delete cells.

      Thank you for the suggestion of approaches to obtain data that more directly addresses changes in nucleosome assembly due to CAF-1 KER mutants. We considered using an in vitro nucleosome assembly assay, such as the reconstitution of nucleosomes onto gapped DNA using purified components developed by Kadyrova et al., 2013 (doi: 10.4161/cc.26310). However, they found defects only in the amount of nucleosome assembly and not changes in nucleosome spacing without CAF-1. In addition, we didn’t have the system set up and knew that it would be unlikely to produce data in the time needed for a revision of the manuscript, or even show spacing changes in nucleosomes at all. Therefore, we chose an assay system in yeast that already has been used to assess the impact of CAF-1 DNA binding mutants on nucleosome assembly (Smith and Whitehouse, 2012; pmid: 22419157 and Mattiroli et al., 2017 doi: 10.7554/eLife.22799). This approach, developed by Smith and Whitehouse, uses a degradable Ligase I system in yeast, which reveals Okazaki fragment lengths, and shows a defect when CAF-1 activity is knocked out (Smith and Whitehouse, 2012). This assay also showed that mutations or deletions in the Cac1 WHD DNA binding domain, led to increased lengths of Okazaki fragments (Mattiroli et al., 2017). As the WHD DBD impacts Okazaki fragment lengths, we reasoned that mutations in the KER DBD might also.

      We generated numerous new yeast strains that included the degradable Ligase I system and collaborated with Dr. Duncan Smith of (Smith and Whitehouse, 2012; pmid: 22419157) to detect nascent Okazaki fragments in various CAC1 mutants in strains that were RTT106 or rtt106∆. We found that the Okazaki fragment lengths from cac1∆ yeast were larger and less discrete than from CAC1 yeast (as Dr Smith published previously) and that the Okazaki fragments from the cac1∆ rtt106∆ strain were barely detectable, presumably because they were too long to be resolved on the gel. However, the assay did not have sufficient resolution to detect changes between the Okazaki fragment length distribution between wild type CAC1 or the ∆KER, ∆middle-A and 2xKER mutants of CAC1, in either the RTT106 or rtt106∆ background. Therefore, we were unable to detect direct effects of the KER mutants on Okazaki-fragment lengths. We considered using the combination of KER mutants with the WHD mutants, but as this would not directly assess the effects of the KER mutants and CAF-1 proteins lacking the KER and the WHD don’t bind to DNA (Figure 3 in Mattiroli et al., 2017), we didn’t pursue it. As the complete deletion of the KER, shortening of the KER and lengthening of the KER did not give detectable changes in this assay, we also did not pursue the other mutants tested in the manuscript. Although, we are disappointed the experiment did not reveal effects that we had hoped for, this experiment provides support for the redundant functions of CAF-1 and Rtt106 in nucleosome assembly, which has not been shown using this assay. As such, we have added Figure 1-figure supplement 1g and text to the results section, methods section and strain table. We have included Prof. Duncan Smith and his student Anne Seck as authors.

      Added text lines 195 to 207: “Finally, to assess the impact of deleting the KER more directly on nucleosome assembly in vivo, we examined histone deposition onto nascent Okazaki fragments during DNA replication as we have shown previously that the length of Okazaki fragment lengths are determined by histone deposition into nucleosomes and is disrupted upon deletion of CAC1 (Smith and Whitehouse, 2012). We compared CAF-1 mutants in the WT yeast background and in yeast lacking Rtt106. We found that the Okazaki fragment length distributions of the ∆KER mutant was indistinguishable from that of WT while that of cac1∆ was disrupted (Figure 1-figure supplement Figure 1-figure supplement 3g). That we did not detect effects on Okazaki-fragment lengths for the yCAF-1 mutants lacking the intact KER is consistent with the results of the viability and silencing assays for KER mutants, which also retained the WHD. Strikingly, the Okazaki fragments from rtt106∆ cac1∆ yeast were highly disrupted (Figure 1-figure supplement Figure 1-figure supplement 3g) further highlighting the redundancy between Rtt106 and Cac1 for assembling histones onto newly replicated DNA. Therefore, t”

    1. Author Response

      Reviewer #3 (Public Review):

      The authors investigated the mechanism of transport of the GLUT5 sugar porter using enhanced sampling molecular dynamics simulations and biochemical analysis. The results suggest a possible general mechanism by which binding to a transported substrate stabilizes an occluded intermediate conformation between outward and inward-facing states of the alternating access conformational change of the protein, thereby enabling transport.

      The authors also identified key elements of this transition, associated with residues involved in sugar binding, and through elegant biochemical experiments demonstrated how mutations of the latter affect the protein function, including mutations of gating residues that can recover the function of inactive mutants.

      The general computational methodology used by authors is appropriate for addressing these questions and compared to other techniques has the advantage of bringing forth an unbiased molecular description of the transport process. The results are overall qualitatively in line with the proposed conclusions.

      A major weakness of this work is that, in contrast to previous studies with the same type of methodology, the authors do not report error analysis or careful statistical assessment of the computational results. Therefore, it is not clear whether the latter is solid or if they support the proposed conclusions. The computational data might generally benefit from an improved methodological design, such as including more degrees of freedom (or collective variables) in the description of the minimum free energy pathway, e.g. the salt-bridges.

      This has now been addressed in the essential revisions above.

      Another weakness is that some of the details of the computational analysis are not reported, therefore other investigators would not know how to reproduce the results.

      We have extended the methods section to include much more detail about the MSM construction and other computational analysis. Data files needed for reproduction are now found in a public repository with links provided in the Methods section.

    1. Author Response

      Reviewer #1 (Public Review):

      This manuscript presents an inference technique for estimating causal dependence between pairs of neurons when the population is driven by optogenetic stimulation. The key issue is how to mitigate spurious correlations between unconnected neurons that can arise due to polysynaptic and other network-level effects during stimulation. The authors propose to leverage each neuron's refractory period (which begins at approximately random times, assuming Poisson-distributed spikes and conditional on network state) as an instrumental variable, allowing the authors to tease apart causal dependence by considering how the postsynaptic neuron fires when the presynaptic neuron must be muted (i.e., is in its refractory period). The idea is interesting and novel, and the authors show that their modified instrumental variable method outperforms similar approaches.

      We wish to thank the reviewer for this positive assessment.

      However, the scope of the technique is limited. The authors' results suggest that the proposed technique may not be practical because it requires considerable amounts of data (more than 10^6 trials for just 200 neurons, resulting in stimulation of more than 5000 times per neuron). Even with such data sizes, the method does not appear to converge to the true solution in simulations. The method is also not tested on any experimental data, making it difficult to judge how well the assumptions of the technique would be met in real use-cases. While the manuscript offers a unique solution to inferring causal dependence, its applicability for experimental data has not yet been convincingly demonstrated, and would, therefore primarily be of interest to those looking to build on these theoretical results for further method development.

      We thank the reviewer for this assessment and agree that the requirement for this many trials makes the estimators practically unsuitable for identifying causal interactions in large systems. However, in the revised manuscript, we can observe that the IV estimator can be beneficial after even a few thousand trials when introducing a newly improved error measurement (which we discovered thanks to these reviews). Moreover, we agree that this work will be of interest to the more theoretically oriented community for methodological improvements; we believe that the methods and causal inference framework will be interesting and useful for the wider neuroscience community. For example, considering the first (new) example in the introduction, even under two-photon single-neuron stimulation, the IV framework should be used to avoid bias amplification.

      Reviewer #2 (Public Review):

      Lepperød et al. consider the problem of inferring the causal effect of a single neuron's activity on its downstream population. While modern methods can perturb neuronal activity, the authors focus on the issue of confounding that arises when attempting to infer the causal influence of a single neuron while stimulating many neurons together. The authors adapt two basic methods from econometrics that were developed to address causal inference in purely observational data: instrumental variables and difference-in-differences, both of which help correct for unobserved correlations that confound causal inference. The authors propose an experimental procedure where neurons have spike times measured with millisecond precision and a subset of neurons are optogenetically activated. As an instrumental variable, the authors propose using the refractoriness of a stimulated neuron, resulting in absent or delayed spiking which can be used to infer its causal effect in otherwise matched conditions.

      Based on this, they develop a collection of estimators to measure the pairwise causal relationship of one neuron on another. By simulating a variety of small networks, the authors show that, provided enough data is present, the proposed causal methods provide estimates that better match underlying connectivity than methods based on ordinary least squares or naive cross-correlograms (CCHs). However, the methods proposed require extensive data and highly targeted stimulation to converge.

      Strengths:

      The value of the paper comes from its attempt to find neuroscience applications for methods from fields where causal analysis of observational data is required. Moreover, as the field develops improved methods of measuring anatomical neuronal connectivity using molecular, physiological, and structural approaches, the question of the causal influence of one neuron's spiking on another remains vital. The authors thoughtfully lay out the necessary conditions - and difficulties - required to establish this type of causal functional influence and suggest one potential approach. The collection of models tested highlighted both the strengths and difficulties of the suggested approaches.

      We wish to thank the reviewer for the positive feedback, we are delighted to share your view that obtaining methodology for estimating causal influence is vital.

      Weaknesses:

      1) I found the paper's introduction to its analysis techniques to be very confusingly written, particularly as it is designed to bridge fields. It is vital that the ideas are communicated more clearly. Some topics are explained multiple times, even after being used previously, other ideas and notations are introduced and immediately dropped (e.g. the "do operator", the ratio of covariances in the introduction to instrumental variables), and still others are introduced with no clear explanation (e.g. the weight term w, the "|Y->Y-Y*" notation, and the notation in the methods with "Y(Z=0)").

      We thank the reviewer to point out this lack of clarity and we extensively rewrote the paper to make it more accessible. The do operator is used in the methods to define Y(Z=0), but is now removed from the introduction to reduce the number of concepts introduced early in the text. The w term is now defined from the generative model. The difference in differences notation is written out fully to be clear and a sketch of the method intuition is added to Figure 1.

      1) Of particular importance, the introduction of the Z,X, and Y variables in the first full paragraph on page five, it could be made much more clear that this method is pairwise: Z and X reference the spiking of one specific stimulated neuron at two time points and Y references one specific downstream neuron. 2) In the third paragraph of the same page, the authors refer to the "refractoriness of X" and "spiking of X onto Y", but this language confuses the neurons with variables in a way that took considerable time to unpack. 3) This was not helped by Figure 1b, which suggested that Z_i, X_i, and Y_i applied to all neurons and merely reflected time points around stimulation. 4) Similarly, the introduction of the Y* variable in the difference of differences method, which the authors view as one of the main contributions, is given little clear explanation or intuition. I assume "shifted on window-size left" means measuring the presence of spiking at the same time step as X, but I see no clear definition of this. 5) The confusion about variables remains when, in Figure 1d, a "transmission probability" goes below 0 and above 1.

      1) Thank you for pointing out this lack of clarity, the suggested explanation of the variables XYZ is adopted.

      2) The language is clarified such that variables and neurons are separated.

      3) Figure is fixed such that variables refer to the neurons they represent.

      4) We have now improved the explanation of DiD with a figure for intuition.

      5) We have now redefined the “transmission probability” to effective connectivity to reduce confusion.

      I also found the network models studied after the first section and the relevant variables difficult to understand with the detail necessary to interpret the results. For example, the cartoon in Figure 2a does not seem to match the text description. I see no explanation for the external "excitatory confounder" and "inhibitory confounder" terms, nor what is done to control the (undefined) \sigma_max/\sigma_min term. I don't see anything in the methods about distinct inhibitory and excitatory neurons either. Further, the violin plots (e.g. Fig 2d) seem quite noisy (e.g. is Br, DiD really bimodal?), and it is not clear what distribution is being covered by them. If this is computational simulations, I would imagine more samples could be generated. The same vagueness issues hold for the networks in section 2.4 and 2.7.

      We have now clarified the implementation of the excitatory and inhibitory confounder and how we distinguish between excitatory and inhibitory neurons and defined the condition number. The violin plots were removed in Fig 2 since the large variance represented changes across external drive which produced largely incomparable statistics. To illustrate variance, we now show the standard deviation of the absolute error in line plots 2e and 2g.

      2) Broadly speaking, the causal estimates appear better in the sense of having smaller errors, but it's not clear to me if they are actually good or not. What does an error of 0.4 mean in terms of your ability to estimate the causal structure, and what exactly does the Error(w{greater than or equal to}0) notion refer to? It would be useful to see actual reconstructions of ground truth versus causally inferred connectivity to better understand the method's strengths and weaknesses.

      To improve clarity, we have added a paragraph in the text before figure 2 explaining a new error measure. Since the estimators give the transmission probability and not the inferred connection strength directly, we previously computed a regressed error as in Das & Fiete 2020. This error measure is equivalent to the sine of the angle between $W$ and $\hat{W}$. This error measure is not ideal and gives an indirect population measure with deviations scaled during the error regression. Upon further reflection, we realized that we could define the error directly using our definition of effective connectivity on the generative model to obtain a much cleaner and more interpretable measure. This further led us to remove one of the proposed methods (brew) as it did not perform well under this new error measure. All error measurements are updated in all figures. Error(w{greater than or equal to}0) means that we only look at positive weights; now clarified in the text

      3) I found the section on optogenetic modeling to be unsatisfying in its realism. The general result that 1 photon excitation hits a wide collection of neurons is undisputed, but the simulation does not account for a number of key factors - optogenetic receptor expression is distributed across the axons and dendrites of a cell, not only soma, scattering in tissue greatly affects transmission, etc. Moreover, experiments that attempt to do highly targeted activation have other methods for exactly this reason, such as multiphoton activation or electrophysiology. The message of decreasing performance as a function of stimulus size is important, but I struggle with the idea of the model being "realistic".

      We thank the reviewer for pointing out this unsatisfactory comparison with realistic scenarios. To mitigate we have changed the wording, but kept the simulation as is. As the reviewer pointed out optogenetic receptor expression is distributed, and here we have assumed an expression that only affects soma (experimentally plausible according to Grødem et al 2023 (10.1038/s41467-023-36324-3)), scattering in tissue is included according to the Kubelka-Munk model.

      4) The authors spend a great deal of analysis of stimulation, but little time on measurement. It seems like this approach demands a highly precise measure of spike time to know if a neuron is firing or not at a given millisecond due specifically being in a refractory state. A stimulated but refractory neuron will still likely spike as soon as it can after the momentary delay, and given the noise in the network this difference might not be easily detectable in the delay-to-spike of the downstream neuron, even assuming one spike in the presynaptic neuron is likely to cause a spike in the downstream. It would be useful to see this aspect considered with the same detail as the rest of the study.

      We thank the reviewer for pointing out this. We have now added a paragraph discussing this: “As outlined in \citep{ozturk2000ill}, ill-conditioning can affect statistical analysis in three ways and therefore similarly in inverse connectivity estimates from measured activity. First, measurement errors such as a temporal shift in spike time estimate e.g. due to low sampling frequency, inaccurate spike sorting, or general noisy measurement due to animal movement etc. In the presence of ill-conditioning the outputs will be sensitive (unstable) to small input changes. If errors are included in some variables, the inference procedures will require information about the distributional properties of these errors. Second, optimized inference can give misleading results in the presence of ill-conditioning, caused by bad design or sampling.

      There will always exist a natural variability in the observations which necessitates the assessment of ill-conditioning before performing statistical analysis. Third, rounding errors can lead to small changes in input under ill-conditioning. This numerical problem is often not considered in neuroscience but will become evermore relevant when large-scale recordings require large-scale inferences.”

    1. Author Response

      Reviewer #1 (Public Review):

      HCN channels are atypically opened by the downward movement of gating charges during hyperpolarisation and have such weak coupling between the VSD and pore domain, and in the absence of an open state structure, extracting mechanistic information has been difficult. This manuscript is a continuation of a previous study on HCN channel gating that revealed how hyperpolarisation causes a downward movement of the VSD's S4, with breakage into two helices. The authors explore gating motions and the coupling between VSD and the pore domain using atomistic simulations. This includes microsecond MD with and without very strong -1V applied potentials to try to drive VSD-TMD changes to open the channel. In the end, however, the authors used a biased simulation approach (adiabatic bias) to enforce conformational change from resting to an open homology model of HCN based on hERG/rEAG. This microsecond simulation followed three interaction distances that were suggested to change between resting and open states based on free MD. This simulation caused pore opening and allowed a description of changes that may occur during gating, including a competition of S5-S6 and S6-S6 contacts and lipid binding locations, which may suggest lipid-dependent function and explain an unexpected closed structure at 0mV in micelles. While I feel the manuscript is written for the HCN expert audience, the mechanistic information in terms of hyperpolarisation-induced voltage gating makes it of much interest. The manuscript is presented at a high level, though there are a couple of points to address, including reproducibility of simulations and potential for more relation to experimental findings.

      We appreciate the comments, thank you, please find a detailed answer below.

      The authors carried out 1μs-MD simulations of the resting, activated, and a Y289D mutant at 0 mV, and then tried to drive the conformational change with a very large -1V voltage (double that studied previously). In 1 us MD, is the membrane stable with such a big voltage, as it would likely not be experimentally? Even with a volt applied, there was incomplete activation of the voltage sensors, despite timescales approaching that of activation.

      This reviewer is correct in cautioning against membrane rupturing effects in simulations with a voltage of this magnitude. We have indeed checked that the membrane and the protein remains intact under these conditions and can confirm that no poration occurs. As membrane poration is stochastic, it could indeed occur over microsecond timescales under 1V, but the probability remains low, and we were lucky to not face this situation herein. Note that whereas potentials of this magnitude could not be applied in experiments, they are relatively routinely used in MD simulations to speed up processes that are driven by changes in transmembrane potentials.

      Interestingly, other work from our lab (Rems et al. Biophysical Journal 119 (1) 190-205 (2020)) has shown that HCN1 voltage sensor domains are less prone to poration than those from other voltage sensor domains, for reasons that remain to be determined.

      Author Response Figure 1. Final snapshots from the simulations of the resting (blue), intermediate (yellow) and activated (red) states. The representation of the solvent (water+ions) in cyan showed no membrane poration at the end of the 1us simulations.

      For the pulling/ driving simulations (adiabatic bias MD) to change suspected interaction distances (V390-I302, N300-W281, and D290-K412), it seems to be just 1 simulation, without reproducibility. One has to wonder, if the simulation was redone from a very different initial conformation, would the results be the same (in addition to the distances themselves that were enforced by the ABMD). Moreover, the authors had to model the open state, such that the results depend on a homology model based on other CNBD channels, hERG / rEAG. Although the model stayed open for a microsecond, what other measures of accuracy of the homology model are there, such as preserved distances according to mutants/double mutants?

      The ABMD simulations were repeated, please refer to the response to essential revisions point 1 for details.

      For reasons mentioned by the reviewer as well as a reconsideration of our strategy to model channel opening, we have decided to omit homology models from the revised version of the paper.

      The authors find that activation involves hydrophobic forces that strengthen the intra-subunit S4/S5/S6 interface, as well as lipid headgroups that make contact with hydrophilic residues at this interface, with lipid tails also contributing to hydrophobic contacts. The authors see bending and rotation of the lower S4 and a displacement of S1 away from S4 that exposes the VSD-pore interface to lipids, with increased lipid contacts at S4 and S5 during activation. This indicates lipid tails may play a role in coupling in HCN1 and may explain the closed state micelle structure at 0mV. Two sites of lipid contact are identified, one engaging VSD residues and the other polar or charged residues on S5 and S6. No experiments are presented or proposed to test the predicted lipid sites. e.g. Mutation of key residues, such as the arginine and histidine seen binding lipid headgroups could be tested as proof of their involvement, or perhaps experiments with varied phosphate moieties? In the absence of new experiments, is there existing data that could help validate the findings?

      We thank this reviewer for this comment. As noted in the response to essential revisions point 3, such experiments are challenging, and have not been reported so far in HCN channels. We do agree that aspects of the mechanism we propose remain hypothetical awaiting further work, but are happy to report that importance of lipid interactions with the crucial salt bridge pair mentioned in the response to essential revisions point 3 has been completely independently validated, thus strengthening our mechanistic hypothesis substantially.

      During free MD simulation, the authors see tilting of S5 caused by activation of the Y289D mutation that brings D290 and K412 positions into proximity. How do we know that the adjacent mutant of Y289 to aspartate has not caused this, or was this interaction also seen in wild-type simulation? Fig.3c might suggest the wt activated simulation may see such an interaction, but it is unclear given the large C_alpha distances, as opposed to H-bonding distances.

      Indeed, Figure 3 appears to indicate that this interaction between D290 and K412 is present in the activated state when the mutation is reverted to the WT sequence. We have recalculated the interaction propensity using all atoms of the residues and present an updated Figure 3c in response.

      The authors predict that a D290-K412 salt bridge may be important for gating and sought to experimentally validate the interaction in the activated-open state using cysteine cross-bridging. As this is the only experimental backing in the paper, it is important to be able to judge its ability to report on the D290-K412 salt bridge. A comparison experiment demonstrating other crosslinks that do not favour the open state would have been helpful in this regard e.g. if crossbridging at similar locations (but not predicted to change interaction during gating) had little effect on I/Imax, then the result may be bolstered. Are there existing mutagenesis experiments that may suggest the importance of these residues (as well as for other key interaction distances identified)?

      Negative results in cross bridging and cysteine accessibility studies in general are difficult to interpret as the lack of a cadmium-specific effect may be due to inaccessibility of the site to cadmium, pairwise distance too far to bridge by cadmium, or bridging or the specified site without a functional effect. However, as reviewer 2 pointed out below, the Yellen group has performed extensive cross bridging experiments in the S4-S5 to Clinker region in spHCN and in most of these positions, the pairs favoring the open state are closer together in our models than pairs favoring the closed state or those without functional effect. We have added Videos 1-6 to highlight this comparison on our open state models and describe in our updated discussion section.

      Rotation of the V390 side chain from a position facing the pore lumen to a position facing I302 on S5 is coupled to an increase of the pore radius at V390, an increased hydration of the pore intracellular gate, and K+ ion movement. Perhaps 5 or 6 ions cross in that single simulation. As K channel ion permeation can depend critically on starting ion configs (as well as the model/force field), reproducibility of this finding is important but does not appear to have been tested. How can we be sure that periods of permeation or no permeation in individual simulations are reliable?

      As mentioned in our response to essential revisions point 1, we have modified the collective variable set used in ABMD, and repeated the simulations in 4 replicates. Whereas the number of permeation events is low in each simulation (Figure 4 S1), the consistency across repeats indicates that these open pore models indeed represent conductive states. Given how short the simulations are, however, it appears unreasonable to infer conductance values from these observations.

      Reviewer #3 (Public Review):

      In this work, Elbahnsi and colleagues use enhanced sampling MD simulation, to recapitulate step by step, the electromechanical coupling between VSD and the pore in HCN1 channels. Building on the available cryoEM structures of HCN1 with the VSD in resting and active state, the authors characterize by MD a subset of interactions that seemingly stabilize the open channel. This subset is, in turn, used in enhanced-sampling simulations to guide channel opening. The main findings are that S4 movement induces a rearrangement of the hydrophobic interaction at the level of S1- S4- and S5 interfaces. Occupancy of lipids seems therefore statedependent and highlights their regulatory role in HCN gating.

      The approach is rather innovative, and it apparently allows the reconstruction of the whole mechanism of gating, pushing the predictive power of MD simulation well beyond its actual temporal limitations. At the same time, the initial choice of interactions is crucial for this approach, because the result cannot differ from the inputs. And reading the paper it does not emerge clearly how the correctness of the reconstructed gating pathway can be verified, if not by functional validation.

      We thank the reviewer for this thoughtful review. It has pushed us to reconsider our approach to enhance the sampling of channel activation and gating. Please refer to the detailed response below as well as the response in particular to essential revisions point 1.

      Here are my comments on the main interactions that were used to feed the final MD simulation:

      1) W281-N300: this interaction, previously identified and studied in SpH channels (Ramentol et al, 2020; Wu et al, 2021), has been elegantly confirmed in this paper. Its inclusion in the initial subset seems appropriate. In the other two cases, the choice of interactions requires further explanations and experimental validation.

      2) D290 and K412: the validation of this interaction shown in Figure 3 and suppl Figure 1 is missing a control, i.e., the effect of the addition of Cd++ on the wt channel. Please add.

      We have performed the control suggested. Please also refer to the answer to essential revisions point 2.

      3) Modelling the open state of HCN1 pore (page 18), is done on the structure of the distantly related hERG rather than on the available open pore structure of HCN4. This choice is justified as follows by the authors:

      a) "Available structures in the CNBD channel family for which representative structures have been solved in closed and open states".

      b) "The structural mechanism of pore gating (i.e. the ⍺ to 𝜋 helix occurring at the glycine657 hinge in hERG) observed in rEAG/hERG may be a conserved gating transition in the CNBD family of channels"

      I encourage the authors to consider the following:

      a) The structure of hERG channel is not available in the closed/open configuration, indeed the comparison must be done with the closed configuration of the related channel rEAG. On the contrary, HCN4 is available in the closed/open configurations. Moreover, one of the open pore structures shows S4-S5-S6 in a very similar conformation to the lock open mutant (F186C/S264C) of HCN1 (Saponaro et al, 2021). With an available HCN4 open structure, forcing HCN1 to the open pore structure of hERG channel (which opens in depolarization and is not regulated by cAMP) seems not necessary.

      In response to this point, we reconsidered our approach and chose to instead use a biasing distance that is consistently increased in CNBD channels of resolved structures, that between neighboring and cross-subunits V390. We have detailed our rationale in the response to essential revisions point 1.

      To my knowledge, hERG is the only channel of the CNBD family for which the transition ⍺ to 𝜋 helix reported by the Authors, occurs in S6. It is not reported for other CNBD family members, in particular for the CNG channels mentioned by the Authors (Zheng et al., 2020; Xue et al., 2021, 2022). Task 4 (Zheng et al) does not show it. Its pore opens by a right-handed twist of S6 at glycine 399, a conserved glycine in all CNG. Human CNGA1 too, opens the pore by a rotational movement of S6 hinged at the equivalent glycine (glycine 385) (Xue et al, 2021). And the same occurs in the non-symmetrical channel CNGA1/B1 (Xue te al, 2022). So, it seems that CNG channels do not show the ⍺ to 𝜋 helix transition in the open pore. Moreover, hERG excluded, all other members of the CNBD family, CNG, EAG, and HCN4 included, do not bend at the hinge glycine 657 of hERG, but at another glycine (gly 648 in hERG numbering) located upstream. Further, their opening is due to a rotation of S6 associated with an outward movement, rather than to the lifting of the lower part of S6, as in hERG.

      After considering this reviewer’s comment, we were surprised to see that HCN1 is apparently prone to secondary structure deformation in S6, even when biasing the aforementioned distances, and thus enforcing no rotation at all in S6. We are intrigued by this observation and eagerly await experimental validation or disproval.<br /> In the meantime, we have made clear in the text that this hypothesis remains based exclusively on modeling work.

      4) V390-I302: this interaction is predicted to stabilize the open pore configuration and was included in the subset. The contact between V390 on S6 and I302 on S5 is observed in the homology model discussed above when the S6 is twisted at the glycine hinge, rotating the preceding residue (V390) out of its pore-lining position and is. Again, I can only disagree with this hypothesis because it has been experimentally demonstrated (Cheng et al, J Pharmacol Exp Ther. 2007 Sep;322(3):931-9) that the side chain of Valine390 is inside the cavity of the open pore of HCN1 channels as it controls the affinity for the pore blocker ZD7288.

      In accordance with other comments above, we have eliminated the bias applied to the V390I302 distance. However, the new ABMD simulations with bias applied to encourage dilation at position 390 still involve rotation of V390 away from the central pore axis, albeit with bending of S6 at the upper glycine mentioned by this reviewer. The degree of rotation is lower than in our previous simulations so that V390 still lines the inner vestibule in the open state, consistent with the observation that this position influences the apparent affinity of open pore blockers.

      In conclusion, modelling the open state pore of HCN1 on hERG rather than on that of HCN4 seems not justified based on accumulated evidence in the published literature. Therefore, the choice of the authors to use it as the open pore model of HCN1 channels needs to be experimentally validated. One possibility is to mutate the glycine hinge, gly391 in HCN1, into an Alanine in order to remove the flexible hinge. If this mutation alters pore gating, it will support the choice of the Authors.

      Once more, we thank the reviewer for the comments, which have led us to reconsider a larg part of our modeling work.

    1. Author Response:

      We would like to thank the reviewers for their thorough evaluation of the presented manuscript and herewith would like to address their comments and suggestions.

      This study was funded by a NSF-grant awarded to Prof. Celio. The animal experimentation license (including animal husbandry, breeding and experiments) that is required by law to perform animal experiments was also issued to Prof. Celio. Therefore, with the retirement of Prof. Celio, the funding for the project was discontinued and the animal license was terminated. We are thus unable to answer the reviewers’ open questions with follow-up experiments. We would however like to discuss some of the reviewers’ open questions or concerns and hope this might be insightful to the interested reader.

      Reviewer #1 (Public Review):

      “First, they reported that chemogenetic activation of Foxb1 hypothalamic cell groups led to tachypnea. The authors tend to attribute this effect to the activation of hM3Dq expressed in the parvofox Foxb1 but did not rule out the participation of the PMd Foxb1 cell group which may as well have expressed hM3Dq, particularly considering the large volume (200 nl) of the viral construct injected. It is also noteworthy that the activation of the Foxb1hypothalamic cell groups in this experiment did not alter the gross locomotor activity, such as time spent immobile state.”

      Because an AAV2 serotype was used for expression of the chemogenetic tools, the spread of viral infection was much more restricted to the injection site in chemogenetic animals than was observed the AAV5-based expression of optogenetic tools. The more restricted spread of viral infection with AAV2 serotypes has previously been shown by a range of other groups (e.g. see https://doi.org/10.3389/fnana.2019.00093). This limited spread of the AAV2 serotype in our chemogenetic animals, together with the absence of the very strong locomotor phenotype observed during optogenetic stimulation experiments makes us hypothesize, that the respiratory phenotype is largely attributable to the ParvafoxFoxb1 neurons.

      “In the second experiment, the authors applied optogenetic ChR2-mediated excitation of the Foxb1+ cell bodies' axonal endings in the dlPAG leading to freezing […]. Here it is important to consider that optogenetic ChR2-mediated excitation of the axonal endings is likely to have activated the cell bodies originating these fibers, and one cannot ascertain whether the behavioral effects are related to the activation of the terminals in the PAGdl or the cell bodies originating the projection.”

      We did not consider the possibility of backpropagation induced by optogenetic axon terminal stimulation at the time of experiments. We acknowledge that this is the major limitation of our optogenetic experiments that would have to be investigated with further animal experiments.

      Reviewer #2 (Public Review):

      “3) Fig. 5, a great effort has been made to illustrate the point that CCK and Foxb1 are differentially expressed. Why not just perform a double in situ experiment to directly illustrate the point?”

      We came across the publication in which the Cck-expressing PMd neurons’ control escape behaviors, only when we were drafting the manuscript. Because this was already after the retirement of Prof. Celio and we were not able to conduct further experiments involving animals, we leveraged on in silico methods and the publicly available high-quality dataset on the gene expression of the posterior hypothalamic area. The applied in silico method of dimensionality reduction and cluster assignment is well established and widely accepted. We believe in the quality of the dataset and the reliability of these in silico results but we agree with the reviewer that an alternative would have been to illustrate the expression patterns of Cck and Foxb1 by in-situ hybridisation.

      “4) Fig. 7 data on optogenetic stimulation on immobility and breathing, since not all mice showed the same phenotype, what is the criterion for allocating these mice to hit or no hit groups?"

      We defined the group allocation criteria in the section titled “Optogenetic modulation of Foxb1 terminal in the dlPAG induces immobility” as follows:

      “OnTarget_antPAG animals had the tip of the optic fiber implant located above the dlPAG at an anterior-posterior level AP-4.04mm (from bregma) or proxymal. The OffTarget group contains animals with fiber tips located below (i.e. ventral to) the dlPAG and/or located more distal than AP -4.04mm.”

    1. Author Response

      Reviewer #1 (Public Review):

      The manuscript of Parab et al. reports a beautiful phenotype analysis of the vascular brain/meningeal anatomy in a variety of reporter lines and mutants for Wnt/β-catenin signaling and angiogenic cues (Vegfaa, Vegfab Vegfc, Vegfd) during zebrafish development.<br /> The present study extends the previous work of the same Parab, Quick, and Matsuoka, that focused on fenestrated vessel formation in the zebrafish myelencephalic choroid plexus (mCP). Vegfs were shown to regulate fenestrated vessel formation in combination, but not individually, and with only little effect on neighboring non-fenestrated brain vessel development. The fenestrated endothelium is thus known to have specific angiogenic requirements.

      The scale of investigation has now changed, and fenestrated vessel formation has been examined throughout the brain, in both circumventricular organs (organum vasculosum of lamina terminalis) and other choroid plexuses (CPs) including the diencephalic CP and its interface with the pineal gland, the eye choroid (choriocapillaris), and the hypophysis vasculature. The original finding is that a regionspecific code of angiogenic cues controls fenestrated vessel formation. The authors show that fenestrated vessels form independently of Wnt/β-catenin signaling and BBB vascular development but require different combinations of Vegfa and Vegfc/d-dependent angiogenesis within and across brain regions. A previously unappreciated function of autocrine and paracrine Vegfc signaling is demonstrated in this brain region-specific regulation of fenestrated capillary development.

      Twenty-one different fish lines accurately genotyped and characterized and including a new Reck mutant, have been instrumental to conduct vascular pattern analysis, using confocal and stereomicroscopy imaging combined with transmission EM. High-quality illustration and robust quantification methods, previously validated, have been used. The study is well organized and reflects the high expertise and strong methodology of the investigators. Data are presented in nine dense figures and the contribution of angiogenic ligands to fenestrated vessel formation can hardly be studied more indepth.

      However, and this will be my only main concern, no information is provided on the regional diversity of angiogenic receptor expression that may correlate with the regional angiogenic factor code. Without asking for a spatial transcriptomic study, the combination of Vegfr-reporter lines or in situ hybridization with a combination of receptor probes would allow for generating a comprehensive set of ligand/receptor data relative to the regional angiogenic signaling pattern involved in fenestrated vessel formation.

      We appreciate this reviewer’s positive and encouraging comments highlighting both the quality and significance of our study. As we commented in response to the Essential Revisions point #1, we anticipate that a detailed expression analysis of all four Vegf receptors at different developmental stages during CP and CVO vascularization will be best addressed with new technologies combined with optimizations of existing tools/protocols. Thus, we have provided a paragraph of discussion on our perspectives for potential Vegf receptors involved in CP and CVO vascularization in the current study.

      We address each of the points raised by the reviewer below.

      Reviewer #2 (Public Review):

      Building on their previous studies, Parab et al used a larger collection of genetically modified zebrafish lines to map the precise expression domains of different VEGF isoforms in the brain and demonstrated that different combinations of VEGF isoforms differentially control the formation of fenestrated vessels at different locations in the brain.

      The authors used three Wnt signaling mutants to convincingly show wnt signaling is essential for parenchymal angiogenesis, but not required for fenestrated vessel development, such as those in choroid plexus, suggesting fenestrated vessel and barrier vessel are differentially regulated. The previous work from this group has established that VEGF isoforms are critical for myelencephalic choroid plexus development. In this study, they carefully documented the developmental vessel patterning in the diencephalic choroid plexus/pineal gland interface. They also documented the local expression pattern of VEGF isoforms with a set of BAC transgenic fish, together with the phenotype of a series of VEGF mutant fish, the data well support that different combinations of VEGF isoforms regulate fenestrated vessel development at different brain locations.

      Given a larger temporal and spatial domain, VEGFs are critical for all forms of vessel development, there are potential redundancy mechanisms to maintain hemostasis of VEGF signaling, in this study, no data is provided to address whether LOF of one form of VEGF affects the expression of other isoforms.

      This work provided detailed evidence of different isoform combinations of VEGF regulate formation/patterning of the fenestrated vessel at CP, OVLT, and NH in zebrafish. It will be interesting to follow in the mammalian system, how well these findings are conserved, for example, which isoform of VEGF is critical for vascular patterning during the developmental stages of the pineal gland? How VEGF isoforms participate in choroid plexus development at different ventricle regions and subsequence secretory function maintenance. However, these tasks are challenging without a good genetic tool to locally manipulate VEGF isoform expression during mammalian brain vessel development.

      We appreciate this reviewer’s favorable and encouraging comments highlighting both the quality and impact of our study. We also acknowledge the great importance of the points raised by the reviewer, including the Vegf redundancy mechanisms and also our results’ conservation in mammals.

      Reviewer #3 (Public Review):

      Parab et al. investigate the requirement of specific Vegf ligands during the embryonic development of new blood vessels in different brain regions. The authors implement their previously published experimental paradigm (Parab et al 2021 eLife) combined with new transgenic and mutant zebrafish lines to show that vegf ligands (vegfaa, vegfab, vegfc, and vegfd) are required in various combinations to drive angiogenesis in distinct brain regions. Specifically, they show that individual loss of different vegf ligands causes either undetectable or partial effects in angiogenesis, while combined loss of vegf ligands results in severe defects in brain region-specific angiogenesis. As different blood vessel types (i.e. arteries, veins, lymphatics) require specific angiogenic cues, this study provides interesting new data on how the combination of these signals drives brain region-specific vascular development.

      While the conclusions of the paper are generally well supported by the data, the authors overstate some of their findings, particularly with respect to the development of fenestrated capillaries. In this study, the authors use the zebrafish transgenic reporter line, plvap:EGFP, as an indicator of fenestrations. However, the authors do not provide any evidence of fenestrations of the blood vessels of the choroid plexuses or the cranial vessels used for quantification (Figures 1, 3, and 4). While expression of Plvap protein is often used as a marker for non-blood brain barrier endothelial cells, as Plvap is the major component of the diaphragms of fenestrated capillaries, plvap:EGFP expression alone does not indicate fenestrations. This is an important point because previous work has demonstrated that targeted deletion of Plvap does not cause a loss of fenestrations, but instead a loss of the diaphragms associated with fenestrations (Stan et al 2012 Dev Cell; Gordon et al 2019 Development). Similarly, Plvap expression alone does not necessarily indicate fenestrations as an expression of Plvap is not sufficient for fenestration formation. In fact, Plvap has initially been expressed in brain endothelial cells during initial angiogenesis to the brain without evidence of fenestrations, and subsequently, Plvap expression disappears during the maturation of the BBB. Thus, to conclude that specific vegf ligands are required for the development of fenestrated capillaries, transmission electron microscopy (TEM) should be used on the capillaries examined in this study or the language describing the results should be modified accordingly. Conversely, the authors did show TEM for the choriocapillaris (Figure 5A-C) but did not show plvap:EGFP expression in these vessels.

      Additionally, the authors' usage of the phrase "development of fenestrated vessels" suggests that the study was examining signals that regulate the formation of fenestrations and not angiogenesis of vessels that may become fenestrated as demonstrated here. Therefore, as Plvap expression does not necessarily equate fenestrations (and vice-versa), the title and some of the major claims of the study are somewhat overstated.

      We appreciate this reviewer’s constructive comments and suggestions to improve this study. We agree with the reviewer that the descriptions of our findings in the original manuscript were not strictly accurate in some aspects. We have now addressed the concern of the Tg(plvap:EGFP) reporter specificity by conducting additional molecular and functional characterizations of Tg(plvap:EGFP)+ vs Tg(glut1b:mCherry)+ brain vasculature, as we have commented in response to the Essential Revisions point #2. In addition, we have made substantial revisions in describing our findings, including 1) the change of the phrase "development of fenestrated vessels" into a more appropriate phrase and 2) the clarification of the primary focus of this manuscript on “angiogenesis/vascularization”. We believe that our revised manuscript now more clearly conveys the finding of signals involved in angiogenesis/vascularization of CP and CVO vascular beds.

    1. Author Response:

      The following is the authors' response to the original reviews.

      We are very glad that the editor and reviewers found our paper of broad interest to the community of population, evolutionary, and ecological genetics. We thank them for their positive feedback and insightful comments and suggestions. We have revised our manuscript to address some of the issues raised by the review. The main change we made was providing a detailed discussion of limitations of simulated genomes, focusing on considerations one needs to make when selecting a demographic model. This can be found in a new section “Limitations of simulated genomes” (pages 9-10). We made a few additional adjustments in other parts of the text based on the reviewers’ suggestions. They are all listed in the detailed point-by-point response to reviewers comments and questions below.

      Editor:

      1) It was noted that demographic models (or genomic parameters) that are inferred based on certain aspects of the genomic data (eg., site frequency spectrum, haplotype structure) may not recapitulate other aspects of the data. In other words, any inferred demographic models are expected to reliably reproduce only some aspects of the genetic variation data but not necessarily all. It would be helpful to emphasize this limitation in the manuscript and to include a table summarizing the types of variation that the demographic models for the catalogued species were based on.

      This is a very important point, which we addressed in the revision by adding a section entitled “Limitations of simulated genomes”. This section discusses the considerations that one should make when selecting an inferred demographic model to implement in simulation. This includes the samples used in analysis, the method used for inference, as well as various filters. In this section we also point to the documentation page of the stdpopsim catalog, which provides information about each demographic model that can help users decide whether it is appropriate for their needs. We decided not to summarize this information in a succinct table in the manuscript because it is not straightforward to summarize the strengths and potential limitations of each model in a table. Instead, we will expand the summary provided for each demographic model in the documentation page to provide additional information. See response to the second reviewer’s comment on this topic for more details.

      2) It will make stdpopsim more user-friendly to include an automated module that can visualize a demographic model given the corresponding parameters (or simulation scripts).

      As mentioned in the response to the first reviewer’s comment on this subject, the documentation page of the stdpopsim catalog provides a brief summary for each demographic model, including a graphical representation. See response below for more details.

      Reviewer #1:

      In the introduction, the authors cite numerous efforts to generate high-quality reference genomes. That's not an issue in itself, but leading with this might send the message to some readers that it is these reference genome efforts that are driving the need for population genomics analysis and simulation tools, which is not really the case - why not instead give some citation attention to actual population genomics projects aiming to address the types of evolutionary questions this paper is concerned with? The reference genome citations would fit better in the section dealing with reference genomes, where they already appear.

      Indeed, the desire to answer complex evolutionary questions is the main motivation for sequencing these genomes and also for generating realistic genome simulations. The reason we chose to lead with the genome-sequencing efforts is that high quality genome data is an important prerequisite for obtaining parameters for chromosome-scale simulations. So, with that perspective, these efforts which we cite are the driving force behind expansion of stdpopsim in the near future. Thus, we decided to leave these citations in the introduction. To balance things out, we now start the introduction with a statement about board questions in population genetics. Moreover, after we list the genome sequencing efforts, we added a list of specific types of questions that can be addressed by these newly emerging genomes, with relevant citations. The beginning of the introduction now reads:

      “Population genetics allows us to answer questions across scales from deep evolutionary time to ongoing ecological dynamics, and dramatic reductions in sequencing costs enable the generation of unprecedented amounts of genomic data that can be used to address these questions (Ellegren, 2014). Ongoing efforts to systematically sequence life on Earth by initiatives such as the Earth Biogenome (Lewin et al., 2022) and its affiliated project networks, such as Vertebrate Genomes (Rhie et al., 2021), 10,000 Plants (Cheng et al., 2018) and others (Darwin Tree of Life Project Consortium, 2022), are providing the backbone for enormous increases in the amount of population-level genomic data available for model and non-model species. These data are being used, among other things, in inference of population history and demographic parameters (Beichman et al., 2018), studying adaptive introgression (Gower et al., 2021), distinguishing adaptation from drift (e.g. Hsieh et al., 2021), and understanding the implications of deleterious variation in populations of conservation concern (e.g. Robinson et al., 2023).”

      Something that would be useful for the stdpopsim resource in general, though not necessarily something for the paper, would be some kind of more human-friendly representation of the demographic models implemented in the curated library. Perhaps I'm not looking in the right place, but as far as I can tell, if I want to study the curated demographic models, I need to go into the Python scripts on the stdpopsim GitHub page (e.g.

      https://github.com/popsim-consortium/stdpopsim/tree/main/stdpopsim/catalog/BosTau). Here the various parameters and demographic events are hard-coded into the scripts. To understand the model being implemented, one thus needs to go dig into these scripts - something which is not necessarily very accessible to all researchers. Visual representations, such as the one for Anopheles gambiae in Fig 2. in the paper, are more widely accessible. I wonder if such figures could be produced for all the curated models and included in the GitHub folders alongside the scripts, perhaps aided by an existing model visualization software such as POPdemog. Again, I would not suggest that this is necessary for the paper, but if practically feasible I think it would be a useful addition to the resource in the longer term.

      This is a very good point. The stdpopsim catalog actually has a documentation page that provides a brief summary for each demographic model, including a graphical representation. This graphical representation is generated using demesdraw applied to the demographic model object implemented in the code. Thus, potential users do not have to dig through the Python code to figure out the details of the demographic model. We used a similar approach to generate the image of the demographic history of A. gambiae for Fig. 2 of the paper. The documentation page is an important part of the stdpopsim catalog, and we now added a link to it in section “Data availability”, and we mention it in key places in the manuscript, such as the caption of Fig 2.

      Reviewer #2:

      An important update to the stdpopsim software is the capacity for researchers to annotate coding regions of the genome, permitting distributions of fitness effects and linked selection to be modeled. However, though this novel feature expands the breadth of processes that can be evaluated as well as is applicable to all species within the stdpopsim framework, the authors do not provide significant detail regarding this feature, stating that they will provide more details about it in a forthcoming publication. Compared to this feature, the additions of extra species, finite-site substitution models, and non-crossover recombination are more specialized updates to the software.

      It would be helpful to provide additional information regarding the coding annotation (and associated distribution of fitness effects and linked selection) that is implemented in the current version of stdpopsim, but will be detailed in a forthcoming paper. This is not to take away from the forthcoming paper, but I believe this is the most important update to the software, and the current manuscript only brushes over it.

      We agree that implementation of selection in simulations is a significant addition to stdpopsim. However, our intention in this manuscript is to focus on the separate effort we made in the last two years to expand the utility of stdpopsim to a more diverse set of species. We think the manuscript stands firmly even without discussing in detail the new features that allow modeling selection. The main reason we briefly mention these features in sections “Additions to stdpopsim” and “Basic setup for chromosome-level simulations” is because the released version of stdpopsim contains implemented DFEs for a few species, and we did not want to completely ignore this. We thus added a brief comment at the end of the “Basic setup” section (page 8) mentioning the three model species for which the stdpopsim catalog currently has annotations and implemented DFE models. We think that a more detailed description of how these features and how they should be used is best left to the manuscript that the PopSim community is currently writing (preprint expected later this year).

      When it comes to simulating realistic genomic data, the authors clearly lay out that parameters obtained from the literature must be compatible, such as the same recombination and mutation rates used to infer a demographic history should also be used within stdpopsim if employing that demographic history for simulation. This is a highly important point, which is often overlooked. However, it is also important that readers understand that depending on the method used to estimate the demographic history, different demographic models within stdpopsim may not reproduce certain patterns of genetic variation well. The authors do touch on this a bit, providing the example that a constant size demographic history will be unable to capture variation expected from recent size changes (e.g., excess of low-frequency alleles). However, depending on the data used to estimate a demographic history, certain types of variation may be unreliably modeled (Biechman et al. 2017; G3, 7:3605-3620). For example, if a site frequency spectrum method was used to estimate a demographic history, then the simulations under this model from y stdpopsim may not recapitulate the haplotype structure well in the observed species. Similarly, if a method such as PSMC applied to a single diploid genome was used to estimate a demographic history, then the simulations under this model from stdpopsim may not recapitulate the site frequency spectrum well in the observed species. Though the authors indicate that citations are given to each demographic model and model parameter for each species, this may not be sufficient for a novice researcher in this field to understand what forms of genomic variation the models may be capable of reliably producing. A potential worry is that the inclusion of a species within stdpopsim may serve as an endorsement to users regarding the available simulation models (though I understand this is not the case by the authors), and it would be helpful if users and readers were guided on the type of variation the models should be able to reliably reproduce for each species and demographic history available for each species. It would be helpful to include a table with types of observed variation that the current set of 21 species (and associated demographic histories) are likely and unlikely to recapitulate well.

      This is a very important point, which we now address in the section “Limitations of simulated genomes”, which we added to the manuscript. In this section, we expand on this topic and discuss various things that will affect the way simulated genomes reflect true sequence variation. This includes the choice of demographic inference method, but also the analyzed samples, and various filters. The main message of this section is that one should consider various things when deciding to implement a demographic model in simulation (or selecting a model among those implemented in stdpopsim). We also cite studies (including Beichman, et al. 2017), which compared different approaches to demography inference. However, we note that the conclusions of these comparisons are not as straightforward as the reviewer suggests. In particular, methods that make use of the site frequency spectrum (such as dadi) should be able to capture some aspects of haplotype structure, because this information is encoded in the demographic history. Furthermore, a demographic history inferred from a single genome (e.g., using PSMC) should do a reasonable job approximating some aspects of the site frequency spectrum. In other words, the aspects of genetic variation not modeled well by a given demographic inference method are not always predicted in a straightforward way. This is why we avoid summarizing this information in a table in the manuscript. The 2nd paragraph of the “Limitations of simulated genomes” section addresses some of these subtle considerations. In particular, we suggest that considering a demographic model for simulation requires some familiarity with the inference method and the way it was applied to data. Regarding the demographic models currently implemented in stdpopsim, we provide some information about each model in the documentation page of the catalog. When selecting a demographic model from the catalog, users should make use of this documentation to guide their decision. This is mentioned in the 3rd paragraph of the “Limitations of simulated genomes” section. Following-up on this issue, we intend to review the documentation and make sure it provides sufficient information for each demographic model. See this GitHub issue.

      Reviewer #3:

      - p5, 2nd paragraph: I think many Biologists, myself included, will think of horizontal gene transfer mostly as plasmids being transferred among bacteria and adding extra genetic material, not as homologous bacterial recombination. This made me confused about modelling horizontal gene transfer in the same way as gene conversion. It may be helpful for some readers if you specify that you are modelling this particular type of horizontal gene transfer. Some explanation along the lines of what is in Cury et al (2022) would be enough.

      This is a good point. We modified the text in that sentence in the 2nd paragraph on page 5 to clarify that we are modeling non-crossover homologous recombination, and not incorporation of exogenous DNA (e.g., via plasmid transfer). The relevant part of the text now says:

      “In bacteria and archaea, genetic material can be exchanged through horizontal gene transfer, which can add new genetic material (e.g., via the transfer of plasmids) or replace homologous sequences through homologous recombination (Thomas and Nielsen, 2005; Didelot and Maiden, 2010; Gophna and Altman-Price, 2022). However, the initial version of stdpopsim used crossover recombination to stand in for these processes. Although we cannot currently simulate varying gene content (as would be required to simulate the addition of new genetic material by horizontal gene transfer), the msprime and SLiM simulation engines now allow gene conversion, which has the same effect as non-crossover homologous recombination.

      Following (Cury et al., 2022), we use this to include non-crossover homologous recombination in bacterial and archaeal species.”

      - p5, 3rd paragraph: When you say gene conversion is turned off by default, you could refer to table 1 and briefly mention the consequence of ignoring gene conversion.

      We agree that it is important to note that avoiding to model gene conversion may lead to faulty lengths of shared haplotypes across individuals. This is implied by the statement we make in the beginning of the 3rd paragraph on page 5, where we lay out the motivation for modeling gene conversion in simulation. Following the reviewer’s suggestion, we now added a statement about this in the end of that paragraph:

      “Note that ignoring gene conversion may result in a slightly skewed distribution of shared haplotypes between individuals (see Table 1)”

      -  p7, item 1 and p9, 1st paragraph: I am not sure what you mean by genetic map here, can you define this term? I am not sure if it is synonymous with gene annotations, a recombination map, or something else. The linkage map doesn't seem to make sense to me here.

      The term ‘genetic map’ referred to the recombination map whenever it was used in the manuscript. To avoid any confusion, we now removed all mentions of ‘genetic map’, and use ‘recombination map’ instead. The recombination map is relevant in item 1 of page 7 because in species with poor assemblies you will not be able to reliably estimate recombination maps, making chromosome-scale simulations less effective. In the 1st paragraph of page 9, we discuss the issue of lifting over coordinates from one assembly to another, and if you have a recombination map estimated in one assembly, you might need to lift it over to another assembly to apply it in your simulation.

      -  Table 1, last row, middle column: when you say "simulated population", I think it is a bit ambiguous. You mean "the true population that we are trying to simulate", but could be read as "the population data that was generated by simulation". I would delete the word simulated here.

      What we mean here is that the selected effective population size should reflect the observed genetic diversity in real genomic data. We realize that the previous wording was confusing, and changed this to the following:

      “Set the effective population size (Ne) to a value that reflects the observed genetic diversity”

      -  Figure 2, and other places when you refer to mutation and recombination rate (eg p11, last paragraph), can you include the units (e.g. per base pair, per generation)?

      Throughout the manuscript, rates are always specified per base per generation. In Figure 2, this is specified in the caption (3rd line). We added units in other places in section “Examples of added species” on pages 12-13, where they were indeed missing.

      -  p11, "default effective population size": can you use a more descriptive word instead of the default? Maybe the historical average? Also, what is this value used for in the simulations when there is a demographic model specified (as in the case of Anopheles)?

      We think that “default effective population size” is the most appropriate term to use here, since we are referring to the parameter in the species model in stdpopsim. It is correct that the value of this parameter should reflect the historical average size in some sense, but it is really unclear what this should be in the case of a species like Bos taurus, which experienced a very dramatic bottleneck in the recent past. We address this subtle, yet important, issue in the sentence preceding this one. If a demographic model is specified in simulation, it overrides the default effective population size, and its value is ignored (which is why we refer to it as ‘default’). We added a short sentence clarifying this in the 2nd paragraph of the “Bos Taurus” section (now page 12).

      “Note that the default Ne is only used in simulation if a demographic model is not specified.”

      -  p8, when you say "Such simulations are useful for a number of purposes, but they cannot be used to model the influence of natural selection on patterns of genetic variation.": You may want to bring up the discussion that many of these neutral parameters taken from the literature could have been estimated assuming genome-wide neutrality, and thus ignoring the effect of background selection. Therefore the parameter values might reflect some effect of background selection that was unaccounted for during their estimation.

      This is an important subtle point, which we now address in the section “Limitations of simulated genomes”, which we added to the revised manuscript. In that section, we discuss various limitations of simulations, focusing on inferred demographic models. We address the potential influence of the segments selected for analysis toward the end of 2nd paragraph in that section (page 9):

      “... all methods assume that the input sequences are neutrally evolving. This implies that technical choices, such as the specific genomic segments analyzed and various filters, may also influence the inferred model and its ability to model observed genetic variation.”

      Interestingly, background selection in itself typically does not have a strong effect on the inferred model. This is something that is examined in the forthcoming publication that presents simulations with natural selection in stdpopsim.

      -  Why are some concepts written in bold (eg effective population size, demographic model)? Were you planning to make a vocabulary box? I think this is a good idea given that you are aiming for a public that can include people who are not very familiar with some population genetics concepts.

      In the “Examples of added species” section, we use boldface fonts to highlight the model parameters that were determined for each species. We added a statement clarifying this in the beginning of this section (page 11), and made sure that all the relevant parameters were consistently highlighted throughout this section. In other sections, we use boldface fonts only for titles. A few cases that did not conform to this rule were removed in the current version. We did not intend on adding a vocabulary box, but considered this when revising the manuscript, due to the reviewer’s suggestion. However, we found it difficult to converge on a small (yet comprehensive) set of terms with accurate and succinct definitions. We think that the important terms are adequately defined within the text of the manuscript, providing sufficient information also for readers who are not expert population geneticists.

      - p4, 2nd paragraph: Are these automated scripts that are used to compare models publicly available? If you are suggesting that people use this approach generally when coming up with a simulation model (p8, penultimate paragraph), it would be helpful to have access to these automated scripts.

      The scripts are part of the public stdpopsim repository on GitHub, and may be used by anyone. Some components of these scripts are more easy to apply in general, such as comparing a demographic model with one implemented separately by the reviewer. This step, for example, is achieved by application of the Demography.is_equivalent method in msprime. Other parts of the comparison depend on the specific structure of python objects used by stdpopsim, so they are not likely to be useful when implementing simulations outside the framework of stdpopsim.

      -  p9, 1st paragraph, and p.12 2nd paragraph: instead of adjusting the mutation rate to fit the demographic model (and using an old estimate of the mutation rate), would it be ok to adjust the demographic model to fit the new mutation rate? E.g. with a new mutation rate that is the double of a previous estimate, would it be ok to just divide Ne by 2 such that Ne*mu is constant (in a constant population size model)? I imagine this could get complicated with population size changes.

      In principle, this could be done if you were simulating neutrally evolving sequences (without modeling natural selection). Since the coalescence is scale-free, then you can scale down all population sizes and divergence times by a multiplicative factor, and scale up migration rates and the mutation rate by the same factor, and you get the exact same distribution over the output sequences. However, making sure you get the scaling right is tricky and is quite error-prone. Especially considering the fact that you have to do this every time the mutation rate of a species is updated. Moreover, once you start modeling natural selection, this scale-free property no longer holds. Thus, the simple solution we came up with in stdpopsim is to attach to each demographic model the mutation rate used in its inference.

    1. Author Response:

      The following is the authors' response to the original reviews.

      We sincerely thank all the editors and reviewers for taking the time to evaluate this study. Here is our point-by-point response to the reviewers’ comments and concerns.

      Reviewer #1 (Public Review):The study by Oikawa and colleagues demonstrates for the first time that a descending inhibitory pathway for nociception exists in non-mammalian organisms, such as Drosophila. This descending inhibitory pathway is mediated by a Drosophila neuropeptide called Drosulfakinin (DSK), which is homologous to mammalian cholecystokinin (CCK). The study creates and uses several Drosophila mutants to convincingly show that DSK negatively regulates nociception. They then use several sophisticated transgenic manipulations to demonstrate that a descending inhibitory pathway for nociception exists in Drosophila.

      […]

      Weaknesses:

      A minor weakness in the study is that it is unclear how DSK negatively regulates nociception. An earlier study at the Drosophila nmj shows that loss of DSK signaling impairs neurotransmission and synaptic growth. In the current study, loss of CCKLR-17D1 in Goro neurons seems to increase intracellular calcium levels in the presence of noxious heat. An interesting future study would be the examination of the underlying mechanisms for this increase in intracellular calcium.

      We thank the reviewer for the kind and very positive evaluation of our manuscript. We agree that this study has not elucidated the intracellular molecular pathway(s) downstream of CCKLR-17D1 that are involved in the regulation of the activity of Goro neurons, and we think that it would definitely be an interesting topic for future research.       

      Reviewer #1 (Recommendations For The Authors):

      The response latencies for the control yw larvae seem large, with many larvae appearing to be insensitive to the thermal stimulus. Is this just an effect of the yw genetic background? A brief discussion of this might be helpful.

      We thank the reviewer for pointing this out. We have also noticed that the yw control larvae tend to show longer response latencies than the other control strains, and in the revised manuscript, we have added the following sentence in the Result section (Lines 91–94):

      “We have noticed that the yw control strain, which was used by us to generate the dsk and receptor deletion mutants, showed relatively longer response latencies to the 42 °C probe compared to the other control strains used in this study. This may be attributed to the effect of the genetic background, although, presently, the cause for this difference is unknown.”

      Reviewer #2 (Public Review):_

      This is an exceptional study that provides conclusive evidence for the existence of a descending pathway from the brain that inhibits nociceptive behavioral outputs in larvae of Drosophila melanogaster. […] The study raises many interesting questions for future study such as what behavioral contexts might depend on this pathway. Using the CAMPARI approach, the authors do not find that the DSK neurons are activated in response to nociceptive input but instead suggest that these cells may be tonically active in gating nociception. Future studies may find contexts in which the output of the DSK neurons is inhibited to facilitate nociception, or contexts in which the cells are more active to inhibit nociception._

      Reviewer #2 (Recommendations For The Authors):I have no recommendations for the authors as this is a very complete and thoroughly executed study. The writing is crystal clear.

      We thank the reviewer for the kind and very positive evaluation of our manuscript. We are happy to know that our current manuscript was deemed to be clear and convincing by the reviewer.

      Reviewer #3 (Public Review):[…] Overall the authors use clean logic to establish a role for DSK and its receptor in regulating nociception. I have made a few suggestions that I believe would strengthen the manuscript as this is an important discovery.

      Major comments:

      1. It's not completely clear why the authors are staining animals with an FLRFa antibody. Can the authors stain WT and DSK KO animals with a DSK antibody? Also, can the authors show in supplemental what antigen the FLRFa antibody was raised against, and what part of that peptide sequence is retained in the DSK sequence? This overall seems like a weakness in the study that could be improved on in some way by using DSK-specific tools.

      We thank the reviewer for this query. We would like to clarify that we first tried the FLRFa antibody to visualize an RFamide-type neuropeptide other than DSK in Drosophila and found that the staining pattern is quite similar to that of anti-DSK, as shown by Nichols et al. [1]. According to the original paper describing the anti-FLRFa antisera [2] (already cited in the reviewed manuscript), the antigen used to raise it was the Phe-Met-Arg-Phe-NH2 peptide conjugated with succinylated thyroglobulin, and the study experimentally shows that the antibody well binds to peptides containing Met-Arg-Phe-NH2 or Leu-Arg-Phe-NH2 sequence and has 100% cross-reactivity to FLRFa. As DSK contains Met-Arg-Phe-NH2 sequence [3], the cross-reaction of this antibody to DSK is consistent with the description of the original study.    

      Although we were unable to use an antibody specific to DSK, our staining data with dsk deletion mutants and the expression pattern of DSK-2A-GAL4 corroborate each other (Figure 2 and Figure 2-figure supplement 1), which we believe provides compelling evidence for the specific expression of DSK in MP1 and Sv neurons, and for that DSK-2A-GAL4 is a reasonably effective tool to specifically manipulate DSK-expressing neurons.

      2. What is the phenotype of DSK-Gal4 x UAS-TET animals? They should be hyper-reactive. If it's lethal maybe try an inducible approach.

      We thank the reviewer for this question. Unfortunately, we have not attempted this experiment, although we agree that this would be a nice addition to further strengthen the study if TET worked well in the DSKergic neurons.

      3. Figure 9. This was not totally clear, but I think the authors were evaluating spontaneous (i.e. TRPA1-driven) rolling at 35C. The critical question is "does activating DSK-expressing neurons suppress acute heat nociception" and this hasn't really been addressed. The inclusion of PPK Gal4 + DSK Gal4 in the same animal kind of clouds the overall conclusions the reader can draw. The essential experiment is to express UAS-dTRPA1 in DSK-Gal4 or GORO-Gal4 cells, heat the animals to ~29C, and then test latency to a thermal heat probe (over a range of sub and noxious temperatures). Basically prove the model in Figure 10 showing ectopic activation or inhibition for each major step, then test heat probe responses.

      We thank the reviewer for suggesting ideas for alternative experiments to potentially strengthen our conclusion. Regarding experiments using heat probes, previous studies have demonstrated that (i) Blocking ppk1.9-GAL4-positive C4da neurons almost completely abolishes the larval nociceptive response to local heat stimulations [4]; (ii) Local heat stimuli above 39 °C readily activate C4da neurons and larval nociceptive rolling [5-9]; and (iii) Thermogenetically or optogenetically activating these neurons is sufficient to trigger Goro neurons and larval rolling [4, 10-12]. Thus, it has now been made clear that heat probes induce larval nociceptive rolling via excitation of the C4da pathway, and we believe that our experiments using thermogenetic activation of C4da neurons can be safely interpreted as an alternative to experiments using heat probes. Using heat probes demands a more complicated experimental set-up to be combined with CaMPARI imaging experiments, and this is another reason why we preferred to take the thermogenetic approach.

      We have also considered the experiment using Goro-GAL4 instead of ppk-GAL4. However, if dTRPA1 artificially activates Goro neurons far downstream of the neuronal mechanism by which MP1 activation suppresses Goro neuron activity, the effect of MP1 activation may be bypassed and masked. As we currently do not know the epistasis between dTRPA function and the effect of MP1 activation in modulating the activity of Goro neurons, we rather chose to activate C4da neurons by using ppk-GAL4, which likely resulted in more natural activation of Goro neurons than dTRPA1-triggered direct activations.

      4. It would also then be interesting to see how strong the descending inhibition circuit is in the context of UV burn. If this is a real descending circuit, it should presumably be able to override sensitization after injury.

      Indeed, this is an interesting avenue to explore in future studies to understand the type of situation in which the DSKergic descending system functions to control nociception.

      Reviewer #3 (Recommendations For The Authors):Overall this is a good story and the claims are generally supported with experimental evidence. The way to really improve this study would be to use more precise and definitive tools, like specific antibodies, specifically targeted genes, and better temporal control of the descending circuit to prove this is inducible sufficient to suppress acute thermal nociception and this occurs only via a descending pathway, etc. However this would be exponentially more work, and so the authors I guess need to weigh the cost-benefit of definitive proof vs. strong evidence for their claims. Overall I think this study will be the beginning of a new line of inquiry in the field that has the potential to guide our understanding also of mammalian descending pathways, and as such, this study is of value to the community.

      We appreciate the reviewer’s multiple interesting ideas for experiments that could have been performed to further reinforce our findings. We agree that some experiments that the reviewer suggested would potentially strengthen this work if supplemented. However, as aforementioned, in our humble opinion, we think that the experiments that the reviewer suggested are either outside the scope of this paper or have no significant benefits over the experiments that were already conducted, and hence are not essential to the present study.

      References

      1. Nichols, R. and I.A. Lim, Spatial and temporal immunocytochemical analysis of drosulfakinin (Dsk) gene products in the Drosophila melanogaster central nervous system. Cell Tissue Res, 1996. 283(1): p. 107-16.

      2. Marder, E., et al., Distribution and partial characterization of FMRFamide-like peptides in the stomatogastric nervous systems of the rock crab, Cancer borealis, and the spiny lobster, Panulirus interruptus. J Comp Neurol, 1987. 259(1): p. 150-63.

      3. Nassel, D.R. and M.J. Williams, Cholecystokinin-like peptide (DSK) in Drosophila, not only for satiety signaling. Front Endocrinol, 2014. 5.

      4. Hwang, R.Y., et al., Nociceptive neurons protect Drosophila larvae from parasitoid wasps. Curr Biol, 2007. 17(24): p. 2105-2116.

      5. Tracey, W.D., Jr., et al., painless, a Drosophila gene essential for nociception. Cell, 2003. 113(2): p. 261-73.

      6. Xiang, Y., et al., Light-avoidance-mediating photoreceptors tile the Drosophila larval body wall. Nature, 2010. 468(7326): p. 921-6.

      7. Burgos, A., et al., Nociceptive interneurons control modular motor pathways to promote escape behavior in Drosophila. eLife, 2018. 7.

      8. Honjo, K. and W.D. Tracey, Jr., BMP signaling downstream of the Highwire E3 ligase sensitizes nociceptors. PLoS Genet, 2018. 14(7): p. e1007464.

      9. Im, S.H., et al., Tachykinin acts upstream of autocrine Hedgehog signaling during nociceptive sensitization in Drosophila. eLife, 2015. 4: p. e10735.

      10. Ohyama, T., et al., A multilevel multimodal circuit enhances action selection in Drosophila. Nature, 2015. 520(7549): p. 633-9.

      11. Honjo, K., R.Y. Hwang, and W.D. Tracey, Jr., Optogenetic manipulation of neural circuits and behavior in Drosophila larvae. Nat Protoc, 2012. 7(8): p. 1470-8.

      12. Zhong, L., et al., Thermosensory and non-thermosensory isoforms of Drosophila melanogaster TRPA1 reveal heat sensor domains of a thermoTRP channel. Cell Rep, 2012. 1(1): p. 43-55.

    1. Author Response:

      The following is the authors' response to the original reviews.

      We’d like to thank the three reviewers for reviewing in depth our work and providing insightful comments and suggestions.

      Reviewer #1 (Recommendations For The Authors):

      1) The evidence that MS023 is actually working in vivo in their last experiment (Fig 6) needs to be strengthened. This could be due to the timing of the experiment. Tail tips were collected 48 h after the final injection and analyzed by Western for ADMA and SDMA levels. They do see subtle changes, in the right directions, of SDMA and ADMA (but these changes are really not very obvious). Perhaps the inhibitor has already been largely metabolized two days after injection. Have they looked at MMA levels?

      We have quantified the ADMA and SDMA levels of Fig. S6. We have not measured MMA levels. The text has been edited as follows:

      “The average ADMA relative expression was 0.95 for vehicle treated mice and 0.83 for MS023 treated mice (p < 0.00041). The average SDMA relative expression was 0.92 for vehicle treated mice and 0.94 for MS023 treated mice (p < 0.17). These whole-body measurements as measured by tail biopsies show MS023 promotes the decrease of proteins with ADMA and a slight increase in proteins with SDMA. It is known that inhibition of type I PRMTs or PRMT1 deletion diminishes ADMA and increases SDMA due to substrate scavenging (Dhar et al, 2013).”

      2) The authors need to explain why they would expect an increase in SDMA levels in these mice after MS023-treatment. 

      We have edited the text as follows:

      “It is known that inhibition of type I PRMTs or PRMT1 deletion diminishes ADMA and increases SDMA due to substrate scavenging (Dhar et al, 2013).”

      3) In the discussion, it would be valuable to address the types of CRISPR-screens that could be performed in these MS023-expanded MSCs. They mention this as a benefit in the introduction, but to expand on this idea in the discussion.

      The idea here was not necessarily to perform a CRISPR screen on the MS023-treated cells (although it is an interesting idea), but rather to correct the genetic mutation by CRISPR-Cas9 to enhance the success of genetically corrected autologous cell transplantation. The addition of MS023 to MuSC in vitro would allow to expand the cells while maintaining their self-renewal potential, thereby providing the opportunity to correct the mutation on the dystrophin gene using technologies such as CRISPR prime editing (Mbakam et al., 2022 Mol Ther Nucleic Acids 30:272-285). Our results demonstrating that MS023 enhances cell engraftment suggest that this method could be used to improve autologous cell transplantation efficiency. We have edited the text in the discussion as follows:

      “Our findings suggest that type I PRMT inhibitors may have therapeutic potential for treating certain skeletal muscle diseases. For instance, to improve the efficacy of autologous cell therapy, the dystrophin-deficient MuSCs collected from DMD patient and corrected by CRISPR prime editing (Happi Mbakam et al, 2022) could be treated with MS023 to maintain their stemness and enhance their cell engraftment capacity.”

      4) Also, could they address the potential value of MSC culture and expansion using a combination of SETD7 inhibition and PRMT1 inhibition?

      Agreed. We have edited the text as follows:

      “These findings suggest that inhibiting methyltransferases can affect MuSC fate and perhaps a combination of Setd7 and MS023 inhibitors would provide a more favorable combination to promote the expansion of MuSCs while maintaining their stem cell-like properties.”

      Reviewer #2 (Recommendations For The Authors): 

      In figure 2 the authors show that upon removal of MS023, the cells differentiate more efficiently. In figure 5E-F they show that the mice that received MS023-treated cells had more GFP mature muscle fibers. However, in figure 5C-D these cells have the same capacities to terminally differentiate. This reviewer was wondering if these cells would differentiate faster? Have the authors look into this?

      The reviewer raises an interesting point. Our in vitro experiments shown in Supplemental Figure S1 indicate that MS023-treated cells are actively more cycling (more ki67+ cells) and are less committed to differentiation (less Pax7-MyoD+ cells), which would suggest that they would need to exit the cell cycle and differentiate faster to reach the same fusion capacity after 3 days of differentiation without MS023. Future experiments with a time course including earlier time points will be needed to confirm if these cells differentiate faster.

      Reviewer #3 (Recommendations For The Authors): 

      1) MS023 is a non-selective inhibitor of type I PRMTs. It has comparable IC50 values for PRMT1 and PRMT4 (CARM1), and lower IC50 values for PRMT6 and PRMT8. The authors argue that the cellular phenotype caused by MS023 is solely mediated via PRMT1, since the specific PRMT4-inhibitor TP-064 has no effects on MuSC expansion. TP-064 treatment was not used as a control for the transplantation and muscle strength measurement experiments. Are PRMT6 and PRMT8 expressed in MuSC and are thy inhibited by the applied concentrations of MS023? Kawabe et al reported that CARM1 methylates Pax7, thereby inducing Myf5 transcription during the asymmetric division of MuSC (PMID: 22863532). Is the expression of Myf5 reduced upon MS023 treatment? scRNAseq of MuSC 4-day after culture is too late to address this question, since the majority of the cells are already committed to differentiation. Staining for Myf5 using ex vivo cultured fibers or regenerating muscles in vivo should be used. 

      Indeed, we mention throughout the text that MS023 is a type I PRMT inhibitor. We have edited the text as follows suggesting the effect are most likely mediated by inhibition of PRMT1 in vivo.

      “Treatment of MuSCs with MS023 resulted in metabolic reprogramming of MuSCs, supporting a role for type I PRMTs as metabolic regulators. In vitro, MS023 has been shown to inhibit several type I enzymes at nM concentrations (Eram et al., 2016). It is well-documented that PRMT1 is the major cellular type I enzyme (Pawlak et al, 2000) and this is why PRMT1, but not the other type I PRMTs are embryonic lethal in mice (Guccione & Richard, 2019). The numerous published data in cellulo with MS023 are thus far only reproduced by PRMT1-deficiency by siRNA or knockout, suggesting that MS023 actions in vivo are predominantly mediated by inhibiting PRMT1 (Gao et al, 2019; Plotnikov et al, 2020; Wu et al, 2022; Zhu et al, 2019). Thus, the effects of MS023 on MuSCs are most likely mediated by inhibition of PRMT1.”

      Moreover, we investigated the expression of other type I PRMTs as suggested by the reviewer. We investigated their expression from publicly available single cell RNAseq dataset (Oprescu SN et al, iScience 2020, 23:100993), which performed analysis on skeletal muscle at different time points post-cardiotoxin injury (uninjured, and 0.5, 2, 3.5, 5, 10, 21 days post-injury). The findings show that Prmt1 is by far the most expressed type I PRMT in MuSCs at every time point tested. Carm1 (Prmt4) is expressed at high level in a small/moderate subset of cells, especially during regeneration. Prmt6 is expressed at low level in a small proportion of cells, while Prmt8 expression was not detected. These findings are coherent with our observation that Prmt1 is the predominant type I Prmt in MuSCs, which further support our hypothesis that it is the main target of MS023. These findings were added in Suppl. Fig 1B.

      The expression of Myf5 during asymmetric division is indeed well characterized on muscle fiber-associated MuSCs (Dumont et al., 2015 Nat Med 21:1455; Kawabe et al., 2012 Cell Stem Cell 11:333). As the reviewer states, the 4-day time point is too late to investigate Myf5 expression. Additionally, these cells were cultured ex-vivo and were not fiber-associated. Therefore, scRNAseq is not an ideal method to address the question of whether MS023 treatment modulates Myf5 expression, and further experiments would be required to examine Myf5 in an appropriate context (i.e. on ex-vivo cultured myofibers).

      2) Figure 2 is not very informative, while the second paragraph of the result parts is excessive and too complicated. The extensive description of differential gene expression in each potential subpopulation is neither very informative nor helpful to convince the reader that the M3/M5 population has acquired more stemness-like features due to the MS023 treatment. From my point of view, the data just reflect the increased proliferative capacity of MS023-treated cells with elevated cell cycle markers, ribosomal protein, and metabolic state. Do the M1-M5 populations show any different distribution along the trajectory? The authors need to show cell trajectories for each sample and cluster in Figure S3A. It is also imperative to present the distribution of signature genes for each individual cluster. Essentially, M1-M5 all located together in one cloud. What justifies segregation into different subclusters? The color code for the different clusters (including the trajectories) to allow better distinction. 

      MS023 treated MuSCs contain a subpopulation with higher Pax7 expression (Supplementary Figure S2F, S2G), which is consistent with the IF results in Figure 1 and emphasized in the abstract. Why are these data in the supplements and not in a main figure (e.g. in figure 2)?

      We appreciate the thoughtful and detailed comments on our single-cell data. Please see below for a response to each point:

      To address the concern that the results section is excessive, our intention was to simply provide the reader with a descriptive overview of the identity of each subcluster that the software identified. In fact, to ensure clarity and conciseness, we elected to provide only the names of a select few cluster markers rather than list all of the significant cluster markers that were generated. We kindly refer the reviewer to Supplementary Table S1 for a more extensive list of markers.

      In response to the reviewer’s comment: “The color code for the different clusters (including the trajectories) to allow better distinction,” we agree that colour-coding is helpful, please refer to Figure 2A for a colour-coded map of the clusters.

      To address the reviewer’s question regarding what justifies segregation into different subclusters for M1-M5, refer to Supplementary Table S1 for a list of uniquely enriched markers for each cluster. This list was filtered to include marker genes that were present only in a given cluster, thus contributing to its uniqueness and explains why that cluster was identified as being distinct from another given cluster.

      Lastly, since the elevated Pax7 levels in MS023-treated MuSCs was already presented and discussed thoroughly in Figure 1, we elected to avoid repetition in the main Figures and presented the ridge plots showing elevated Pax7 in the Supplementary Material for Figure 2

      3) The same group has reported previously that PRMT1-deficient MSCs show reduced expression of MyoD due to disruption of Eya1/Six1 recruitment to the MyoD promoter (PMID: 27849571). However, the scRNAseq result does reflect this finding. MyoD levels are not significantly changed in d4 MS023 compared with d4 (Supplementary figure S2G). The authors need to provide an explanation. Furthermore, the authors previously described that "the majority of PRMT1-deficient MSCs repressed Pax7 expression at day 3 while being Ki67 positive (Fig. 5B). How does that fit to the current observations, which indicate an increase of Pax7+ cells after MS023 treatment? This discrepancy needs to be resolved. 

      While the scRNAseq does not show a reduction in overall MyoD expression in MS023-treated MUSCs, there is indeed a reduction in the proportion of MyoD+ myofiber-associated MuSCs (Figure 1C, 1D). Supplemental Figure S2G further shows a subpopulation in the d4MS023 group with lower MyoD expression that was not present in the d4 group, reflective of the findings in Figures 1C and 1D. Therefore, although the average expression was not shifted significantly with MS023, there was indeed a subpopulation of MuSCs with lower MyoD expression.

      The reviewer additionally points out that Fig. 5B from a previous study (Blanc et al., 2017 MCB 37:e00457) performed by our group, shows that Pax7 expression was repressed at day 3 of culture in PRMT1-null MuSCs. However, this quantification was based on immunofluorescence staining where cells are marked positive or negative for Pax7 expression and does not look at the intensity of Pax7 expression levels. In our current study, we examine the expression levels of Pax7 in discrete subpopulations of MuSCs and found that there is a subpopulation of MuSCs that emerges with MS023 treatment that has higher Pax7 expression than untreated counterparts. Therefore, the results of the two experiments are not directly comparable. 

      4) I do have a major problem with the interpretation of the metabolic changes in MS023-treated MuSC. In the abstract, the authors wrote, "These findings suggest that type I PRMT inhibition metabolically reprograms MuSCs resulting in improved self-renewal and muscle regeneration fitness." There is simply no causal evidence to support this claim, which is solely based on a correlation. If the authors want to maintain this claim they either need to stimulate OXPHOS and glycolysis by other means to see whether such a manipulation recapitulates the effects of MS023 or attenuate OXPHOS and glycolysis to see whether this abrogates the effects of MS023. To prove whether increased OXPHIS is a cause for improved self-renewal, the authors might simply co-treat MuSC with MS023 and an OXPHIS inhibitor and analyze consequences for the Pax7+/MyoD- population. 

      We thank the reviewer for the excellent suggestions of experiments that would solidify a causal relationship between increased metabolism and increased self-renewal. We will certainly consider them for future studies. We agree that the relationship in the present study is correlative, and the text has been modified in the abstract as follows:

      “Single cell RNA sequencing (scRNAseq) of ex vivo cultured MuSCs revealed the emergence of subpopulations in MS023-treated cells which are defined by elevated Pax7 expression and markers of MuSC quiescence, both features of enhanced self-renewal. Furthermore, the scRNAseq identified MS023-specific subpopulations to be metabolically altered with upregulated glycolysis and oxidative phosphorylation (OxPhos). Transplantation of MuSCs treated with MS023 had a better ability to repopulate the MuSC niche and contributed efficiently to muscle regeneration following injury. Interestingly, the preclinical mouse model of Duchenne muscular dystrophy had increased bilateral grip strength 10 days after a single intraperitoneal dose of MS023. Our  findings show that inhibition of type I PRMTs increased the proliferation capabilities of MuSCs with altered cellular metabolism, while maintaining their stem-like properties such as self-renewal and engraftment potential.”

      5) Ryall et al reported that MuSCs undergo a metabolic switch from fatty acid oxidation to glycolysis with reduced intracellular NAD+ levels and reduced activity of SIRT1, leading to elevated H4K16 acetylation. Here, both OXPHOS and glycolysis are increased after treatment of MuSC with MS023. Are the NAD+ and H4K16ac levels changed in MS023-treated MuSC? 

      This is another excellent study that would help to support a causal relationship between MS023 treatment and increase OXPHOS and glycolysis and could certainly be addressed in future studies.

      6) In Ryall et al.'s results, there was no difference in the basal mitochondrial OCR between freshly isolated MuSCs and cultured MuSCs. Importantly, stimulation of OXPHOS will increase ROS concentration, resulting in premature differentiation of MuSC (PMID: 30106373). Furthermore, increased ROS levels will most likely enhance DNA damage rather than improve self-renewal. The authors have to address these issues and also monitor ROS and DNA damage levels. 

      The lack of cell death upon treatment with MS023 in the present study would indicate that there is no major ROS-induced DNA damage occurring. Additionally, the propensity of MS023-treated MuSCs to retain their stemness while in long-term culture (Supplemental figure S1E) would indicate that in this context, premature differentiation is not a concern.

      7) The authors used FACS-analysis of MuSCs three weeks after transplantation to demonstrate that MS023 treatment enables better engraftment into the MuSC niche. The six-fold increase of transplanted cells in the MuSC niche is difficult to understand, Why shall transplanted cells compete so efficiently with endogenous MuSC for repopulation of the niche? Is it possible that some of the transplanted MuSC are still lingering within the interstitium and erroneously counted as bona fide MuSC? The authors have to determine the localization of transplanted MuSC. Are all transplanted cells indeed situated in the proper niche or are they also present outside the basal lamina of muscle fibers? 

      The hindlimbs which received the engraftment were irradiated 24h prior to engraftment, therefore the ability of endogenous MuSCs to compete is compromised. Additionally, Figure 5E shows that the regenerated muscle indeed has GFP negative fibers that would have been generated from endogenous MuSCs, indicating that MS023-treated MuSCs did not fully outcompete endogenous MuSCs.

      8) The authors reported that an only 3-day treatment with MS023 is sufficient to dramatically improve muscle function in mdx mice even 30 days later, which is hard to swallow. What is the evidence that such strong effects are primarily mediated by stimulation of MuSC expansion? Are there other pathways or cells that respond to MS023 treatment and stimulate muscle strength? To support the claim of a 'better' stem cell function as the major cause for MS023-dependent stimulation of muscle strength in mdx mice, the authors need to determine the total number of Pax7+ cells, Pax7+/Ki67+, Pax7+/MyoD+, Pax7+/MyoD-, Pax7-/MyoD+ and myonuclei. It is also absolutely mandatory to include wildtype controls in the muscle strength measurements. Does MS023 treatment also increase muscle strength in wild-type controls? 

      Agreed. We cannot exclude if the effect is mediated by an expansion of the MuSC pool or by an effect on other cell types, such as a direct impact on the myofibers. The manuscript has been modified to include the following text:

      “Furthermore, our findings show that injection of MS023 in the dystrophic mouse model mdx led to enhanced muscle strength with effects lasting up to 30 days.  We cannot exclude if the effect of MS023 was mediated by an expansion of the MuSC pool or by an effect on other cell types, such as a direct impact on the myofibers. The goal of this experiment was to provide a therapeutic perspective for the possible use of type I PRMT inhibitor for the treatment of DMD.”

      The goal of this figure was to provide a therapeutic perspective for the use of type I PRMT inhibitor for the treatment of DMD. Muscle wasting/weakness in DMD is a complex and multifactorial process (e.g., myofiber fragility, MuSC defects, chronic inflammation, fibrofatty accumulation). If MS023 can target multiple aspects of the physiopathology of the disease it would increase its therapeutic applicability. Further studies will be needed to determine the exact mechanism by which MS023 mediate its beneficial effect. These future studies could include the use of wild type control, as the reviewer suggests, to investigate the role of MS023 in a non-muscle degenerative context.

      9) Ideally, a genetic inactivation-reactivation of PRMT1 should be done to validate the results with MS023 and to make sure that indeed the transient inhibition of PRMT1 is responsible for the beneficial effects of MS023. Of course, this would be a major effort when done in genetically manipulated mice and therefore is not adequate to ask for. However, it should be possible to use PRMT1-deficient MuSC, which the authors have in hand, and re-express PRMT1 in these cells with an AAV or a lentivirus. 

      We agree that genetic ablation of PRMT1 is a key experiment to validate MS023 results. Please refer to previous work from our group, which shows that PRMT1-KO MuSCs have an enhanced self-renewal phenotype (Blanc et al., 2017 MCB 37:e00457), similar to what was observed in the present study with MS023 treatment.

      10) Some claims are overstated and/or to aggressive. E.g.: "Therefore, through repression of type I PRMTs with MS023, we have reprogramed MuSCs to acquire a unique and previously uncharacterized identity." I do not see clear evidence that MS023 treatment 'reprograms' MuSC to a 'unique identity'. The observed changes are in large parts compatible with a simple stimulation of proliferation. 

      The unique finding in our data is that treatment with MS023 resulted in a shift in identity as compared to the DMSO-treated proliferating MuSCs (M1, M2 and M4), creating transcriptionally distinct M3 and M5 clusters. M3 and M5 had elevated markers for metabolism (E.g. Eno1, Atp5k, etc) and early activation (E.g. Fos, Jun), while the untreated MuSCs in clusters M1, M2 and M4 did not. Furthermore, M3 and M5 had higher baseline levels of Pax7 expression when compared to untreated cells. Together, these findings describe a transitional subpopulation of MuSCs unique to MS023 treatment which not only harbour stem like/early activation markers Pax7, Fos and Jun, but also elevated proliferative markers related to cell cycle and energy metabolism. This particular combination of characteristics is unique to the MS023-treated MuSCs, thus identifying a unique subtype of MuSC identity. In accordance with our scRNAseq data, we validated experimentally that MS023-treated cells have higher energy metabolism and increased self-renewal potential, thereby confirming that the unique transcriptomic signature of these cells also lead to a different cell fate decision.

    1. Author Response:

      The following is the authors' response to the original reviews.

      1) l. 80: "evolved from a fourth domain of cellular life": I am worried a little bit about putting together what I believe are too distinct hypothesis: (i) NCLDV deriving from a complex (ancestral) cellular life form (possibly proto-eukaryotic) by reductive evolution, and (ii) NCLDV forming or deriving from a fourth domain of cellular life. To clarify for non-expert reader, I would suggest rephrasing as "evolved reductive evolution, possibly from a fourth domain of cellular life...".

      Following the reviewer’s recommendation, we have clarified the sentence by writing: “These observations are at odds with the suggestion that NCLDVs originated by reductive evolution, possibly from a fourth domain of cellular life (Colson et al., 2018; Legendre et al., 2012; Patil and Kondabagil, 2021).”.

      2) l. 187-198: Please provide more information on which tool (with version number and parameter) was used to search genomes for MCPs. When I downloaded the HMM model and the faa file for the MCP from the figshare repository and tried to match the two, only a small number (4) of the MCP sequences actually matched the MCP HMM model with significant e-value, but I am not sure why? (for reference, I was using hmmsearch 3.3.2, default parameters)

      We used HMMER version 3.3.2 using the default parameters (hmmbuild and hmmsearch algorithms). We now include this information in the relevant section of the Methods: “Next, we constructed a set of Hidden Markov Models (HMMER version 3.3.2, hmmbuild/hmmsearch using the default parameters) for each of the 4 core proteins involved in virion morphogenesis”.

      We were able to reproduce the reviewer’s observation that the Major capsid curated HMM model returns 4 significant hits when used on the Major capsid multiple alignment file provided in FigShare (significant matches: 1. maverick2_NW_021681489.1_105940131438, 2. ncbincldv_NC_011335.1, 3. ncbincldv_NC_038553.1, 4. yutin_PLVACE1). This curated HMM model was one of the models used for searching homologous protein sequences and was built from a preliminary multiple sequence alignment comprising a different set of taxa (N. taxa = 48). In contrast, the multiple sequence alignment provided in Figshare is the final multiple sequence alignment of major capsid proteins that was used in phylogenetic analyses (N. taxa = 54). Therefore, we should not expect an exact match between the two files.

      We have updated the Figshare repository with a compressed file containing all the HMMs used for searching protein homologues (n = 38), which can be validated on hmmsearch on the European Bioinformatic Institute’s website (https://www.ebi.ac.uk/Tools/hmmer/search/hmmsearch).) A separate compressed file contains the final multiple sequence alignments that were used in phylogenetic inference and hypothesis testing.

      3) Figure 4: The acronyms should be explained in the legend (pPOLB, MCP, mCP, pro, atp, int, TIRs, etc)

      We now provide an explanation of the acronyms used for the traits matrix on Figure 4: “Acronyms refer to genes and genomic features present in the viral genomes: pPOLB (protein-primed DNA polymerase B), MCP (major capsid protein), mCP (minor capsid protein), int (rve-type integrase), pro (adenoviral-like protease), atp (FtsK/HerA DNA packaging ATPase), TIRs (terminal inverted repeats).”

      4) Figure 4: I believe that "TIRs" should be "Present in some members" for the virophages, based on https://doi.org/10.1186/s13062-015-0054-9? Interestingly, this group is typically the one that branches the deepest within virophages, which would be consistent with TIRs being an ancestral trait of the Maveriviricetes class (formerly Lavidaviridae family).

      As suggested, we updated the terminal inverted repeats (TIRs) trait for virophages to “Present in some members” to account for the Rumen virophages described by Yutin, Kapitonov and Koonin (2015, doi: 10.1186/s13062-015-0054-9).

      Additional changes:

      1) Figure 1 has been updated and now shows a polytomy between Mavericks 1/2 and PLVs. This reflects more closely the conceptual framework for our analyses since the specific branching of these groups was not specified in the phylogenetic models.

      2) We have added an Acknowledgements section to the end of the manuscript:

      Acknowledgements

      We wish to thank Peter Simmonds and Alexander Suh for their critical reading and comments on the manuscript, which served to improve this work. We also thank the reviewers for their recommendations and feedback. This work was supported by a doctoral scholarship (Dr. Jose Gregorio Hernandez Award) to JGNB made by the National Academy of Medicine of Venezuela and Pembroke College, Oxford.

    1. Author Response

      eLife assessment

      Tilk and colleagues present a valuable computational analysis of tumor transcriptomes to investigate the hypothesis that the large number of somatic mutations in some tumors is detrimental such that these detrimental effects are mitigated by an up-regulation by pathways and mechanisms that prevent protein misfolding. The authors address this question by fitting a model that explains the log expression of a gene as a linear function of the log number of mutations in the tumor and show that specific categories of genes (proteasome, chaperones, ...) tend to be upregulated in tumors with a large number of somatic mutations. Some of the associations presented could arise through confounding, but overall the authors present solid evidence that mutational load is associated with higher expression of genes involved in mitigation of protein misfolding - an important finding with general implications for our understanding of cancer evolution.

      We thank the reviewers for these kind words. The summary statement and public review highlight our work in understanding how human tumors phenotypically respond to mutational load by assessing changes in gene expression. This work provides a mechanistic underpinning to our previous finding that the accumulation of passenger mutations in tumors creates a substantial cost because even substantially damaging passenger mutations can fix in non-recombining clonal tumor lineages. At the same time, we believe the summary statement and the public review do not mention a key remaining part of our paper that validates our findings and establishes causal connections between protein misfolding due to coding passenger mutations and tumor fitness. Specifically, we replicate and cross-validate our findings in human tumors by examining expression responses in an independent dataset of cancer cell lines (CCLE), where we demonstrate similar expression responses to an accumulation of mutations, indicating generic, cell intrinsic responses. We then establish a causal link by demonstrating that mitigation of protein misfolding through protein degradation and re-folding is necessary for high mutational load cancer cells to maintain viability through perturbation experiments via shRNA known-down and treatment with targeted agents. These analyses and results are important because they show that the adaptive responses we observe are evidence of a generic, cell intrinsic phenomenon that cannot be explained by organismal effects, such as aging, changes in the immune system or microenvironment. 

      Joint Public Review:

      Tilk and colleagues present a computational investigation of tumor transcriptomes to investigate the hypothesis that the large number of somatic mutations in some tumors is detrimental and that these detrimental effects are mitigated by an up-regulation by pathways and mechanisms that prevent protein misfolding.

      The authors address this question by fitting a model that explains the log expression of a gene as a linear function of the log number of mutations in the tumor and additional effects for tumor homogeneity and type. This analysis identified a large number of genes (5000) that are more highly expressed at high mutational load at a FDR of 0.05. These genes are enriched in many core categories, most prominently in the proteasome, translation, and mitochondral translation. The authors then proceed to investigate specific categories of upregulated genes further.

      The individual reviews, and the discussion among the reviewers, raised several issues that could potentially undermine or weaken some of the findings presented in this paper.

      1) Systematic differences in expression of some genes from one tumor class to another might generate spurious associations with mutational load (ML), which would affect the results presented in Figs 1 and 3. The case of a causal link between ML and over-expression of genes that mitigate deleterious effects of misfolding would be stronger if these results were replicated within single cancer types with many samples with different ML (similar to how Fig S6 relates to Fig 3). A related concern might be an association between increased variance of expression and ML. The compositional nature of expression data could generate trends like the ones shown in Fig. 2 with changing variance.

      We agree with the reviewers that possible confounders should be considered since TCGA data is heterogeneous. In this paper, we investigated possible confounders such as multicollinearity with different mutational types (SNVs and CNVs), controlled for expression responses within cancer types in the GLMM, and used the jackknifing procedure to ensure that no one cancer type dominates the signal. However, in principle unknown hidden confounders could remain, which is why a large part of our paper was focused on validating these effects in an independent dataset (CCLE) where many other covariates are not relevant (immune system, donor variability, stage, age, sex, etc.). Importantly, we also used data from perturbation screens that are completely orthogonal to expression responses in CCLE to get at a cause and effect. 

      Our reasoning for using all of the data in Figure 1 while controlling for differences due to cancer type in the GLMM was to maximize the variation in mutational load across all of the samples in this dataset to identify what genes increase in expression as mutational load increases over 5 orders of magnitude. As noted here, we also already further validated that the signal we observe in Figure 1 is still robust for our gene sets of interest within cancer types in Supplemental Figure 6.

      2) Fig 4, Fig S5 and Fig S8 show results for the regression coefficient of expression on ML after leaving out one cancer at a time. All of us initially read this as results for 'one cancer at a time', rather than 'leave-one-out'. These figures are used to argue that the results are not driven by specific cancer types. However, this analysis would not reveal if the signal was driven by a (small) subset of cancer types. To justify claims like "significant negative relationship between mutational load and cell viability across almost all cancer types", one needs to analyze individual cancer types. Results for specific genes, rather than broad groups would also help interpret these results.

      Our reasoning for grouping together genes in Figure 4 was because the shRNA screen was done on a single gene at a time, and we were interested in measuring the joint effect on viability after knocking down all of the genes in a given complex. 

      Given that the expression responses in Figure 3 already validate within cancer types in TCGA in Supplemental Figure 6, we believe that it’s very unlikely that the signal we observe is driven by individual cancer types or smaller groups of cancer types. In addition, we did not perform a within cancer analysis in CCLE for Figure 4, because not all available cancer types in CCLE were profiled evenly in the shRNA screen (Total < 300). The vast majority of cancer types in CCLE for the shRNA screen (23/26) have sample sizes <20 within each group that we believe are unlikely to lead to meaningful results that are not driven by noise.

      3) You use different model architecture for the TCGA and CCLE analysis because you suspect that the sample size imbalance in the latter might mean that a GLMM can not capture the different variance components accurately. Did you test this? Could you downsample to avoid this? Cancer type is likely a strong confounder of ML.

      That was indeed our reasoning, that within group sample sizes in CCLE are too low to robustly estimate variance within cancer types. Given that many cancer types have <20 samples within each group, we don’t think that evenly downsampling would enable us to get an estimate not driven by noise. As noted above, our approach to control for this was to perform a jackknifing procedure that eliminates a single cancer type at a time and re-estimates the effect. 

      4) In the splicing analysis (Fig 2 and Fig S4), you report a 10% variation in splicing for a 100-fold variation in ML. This weak trend is replicated in very similar ways for many different types of alternative splicing events. It is not clear why different events (exon skipping, intron retention, etc) should respond in the same way to ML. A weak but homogeneous effect like the one shown here might result from some common confounder (see point 1). Similarly, it is not clear why with increasing intron retention PSI threshold the fraction of under-expressed transcripts would decrease and not increase.

      We agree that the effects of all the different alternative splicing effects are complex. Our focus was on intron retention, which is known to occur in cancer (Lindeboom, et. al 2016, Nature Genetics), and our analysis is consistent with the idea that damaging passenger mutations can shift cellular phenotypic states that require the use of many different mechanisms to mitigate protein misfolding.

      For Figure S4, as the PSI threshold for calling an alternative splicing event increases, fewer samples are called as having an intron retention event in the gene. This uniformly decreases the numerator across all the mutational load bins, so that when the threshold is increased the fraction of under-expressed transcripts with intron retention events is lower.